<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Recurrent neural networks with specialized word embedding for Chinese Clinical Named Entity Recognition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhenzhen Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qun Zhang</string-name>
          <email>chriszhang0511@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yang Liu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dawei Feng</string-name>
          <email>davyfeng.c@qq.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhen Huang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>College of Computer, National University of Defense Technology</institution>
          ,
          <addr-line>China Changsha, Hunan 410073</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>To extract medical clinical related entity mention from patient clinical records is an essential step in clinical research. Recently, many researchers employ neural architecture to tackle the similar task of clinical concept extraction or drug name recognition from English clinical records, and have got prominent progress. However, most previous systems on Chinese Clinical Named Entity Recognition(CNER) have focused on a combination of text “feature engineering” and conventional machine learning algorithms. In this paper, we proposed a nerual network system based on bidirectional LSTMs and CRF for Chinese CNER. Also, we use health domain datasets to create richer, more specialized word embeddings, and combined with the external health domain lexicons, the performance is futher improved.</p>
      </abstract>
      <kwd-group>
        <kwd>Neural networks</kwd>
        <kwd>Chinese Clinical Named Entity Recognition</kwd>
        <kwd>specialized word embedding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Patient clinical records contain longitudinal record of patient health, disease,
conducted test and response to treatment, often useful for epidemiologic and
clinical research. Thus extracting these information has been of immense value
for both clinical practise and to improve quality of patient care provided, while
reducing healthcare costs. The evaluation task of Clinical Named Entity
Recognition (CNER) aims to identify medical clinical related entity mentions from
Electronic Health Record narratives, and classify them into predefined
categories, such as disease, symptoms, examination etc.</p>
      <p>
        Traditional approaches to Named Entity Recognition(NER) relied on rule
based systems or dictionaries (lexicons) using string comparision to identify
entity mention of interest. Although these systems achieve high precision, they still
sufer from low recall and are hard to scale. Many related research regards NER
as a sequence labelling task. The applied methods on NER include Conditional
Random Fields (CRFs), Hidden Markov Models (HMMs), and they try to jointly
infer the most likely label sequence for a given sentence.However, these
methods rely heavily on hand-crafted features and task-specific resources, which is
costly to develop. To overcome these limitations, neural networks are applied to
this task [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and achieved competitive performance. This paper employs
bidirectional LSTM-CRF for automatic feature learning thus avoiding time-consuming
feature engineering.
      </p>
      <p>
        As to Chinese NER, there are more complicated properties in Chinese, for
example, the lack of word boundary, the complex composition forms, the uncertain
length and so on [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. To obtain the representations that embedded more precise
semantics, the sentences are segmented to words or phrases by NLP tools, and
each word or phrase can be represented by a numerical vector (the embedding).
Since the task of CNER is limited to the specific domain, the CNER systems
should also focus on text with specific dictionaries and topics, together with
dedicated sets of named entities. The NER system [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] has provided evidence that
retrained embeddings on a domain-specific dataset can help learn vector
representations for domain-specific words and increase the classification accuracy.
Therefore, this paper crawls Chinese health domain corpus to create richer, more
specialized word embeddings.
      </p>
      <p>In this paper, we develop a neural network architecture for Chinese clinical
named entity recognition with more complex and specialized word embeddings.
Moreover, we use the external clinical lexicons to optimize the extracted entity
mentions, which proves to be efective. External clinical lexicons are also used
to label the unlabeled dataset, so we get a noisy considerable training dataset,
which improves the neural architecture’s performance.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Model</title>
      <p>
        Our neural network is inspired by the work of Lamplea et al.(2016) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], where
feature vectors are computed by lookup tables and concatenated together, and
then fed into a BiLSTM neural network. Instead of using the general-purpose,
pre-trained word embeddings, we retrained word embeddings on the health
related and clinical dataset.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Input Word Embeddings</title>
        <p>specialized Word Embeddings A word embedding maps a word to a
numerical vector in a vector space, where semantically-similar words are expected to be
assigned similar vectors. To perform this mapping, we use the gensim tool and
choose the CBOW(Continuous Bag-Of-Words) algorithm to train a word2vec
model.</p>
        <p>The datasets used to train the embeddings consist of two parts. One is
crawled from several Chinese health related or clinical websites, such as 39
health1, healthcare and learning2 medical encyclopedia3 and so on. The crawled
corpora contain the descriptions about diseases, the symptoms, the therapeutic
methods and the doctor’s responses to the messages, which amount to over one
million sentences. The other corpus is published by the CNER Evaluation,
including the labeled data and unlabeled data. All these corpora are preprocessed
by removing unwanted characters, such as special characters, punctuation and
stop words. Then the specialized word embeddings are trained on these datasets,
which contain over 20 thousand unique words.</p>
        <p>As a matter of fact, in health corpora it is common to find some technical
and unusual words which are specific to the health domain. Therefore, such
datasets can generate good embeddings in many cases. However, for this
domainspecific task of CNER they still sufer from some lack of vocabulary. These out
of vocabulary words are initialized with random assignments, and we also use
character-level embeddings to complement its semantics.</p>
        <p>
          Character-level embeddings Following Lample et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] we also add
characterlevel embeddings of the words and phrases. The Chinese character means the
minimum unit of a segmented word or phrase, which can reflect part semantics
of the phrase. A character lookup table initialized at random contains an
embedding for every character. The character embeddings corresponding to every
character in a phrase are given in direct and reverse order to a forward and a
backward LSTM. This character-level representation is then concatenated with
a word-level representation from a word lookup table.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Bidirectional LSTM-CRF Networks</title>
        <p>
          We provide a brief introduction on the hybrid tagging architecture, which is
based on LSTMs and CRFs. The architecture is similar to the ones presented
by Lample et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], which show that NER can be successfully resolved as a
sequence labeling task.
tagging scheme The task of CNER is to assign a named entity label to every
word in a sentence. A single named entity could span several tokens within a
sentence. Sentences are usually represented in the IOB format(Inside, Outside,
Beginning). In this paper, we use IOBES(Inside, Outside, Beginning, Ending,
Singleton) tagging scheme. Using this scheme, more information about the
following tag is considered.
        </p>
        <p>
          BiLSTM-CRF The Long Short-Term Memory (LSTM) is designed to learn
long-term dependencies in the sequences by incorporating a gated memory-cell.
They do so using several gates that control the proportion of the input to give to
1 http://www.39.net/
2 http://club.xywy.com/
3 http://www.a-hospital.com/
the memory cell, and the proportion from the previous state to forget [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In this
paper, we use one popular LSTM variant, introduced by [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], is adding “peephole
connections”. This means that we let the gate layers look at the cell state. We
also use coupled forget and input gates. The implementation is followed by these
Equations:
it = (Wxixt + Whiht 1 + Wcict 1 + bi)
        </p>
        <p>C~t = tanh(Wxcxt + Whcht 1 + bc)</p>
        <p>Ct = it ⊙ C~t + (1</p>
        <p>it) ⊙ Ct 1
ot = (Wxoxt + Whoht 1 + Wcoct + bo)
ht = ot ⊙ tanh(Ct)
(1)
(2)
(3)
(4)
(5)
where is the element-wise sigmoid function, and ⊙ is the element-wise
product. In the bidirectional LSTM, for any given sentence, the network computes
both a left,ht , and a right,!ht , representations of the sentence context at
every input, xt . The final representation is created by concatenating them as
ht = [!ht ; ht ] .</p>
        <p>The output vectors of BiLSTM are fed to the CRF layer to jointly decode
the best label sequence. Since there are strong dependencies across output
labels(e.g., I-PER cannot fowllow B-ORG), CRF layer can be used to utilize the
dependencies and decide the final output label jointly. The labels are typically
predicted using a Viterbi-style algorithm which provides the optimal prediction
for the measurement sequence as a whole. Combining the continuous output
label of the same type, we get the candidate entity mentions of interest.
2.3</p>
        <p>optimizing with external knowledge
It is undeniable that external knowledge such as lexicons or the knowledge base
is crucial to NER systems, especially on specific domains. Thus, we construct
lexicons from annotated data and online data, which are related to the five
categories (Symptom, Examination, Disease, Treatment, Body) defined by the
CCKS 2017 CNER Evaluation task. The lexicons are applied to rectify some
incorrect extracted entity mentions by rules. For example, the entity name “
” is extracted by our system automatically, which is combined by two
tokens, “ ”, while the gold label of the same position is “ ”. This
is due to the wrong word segmentation. After word segmentation, if “ ” is
regarded as one token, the system can never automatically pick out “ ”
as an entity name. With the help of the lexicons, such errors can be solved by
replacing the extracted entity mention with the matched entry in the lexicon. A
match is successful when the extracted entity mention overlaps an entry in the
lexicon with the same type and the entry also appeares in the original sentence.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evaluation</title>
      <p>Evaluation was performed on the CCKS-2017 CNER shared task dataset. We
ran each experiment multiple times and remained the best hyper-parameters.
We also applied a dropout mask to the final embedding layer just before the
input to the BiLSTM network.
3.1</p>
      <sec id="sec-3-1">
        <title>Datasets</title>
        <p>The training set has 1198 electronic medical records, and the test set has 796
records. The five predefined categories are symptoms and signs, examination
and inspection, disease and diagnosis, treatment, body parts, which are
abbreviated as Symptom, Examination, Disease, Treatment, Body. Table 1 shows the
quantity of entity mentions labeled in the datasets from 5 categories.
As Table 2 shows with the raw training dataset, the performance of these two
categories (Treatment, Disease) is poor, so we reasonably guess this is caused
by the imbalance of training data. Therefore, we repeat all the sentences from
the training dataset that contain the entity mentions in the Treatment category
twice, and five times for the Disease category. After verifying the efectiveness
of this method, we test our model on the revised training dataset.</p>
        <p>For the dataset, we performed the following preprocessing:
– All sequences of digits 0-9 are replaced by a single “0”.</p>
        <p>– All sentences are segmented by the HanLP tool.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Evaluation Methodology</title>
        <p>In this work, we employ the “strict” evaluation method, where both the
entity class and its exact boundaries are expected to be correct. We report the
performance of the model in terms of the F1-score.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Results and Analysis</title>
        <p>Dataset/F1-score Symptom Examination Disease Treatment Body Overall
raw training dataset 0.9094 0.9000 0.6728 0.6686 0.7393 0.8250
revised training dataset 0.8199 0.8734 0.7054 0.8607 0.8049 0.8301</p>
        <p>Table 2. Results with diferent training dataset
Thus we can conclude that this method of revising the training dataset is efective
to the task. Our architecture have several components that have diferent impact
on the overall performance. Table 3 presents our comparison with diferent word
embedding. Compared with random assignments, our model obtained better
performance with specialized word embedding by +2.7%. With the help of the
lexicons, the F1-score gained by 2.2%</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this paper, we have set to investigate the efectiveness of the BiLSTM-CRF
architecture with specialized word embedding for Chinese clinical named entity
recognition, and compared them with a baseline neural network model. As
input features, we have applied combinations of specialized word embedding with
character-level embedding. And the flexible application of the lexicons also
improves the performance further.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Lample</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            <given-names>S</given-names>
          </string-name>
          , et al.
          <article-title>Neural architectures for named entity recognition[J]</article-title>
          <source>In Proc. of NAACL-2016</source>
          , San Diego, California, USA, June.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Duan</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zheng</surname>
            <given-names>Y.</given-names>
          </string-name>
          <article-title>A study on features of the CRFs-based Chinese Named Entity Recognition</article-title>
          [J].
          <source>International Journal of Advanced Intelligence</source>
          ,
          <year>2011</year>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <fpage>287</fpage>
          -
          <lpage>294</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Unanue</surname>
            <given-names>I J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borzeshi</surname>
            <given-names>E Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piccardi</surname>
            <given-names>M.</given-names>
          </string-name>
          <article-title>Recurrent neural networks with specialized word embeddings for health-domain named-entity recognition[J]</article-title>
          .
          <source>CoRR:abs/1706.09569</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>Long short-term memory</article-title>
          ,
          <source>Neural computation 9</source>
          (
          <year>1997</year>
          ) 1735
          <fpage>1780</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Gers</surname>
            <given-names>F A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Recurrent nets that time</article-title>
          and count[C]//Neural Networks,
          <year>2000</year>
          .
          <source>IJCNN</source>
          <year>2000</year>
          ,
          <article-title>Proceedings of the IEEE-INNS-ENNS International Joint Conference on</article-title>
          . IEEE,
          <year>2000</year>
          ,
          <volume>3</volume>
          :
          <fpage>189</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>