<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Deep Learning-Based System for the MEDDOCAN Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dehuan Jiang</string-name>
          <email>jiangdehuan@stu.hit.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yedan Shen</string-name>
          <email>shenyedan@stu.hit.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shuai Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Buzhou Tang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xiaolong Wang</string-name>
          <email>wangxl@insun.hit.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Qingcai Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruifeng Xu</string-name>
          <email>xuruifeng@hit.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jun Yan</string-name>
          <email>Jun.YAN@Yiducloud.cn</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yi Zhou</string-name>
          <email>zhouyi@sysu.edu.cn</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Harbin Institute of Technology Shenzhen Graduate School</institution>
          ,
          <addr-line>Shenzhen, China, 518055</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sun YAT-SEN UNIVERSITY</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Yidu Cloud (Beijing) Technology Co., Ltd</institution>
          ,
          <addr-line>Beijing</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>761</fpage>
      <lpage>767</lpage>
      <abstract>
        <p>Copyright c 2019 for this paper by its authors. Use permitted under CreativeCommons License Attribution 4.0 International (CC BY 4.0). IberLEF 2019, 24 September 2019, Bilbao, Spain.</p>
      </abstract>
      <kwd-group>
        <kwd>De-identification</kwd>
        <kwd>Protected Health Information</kwd>
        <kwd>medical document anonymization</kwd>
        <kwd>deep learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        De-identification is a prerequisite of clinical record accessing and sharing outside of
hospitals, which is very important for secondary use of clinical data. In the past few
years, de-identification had attracted plenty of attention and a large number of efforts
had been made for de-identification, especially for clinical documents in English. The
representative works are natural language processing (NLP) challenges including the
de-identification task of clinical text, such as the i2b2 (the Center of Informatics for
Integrating Biology and Bedside) 2006 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and 2014 [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2-4</xref>
        ], and the N-GRID (the
Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-scale and
RDOC Individualized Domains) 2016 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. As these challenges are public and provide
manually annotated corpora for de-identification, they attract lots of research teams to
participate in and develop various kinds of systems [
        <xref ref-type="bibr" rid="ref6 ref7 ref8 ref9">6-9</xref>
        ]. According to the overview
report of the N-GRID 2016 NLP challenge [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the best system is a hybrid system
based on deep learning methods [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        In 2019, Martin Krallinger et al. organized a challenge task special for the
deidentification of medical documents in Spanish, called the MEDDOCAN (Medical
Document Anonymization) task [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The organizers provided a training set of 500
clinical records, a development set of 250 clinical records and a test set of 250 clinical
records embedded in synthetic corpus 3751 clinical records. We participated in this
challenge task and developed a system based on latest deep learning methods such as
BERT (Bidirectional Encoder Representations from Transformers)
(https://github.com/google-research/bert) and flair
(https://github.com/zalandoresearch/flair). The system developed on the training set
and development set achieved a “strict” F1-score of 0.9646 at entity level, a “strict”
F1-score of 0.97 at span level and a “merged” F1-score of 0.9821 at span level. It
should be noted that the results reported here were the new results after we added a
post-processing module to fix tokenization errors when testing.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Material and Methods</title>
      <p>
        The overview architecture of our system for the MEDDOCAN task is shown Fig.1.
We first tokenized raw clinical texts in Spanish, and then deployed two individual
deep learning methods (i.e., BERT+CRF and flair) for de-identification respectively.
Our system was described below in detail.
The organizers of the MEDDOCAN task provided participants with a synthetic
corpus of 1000 discharge summaries and medical genetics clinical records manually
annotated by medical experts according to a guideline defining 22 types of PHI. The
corpus were divided into three parts: a training set of 500 records with 11,333 PHI
mentions, a development set of 250 records with 5801 PHI mentions, and a test set of
250 records with 5661 PHI mentions. The test set was embedded in a background set
of 3751 clinical records that have been manually split into sentences. The statistics of
the corpus, including number of documents, sentences and PHI mentions are listed in
Table 1, where “NA” denotes unknown.
Sentence split and tokenization are two important preprocessing steps for natural
language processing (NLP). We developed a simple rule-based system for sentence split
and tokenization. A document was split into sentences by ‘;’, ‘?’, ‘!’, ‘\n’ or ‘.’ not in
numbers, and each sentence was tokenized by the method proposed by Liu et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
2.3
      </p>
      <sec id="sec-2-1">
        <title>Deep Learning Methods</title>
        <p>De-identification is a typical named entity recognition problem, which is usually
recognized as a sequence labeling problem. In this study, we deployed two deep learning
methods for the MEDDOCAN task, that is, BERT+CRF and flair as follows:
BERT+CRF. a method that appends a condition random field (CRF) layer to BERT.
In our study, we compared the cases using different settings.</p>
        <p>Flair. a sequence labeling method based on contextual string embeddings.
2.4</p>
      </sec>
      <sec id="sec-2-2">
        <title>Post-processing</title>
        <p>As clinical records in the test set have been manually split into sentence, to fixed
errors caused by sentence split, we mapped the split sentences back to the gold ones and
combined the neighbor PHI mentions of the same type together.
2.5</p>
      </sec>
      <sec id="sec-2-3">
        <title>Evaluation</title>
        <p>All system performance was measured by micro-average precisions (P), recalls (R),
and F1-scores (F1) under three criteria: “strict” at entity level (track 1), “strict” at
span level (track 2), and “merged” at span level, where “strict” at entity level checks
whether a recognized PHI mention exactly matches a gold one of the same type,
“strict” at span level checks whether a recognized PHI mention has the same span as a
gold one no matter their types, and “merged” at span level is a “strict” at span level
after merging the spans of PHI mentions connected by non-alphanumerical characters.
All evaluations were conducted on the independent test data set, and the measures
were calculated by the tool provided by the MEDDOCAN organizers.
2.6</p>
      </sec>
      <sec id="sec-2-4">
        <title>Experiments Setup</title>
        <p>In this study, PHI mentions were represented by “BIO” (B-beginning of a PHI
mention, I-insider a PHI mention, O-outside a PHI mention). The hyper-parameters and
parameter estimation algorithm listed in Table 2 were used in the deep learning
methods. The pre-trained neural language models
(https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H768_A-12.zip and https://github.com/zalandoresearch/flair) were used in BERT+CRF
and flair respectively. The other parameters were optimized on the development set,
and all models are evaluated on the independent test set.
The micro-average precisions, recalls and F1-scores of our system under the three
criteria were listed in Table 3. BERT+CRF outperformed flair by about 0.3% in
F1scores because of higher recalls. When POS features were added, the performance of
BERT+CRF decreased a little bit. When we further fine-tuned BERT+CRF on the
combination of training and development sets, BERT+CRF did not change very
much. Our system achieved the highest “strict” F1-score of 0.9646 at entity level, a
“strict” F1-score of 0.97 at span level and a “merged” F1-score of 0.9821 at span
level.
The new results (shown in Table 3) reported here are the results of our first
submissions (shown in Table 4) after post-processing. The great differences between “strict”
F1-scores (track 2) and “merged” F1-scores inspired us to find errors caused by
sentence split. For example, in sentence ”Domicilio: Av. de Jaén, 28.”, “Av. de Jaén, 28”
is a entity of “CALLE”, but was split into two entities of “CALLE”: “Av.” and “de
Jaén, 28” as the sentence were split into two sentence “Domicilio: Av.” and “de Jaén,
28.” by ‘.’. The sentence split errors result in an F1-score difference of about 0.4
between “strict” F1-scores and “merged” F1-scores. We can see that the post-processing
module brings a “strict” F1-score gain of 0.0245 for track 1 and a “strict” F1-score
gain of 0.0243 for track 2. The differences between “strict” F1-scores (track 2) and
“merged” F1-scores decrease sharply when the post-processing module is added.</p>
        <p>To analyze errors in our system, we evaluated the performance on each category of
entity and found that the F1-scores on “PROFESION” and “INSTITUCION” are
much lower than other categories except “OTROS_SUJETO_ASISTENCIA”, on
which the F1-score is zero. There are main three reasons why these three categories of
entities are not well recognized. Firstly, entities in some categories are too few. For
example, there are only 15 entities of “OTROS_SUJETO_ASISTENCIA” in the
training set and development set in all, and only 7 in the test set. Secondly, entities of
“INSTITUCION” vary greatly in format. Thirdly, there may be some entities wrongly
labeled as gold standards. For example, “militar” and “ex-operario de industria textil”,
which means “soldier” and “ex-textile industry operator” respectively, are recognized
by our system but not labeled as gold standards.
5</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>In this study, we developed a deep learning-based system for the MEDDOCAN task,
a challenge special for de-identification of clinical text in Spanish. The system
achieves a promising performance. Besides, “BERT+CRF” outperforms flair. In the
future, we will investigate whether BERT and flair can be combined together for
further improvement.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>This paper is supported in part by grants: NSFCs (National Natural Science
Foundations of China) (U1813215, 61876052 and 61573118), National Key Research and
Development Program of China (2017YFB0802204), Special Foundation for
Technology Research Program of Guangdong Province (2015B010131010), Strategic
Emerging Industry Development Special Funds of Shenzhen
(JCYJ20170307150528934 and JCYJ20180306172232154), Innovation Fund of
Harbin Institute of Technology (HIT.NSRIF.2017052).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Ö. Uzuner,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Luo</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Szolovits</surname>
          </string-name>
          ,
          <article-title>Evaluating the state-of-the-art in automatic deidentification</article-title>
          ,
          <source>Journal of the American Medical Informatics Association</source>
          , vol.
          <volume>14</volume>
          , no.
          <issue>5</issue>
          ,
          <issue>2007</issue>
          , pp.
          <fpage>550</fpage>
          -
          <lpage>563</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Stubbs</surname>
          </string-name>
          and Ö. Uzuner,
          <article-title>Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus</article-title>
          ,
          <source>Journal of biomedical informatics</source>
          , vol.
          <volume>58</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>S20</fpage>
          -
          <lpage>S29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Ö. Uzuner and
          <string-name>
            <given-names>A.</given-names>
            <surname>Stubbs</surname>
          </string-name>
          ,
          <article-title>Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks</article-title>
          ,
          <source>Journal of biomedical informatics</source>
          , vol.
          <volume>58</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>S1</fpage>
          -
          <lpage>S5</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>A.</given-names>
            <surname>Stubbs</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Kotfila and Ö. Uzuner, Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1</article-title>
          ,
          <source>Journal of biomedical informatics</source>
          , vol.
          <volume>58</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>S11</fpage>
          -
          <lpage>S19</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Stubbs</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Filannino</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uzuner</surname>
            <given-names>Ö</given-names>
          </string-name>
          .
          <article-title>De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID Shared Tasks Track 1[J]</article-title>
          .
          <source>Journal of biomedical informatics</source>
          ,
          <year>2017</year>
          ,
          <volume>75</volume>
          :
          <fpage>S4</fpage>
          -
          <lpage>S18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>S.M.</given-names>
            <surname>Meystre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.J.</given-names>
            <surname>Friedlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.R.</given-names>
            <surname>South</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shen</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.H.</given-names>
            <surname>Samore</surname>
          </string-name>
          ,
          <article-title>Automatic deidentification of textual documents in the electronic health record: a review of recent research</article-title>
          ,
          <source>BMC medical research methodology</source>
          , vol.
          <volume>10</volume>
          , no.
          <issue>1</issue>
          ,
          <issue>2010</issue>
          , pp.
          <fpage>70</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>O.</given-names>
            <surname>Ferrández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.R.</given-names>
            <surname>South</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.J.</given-names>
            <surname>Friedlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.H.</given-names>
            <surname>Samore</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.M.</given-names>
            <surname>Meystre</surname>
          </string-name>
          ,
          <article-title>Evaluating current automatic de-identification methods with Veteran's health administration clinical documents</article-title>
          ,
          <source>BMC medical research methodology</source>
          , vol.
          <volume>12</volume>
          , no.
          <issue>1</issue>
          ,
          <issue>2012</issue>
          , pp.
          <fpage>109</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>L.</given-names>
            <surname>Deleger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Molnar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Savova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lingren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Marsolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jegga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Stoutenborough</surname>
          </string-name>
          ,
          <article-title>Large-scale evaluation of automated clinical note deidentification and its impact on information extraction</article-title>
          ,
          <source>Journal of the American Medical Informatics Association: JAMIA</source>
          , vol.
          <volume>20</volume>
          , no.
          <issue>1</issue>
          ,
          <issue>2013</issue>
          , pp.
          <fpage>84</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Deng</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Automatic de-identification of electronic medical records using token-level and character-level conditional random fields</article-title>
          ,
          <source>Journal of Biomedical Informatics</source>
          , vol.
          <volume>58</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>S47</fpage>
          -
          <lpage>S52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Liu</surname>
            <given-names>Z</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            <given-names>B</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            <given-names>X</given-names>
          </string-name>
          , et al.
          <article-title>De-identification of clinical notes via recurrent neural network and conditional random field[J]</article-title>
          .
          <source>Journal of biomedical informatics</source>
          ,
          <year>2017</year>
          ,
          <volume>75</volume>
          :
          <fpage>S34</fpage>
          -
          <lpage>S42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Marimon</surname>
          </string-name>
          , Montserrat, Gonzalez-Agirre, Aitor, et al.
          <article-title>Automatic de-identification of medical texts in Spanish: the MEDDOCAN track, corpus, guidelines, methods and evaluation of results</article-title>
          ,
          <source>Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>