<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Protected Health Information Recognition byBiLSTM-CRF</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Cristobal Colon-Ruiz[</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Isabel Segura- B</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad Carlos III de Madrid</institution>
          ,
          <addr-line>Leganes</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>679</fpage>
      <lpage>686</lpage>
      <abstract>
        <p>Medical records contain relevant information about patients, which can be bene cial to improving healthcare and research in the clinical domain. Due to this, there is a growing interest in developing automatic methods to extract and exploit the information from medical records. However, medical records also contain protected health information about patients. To protect the con dentiality and privacy of patients, this sensitive information should be removed prior to any processing of these documents. In this paper, we describe an architecture for the detection and identi cation of protected health information from medical records. The architecture is composed of two bidirectional Long ShortTerm Memory layers and a nal layer based on Conditional Random Fields. Our system participated in the Meddocan shared task, obtaining a micro-F1 of 93.22%.</p>
      </abstract>
      <kwd-group>
        <kwd>Anonymization</kwd>
        <kwd>De-identi cation</kwd>
        <kwd>Short-Term Memory</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In recent years, the number of electronic medical records (EHRs) has been
increasing massively. These registries are very useful resources to perform studies
focused on detection/prevention of diseases and medical decision making, among
others. However, health records contain protected health information (PHI). For
instance, information about family history, treatment tracking, and data that
may help to identify a given patient (for example, patient's name, address,
telephone number, zip code, etc). Due to this protected information, medical
records cannot be shared without a previous de-identi cation process. This
process consists of detecting and subsequently replacing or removing all protected
information from the records.</p>
      <p>
        The interest in addressing de-identi cation problems motivated the proposal
of two tasks, the 2006 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and 2014 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] de-identi cation tracks, organized by
Integrating Biology and the Bedside (i2b2). These tracks have signi cantly in
uenced the Natural Language Processing (NLP) community in the medical eld,
and in particular, for the task of automated text anonymization. Nevertheless,
the tasks only focused on records written in English. The Meddocan 2019 has
proposed the rst task for the anonymization of medical records in Spanish [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        The identi cation of PHI inside medical documents can be addressed as a
Named Entity Recognition (NER) problem. This problem has been widely
studied and the approaches normally used can be divided into several main
categories: dictionaries and rule-based systems, machine learning, deep learning,
and hybrid systems. Dictionary-based methods are limited by the size of the
dictionaries themselves, in addition to the constant growth of vocabulary and
spelling errors. Rule-based approaches usually provide high precision, however,
they do not usually contemplate all existing cases as a result of the complexity
of the language. Furthermore, rule-based and machine learning methods require
a previous generation of syntactic and semantic features, as well as
domainspeci c information. Approaches based on deep learning methods automatically
learn relevant patterns, allowing a certain grade of independence of language
and domain. Moreover, these approaches have been shown to achieve better
results than the best hybrid systems in i2b2 tasks. The system described in [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
which was based on Long-short Term Memory (LSTM) layers combined with
Conditional Random Field (CRF) layers, scored 97.87% of F1 surpassing the
winning ib2b 2014 system [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] with 96.11%, which was based on a hybrid model
combining conditional random elds with keyword and rule-based approaches.
      </p>
      <p>
        Considering the above, in this paper, we propose the use of an adaptation
of the NeuroNer tool [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] for the sub-tasks 1 and 2 of MEDDOCAN-2019 on
Spanish records. This tool uses the combination of two bidirectional Long-short
Term Memory (BiLSTM) layers with a nal Conditional Random Fields layer.
The rest of the paper is organized as follows. Section 2 brie y describes the
datasets provided for the MEDDOCAN-2019 task. In Section 3, we describe
the architecture of our system. Section 4 presents the results obtained for our
system. In Section 5, we provide the conclusions.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Dataset</title>
      <p>The training and development sets provided in the MEDDOCAN-2019 task are
composed of 500 and 250 clinical cases respectively and contain 22 di erent types
of PHI entities listed in Table 5. In both sets, the representation of the di erent
types of entity is proportional and unbalanced.</p>
      <p>Protected health information recognition by BiLSTM-CRF</p>
      <p>The clinical cases are initially provided in BRAT format1, a stando format
where the di erent annotations are stored separately from the original text in a
similar way to the BioNLP Shared Task stando format2.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Methods and system description</title>
      <sec id="sec-3-1">
        <title>Pre-processing</title>
        <p>We pre-process the text of the clinical cases taking into account di erent steps.
First, sentences are split using the Spacy (Spacy.io), an open-source library that
provides support for texts in several languages, including Spanish. Due to the
nature of the language used in this type of text, we have de ned a set of rules
to avoid detecting dots corresponding to acronyms or abbreviations as sentence
separators.</p>
        <p>Afterward, the resulting sentences are tokenized by using Spacy. However,
some tokens meet the following pattern " eld:value". For instance, the token
"cp:28007", which refers to a postal code, should be split into the corresponding
eld ("cp") and its value ("28007"). Due to this, we have included a set of rules
to correctly split this kind of tokens.</p>
        <p>
          Finally, the text and its annotations are transformed into the CoNLL-2003
format3 using the BIOES schema [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. In this schema, tokens are annotated using
the following tags: The B tag represents a token that is the beginning of an entity,
The I tag indicates that the token belongs to an entity, the O tag represents that
the token does not belong to any entity, the E tag marks a token as the end of
a given entity, and the S tag indicates that an entity is comprised of a single
token.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Network description</title>
        <p>
          As we can appreciate in section 1, approaches composed of Bidirectional LSTMs
layers in conjunction with CRF, provide good results in NER tasks [
          <xref ref-type="bibr" rid="ref2 ref5">5, 2</xref>
          ].
Bidirectional LSTMs are a type of recurrent neural network (RNN) that takes into
account the context of words in the sentence by capturing past (previous words)
and future (next words) information. In addition, to improve the accuracy of
predictions provided by the BiLSTM layer, the CRF layer uses information from
the neighbor tags (at sentence level) in order to predict current tags.
Considering the above, in order to address the de-identi cation problem described in
the MEDDOCAN-2019 task, we propose to use the NeuroNer tool [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], which is
composed of three main layers:
1. Token representation with character-enhanced and token embedding layer.
2. BiLSTM prediction layer
1 http://brat.nlplab.org/stando .html
2 http://2011.bionlp-st.org/home/ le-formats
3 https://www.clips.uantwerpen.be/conll2003/ner/
3. CRF sequence optimization layer
        </p>
        <p>
          The rst layer aims to generate vector representations of the tokens that
conform the input sequences. The direct representation of token to vector (word
embedding) can be pre-trained or can be learned in conjunction with the rest
of the model by adjusting its weights. Pre-trained models can be obtained from
a large amount of unlabeled data with methods such as word2vec or GloVe [
          <xref ref-type="bibr" rid="ref7 ref8">7,
8</xref>
          ]. However, the di erent word embedding models do not contain representation
for those tokens not included in their vocabularies. The rst layer addresses this
problem by incorporating a representation of tokens based on their characters
(character embeddings). Each token character is represented by its own vector,
allowing the network to learn morphological information even from tokens that
are not included in the vocabulary of the word embedding model [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>The character embedding sequence of each token is passed as input to a
BiLSTM to obtain character-based word embedding as output. Finally, the
representation of word embeddings and character-based word embedding are
concatenated for each token, which will be the input for the second BiLSTM layer.
This BiLSTM layer aims to obtain the sequence of probabilities for each token
to pertain to a given label using the BIOES coding. The label for each token
will be the one with the highest probability.</p>
        <p>The last layer consists of a conditional random elds layer. This layer receives
as input the sequence of probabilities of the previous layer in order to improve
predictions. This is due to the ability of the layer to take into account the
dependencies between the di erent labels. The output of this layer provides the
most probable sequence of labels.</p>
        <p>The parameters of the embedding and hyperparameters of our model used
for the MEDDOCAN-2019 task are listed below:
{ Word Embeddings: randomly initialized and adjusted during training.</p>
        <p>The dimension of the vectors is 100.
{ Character Embeddings dimension: randomly initialized and adjusted
during training. The dimension of the vectors is 25.
{ First BiLSTM hidden state dimension: 25 for the forward and backward
layers
{ Second BiLSTM hidden state dimension: 100 for the forward and
backward layers
{ Optimizer: Stochastic gradient descent (SGD), learning rate: 0.01
{ Dropout: 0.5
{ Number of Epochs: 100
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>The MEDDOCAN-2019 evaluation process consists of two di erent scenarios.
The rst scenario consists of detecting exactly the location in the text of each
PHI, as well as the type of entity. The second scenario consists of identifying</p>
      <p>Protected health information recognition by BiLSTM-CRF
sensitive data in order to be replaced, regardless of the type of entity. In the last
scenario, two di erent types of evaluations are performed: (1) Strict evaluation of
the spans of PHI that belong to sensitive phrases. (2) Merge evaluation where the
spans of PHI connected by non-alphanumerical characters are merged. To
evaluate both scenarios, the metrics proposed by the organizers are micro-averaged
precision, recall, and F1-score.</p>
      <p>To evaluate the trained models, as well as their hyperparameters, we
performed a set of experiments with the development dataset provided by the
MEDDOCAN-2019 organizers. We used grid search to adjust the word
embeddings dimension, the number of units in the BiLSTM hidden layer, the optimizer
and the learning rate.</p>
      <p>
        We can observe in Tables 2 and 3 that our best results on the overall
development set do not present signi cant di erences. The run0 model was trained
using the hyperparameters mentioned in section 3.2. The run1 model was trained
using ADAM [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] as optimizer, with a word embeddings dimension of 300 and
200 units in the second layer of the BiLSTM. The run2 model was trained using
ADAM employing a word embeddings dimension of 200 and 100 units in the
second layer of the BiLSTM.
      </p>
      <p>Considering that both the models run0 and run1 achieved the best results
in the two subtasks, these models were used to process the test set provided
by MEDDOCAN-2019 in scenarios 1 and 2. The results obtained in both tasks
can be seen in Table 4. Moreover, we can verify that the run0 model is the
one that provides the best results in all scenarios (F1 of 93.22% in sub-task 1,
F1 of 94.26% in sub-task 2 for strict evaluation and F1 of 95.77% for merged
evaluation).</p>
      <p>Once the run 0 model has been selected as our best model, it is interesting
to perform a more detailed study for each of the PHI due to the unbalance of
the problem. As we can see in Table 5, not all types of entities are classi ed
so easily. For example, entities such as ID EMPLEO PERSONAL SANITARIO
and OTROS SUJETO ASISTENCIA are never correctly classi ed. This may be
due to their low representation in the training set, as well as the confusion it may
cause with other similar types of entities such as FAMILIARES SUJETO ASISTENCIA.
On the other hand, the types of entity with the best results are those with the
highest representation in the training set or those with a speci c structure such
as ID ASEGURAMIENTO and ID CONTACTO ASISTENCIAL.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>Medical records are sources of a wide range of studies to detect and prevent
diseases, as well as to provide information for medical decision making. The
growing volume of these types of records, in addition to the studies associated
with them, increases the importance of de-identi cation. This type of process</p>
      <p>Protected health information recognition by BiLSTM-CRF
has the goal of eliminating or replacing sensitive data to allow access to medical
information without compromising the identi cation of the patients involved.</p>
      <p>Most previous e orts in the task of anonymization medical records have been
focused mostly on texts written in English. Meddocan is the rst shared task
devoted to the anonymization of medical records in Spanish. One of the
major challenges of this shared task is that there are a large number of sensitive
data categories (22 di erent types of entities). In addition, these data are often
unbalanced in the text. This results in di culties to classify them correctly.</p>
      <p>In this paper, we describe our participating system in this task. It exploits
the NeuroNer tool, a tool based on deep learning with bi-directional LSTM and
CRF layers for the task of NER. In spite of the challenges above described, our
system obtains a micro-F1 of 93.22% on the test set.</p>
      <p>
        For future works, we plan to explore other deep learning architectures as
well as exploiting pre-trained word embedding models, as well as other types of
embeddings such as sense embeddings [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Due to the unbalanced data, we also
plan to explore how the weighting of di erent classes in training can a ect the
performance, as well as the use of di erent sampling methods. Furthermore, the
identi cation of certain types of entities (table 5)(such as PROFESION,
INSTITUCION, and CENTRO SALUD) might be improved by using dictionary-based
approaches in addition to our system.
      </p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgements</title>
      <p>This work was supported by the Research Program of the Ministry of Economy
and Competitiveness - Government of Spain (DeepEMR project
TIN2017-87548C2-1-R).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Dernoncourt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Neuroner: an easy-to-use program for named-entity recognition based on neural networks</article-title>
          .
          <source>arXiv preprint arXiv:1705.05487</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dernoncourt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uzuner</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>De-identi cation of patient notes with recurrent neural networks</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>24</volume>
          (
          <issue>3</issue>
          ),
          <volume>596</volume>
          {
          <fpage>606</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , Lu s, T.,
          <string-name>
            <surname>Marujo</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Astudillo</surname>
            ,
            <given-names>R.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amir</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Black</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Trancoso</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Finding function in form: Compositional character models for open vocabulary word representation</article-title>
          .
          <source>arXiv preprint arXiv:1508</source>
          .
          <year>02096</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lyu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Long short-term memory rnn for biomedical named entity recognition</article-title>
          .
          <source>BMC bioinformatics 18(1)</source>
          ,
          <volume>462</volume>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Marimon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Intxaurrondo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrguez</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            ,
            <surname>Villegas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            :
            <surname>Automatic</surname>
          </string-name>
          de
          <article-title>-identi cation of medical texts in spanish: the meddocan track, corpus, guidelines, methods and evaluation of results</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ). vol.
          <source>TBA</source>
          , p.
          <source>TBA. CEUR Workshop Proceedings (CEUR-WS.org)</source>
          , Bilbao,
          <source>Spain (Sep</source>
          <year>2019</year>
          ), TBA
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ratinov</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Design challenges and misconceptions in named entity recognition</article-title>
          .
          <source>In: Proceedings of the thirteenth conference on computational natural language learning</source>
          . pp.
          <volume>147</volume>
          {
          <fpage>155</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Stubbs</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kot la</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uzuner</surname>
            ,
            <given-names>O</given-names>
          </string-name>
          .
          <article-title>: Automated systems for the de-identi cation of longitudinal clinical narratives: Overview of 2014 i2b2/uthealth shared task track 1</article-title>
          .
          <source>Journal of biomedical informatics 58, S11{S19</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Trask</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michalak</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>sense2vec-a fast and accurate method for word sense disambiguation in neural word embeddings</article-title>
          .
          <source>arXiv preprint arXiv:1511.06388</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Uzuner</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kohane</surname>
            ,
            <given-names>I.:</given-names>
          </string-name>
          <article-title>i2b2 workshop on natural language processing challenges for clinical records</article-title>
          .
          <source>In: Proceedings of the Fall Symposium of the American Medical Informatics Association. Citeseer</source>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garibaldi</surname>
            ,
            <given-names>J.M.:</given-names>
          </string-name>
          <article-title>Automatic detection of protected health information from clinic narratives</article-title>
          .
          <source>Journal of biomedical informatics 58, S30{S38</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>