<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VSP at MEDDOCAN 2019</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, Carlos III University of Madrid. Leganes 28911</institution>
          ,
          <addr-line>Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>654</fpage>
      <lpage>662</lpage>
      <abstract>
        <p>This work presents the participation in the MEDDOCAN Task of the VSP team with a neural model for the Named Entity Recognition of medical documents in Spanish. The Neural Network consists of a two-layer model that creates a feature vector for each word of the sentences. The rst layer uses the character information of each word and the output is aggregated to the second layer together with its word embedding in order to create the feature vector of the word. Both layers are implemented with a bidirectional Recurrent Neural Network with LSTM cells. Moreover, a Conditional Random Field layer classi es the word vectors in one of the 29 types of Protected Health Information (PHI). The system obtains a performance of 86.01%, 87.03%, and 89,12% in F1 for the classi cation of the entity types, the sensitive span detection, and both tasks merged, respectively. The model shows very high and promising results being a basic approach without using pretrained word embeddings or any hand-crafted feature.</p>
      </abstract>
      <kwd-group>
        <kwd>Named Entity Recognition</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Recurrent Neural Network</kwd>
        <kwd>Medical Documents</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>V ctor Suarez-Paniagua
Nowadays, healthcare professionals deal with a high amount of unstructured
documents that makes very di cult the task of nding the essential data in
medical documents. Decreasing the time-consuming task of retrieving the most
relevant information can help the fastness of generating a diagnosis for patients
by doctors. Instead the vast of information are available as Electronic Health
Record (EHR), the manual annotation of them is impracticable because the
highly increasing number of generated documents per day and also because they
contain sensitive data and Protected Health Information (PHI). For this reason,
the development of an automatic system that identi es sensitive information
from medical documents is vital for helping doctors and preserving patient
condentiality.</p>
      <p>
        The i2b2 shared task was the rst Natural Language Processing (NLP)
challenge for identifying PHI in the clinical narratives [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The second edition of the
i2b2 shared task Track 1 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] created a gold standard dataset with annotations
of the PHI categories from 1,304 medical records in English. In this competition,
the highest ranking system used the Conditional Random Field (CRF) classi er
together with hand-written rules for the de-identi cation of clinical narratives
obtaining very promising results with 97.68% in F1 [14].
      </p>
      <p>
        The goal of the Iberian Languages Evaluation Forum (IberLEF) 2019, which
includes the TASS and IberEval workshops, is to create NLP challenges using
corpora written in one of the Iberian languages (Spanish, Portuguese, Catalan,
Basque or Galician). Following the i2b2 de-identi cation task, the Medical
Document Anonymization task (MEDDOCAN) encourages the research community
to design NLP systems for the identi cation of PHI from clinical texts in Spanish
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For this purpose, a corpus of 1,000 clinical case studies with PHI phrases
was manually annotated by health documentalists.
      </p>
      <p>
        Currently, Deep Learning approaches overcome traditional machine learning
systems on the majority of NLP tasks, such as text classi cation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], language
modeling [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and machine translation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Moreover, these models have the
advantage of automatically learn the most relevant features without de ning rules
by hand. Concretely, the state-of-the-art performance for Named Entity
Recognition (NER) task is an LSTM-CRF Model proposed by [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The main idea of
this system is to create a word vector representation using a bidirectional
Recurrent Neural Network with LSTM cells (BiLSTM) with character information
encoded in another BiLSTM layer in order to classify the tag of each word in the
sentences with a CRF classi er. Following this approach, the system proposed
in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] uses a BiLSTM-CRF Model with character and word levels for the
deidenti cation of patient notes using the i2b2 dataset. This approach overcomes
the top ranking system in this task reaching to 97.88% in F1.
      </p>
      <p>
        This paper presents the participation of the VSP team at the tasks proposed
by MEDDOCAN about the classi cation of PHI types and the sensitive span
detection from medical documents in Spanish. The proposed system follows the
same approaches of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] with some modi cations for the Spanish language
implemented with NeuroNER tool [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Dataset</title>
      <p>The corpus of the MEDDOCAN task contains 1,000 clinical cases with PHI
entities manually annotated by health documentalists. The documents are randomly
divided into the training, validation and test sets for creating, developing and
ranking the di erent systems, respectively.</p>
      <p>Similarly to the annotation schema of the i2b2 de-identi cation tasks, the
named entities are annotated according to their o sets and their type for each
detection and classi cation (see Figure 1). The 29 types of the annotated PHI
mentions follow the Health Insurance Portability and Accountability Act (HIPAA)
guidelines for Spanish the health records aggregating some PHI entities.</p>
      <p>VSP at MEDDOCAN 2019
This section presents the Neural architecture for the classi cation of the PHI
entity types and the sensitive span detection using medical documents in
Spanish. Figure 2 shows the entire process of the model using two BiLSTMs for the
character and token levels in order to create each word representation until its
classi cation by a CRF.</p>
      <p>V ctor Suarez-Paniagua
3.1</p>
      <sec id="sec-2-1">
        <title>Data preprocessing</title>
        <p>
          Before using the system, the documents of the corpus are preprocessed in order to
prepare the inputs for the Neural model. Firstly, the clinical cases are separated
into sentences using a sentence splitter and the words of these sentences are
extracted by a tokenizer, both were adapted for the Spanish language. Once
the sentences are divided into word, the BIOES tag schema encodes each token
with an entity type. The tag B de nes the beginning token of a mention, the
I tag de nes the inside token of a mention, the E tag de nes the ending token
of a mention, the S tag indicates that the mention has a single token and the
O tag indicates the outside tokens that do not belong to any mention. In many
previous NER tasks, using this codi cation is better than the BIO tag scheme
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], but the number of labels increases because there are two additional tags for
each class. Thus, the number of possible classes are the 4 tags times the 29 PHI
classes and the O tag for the MEDDOCAN corpus. For the experiments, all the
previous processes are performed by the spaCy tool in Python [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
3.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>BiLSTM layers</title>
        <p>RNNs are very e ective in feature learning when the inputs are sequences. This
Deep Learning model uses two di erent weights for the input and for the previous
output as:
h(t) = f (Wx(t) + Uh(t
1) + b)
where h(t) is the output at t time of the input x, f is a non-linear function,
W are the weights for the current input, U are the weights for the previous
output, and b the bias term of the Neural Network. However, the basic RNN
cannot capture the long dependencies because it loses the information of the
gradients as long as the back-propagation is applied to the previous states. For
this reason, the incorporation of cell units into the RNN computation solves the
long propagation of the gradient problem.</p>
        <p>
          The Long Short-Term Memory cell (LSTM) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] de nes four gates for creating
a word representation taking the information of the current and previous cells.
The input gate it, the forget gate ft and the output gate ot for the current
t step transform the input vector xt taking the previous output ht 1 using its
corresponding weights and bias computed with a sigmoid function. The cell state
ct takes the information given from the previous cell state ct 1 regulated by the
forget cell and the information given from the current cell c0t regulated by the
input cell using the element-wise represented as:
        </p>
        <p>VSP at MEDDOCAN 2019
ft = (Wf [ht 1; xt] + bf )</p>
        <p>it = (Wi [ht 1; xt] + bi)
c0t = tanh(Wc [ht 1; xt] + bc)
ct = ft ct 1 + it c0</p>
        <p>t
ot = (Wo [ht 1; xt] + bo)</p>
        <p>ht = ot tanh(ct)</p>
        <p>Finally, the current output ht is represented with the hyperbolic function of
the cell state and controlled by the output gate. Furthermore, another LSTM can
be applied in the other direction from the end of the sequence to the start.
Computing the two representations is bene cial for extracting the relevant features
of each word because they have dependencies in both directions.
Character level The rst layer takes each word of the sentences individually.
These tokens are decomposed into characters that are the input of the BiLSTM.
Once all the inputs are computed by the network, the last output vectors of both
directions are concatenated in order to create the vector representation of the
word according to its characters.</p>
        <p>Token level The second layer takes the embedding of each word in the sentence
and concatenates them with the outputs of the rst BiLSTM with the character
representation. In addition, a Dropout layer is applied to the word representation
in order to prevent over tting in the training phase. In this case, the outputs of
each direction in one token are concatenated for the classi cation layer.
3.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Contional Random Field Classi er</title>
        <p>
          CRF [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] is the sequential version of the Softmax that aggregates the label
predicted in the previous output as part of the input. In NER tasks, CRF shows
better results than Softmax because it adds a higher probability to the correct
labelled sequence. For instance, the I tag cannot be before a B tag or after a E
tag by de nition. For the proposed system, the CRF classi es the output vector
of the BiLSTM layer with the token information in one of the classes.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results and Discussion</title>
      <p>The architecture was trained over the training set during 25 epochs with shu ed
mini-batches and choosing the best performance over the validation set. The
values of the two BiLSTM and CRF parameters for generating the prediction
of the test set are presented in Table 1. The embeddings of the characters and
words are randomly initialized and learned during the training of the network.</p>
      <p>V ctor Suarez-Paniagua
Additionally, a gradient clipping keeps the weight of the network in a low range
preventing the exploding gradient problem.</p>
      <p>The results were measured with precision (P), recall (R) and F-measure (F1)
using the True Positives (TP), False Positives (FP) and False Negatives (FN)
for its calculation. Table 2 presents the results of the Neural Model with the
two BiLSTM levels and the CRF classi er over the test set of the MEDDOCAN
task. The performance over the NER o set and entity type classi cation (Task
1) shows an 86,01% in F1 and the performance over the sensitive token detection
(Task 2) shows an 87,03% in F1 taking into consideration only if the entities have
exact boundary match and entity type (Strict). Thus, the results for both tasks
merged reach to 89,12% in F1.</p>
      <p>From the table, it can be observed that the number of FN and FP are very
similar giving very similar Precision and Recall results in all the classes. On the
one hand, there are classes with very high performance, such as CORREO ELECTRONICO,
EDAD SUJETO ASISTENCIA, FECHAS, NOMBRE SUJETO ASISTENCIA
and PAIS that are greater than the 95% in F1 because of the data is presented in
the same location between documents and they are easy to disambiguate from the
remaining classes. On the other hand, the classes of OTROS SUJETO ASISTENCIA
and PROFESION shows a very low performance because they have a very small
number of instances in the training set making hard the learning of their
representation in the network. In order to alleviate this problem, the use of
oversampling techniques is proposed to increase the number of instances of the less
representative classes and making more balanced this dataset.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusions and Future work</title>
      <p>This work proposes a Neural model for the detection and classi cation of PHI
from clinical texts in Spanish. The architecture is based on RNNs in both
direction of the sentences using LSTM for the computation of the outputs. Finally,
a CRF classi er performs the classi cation for tagging the PHI entity types.
The results shows a performance of 86.01% and 87.03% in F1 for the classi</p>
      <p>VSP at MEDDOCAN 2019
cation of the entity types and the sensitive span detection over the
MEDDOCAN corpus giving 89,12% in F1 for the merged tasks as the o cial result.
The results are very similar in Precision and Recall for all the classes giving
a low performance in the less representative classes and a higher performance
in the well-structured PHI entities, such as NOMBRE SUJETO ASISTENCIA
EDAD SUJETO ASISTENCIA, CORREO ELECTRONICO, FECHAS, and PAIS.</p>
      <p>As future work, exploring the contribution of each representation individually
and ne-tuning the parameters of the model will be useful in order to increase the
performance. In addition, the aggregation of embeddings from di erent external
information, such as Part-of-Speech tags, syntactic parse trees or semantic tags,
could increase the representation of each word for improving its classi cation.
Moreover, the sentence splitter of spaCy seems to divide sentences when some
acronyms appear, such as 'Dr.', 'Dra.', 'Sr.' or 'Sra.' (Spanish honori c pre x).
For this reason, the creation of simple rules in order to avoid these cases could
be bene cial for increasing the performance. Furthermore, adding more layers
to each BiLSTM is proposed to be included in the architecture.</p>
      <p>V ctor Suarez-Paniagua</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cho</surname>
          </string-name>
          , K.,
          <string-name>
            <surname>van Merrienboer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bougares</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning phrase representations using RNN encoder{decoder for statistical machine translation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1724</volume>
          {
          <fpage>1734</fpage>
          . Association for Computational Linguistics, Doha, Qatar (Oct
          <year>2014</year>
          ). https://doi.org/10.3115/v1/
          <fpage>D14</fpage>
          -1179, https://www.aclweb. org/anthology/D14-1179
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Dernoncourt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>NeuroNER: an easy-to-use program for named-entity recognition based on neural networks</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          . pp.
          <volume>97</volume>
          {
          <fpage>102</fpage>
          . Association for Computational Linguistics, Copenhagen, Denmark (Sep
          <year>2017</year>
          ). https://doi.org/10.18653/v1/
          <fpage>D17</fpage>
          -2017, https:// www.aclweb.org/anthology/D17-2017
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dernoncourt</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Young</surname>
            <given-names>Lee</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Uzuner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Szolovits</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>De-identi cation of patient notes with recurrent neural networks</article-title>
          .
          <source>Journal of the American Medical Informatics Association : JAMIA</source>
          <volume>24</volume>
          (
          <year>06 2016</year>
          ). https://doi.org/10.1093/jamia/ocw156
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Explosion</surname>
            <given-names>AI</given-names>
          </string-name>
          : spaCy - Industrial-strength
          <source>Natural Language Processing in Python (</source>
          <year>2017</year>
          ), https://spacy.io/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          ,
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classi cation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1746</volume>
          {
          <issue>1751</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>La</surname>
            <given-names>erty</given-names>
          </string-name>
          , J.D.,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.C.N.</given-names>
          </string-name>
          :
          <article-title>Conditional random elds: Probabilistic models for segmenting and labeling sequence data pp</article-title>
          .
          <volume>282</volume>
          {
          <issue>289</issue>
          (
          <year>2001</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>645530</volume>
          .
          <fpage>655813</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawakami</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Neural architectures for named entity recognition</article-title>
          .
          <source>In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>260</volume>
          {
          <fpage>270</fpage>
          . Association for Computational Linguistics, San Diego, California (Jun
          <year>2016</year>
          ). https://doi.org/10.18653/v1/
          <fpage>N16</fpage>
          -1030, https://www.aclweb.org/anthology/N16-1030
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Marimon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Intxaurrondo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodr</surname>
            <given-names>guez</given-names>
          </string-name>
          , H.,
          <string-name>
            <surname>Lopez</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            ,
            <surname>Villegas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            :
            <surname>Automatic</surname>
          </string-name>
          de
          <article-title>-identi cation of medical texts in spanish: the meddocan track, corpus, guidelines, methods and evaluation of results</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ). vol.
          <source>TBA</source>
          , p.
          <source>TBA. CEUR Workshop Proceedings (CEUR-WS.org)</source>
          , Bilbao,
          <source>Spain (Sep</source>
          <year>2019</year>
          ), TBA
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ratinov</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Design challenges and misconceptions in named entity recognition</article-title>
          .
          <source>In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL-2009)</source>
          . pp.
          <volume>147</volume>
          {
          <fpage>155</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2009</year>
          ), http://aclweb.org/anthology/W09-1119
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Stubbs</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kot la</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uzuner</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>Automated systems for the deidenti cation of longitudinal clinical narratives: Overview of 2014 i2b2/uthealth shared task track 1</article-title>
          .
          <source>Journal of biomedical informatics 58 (07</source>
          <year>2015</year>
          ). https://doi.org/10.1016/j.jbi.
          <year>2015</year>
          .
          <volume>06</volume>
          .007
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>O</given-names>
            <surname>zlem</surname>
          </string-name>
          <string-name>
            <surname>Uzuner</surname>
          </string-name>
          , Luo,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Szolovits</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Evaluating the state-of-the-art in automatic de-identi cation</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <volume>550</volume>
          {
          <fpage>563</fpage>
          (
          <year>2007</year>
          ). https://doi.org/https://doi.org/10.1197/jamia.M2444, http:// www.sciencedirect.com/science/article/pii/S106750270700179X 14.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garibaldi</surname>
            ,
            <given-names>J.M.:</given-names>
          </string-name>
          <article-title>Automatic detection of protected health information from clinic narratives</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>58</volume>
          ,
          <issue>S30</issue>
          {
          <fpage>S38</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>