<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>E2EJ : Anonymization of Spanish Medical Records using End-to-End Joint Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mohammed Jabreel</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fadi Hassan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Najlaa Maarrof</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Sanchez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Josep Domingo-Ferrer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Moreno</string-name>
          <email>antonio.morenog@urv.cat</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Medical Documents. Deep Learn-</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CYBERCAT-Center for Cybersecurity Research of Catalonia. UNESCO Chair in Data Privacy. Universitat Rovira i Virgili</institution>
          ,
          <addr-line>Av. Pasos Catalans 26, E-43007 Tarragona, Catalonia</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>iTAKA: Intelligent Technologies for Advanced Knowledge Acquisition. Department of Computer Science and Mathematics</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>712</fpage>
      <lpage>719</lpage>
      <abstract>
        <p>This paper describes E2EJ, the system that we have developed to participate in the Medical Document Anonymization challenge in the shared task of IberLEf2019. E2EJ is a data-driven and end-to-end neural network. It does not rely on external resources such as part-ofspeech tagger. It proposes to solve two problems jointly; the rst problem is to automatically identify whether a token is sensitive, whereas the second one is to identify the type of the token. E2EJ shows comparable results to the state-of-the-art systems and outperform the baseline systems. The F1 score of our system on the test set is 96.61% and 95.83% for the sensitivity detection and the token type identi cation tasks respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>Anonymization</kwd>
        <kwd>CRF ing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Patient notes in electronic health records (EHRs) contain critical information
that may be useful for medical investigations. However, due to privacy
concerns, the vast majority of medical investigators can only access anonymized or
de-identi ed notes to protect the con dentiality of patients [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Anonymization
can be either manual or automated. Manual anonymization means that human
annotators label protected health information (PHI). This approach has some
drawbacks. First, only a limited set of individuals is allowed to access the
identi ed patient notes. Thus, the task cannot be crowd-sourced. Second, humans
are prone to mistakes. Third, manual anonymization is impractical given the
size of EHR databases. Therefore, a reliable automated anonymization system
would consequently be of high-value [
        <xref ref-type="bibr" rid="ref14 ref8">14, 8</xref>
        ]. In the literature, there are many
      </p>
      <p>M. Jabreel et al.
systems for EHR anonymization, which we can categorize them as rule-based,
feature-engineering-based, or deep-learning-based approaches.</p>
      <p>
        Starting by a seed collection of sensitive tokens, the idea of rule-based systems
is to manually engineer some rules based on regular expressions, syntactic, or
dependency structures to expand the collection iteratively [
        <xref ref-type="bibr" rid="ref13 ref9">13, 9</xref>
        ].
      </p>
      <p>
        The feature-engineering-based systems aim to train a sequence tagger with
rich, hand-crafted features based on linguistic or syntactic information from
annotated corpus to predict a label (e.g., O, B &lt; entity &gt; or I &lt; entity &gt;)
on each token in a sentence [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Rule-based and feature-engineering-based approaches are labor-intensive for
constructing rules or features using linguistic and syntactic information. Despite
some promising results, there are two main issues with these approaches. First,
the engineering of rules and features is a time-consuming task. Moreover, rules
always need to be updated. Second, the systems of these two categories are
dependent on some external requirements like a parser analyzing the syntactic
and dependency structure of sentences. Therefore, the performances of these
systems rely on the quality of the parsing results [
        <xref ref-type="bibr" rid="ref14 ref9">14, 9</xref>
        ]. To avoid these issues,
deep-learning is used to develop systems learn high-level representations for each
token, on which a classi er or sequence tagger can be trained [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        Medical Document Anonymization (MEDDOCAN) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] is a challenge in the
shared task of IberLEf2019 dedicated to the EHRs in the Spanish language.
There are two structured sub-tasks: "sensitive token detection" and "NER o
set and entity type classi cation". The rst sub-task aims to identify the sensitive
tokens in a document. We can solve this sub-task as a token-level binary
classication problem in which we develop a system that takes as input a document
and classify each token as sensitive or not. The second sub-task aims at
identifying the type of each token in a document. We can model this problem as a
sequence tagging problem. The input is a sequence of tokens, and the output is
their corresponding labels.
      </p>
      <p>We participated in the MEDDOCAN challenge by developing E2EJ, a joint
and end-to-end neural network-based system for the two sub-tasks. The
proposed system provides an end-to-end solution and does not require any parsers
or other linguistic resources. Speci cally, the proposed system is a multilayer
neural network, where the rst three layers aim to learn high representation for
a sequence of tokens, then we pass, jointly, the output of these layers to two
submodels that are learned interactively. One is for extracting the sensitive tokens,
while the other is for identifying their types.</p>
      <p>The rest of the paper structured as follows: Section 2 presents the
Methodology; Section 3 explains the dataset, baselines, and experimental settings; Section
4 presents and discusses the results; nally, Section 5 concludes this paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>System Description</title>
      <p>The main distinction point between our model and the literature
deep-learningbased is the consideration of the interaction between the two tasks of sensitivity
detection and token type identi cation. In this subsection, we introduce E2EJ
and its implementation steps in detail. Fig 1. depicts the architecture of our
model.</p>
      <p>1
2
...</p>
      <p>Word-level Embedding
Char-level Embedding</p>
      <p>BiLSTM</p>
      <p>FwLSTM BwLSTM
Word-level Embedding
Char-level Embedding
Word-level Embedding
Char-level Embedding</p>
      <p>BiLSTM
FwLSTM BwLSTM
...</p>
      <p>...</p>
      <p>BiLSTM
FwLSTM BwLSTM</p>
      <p>Conv1D
Conv1D
Conv1D
Conv1D
Conv1D
Conv1D</p>
      <p>MLP
CRF
MLP
CRF
MLP</p>
      <p>
        CRF
The goal of the embedding layer is to represent each word wi 2 S by a
lowdimensional vector space vi 2 Rd. Here, d is the size of the embedding layer.
We use two levels of embedding: word-level and character-level. For the
wordlevel embedding, we replace wi with its pre-trained Glove word embedding
vector viw [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. We use a single-layer 1-Dimensional Convolutional Neural
Networks (Conv1D) with max-over-time pooling to represent the word at
characterlevel as the following. Suppose that wi is made up of a sequence of characters
[c1; c2; :::; cn], where n is the length of wi. First, we pass the sequence of
characters of the word wi to a randomly initialized character embedding layer to get
the matrix Ci 2 Rr l - that is the character-level representation of wi. Here,
the j th column corresponds to the character embedding for cj . After that, we
apply a narrow convolution between Ci and a lter (or kernel) H 2 Rr k of
width k, after which we add a bias and apply a nonlinearity to obtain a feature
map f i 2 Rn k+1. Speci cally, the m-th element of f i is given by:
f i[m] = tanh(hCi[ ; m : m + k
1]; Hi + b)
where Ci[ ; m : m + k 1] is the m-to-(m + w1)-th column of Ci and hA; Bi
is the frobenius inner product. Finally, we take the max-over-time
vic = maxmf i[m]
      </p>
      <p>Sensitive?
O / B-? / I-?
Sensitive?
O / B-? / I-?
Sensitive?
O / B-? / I-?
(1)
(2)
as the feature corresponding to the lter H (when applied to word wi).A
lter is basically picking out a character n-gram, where the size of the n-gram
corresponds to the lter width.</p>
      <p>The nal representation of the word wi is given by concatenating the
wordlevel vector and the character-level vector.</p>
      <p>vi = [viw; vic]
2.2</p>
      <sec id="sec-2-1">
        <title>BiLSTM Layer</title>
        <p>The goal of the encoder layer is to represent the sequence of words
representations, fv1; v2; :::; vlg, that is obtained from the embedding layer in higher level of
abstraction and model the sequential phenomena. In this work we use a BiRNN
to design our encoder. A BiRNN consists of forward ! and backward
recurrent neural networks (RNNs). The rst one reads the input sequence in a forward
direction and produces a sequence of forward hidden states (h!1; :::; !hl ), whereas
the former reads the sequence in the reverse order (vwl ; :::; vw1 ) resulting in a
sequence of backward hidden states (hl ; :::; h1).</p>
        <p>We obtain a representation for each word vwt by concatenating the
corresponding forward hidden state !ht and the backward one ht. The following
equations illustrate the main ideas:
!ht = !(vwt ; ht!1)
ht =</p>
        <p>(vwt ; ht 1)
ht = [ !ht; ht]</p>
        <p>
          In practice, RNNs are challenging to train. Gradients may explode or vanish
over long sequences [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. To overcome these problems, we use Long Short-Term
Memory (LSTM) [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] networks that are a more sophisticated variant of regular
RNNs.
2.3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Sensitivity Detection Sub-Model</title>
        <p>The input to this sub-model is the obtained sequence of vectors from the BiLSTM
layer, and the output is the probability for each token been sensitive. As shown
g. 1, it comprises of two units: Conv1D with a single layer and a multi-layer
perceptron (MLP) with one hidden layer and one Sigmoid neuron, i.e., the output
layer. The goal of the Conv1D layer is to enrich the representation of each token
with information about a xed-sized context depending on a kernel width of k.
Formally, we get the nal representation of the input sequences as follows:
[v1s; v2s; :::; vls] = Conv1D([v1; v2; :::; vl]
Where Conv1D refers to the same operations in Equations 1 and 2. Given that,
for each vts, we obtain the nal output as the following.</p>
        <p>xts = tanh(vts W1s + bs1)
(3)
(4)
(5)
(6)
(7)
(8)
yts = sigmoid(xts</p>
        <p>W2s + bs2)
(9)
Here, W1s 2 Rds dx , bs1 2 Rdx , W2s 2 Rds 1 and bs2 2 R are the MLP parameters.
Where ds is the dimensionality of the output vector from Conv1D and dx is the
dimensionality of the output vector from the hidden layer.
2.4</p>
      </sec>
      <sec id="sec-2-3">
        <title>NER Type Detection Sub-Model</title>
        <p>Similarly, the input to this sub-model is the obtained sequence of vectors from
the BiLSTM layer. The output, in this case, is the probability for each token
been sensitive. Formally, let [v1t; v2t; :::; vlt] be the sequence of vectors to be
labeled, which is produced the concatenation of the MLP layer in the Sensitivity
Detection sub-model and the output of the Conv1D layer in this sub-model, and
Y t = [y1t; y2t; :::; ylt] is the corresponding tag sequence. Each element yit of y is one
of the B &lt; entity &gt;, I &lt; entity &gt; or O tags. Both H and Y t are assumed to
be random variables, and they are jointly modeled using a conditional random
eld (CRF).
2.5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Training</title>
        <p>We train our model to minimise the joint objective function J .</p>
        <p>J = Js + Jt
Where Js is the sigmoid cross-entropy and Jt is the negative log-probability of
the correct tag sequence:</p>
        <p>s
Js = yt
log(yts) + (1
yts)
log(1</p>
        <p>yts)
Jt =</p>
        <p>log p(Y tjH)</p>
        <p>
          Where yts is the golden label and yts is the predicted one. The Y t refers to
the sequence of tags. As optimization algorithm, we used Stochastic Gradient
Descent (SGD)-based ADAM algorithm [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] with learning-rate of 0.001. To avoid
the over- tting, we used dropout on the embeddings and decoder outputs with
a rate of 0.3 [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
(10)
(11)
(12)
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>In this section, we discuss the dataset used and di erent experimental settings
devised to evaluate our system.
3.1</p>
      <sec id="sec-3-1">
        <title>Dataset Details</title>
        <p>
          We trained and ne-tuned our system respectively on the training and the
development sets provided by the organizers of the MEDDOCAN challenge. After
that, we submitted the predicted labels of the test set that are produced by our
system to evaluate its performance. The organizers omitted the golden labels of
the test. The training set contains 500 documents, and the development and test
sets contain 250 documents each.
We used grid-search to obtain the best hyper-parameter values based on the
development set. We list these values in Table 1.
We evaluated the performance of our system by comparing it against the
following baseline systems:
{ RegEx: a rule-based system using only regular expressions.
{ CRf : a CRf-based system trained on a set of features such as unigram,
part-of-tags, word shape, a xes, etc. [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]
{ E2E-LSTM: a version of our system that are trained to only identify the
type of tokens.
We have developed a system, called E2EJ, that automatically detects the
sensitive entities and identify their types in Spanish electronic health records. It
contains two sub-models that are trained jointly. The rst one aims to detects
the sensitive entities and guides the second one to accurately predict the type
of these detected tokens. E2EJ provides an end-to-end solution and does not
require any external tools or other linguistic resources. The e ectiveness of the
proposed system has been evaluated by participating in the Medical Document
Anonymization challenge for the electronic health records in Spanish language
obtaining results which show comparable results to the state-of-the-art systems
and outperform the baseline systems. The reported results show that the
proposed system is stable and consistent. In our future work, we plan to perform
extensive error analysis and inspect the performance of the system and improve
it. For example, we plan to use a transformer-based interpretable model like
BERT [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] as a pre-trained sentence encoder instead of using BiLSTM.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>The authors acknowledge the support of Univ. Rovira i Virgili through a Mart
i Franques PhD grant, the assistant/teaching grant for the Department of
Computer Engineering and Mathematics and the Research Support Funds
2019PFRURV-B2-60.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Act</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Health insurance portability and accountability act of 1996</article-title>
          . Public law
          <volume>104</volume>
          ,
          <issue>191</issue>
          (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          ,
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>arXiv preprint arXiv:1412.6980</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Automatic</surname>
          </string-name>
          de
          <article-title>-identi cation of electronic medical records using token-level and character-level conditional random elds</article-title>
          .
          <source>Journal of biomedical informatics 58, S47{S52</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>De-identi cation of clinical notes via recurrent neural network and conditional random eld</article-title>
          .
          <source>Journal of biomedical informatics 75, S34{S42</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Marimon</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Intxaurrondo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrguez</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez</surname>
            <given-names>Martin</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>J.A.</given-names>
            ,
            <surname>Villegas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Krallinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            :
            <surname>Automatic</surname>
          </string-name>
          de
          <article-title>-identi cation of medical texts in spanish: the meddocan track, corpus, guidelines, methods and evaluation of results</article-title>
          .
          <source>In: Proceedings of the Iberian Languages Evaluation Forum (IberLEF</source>
          <year>2019</year>
          ). vol.
          <source>TBA</source>
          , p.
          <source>TBA. CEUR Workshop Proceedings (CEUR-WS.org)</source>
          , Bilbao,
          <source>Spain (Sep</source>
          <year>2019</year>
          ), TBA
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Meystre</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedlin</surname>
            ,
            <given-names>F.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>South</surname>
            ,
            <given-names>B.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samore</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          :
          <article-title>Automatic de-identi cation of textual documents in the electronic health record: a review of recent research</article-title>
          .
          <source>BMC medical research methodology 10(1)</source>
          ,
          <volume>70</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Neamatullah</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Douglass</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li-wei</surname>
            ,
            <given-names>H.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reisner</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villarroel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>W.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Moody, G.B.,
          <string-name>
            <surname>Mark</surname>
            ,
            <given-names>R.G.</given-names>
          </string-name>
          ,
          <article-title>Cli ord</article-title>
          , G.D.:
          <article-title>Automated deidenti cation of free-text medical records. BMC medical informatics and decision making 8(1</article-title>
          ),
          <volume>32</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pascanu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>On the di culty of training recurrent neural networks</article-title>
          .
          <source>In: ICML</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hinton</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krizhevsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salakhutdinov</surname>
          </string-name>
          , R.:
          <article-title>Dropout: a simple way to prevent neural networks from over tting</article-title>
          .
          <source>The Journal of Machine Learning Research</source>
          <volume>15</volume>
          (
          <issue>1</issue>
          ),
          <year>1929</year>
          {
          <year>1958</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Sweeney</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Replacing personally-identifying information in medical records, the scrub system</article-title>
          .
          <source>In: Proceedings of the AMIA annual fall symposium</source>
          . p.
          <fpage>333</fpage>
          . American Medical Informatics Association (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Uzuner</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Evaluating the state-of-the-art in automatic de-identi cation</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>14</volume>
          (
          <issue>5</issue>
          ),
          <volume>550</volume>
          {
          <fpage>563</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>