<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Toronto CL at CLEF 2018 eHealth Task 1: Multi-lingual ICD-10 Coding using an Ensemble of Recurrent and Convolutional Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Serena Jeblee</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akshay Budhkar</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sasa Milic</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Je Pinto</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chloe Pou-Prom</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krishnapriya Vishnubhotla</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Graeme Hirst</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frank Rudzicz</string-name>
          <email>ffrankg@spoclab.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The Vector Institute</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Toronto Rehabilitation Institute-UHN</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Toronto</institution>
          ,
          <addr-line>Toronto</addr-line>
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We assign ICD-10 codes to cause-of-death phrases in multiple languages by creating rich and relevant word embedding models. We train 100-dimensional word embeddings on the training data provided, combined with language-speci c Wikipedia corpora. We then use n-gram matching of the raw text to the provided ICD dictionary followed by an ensemble model which includes predictions from a CNN classi er and a GRU encoder-decoder model.</p>
      </abstract>
      <kwd-group>
        <kwd>GRU</kwd>
        <kwd>CNN</kwd>
        <kwd>ensemble</kwd>
        <kwd>word embeddings</kwd>
        <kwd>medical coding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The International Classi cation of Diseases (ICD), established by the World
Health Organization, provides a standardized and universal way to encode
medical diagnoses. Used for the purposes of determining health trends and reporting
statistics, the ICD assigns each medical concept, disease, or disorder to a
hierarchical letter-digits combination (e.g., A08 denotes \viral and other speci ed
intestinal infections") [23]. Medical coding is important for public health research
and for making clinical and nancial decisions. However, the coding process is
often expensive and time-consuming, and can be error-prone due to its complex
pipeline [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Automated medical coding could o er a potential solution to these
problems.
      </p>
      <p>
        Here we present our methodology and results for the task 1 of the CLEF
2018 eHealth challenge: \Multilingual information extraction | ICD10 coding"
[
        <xref ref-type="bibr" rid="ref12">12, 20</xref>
        ]. This task consists of assigning ICD-10 codes [23] (the tenth revision of
ICD)4 to the text of death certi cates. Datasets are provided for three languages:
French, Italian, and Hungarian, and we submit results for all three.
4 http://apps.who.int/classi cations/icd10/browse/2016/en
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        The recent popularity of neural network-based methods has spread to the
healthcare domain, especially convolutional neural networks (CNNs) and recurrent
neural networks (RNNs), both of which we use in this paper. A variety of
models have been successfully applied to health-related automated tasks, such as
predicting hospital readmissions and suicide risk [
        <xref ref-type="bibr" rid="ref13">19, 13</xref>
        ] and automated
diagnosis and information extraction using deep convolutional belief networks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        For this task, we can treat ICD coding as a sequence-to-sequence problem of
words to ICD codes. Indeed, the best submission from the CLEF eHealth 2017
challenge achieved an F1 score of 85.01% on the test data using an
encoderdecoder model [21]. Other approaches from the 2017 challenge included
rulebased systems (IMSUNIPD [15], LITL [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), SVM classi ers (LIMSI [24]), and
query-based algorithms (SIBM [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], WBI [22], LITL).
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Data and pre-processing</title>
      <p>The provided training data consists of cause-of-death texts, ICD-10 codes, and
demographic information in the French, Hungarian, and Italian languages. The
data is given in either raw or aligned format. Raw data consists of the following
three les:
{ A CausesBrutes le containing values for DocID (the death certi cate ID),
YearCoded (the year the death certi cate was processed), LineID (the line
number within the death certi cate), RawText (the raw text entered by the
physician in the death certi cate), IntType (the type of time interval the
patient had been su ering from coded cause), and IntValue (the time interval
of IntType).
{ A CausesCalculees le containing the ICD-10 codes. Similar to
CausesBrutes, this le contains the DocID, YearCoded, and LineID elds.
Additionally, this le includes values for CauseRank (the rank of the ICD-10 code
assigned by the coder), StandardText (the dictionary entry or excerpt of
RawText that supports the assigned code), and ICD10 (the gold standard
ICD-10 code).
{ A Ident le containing demographic information. This le contains the DocID,
YearCoded, and LineID elds, as well as the following elds: Gender (the
gender of the deceased), PrimCauseCode (the code of the primary cause of
death), Age (age at the time of death, rounded to the nearest ve-year age
group), and LocationOfDeath.</p>
      <p>Files are provided in the raw format for the French, Hungarian, and Italian
datasets. Additionally, the French data is also provided in an aligned format,
consisting of one le in which the information from the CausesBrutes,
CausesCalculees, and Ident les is already combined.</p>
      <p>For each language, ICD-10 dictionaries are supplied. From these dictionaries,
we retain the DiagnosisText (the text description of the given code) and Icd1
(the ICD-10 code) elds.</p>
      <p>For this task we discover that, with the exception of n-gram matching, model
performance improves in proportion to the volume and quality of reference data.
To this end, we pre-process both training and supplementary reference data to
maximize the model's coverage of terms in the text by standardizing the corpora
and reducing noise.
3.1</p>
      <sec id="sec-3-1">
        <title>Training Data</title>
        <p>We combine the raw format les with the demographic data to create an
alignedlike le for each language, since preliminary experiments reveal that using the
demographic information greatly improves classi cation results. We create a
concatenated input stream by appending each row of RawText with the YearCoded,
Gender, Age, IntType, IntValue, and LocationOfDeath that match the row's
DocID and LineID. If a document has multiple rows, each row has identical
appended data.</p>
        <p>After initial experiments, we determine that commas (,) in the RawText
eld often indicate multiple ICD-10 codes. Our results improve by splitting each
unaligned training row by commas prior to processing for the n-gram and the
CNN models, thus assigning n + 1 codes given a row with n commas. Removing
accents from the text does not improve performance on the training data, so
we keep all accents for all languages. We lowercase and strip the text from the
RawText eld of all punctuation symbols for all of our models.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Supplementary data</title>
        <p>
          For our word embedding models, we use publicly available monolingual Wikipedia
data from Linguatools [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. The supplementary data is pre-processed to reduce
the content to only alphabetical characters (including accented characters), since
the input RawText does not contain signi cant numeric data. In addition to
removing web URLs and extra whitespace, we remove any characters not in the
language-speci c lists in Table 1, and convert all text to lower case. After
preprocessing, we have over 9M lines of Hungarian (120M tokens), 21M lines of
Italian (370M tokens), and 32M lines of French (540M tokens) supplemental
data.
We experiment with pre-trained French, Hungarian, and Italian word embedding
models from the Facebook MUSE dataset [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], but discover that they have fairly
low vocabulary coverage on the training data. In order to get better coverage, we
train our own word embeddings using Wikipedia data (Section 3.2), which we
augment with the RawText eld from the training data and the dictionaries. The
training and dictionary texts are repeated N = 4 times, (N is chosen empirically
to increase the overall accuracy). We use the word2vec [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] implementation in
Gensim [18], with a context window size of 2 and a minimum word occurrence
frequency set to 4. We set the dimensions of our word embeddings to 100,
chosen for the ease of potentially aligning these vectors to other publicly available
vectors trained on medical datasets.
        </p>
        <p>Because we train the word embedding models on text from the provided
training data, we get 100% vocabulary coverage of the training data, and very
high coverage of the testing data. See Table 2 for coverage on the test set of
our word embeddings models compared to pre-trained models from Facebook
MUSE. We report vocabulary coverage as the percentage of types (i.e., distinct
words) and the percentage of tokens (i.e., all words) from the RawText eld that
can be found in the given word vector model.
5</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Models</title>
      <p>We use n-gram text matching followed by a variety of neural network models.
We implement the neural network models in PyTorch [16] with CUDA [14], and
train them on the sequence of word embeddings representing the text of each
line. We conduct 10-fold cross-validation on the training data in order to choose
the nal models and parameters, using scikit-learn's GroupKFold function
[17].
5.1</p>
      <sec id="sec-4-1">
        <title>N -gram matching</title>
        <p>
          The \ rst pass" of our pipeline checks whether the text string has an exact
match in the dictionary. If so, we use the corresponding ICD code as our
target. We experiment with spelling correction on the input text before matching
it to dictionary text, since we notice that there are spelling errors in the
training data. For spelling correction we use the pyenchant package [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. pyenchant
computes edit distance between a given string and vocabulary; the vocabulary
we supply in this case is the words in the dictionary entries. Because spellcheck
improves precision only on the French dataset, our French classi er is the only
one which performs spellcheck before the exact match procedure. Interestingly,
the French, Italian, and Hungarian datasets have notably di erent F1 scores
when this simple procedure is applied. This perhaps suggests a di erent model
or approach for each language might be necessary for classi cation rather than
one language-agnostic approach. Occasionally, the exact same text might map to
more than one ICD code. If this happens, the most frequent ICD code is taken
| frequency is computed across all training data in all languages (we assume
that the distribution of ICD codes is the same across year and region).
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>N -gram matching on the Hungarian dataset We note that the Hungarian</title>
        <p>dataset performs well with pattern matching, thus for one of our test runs, we
submit results obtained with a purely rule-based, pattern-matching classi er.
Speci cally, for the Hungarian data set, a \second pass" is performed in which
bigrams of the input text are matched to bigrams of dictionary text. Since there
is a many-to-many mapping between bigrams in the training text and bigrams
in the dictionary text, our chosen target ICD code is the one that matches the
largest number of bigrams from the input text. Again, ties are broken by choosing
the most frequent ICD code.
5.2</p>
      </sec>
      <sec id="sec-4-3">
        <title>Encoder-decoder</title>
        <p>Similar to work done by [21], we build an encoder-decoder model that takes as
input a sequence of words and outputs a sequence of ICD codes. We experiment
with one-hot vectors and various word embeddings as described in Section 4, and
obtain our best results using our own word embeddings trained on the Wikipedia
corpora augmented with training data and dictionary data.</p>
        <p>
          The encoder and decoder architectures consist of gated recurrent units (GRUs)
of hidden size 256, to which we apply a dropout of 0.1. The encoder-decoder
model is trained on pairs of sequences of tokens of hRawText, ICD10i for each
line of each death certi cate (i.e., for each LineID, DocID). The sequence of
ICD10 codes is ordered from left-to-right by increasing rank. The pre-processed
sequence of tokens from the RawText eld is converted to a sequence of
100dimensional word vectors, and the corresponding ICD-10 codes are converted to
one-hot word vectors. We also make use of the demographic information (i.e.,
the Gender, Age, LocationOfDeath, IntType, IntValue elds) and pass it
as input to the decoder as a zero-padded 100-dimensional vector. We train our
model for 10 epochs, using the AdaGrad algorithm [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and a learning rate of 0.1,
where each epoch iterates through the whole training set. The encoder-decoder
model training time depends on the size of the input dataset | it takes about
4 hours for the Italian dataset (the smallest one), and up to 12 hours for the
Hungarian dataset. Since we order the ICD-10 codes by rank during training,
our output sequence of ICD-10 codes is also ordered by rank. For example, for
an output of \T293 S299", we assign Rank 1 to \T293" and Rank 2 to \S299".
        </p>
        <p>We note that unlike the n-gram matching and CNN approaches, we do not
split the RawText on commas during training, and give as input the entire text
to the encoder-decoder. Since this model treats ICD-10 prediction as a
sequenceto-sequence model, it is able to \learn" the correct number of ICD-10 codes for
each line of text given, as well as the correct rank order.
5.3</p>
      </sec>
      <sec id="sec-4-4">
        <title>Convolutional neural network</title>
        <p>We also use a convolutional neural network (CNN). Although CNNs are typically
used for image processing, they have also achieved good results on text problems,
including tasks in the medical domain [19].</p>
        <p>For French and Hungarian data, we apply spelling correction to the text
before passing it to the CNN. On the training data, spelling correction does not
improve CNN results for Italian, so we do not correct the Italian text. All text
is then pre-processed as described in Section 3.</p>
        <p>
          The CNN model consists of lters of size 1 through 4 by the word embedding
size (100), a max pooling layer, and a softmax prediction layer. We use Adam [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]
as the optimizer, with a learning rate of 0.001, and train for 20 epochs. Compared
to the encoder-decoder model, the CNN is very fast to train: 5{20 minutes versus
4{12 hours for the encoder-decoder.
        </p>
        <p>One limitation of the CNN model is that it outputs exactly one ICD code
per input line. In order to overcome this limitation, we use the softmax output
probabilities from the last layer of the network as the input to the ensemble
model, which allows us to choose multiple codes per line for the nal output.
5.4</p>
      </sec>
      <sec id="sec-4-5">
        <title>Ensemble model</title>
        <p>We combine the outputs from the models discussed in the previous sections
using a rule-based ensemble model. This model takes as inputs the CNN softmax
probabilities for all the ICD codes, the codes predicted by the encoder-decoder
model (including their ranks), and the codes predicted by n-gram matching.</p>
        <p>We empirically choose thresholds to optimize the accuracy on the rst
crossvalidation fold of every language. For each ICD code, the ensemble model
determines whether they are assigned to the given RawText based on the following
rules:
(ICD 2 ende output and ICDende rank &lt; tende rank)</p>
        <p>or
(ICDcnn prob &gt; pcnn)</p>
        <p>or
(ICD 2 ngram output and ICDcnn prob &gt; pngram)</p>
        <p>For every ICD code, we rst check whether it is in the list of ICD-10 codes
producible by the encoder-decoder ende output, and then check whether its rank
ICDende rank falls below the threshold tende rank. Next, the ensemble model
checks whether the ICD code CNN probability ICDcnn prob is greater than the
threshold pcnn. Then, the model checks whether the ICD code is in the list of
n-gram matched codes ngram output and veri es whether the corresponding
CNN probability ICDcnn prob is greater than the threshold pngram. Here, the
CNN probabilities have two thresholds | one for the probabilities of all the ICD
codes, and one only for the n-gram matched codes. Any codes that pass one of
the checks described above are included in our prediction. The n-gram matched
codes are only used for French.</p>
        <p>The ensemble optimizes the rank threshold tende rank, and the probability
thresholds pcnn and pngram. See Table 3 for the threshold values for the three
languages.
In our nal submission, we use the n-gram and ensemble models for the
Hungarian dataset, and the encoder-decoder and ensemble models for both the Italian
and French datasets.</p>
        <p>For Hungarian, we achieve fairly good results using a simple n-gram matching
scheme. This suggests that the Hungarian ICD dictionary has better coverage
of the words in the death certi cate text than the dictionaries of the other two
languages. The lines in the Hungarian test data have an average length of 2.9
words (2.0 standard deviation), compared to 2.1 (1.5) for Italian and 3.3 (2.5)
for French.</p>
        <p>For the encoder-decoder, CNN, and ensemble model, we use the same
framework for all three languages. With training data, this model could easily be
extended to new languages.</p>
        <p>Our test results show low recall values for French, especially for the raw data.
We suspect that the test data includes ICD codes that our models had not seen
during training. A potential way to ensure that our model sees all possible ICD
codes would be to include the ICD dictionaries during training. However, in
our experiments, combining dictionary data with training data when training
the encoder-decoder model actually produced lower results in cross-validation
experiments. Augmenting data with the dictionary text skews the distribution
of ICD-10 codes, which in turn, a ects classi cation.</p>
        <p>Our best model achieves an F1 measure of 0.91 on the Hungarian language
test data using the ensemble model. An ANOVA on the F1 measures of our test
data reveals no signi cant e ect from either language (F = 3:712, p = 0:212),
model (F = 1:033, p = 0:492), or between the e ect of the two covariates
(F = 0:001, p = 0:982).</p>
        <p>We note that the amount of supplementary material is not proportional to
a language's vocabulary coverage. The French Wikipedia corpus has over three
times the quantity of tokens and sentences of the Hungarian, yet includes 20%
fewer terms (see Table 2). This implies that the source of supplemental material
has varied e cacy depending on the language. Note that spelling is not corrected
in the supplemental data, which may reduce the term coverage.
8</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We have shown that using in-domain data in addition to a large corpus of general
language-speci c data for word embeddings, along with neural network models,
can achieve fairly high performance on ICD coding in multiple languages. This
kind of model does not require any expert knowledge or feature engineering, and
therefore can be implemented for any language for which we have training data.</p>
      <p>The performance of this model could be improved in several ways. Our word
embedding models are trained on a large corpus of language-speci c Wikipedia
data, but we would expect a better representation if the embedding models were
trained on large corpora of text in the respective languages from the medical
domain, such as clinical notes or medical textbooks.</p>
      <p>
        A multilingual ICD classi cation model could also bene t from cross-lingual
representations such as word embeddings aligned in the same vector space [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
We were unable to get a good alignment for this task, but such an alignment
would allow us to train models on multiple languages, and apply them to any
language for which we have aligned word embeddings.
      </p>
      <p>This model could also potentially bene t from character-based embeddings
and RNN models, especially for agglutinative languages such as Hungarian,
which have complex morphology. For such languages, more pre-processing such
as morphological analysis could also help.</p>
      <p>This challenge exposes several aspects of disease prediction that must be
overcome if machine learning is to be applied globally to the electronic medical
record. Not least of these aspects includes di erences that occur between
languages { both in terms of their own structure and semantics but also in terms
of the availability of data resources from which models are to be trained.
Although we have made progress in terms of overall precision and recall, more
work remains.
14. Nickolls, J., Buck, I., Garland, M., Skadron, K.: Scalable parallel programming
with CUDA. Queue 6(2), 40{53 (2008). https://doi.org/10.1145/1365490.1365500,
http://doi.acm.org/10.1145/1365490.1365500
15. Nunzio, G.M.D., Beghini, F., Vezzani, F., Henrot, G.: A lexicon based approach
to classi cation of ICD10 codes. IMS Unipd at CLEF eHealth Task 1. In: CLEF
2017 Evaluation Labs and Workshop: Online Working Notes, CEUR-WS (2017)
16. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z.,
Desmaison, A., Antiga, L., Lerer, A.: Automatic di erentiation in pytorch. In:
Autodi Workshop at Neural Information Processing Systems (NIPS) 2017 (2017)
17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B.,
Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V.,
Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M.,
Duchesnay, E.: Journal of Machine Learning Research, vol. 12. MIT Press (2001),
https://dl.acm.org/citation.cfm?id=2078195
18. Rehurek, R., Sojka, P.: Software framework for topic modelling with large
corpora. In: Proceedings of the LREC 2010 Workshop on New
Challenges for NLP Frameworks. pp. 45{50. ELRA, Valletta, Malta (May 2010),
http://is.muni.cz/publication/884893/en
19. Shickel, B., Tighe, P., Bihorac, A., Rashidi, P.: Deep EHR: A survey of recent
advances in deep learning techniques for electronic health record (EHR) analysis
(June 2017). https://doi.org/10.1109/JBHI.2017.2767063
20. Suominen, H., Kelly, L., Goeuriot, L., Kanoulas, E., Azzopardi, L., Spijker, R.,
Li, D., Neveol, A., Ramadier, L., Robert, A., Palotti, J., Jimmy, Zuccon, G.:
Overview of the CLEF eHealth Evaluation Lab 2018. In: CLEF 2018 - 8th
Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science
(LNCS). Springer (2018)
21. Tutubalina, E., Miftahutdinov, Z.: An Encoder-Decoder Model for
ICD10 Coding of Death Certi cates. In: Machine Learning for Health
Workshop at Neural Information Processing Systems (NIPS) 2017 (dec 2017),
http://arxiv.org/abs/1712.01213
22. Seva, J., Kittner, M., Roller, R., Leser, U.: Multi-lingual ICD-10 coding using a
hybrid rule-based and supervised classi cation approach at CLEF eHealth 2017.
In: CLEF 2017 Evaluation Labs and Workshop: Online Working Notes, CEUR-WS
(2017)
23. World Health Organization: International statistical classi cations of diseases and
related health problems. 10th rev, vol. 1. World Health Organization, Geneva,
Switzerland (2008)
24. Zweigenbaum, P., Lavergne, T.: Multiple methods for multi-class, multi-label
ICD10 coding of multi-granularity, multilingual death certi cates. In: CLEF 2017
Evaluation Labs and Workshop: Online Working Notes, CEUR-WS (2017)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Cabot</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soualmia</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          :
          <source>SIBM at CLEF eHealth Evaluation Lab</source>
          <year>2017</year>
          :
          <article-title>Multilingual information extraction with CIM-IND</article-title>
          . In:
          <article-title>CLEF 2017 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Conneau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jegou</surname>
          </string-name>
          , H.:
          <article-title>Word translation without parallel data</article-title>
          .
          <source>arXiv preprint arXiv:1710.04087</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Duchi</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hazan</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singer</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Adaptive subgradient methods for online learning and stochastic optimization</article-title>
          .
          <source>Journal of Machine Learning Research 12(Jul)</source>
          ,
          <volume>2121</volume>
          {
          <fpage>2159</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ho-Dac</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fabre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Birski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boudraa</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bourriot</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cassier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delvenne</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Gonzalez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>E.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piccinini</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohrbacher</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seguier</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>LITL at CLEF eHealth2017: Automatic classi cation of death reports</article-title>
          . In:
          <article-title>CLEF 2017 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kelly</surname>
          </string-name>
          , R.:
          <article-title>Python bindings for the enchant spellchecker (</article-title>
          <year>2017</year>
          ), https://github.com/rfk/pyenchant, Last accessed on 2018-
          <volume>05</volume>
          -24
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>In: International Conference on Learning Representations (dec</source>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kolb</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Linguatools;
          <article-title>Wikipedia monolingual corpora (</article-title>
          <year>2018</year>
          ), http://linguatools.org/tools/corpora/wikipedia-monolingual-corpora/,
          <source>Last accessed on 2018-05-24</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ranzato</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Unsupervised machine translation using monolingual corpora only</article-title>
          .
          <source>arXiv preprint arXiv:1711.00043</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Liang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          , Zhang, G.,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>Deep learning for healthcare decision making with EMRs</article-title>
          .
          <source>In: 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)</source>
          . pp.
          <volume>556</volume>
          {
          <fpage>559</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (November
          <year>2014</year>
          ). https://doi.org/10.1109/BIBM.
          <year>2014</year>
          .
          <volume>6999219</volume>
          , http://ieeexplore.ieee.org/document/6999219/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>26</volume>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>R.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavergne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rondet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>CLEF eHealth 2017 Multilingual Information</surname>
          </string-name>
          <article-title>Extraction task Overview: ICD10 Coding of Death Certi cates in English and French</article-title>
          . In:
          <article-title>CLEF 2017 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grippo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morgand</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orsi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelikan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramadier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>CLEF eHealth 2018 Multilingual Information</surname>
          </string-name>
          <article-title>Extraction task Overview: ICD10 coding of death certi cates in French, Hungarian and Italian</article-title>
          . In:
          <article-title>CLEF 2018 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tran</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wickramasinghe</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venkatesh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Deepr: A convolutional net for medical records</article-title>
          .
          <source>IEEE Journal of Biomedical and Health Informatics</source>
          <volume>21</volume>
          (
          <issue>1</issue>
          ),
          <volume>22</volume>
          {
          <fpage>30</fpage>
          (
          <year>July 2017</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>