<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>WBI at CLEF eHealth 2018 Task 1: Language-independent ICD-10 coding using multi-lingual embeddings and recurrent neural networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jurica Seva</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mario Sanger</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ulf Leser</string-name>
          <email>leserg@informatik.hu-berlin.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Humboldt-Universitat zu Berlin</institution>
          ,
          <addr-line>Knowledge Management in Bioinformatics, Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of the WBI team in the CLEF eHealth 2018 shared task 1 (\Multilingual Information Extraction - ICD-10 coding"). Our contribution focus on the setup and evaluation of a baseline language-independent neural architecture for ICD-10 classi cation as well as a simple, heuristic multi-language word embedding space. The approach builds on two recurrent neural networks models to extract and classify causes of death from French, Italian and Hungarian death certi cates. First, we employ a LSTM-based sequenceto-sequence model to obtain a death cause from each death certi cate line. We then utilize a bidirectional LSTM model with attention mechanism to assign the respective ICD-10 codes to the received death cause description. Both models take multi-language word embeddings as inputs. During evaluation our best model achieves an F-score of 0.34 for French, 0.45 for Hungarian and 0.77 for Italian. The results are encouraging for future work as well as the extension and improvement of the proposed baseline system.</p>
      </abstract>
      <kwd-group>
        <kwd>ICD-10 coding</kwd>
        <kwd>Biomedical information extraction</kwd>
        <kwd>Multi- lingual sequence-to-sequence model</kwd>
        <kwd>Represention learning</kwd>
        <kwd>Recurrent neural network</kwd>
        <kwd>Attention mechanism</kwd>
        <kwd>Multi-language embeddings</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Automatic extraction, classi cation and analysis of biological and medical
concepts from unstructured texts, such as scienti c publications or electronic health
documents, is a highly important task to support many applications in research,
daily clinical routine and policy-making. Computer-assisted approaches can
improve decision making and support clinical processes, for example, by giving
a more sophisticated overview about a research area, providing detailed
information about the aetiopathology of a patient or disease patterns. In the past
years major advances have been made in the area of natural-language
processing (NLP). However, improvements in the eld of biomedical text mining lag
behind other domains mainly due to privacy issues and concerns regarding the
processed data (e.g. electronic health records).</p>
      <p>
        The CLEF eHealth lab1 attends to circumvent this situation through
organization of various shared tasks to exploit electronically available medical content
[
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]. In particular, Task 12 of the lab is concerned with the extraction and
classi cation of death causes from death certi cates originating from di erent
languages [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. Participants were asked to classify the death causes mentioned in
the certi cates according to the International Classi cation of Disease version 10
(ICD-10)3. The task was concerned with French and English death certi cates
in previous years. In contrast, this year the organizers provided annotated death
reports as well as ICD-10 dictionaries for French, Italian and Hungarian. The
development of language-independent, multilingual approaches was encouraged.
      </p>
      <p>
        Inspired by the recent success of recurrent neural network models (RNN)
[
        <xref ref-type="bibr" rid="ref10 ref20 ref6">6,20,10</xref>
        ] in general and the convincing performance of the work from
Miftahutdinov and Tutubalina [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] in the last edition of the lab, we opt for the
development of a deep learning model for this year's competition. Our work introduces
a prototypical, language independent approach for ICD-10 classi cation using
multi-language word embeddings and long short-term memory models (LSTMs).
We divide the proposed pipeline into two tasks. First, we perform named entity
recognition (NER), i.e. extract the death cause description from a certi cate
line, with an an encoder-decoder model. Given the death cause, named entity
normalization (NEN), i.e. assigning an ICD-10 code to extracted death cause,
is performed by a separate LSTM. Our approach builds upon a heuristic
multilanguage embedding space and therefore only needs one single model for all three
data sets. With this work we want to experiment and evaluate which performance
can be achieved with such a simple shared embedding space.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>This section highlights previous work related to our approach. We give a brief
introduction to the methodical foundations of our work, RNNs and word
embeddings. The section concludes with a summary of ICD-10 classi cation approaches
used in previous eHealth Lab competitions.
2.1</p>
      <p>
        Recurrent neural networks (RNN)
RNNs are a widely used technique for sequence learning problems such as
machine translation [
        <xref ref-type="bibr" rid="ref1 ref6">1,6</xref>
        ], image captioning [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], NER [
        <xref ref-type="bibr" rid="ref20 ref40">20,40</xref>
        ], dependency parsing
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and part-of-speech tagging [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]. RNNs model dynamic temporal behaviour
in sequential data through recurrent units, i.e. the hidden, internal state of a
unit in one time step depends on the state of the unit in the previous time step.
      </p>
      <sec id="sec-2-1">
        <title>1 https://sites.google.com/site/clefehealth/</title>
      </sec>
      <sec id="sec-2-2">
        <title>2 https://sites.google.com/view/clef-ehealth-2018/task-1-multilingual-information</title>
        <p>extraction-icd10-coding</p>
      </sec>
      <sec id="sec-2-3">
        <title>3 http://www.who.int/classi cations/icd/en/</title>
        <p>These feedback connections enable the network to memorize information from
recent time steps and add the ability to capture long-term dependencies.</p>
        <p>
          However, training of RNNs can be di cult due to the vanishing gradient
problem [
          <xref ref-type="bibr" rid="ref16 ref3">16,3</xref>
          ]. The most widespread modi cations of RNNs to overcome this
problem are LSTMs [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and gated recurrent units (GRU) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Both modi cations
use gated memories which control and regulate the information ow between two
recurrent units. A common LSTM unit consists of a cell and three gates, an input
gate, an output gate and a forget gate. In general, LSTMs are chained together
by connecting the outputs of the previous unit to the inputs of the next one.
        </p>
        <p>A further extension of the general RNN architecture are bidirectional
networks, which make the past and future context available in every time step. A
bidirectional LSTM model consists of a forward chain, which processes the
input data from left to right, and and backward chain, consuming the data in the
opposite direction. The nal representation is typically the concatenation or a
linear combination of both states.
2.2</p>
        <p>
          Word Embeddings
Distributional semantic models (DSMs) have been researched for decades in NLP
[
          <xref ref-type="bibr" rid="ref37">37</xref>
          ]. Based on a huge amount of unlabeled texts, DSMs aim to represent words
using a real-valued vector (also called embedding) which captures syntactic and
semantic similarities between the words. Starting with the publication of the
work from Collobert et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] in 2011, learning embeddings for linguistic units,
such as words, sentences or paragraphs, is one of the hot topics in NLP and a
plethora of approaches have been proposed [
          <xref ref-type="bibr" rid="ref23 ref28 ref30 ref4">4,23,28,30</xref>
          ].
        </p>
        <p>
          The majority of todays embedding models are based on deep learning models
trained to perform some kind of language modeling task [
          <xref ref-type="bibr" rid="ref29 ref30 ref32">29,30,32</xref>
          ]. The most
popular embedding model is the Word2Vec model introduced by Mikolov et
al. [
          <xref ref-type="bibr" rid="ref22 ref23">22,23</xref>
          ]. They propose two shallow neural network models, continuous
bag-ofwords (CBOW) and SkipGram, that are trained to reconstruct the context given
a center word and vice versa. In contrast, Pennington et al. [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ] use the ratio
between co-occurrence probabilities of two words with another one to learn a vector
representation. In [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] multi-layer, bi-directional LSTM models are utilized to
learn word embeddings that also capture di erent contexts of it.
        </p>
        <p>
          Several recent models focus on the integration of subword and
morphological information to provide suitable representations even for unseen,
out-ofvocabulary words. For example, Pinter et al. [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ] try to reconstruct a pre-trained
word embedding by learning a bi-directional LSTM model on character level.
Similarly, Bojanowski et al. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] adapt the SkipGram by taking character n-grams
into account. Their fastText model assigns a vector representation to each
character n-gram and represents words by summing over all of these representations
of a word.
        </p>
        <p>
          In addition to embeddings that capture word similarities in one language,
multi- and cross-lingual approaches have also been investigated. Proposed
methods either learn a linear mapping between monolingual representations [
          <xref ref-type="bibr" rid="ref12 ref41">12,41</xref>
          ]
or utilize word- [
          <xref ref-type="bibr" rid="ref13 ref38">13,38</xref>
          ], sentence- [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] or document-aligned [
          <xref ref-type="bibr" rid="ref36">36</xref>
          ] corpora to build
a shared embedding space.
2.3
        </p>
        <p>
          ICD-10 Classi cation
The ICD-10 coding task has already been carried out in the 2016 [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] and 2017
[
          <xref ref-type="bibr" rid="ref25">25</xref>
          ] edition of the eHealth lab. Participating teams used a plethora of di
erent approaches to tackle the classi cation problem. The methods can essentially
be divided into two categories: knowledge-based [
          <xref ref-type="bibr" rid="ref18 ref24 ref5">5,18,24</xref>
          ] and machine learning
(ML) approaches [
          <xref ref-type="bibr" rid="ref11 ref15 ref21 ref8">8,11,15,21</xref>
          ]. The former relies on lexical sources, medical
terminologies and other dictionaries to match (parts of) the certi cate text with
entries from the knowledge-bases according to a rule framework. For example, Di
Nunzio et al. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] calculate a score for each ICD-10 dictionary entry by summing
the binary or tf-idf weights of each term of a certi cate line segment and assign
the ICD-10 code with the highest score. In contrast, Ho-Dac et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] treat the
problem as information retrieval task and utilize the Apache Solr search engine4
to classify the individual lines.
        </p>
        <p>
          The ML-based approaches employ a variety of techniques, e.g. Conditional
Random Fields (CRFs) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], Labeled Latent Dirichlet Analysis (LDA) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and
Support Vector Machines (SVMs) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] with diverse hand-crafted features. Most
similar to our approach is the work from Miftahutdinov and Tutubalina [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ],
which achieved the best results for English certi cates in the last year's
competition. They use a neural LSTM-based encoder-decoder model that processes
the raw certi cate text as input and encodes it into a vector representation.
Additionally, a vector which captures the textual similarity between the certi cate
line and the death causes of the individual ICD-10 codes is used to integrate
prior knowledge into the model. The concatenation of both vector
representations is then used to output the characters and numbers of the ICD-10 code in
the decoding step. In contrast to their work, our approach introduces a model
for multi-language ICD-10 classi cation. Moreover, we divide the task into two
distinct steps: death cause extraction and ICD-10 classi cation.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <p>Our approach models the extraction and classi cation of death causes as two-step
process. First, we employ a neural, multi-language sequence-to-sequence model
to receive a death cause description for a given death certi cate line. We then
use a second classi cation model to assign the respective ICD-10 codes to the
obtained death cause. The remainder of this section gives a detailed explanation
of the architecture of the two models.
3.1</p>
      <p>Death Cause Extraction Model
The rst step in our pipeline is the extraction of the death cause from a given
certi cate line. We use the training certi cate lines (with their corresponding</p>
      <sec id="sec-3-1">
        <title>4 http://lucene.apache.org/solr/</title>
        <p>ICD-10 codes) and the ICD-10 dictionaries as basis for our model. The
dictionaries provide us with death causes for each ICD-10 code. The goal of the model
is to reassemble the dictionary death cause text from the certi cate line.</p>
        <p>
          For this we adopt the encoder-decoder architecture proposed in [
          <xref ref-type="bibr" rid="ref35">35</xref>
          ]. Figure
1 illustrates the architecture of the model. As encoder we utilize a unidirectional
LSTM model, which takes the single words of a certi cate line as inputs and scans
the line from left to right. Each token is represented using pre-trained fastText5
word embeddings [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We utilize fastText embedding models for French, Italian
and Hungarian trained on Common Crawl and Wikipedia articles6.
Independently from the original language a word we represent it by looking up the word
in all three embedding models and concatenate the obtained vectors. Through
this we get a simple multi-language representation of the word. This heuristic
composition constitutes a naive solution to build a multi-language embedding
space. However we opted to evaluate this approach as a simple baseline for
future work. Encoders' nal state represents the semantic representation of the
certi cate line and serves as initial input for decoding process.
        </p>
        <p>Encoder</p>
        <p>Decoder
Input
colite
infectieuse
ou
ischemique
\s</p>
        <p>For the decoder we utilize another LSTM model. The initial input of the
decoder is the nal state of the encoder model. Moreover, each token of the
5 https://github.com/facebookresearch/fastText/
6
https://github.com/facebookresearch/fastText/blob/master/docs/crawlvectors.md
dictionary death cause text (padded with special start and end tag) serves as
(sequential) input. Again, we use fastText embeddings of all three languages
to represent the input tokens. The decoder predicts one-hot-encoded words of
the death cause. During test time we use the encoder to obtain a semantic
representation of the certi cate line and decode the death cause description
word by word starting with the special start tag. The decoding process nishes
when the decoder outputs the end tag.
3.2</p>
        <p>
          ICD-10 Classi cation Model
The second step in our pipeline is to assign a ICD-10 code to the generated
death cause description. For this we employ a bidirectional LSTM model which
is able to capture the past and future context for each token of a death cause
description. Just as in our encoder-decoder model we encode each token using the
concatenation of the fastText embeddings of the word from all three languages.
To enable our model to attend to di erent parts of the death cause we add
an extra attention layer [
          <xref ref-type="bibr" rid="ref33">33</xref>
          ] to the model. Through the attention mechanism
our model learns a xed-sized embedding of the death cause description by
computing an adaptive weighted average of the state sequence of the LSTM
model. This allows the model to better integrate information over time. Figure 2
presents the architecture of our ICD-10 classi cation model. We train the model
using the provided ICD-10 dictionaries from all three languages.
        </p>
        <p>Input
Embedding
Bi - LSTM
Attention
Softmax
Output
colite
ischémique</p>
        <p>sigmoïde
Death Cause Emb.</p>
        <p>K559
French Embedding</p>
        <p>Italian Embedding</p>
        <p>Hungarian Embedding
Fig. 2. Illustration of the ICD-10 classi cation model. The model utilizes a
bidirectional LSTM layer, which processes the death cause from left to right and vice
versa. The attention layer summarizes the whole description by computing an
adaptive weighted average over the LSTM states. The resulting death cause embedding
will be feed through a softmax layer to get the nal classi cation. Equivalent to our
encoder-decoder model all input tokens will be represented using the concatenation of
the fastText embeddings of all three languages.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments and Results</title>
      <p>In this section we will present experiments and obtained results for the two
developed models, both individually as well as combined in a pipeline setting.
4.1</p>
      <p>Training Data and Experiment Setup
The CLEF e-Health 2018 Task 1 participants where provided with annotated
death certi cates for the three selected languages: French, Italian and Hungarian.
Each of the languages is supported by training certi cate lines as well as a
dictionary with death cause descriptions resp. diagnosis for the di erent ICD-10
codes. The provided training data sets were imbalanced concerning the di erent
languages: the Italian corpora consists of 49,823, French corpora of 77,3487 and
Hungarian corpora 323,175 certi cate lines. We split each data set into a training
and a hold-out evaluation set. The complete training data set was then created by
combining the certi cate lines of all three languages into one data set. Beside the
provided certi cate data we used no additional knowledge resources or annotated
texts.</p>
      <p>Due to time constraints during development no cross-validation to optimize
the (hyper-) parameters and the individual layers of our models was performed.
We either keep the default values of the hyper-parameters or set them to
reasonable values according to existing work. During model training we shu e the
training instances and use varying instances to perform a validation of the epoch.</p>
      <p>Pre-trained fastText word embeddings were trained using the following
parameter settings: CBOW with position-weights, embedding dimension size 300,
with character n-grams of length 5, a window of size 5 and 10 negative
samples. Unfortunately, they are trained on corpora not related with the biomedical
domain and therefore do not represent the best possible textual basis for an
embedding space for biomedical information extraction. Final embedding space
used by our models is created by concatenating individual embedding vectors
for all three languages. Thus the input of our model is embedding vector of size
900. All models were implemented with the Keras8 library.
4.2</p>
      <p>Death cause extraction model
To identify possible candidates for a death cause description, we focus on the use
of an encoder-decoder model. The encoder model uses an embedding layer with
input masking on zero values and a LSTM layer with 256 units. The encoders'
output is used as the initial state of the decoder model.</p>
      <p>
        Based on the input description from the dictionary and a special start token,
the decoder generates a death cause word by word. This decoding process
continues until a special end token is generated. The entire model is optimized using
the Adam optimization algorithm [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] and a batch size of 700. Model training
      </p>
      <sec id="sec-4-1">
        <title>7 For French we only took the provided data set from 2014.</title>
      </sec>
      <sec id="sec-4-2">
        <title>8 https://keras.io/</title>
        <p>was performed either for 100 epochs or until an early stopping criteria is met
(no change in validation loss for two epochs).</p>
        <p>As the provided data set are imbalanced regarding the tasks' languages,
we devised two di erent evaluation settings: (1) DCEM-Balanced, where each
language was supported by 49.823 randomly drawn instances (size of the smallest
corpus) and (2) DCEM-Full, where all available data is used. Table 4.2 shows
the results obtained on the training and validation set. The gures indicate that
the distribution of training instances per language have a huge in uence on the
performance of the model. The model trained on the full training data achieves
an accuracy of 0.678 on the validation set. In contrast using the balanced data
set the model reaches an accuracy of 0.899 (+ 32.5%).</p>
        <p>Setting Trained Epochs AccuraTcryain Loss AccVuraalicdyatioLnoss
DCEM-Balanced 18 0.958 0.205 0.899 0.634
DCEM-Full 9 0.709 0.098 0.678 0.330
Table 1. Experiment results of our death cause extraction sequence-to-sequence model
concerning balanced (equal number of training instances per language) and full data
set setting.
4.3</p>
        <p>ICD-10 Classi cation Model
The classi cation model is responsible for assigning a ICD-10 code to death
cause description obtained during the rst step. Our model uses an embedding
layer with input masking on zero values, followed by a bidirectional LSTM layer
with 256 dimension hidden layer. Thereafter an attention layer builds an
adaptive weighted average over all LSTM states. The respective ICD-10 code will
be determined by a dense layer with softmax activation function. We use the
Adam optimizer to perform model training. The model was validated on 25% of
the data. As for the extraction model, no cross-validation or hyper-parameter
optimization was performed.</p>
        <p>Once again, we devised two approaches. This was mainly caused by the lack of
adequate training data in terms of coverage for individual ICD-10 codes.
Therefore, we de ned two training data settings: (1) minimal (ICD-10 Minimal), where
only ICD-10 codes with two or more supporting training instances are used. This
leaves us with 6,857 unique ICD-10 codes and discards 2,238 unique codes with
support of one. This, of course, minimizes the number of ICD-10 codes in the
label space. Therefore, (2) an extended (ICD-10 Extended) data set was de ned.
Here, the original ICD-10 code mappings, found in the supplied dictionaries, are
extended with the training instances from individual certi cate data from the
three languages. This generates 9,591 unique ICD-10 codes. Finally, for the
remaining ICD-10 codes that have only one supporting description, we duplicate
those data points.</p>
        <p>The goal of this approach is to extend our possible label space to all available
ICD-10 codes. The results obtained from the two approaches on the validation
set are shown in Table 4.3. Using the minimal data set the model achieves an
accuracy of 0.937. In contrast, using the extended data set the model reaches an
accuracy of 0.954 which represents an improvement of 1.8%.</p>
        <sec id="sec-4-2-1">
          <title>Setting</title>
          <p>ICD-10 Minimal
ICD-10 Extended*</p>
        </sec>
        <sec id="sec-4-2-2">
          <title>Trained Epochs</title>
          <p>The two models where combined to create the nal pipeline. We tested both
death cause extraction models (based on the balanced and unbalanced data set)
in the nal pipeline, as their performance di ers greatly. On the contrary, both
ICD-10 classi cation models perform similarly, so we just used the extended
ICD-10 classi cation model, with word level tokens9, in the nal pipeline. To
evaluate the pipeline we build a training and a hold-out validation set during
development. The obtained results on the validation set are presented in Table
4.4. The scores are calculated using a prevalence-weighted macro-average across
the output classes, i.e. we calculated precision, recall and F-score for each ICD-10
code and build the average by weighting the scores by the number occurrences
of the code in the gold standard.</p>
          <p>Although the individual models, as shown in Tables 4.2 and 4.3 are promising,
the performance decreases considerably in a pipeline setting . The pipeline model
based on the balanced data set reaches a F-score of 0.61, whereas the full model
achieves a slightly higher value of 0.63. Both model con gurations have a higher
precision than recall (0.73/0.61 resp. 0.74/0.62).</p>
          <p>This can be contributed to several factors. First of all, a pipeline architecture
always su ers from error-propagation, i.e. errors in a previous step will in uence
the performance of the following layers and generally lower the performance of
the overall system. Investigating the obtained results, we found that the
imbalanced distribution of ICD-10 codes represents one the main problems. This</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>9 Although models supporting character level tokens were developed and evaluated,</title>
        <p>their performance fared poorly compared to the word level tokens.</p>
        <sec id="sec-4-3-1">
          <title>Model</title>
          <p>Final-Balanced
Final-Full
severely impacts the decoder-encoder architecture used here as the token
generation is biased towards the available data points. Therefore the models misclassify
certi cate lines associated with ICD-10 codes that only have a small number of
supporting training instances very often.</p>
          <p>Results obtained on the test data set, resulting from the two submitted o cial
runs, are shown in Table 4. Similar to the evaluation results during development,
the model based on the full data set performs slightly better than the model
trained on the balanced data set. The full model reaches a F-score of 0.34 for
French, 0.45 for Hungarian and 0.77 for Italian. All of our approaches perform
below the mean and median averages of all participants.</p>
          <p>Surprisingly, there is a substantial di erence in results obtained between the
individual languages. This con rms our assumptions about the (un-) suitability
of the proposed multi-lingual embedding space for this task. The results also
suggest that the size of the training corpora is not in uencing the nal results.
As seen, best results were obtained on the Italian data set were trained on the
smallest corpora. Worst results were obtained on the middle, French, corpus
while the biggest corpus, Hungarian, is in second place.</p>
          <p>We identi ed several possible reasons for the obtained results. These also
represent (possible) points for future work. One of the main disadvantages of our
approach is the quality of the used word embeddings as well as the properties
of the proposed language-independent embedding space. The usage of
out-ofdomain word embeddings which aren't targeted to the biomedical domain are
likely a suboptimal solution to this problem. We tried to alleviate this by
nding suitable external corpora to train domain-dependent word embeddings for
each of the supported languages, however we were unable to nd any signi cant
amount of in-domain documents (e.g. PubMed search for abstracts in either
French, Hungarian or Italian found 7843, 786 and 1659 articles respectively).
Furthermore, we used a simple, heuristic solution by just concatenating the
embeddings of all three languages to build a shared vector space.</p>
          <p>Besides the issues with the used word embeddings, the inability to obtain full
ICD-10 dictionaries for the selected languages has also negatively in uenced the
results. As a nal limitation to our approach, lack of multi-label classi cation
support has also been identi ed (i.e. not recognizing more than one death cause
in a single input text).</p>
        </sec>
        <sec id="sec-4-3-2">
          <title>Language</title>
          <p>French
Hungarian
Italian</p>
        </sec>
        <sec id="sec-4-3-3">
          <title>Model</title>
          <p>Final-Balanced
Final-Full
Baseline
Average
Median
Final-Balanced
Final-Full
Baseline
Average
Median
Final-Balanced
Final-Full
Baseline
Average
Median
In this paper we tackled the problem of information extraction of death causes
in an multilingual environment. The proposed solution was focused on the setup
and evaluation of an initial language-independent model which relies on a
heuristic mutual word embedding space for all three languages. The proposed pipeline
is divided in two steps: possible token describing the death cause are generated
by using a sequence to sequence model rst. Afterwards the generated token
sequence is normalized to a ICD-10 code using a distinct LSTM-based classi cation
model with attention mechanism. During evaluation our best model achieves an
F-score of 0.34 for French, 0.45 for Hungarian and 0.77 for Italian. The obtained
results are encouraging for further investigation however can't compete with the
solutions of the other participants yet.</p>
          <p>We detected several issues with the proposed pipeline. These issues serve
as prospective future work to us. First of all the representation of the input
words can be improved in several ways. The word embeddings we used are not
optimized to the biomedical domain but are trained on general text. Existing
work was proven that in-domain embeddings improve the quality of achieved
results. Although this was our initial approach, the di culties of nding adequate
in-domain corpora for selected languages has proven to be to a hard to tackle.
Moreover, the multi-language embedding space is currently heuristically de ned
as concatenation of the three word embeddings models for individual tokens.
Creating an uni ed embedding space would create a truly language-independent
token representation. The improvement of the input layer will be the main focus
of our future work.</p>
          <p>The ICD-10 classi cation step also su ers from lack of adequate training
data. Unfortunately, we were unable to obtain extensive ICD-10 dictionaries for
all languages and therefore can't guarantee the completeness of the ICD-10 label
space. Another disadvantage of the current pipeline is the missing support for
multi-label classi cation.</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.:</given-names>
          </string-name>
          <article-title>Neural machine translation by jointly learning to align and translate</article-title>
          .
          <source>In: Proceedings of the 6th International Conference on Learning Representations (ICLR</source>
          <year>2018</year>
          )
          <article-title>(</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jaitly</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shazeer</surname>
          </string-name>
          , N.:
          <article-title>Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          <volume>28</volume>
          , pp.
          <volume>1171</volume>
          {
          <fpage>1179</fpage>
          . Curran Associates, Inc. (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simard</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frasconi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Learning long-term dependencies with gradient descent is di cult</article-title>
          .
          <source>IEEE transactions on neural networks 5(2)</source>
          ,
          <volume>157</volume>
          {
          <fpage>166</fpage>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bojanowski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grave</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joulin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Enriching Word Vectors with Subword Information</article-title>
          .
          <source>Transactions of the Association of Computational Linguistics</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <volume>135</volume>
          {
          <fpage>146</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cabot</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soualmia</surname>
            ,
            <given-names>L.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dahamna</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Darmoni</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          :
          <source>SIBM at CLEF eHealth Evaluation Lab</source>
          <year>2016</year>
          :
          <article-title>Extracting Concepts in French Medical Texts with ECMT and CIMIND</article-title>
          .
          <source>In: CLEF 2015 Online Working Notes. CEUR-WS</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Cho</surname>
          </string-name>
          , K.,
          <string-name>
            <surname>van Merrienboer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gulcehre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bahdanau</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bougares</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schwenk</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Learning Phrase Representations using RNN Encoder{Decoder for Statistical Machine Translation</article-title>
          .
          <source>In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1724</volume>
          {
          <fpage>1734</fpage>
          . Association for Computational Linguistics, Doha, Qatar (
          <year>October 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Collobert</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weston</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karlen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kavukcuoglu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kuksa</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Natural language processing (almost) from scratch</article-title>
          .
          <source>Journal of Machine Learning Research 12(Aug)</source>
          ,
          <volume>2493</volume>
          {
          <fpage>2537</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dermouche</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Looten</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flicoteaux</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chevret</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velcin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taright</surname>
          </string-name>
          , N.:
          <article-title>ECSTRA-INSERM@ CLEF eHealth2016-task 2: ICD10 Code Extraction from Death Certi cates</article-title>
          .
          <source>In: CLEF 2016 Online Working Notes</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Di</given-names>
            <surname>Nunzio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.M.</given-names>
            ,
            <surname>Beghini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Vezzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Henrot</surname>
          </string-name>
          , G.:
          <article-title>A Lexicon Based Approach to Classi cation of ICD10 Codes. IMS Unipd at CLEF eHealth Task</article-title>
          .
          <source>In: CLEF 2017 Online Working Notes. CEUR-WS</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matthews</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>N.A.</given-names>
          </string-name>
          :
          <article-title>Transition-Based Dependency Parsing with Stack Long Short-Term Memory</article-title>
          .
          <source>In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          <article-title>)</article-title>
          .
          <source>vol. 1</source>
          , pp.
          <volume>334</volume>
          {
          <issue>343</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ebersbach</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herms</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eibl</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Fusion Methods for ICD10 Code Classi cation of Death Certi cates in Multilingual Corpora</article-title>
          .
          <source>In: CLEF 2017 Online Working Notes. CEUR-WS</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Faruqui</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Improving vector space word representations using multilingual correlation</article-title>
          .
          <source>In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics</source>
          . pp.
          <volume>462</volume>
          {
          <issue>471</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Che</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yarowsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Cross-lingual dependency parsing based on distributed representations</article-title>
          .
          <source>In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</source>
          (Volume
          <volume>1</volume>
          :
          <string-name>
            <given-names>Long</given-names>
            <surname>Papers</surname>
          </string-name>
          <article-title>)</article-title>
          .
          <source>vol. 1</source>
          , pp.
          <volume>1234</volume>
          {
          <issue>1244</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ho-Dac</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fabre</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Birski</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boudraa</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bourriot</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cassier</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delvenne</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Gonzalez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            ,
            <given-names>E.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piccinini</surname>
          </string-name>
          , E.:
          <article-title>LITL at CLEF eHealth2017: automatic classi cation of death reports</article-title>
          .
          <source>In: CLEF 2017 Online Working Notes. CEUR-WS</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ho-Dac</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanguy</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grauby</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mby</surname>
            ,
            <given-names>A.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Malosse</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riviere</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>VeltzMauclair</surname>
          </string-name>
          , A.:
          <article-title>LITL at CLEF eHealth2016: recognizing entities in French biomedical documents</article-title>
          .
          <source>In: CLEF 2016 Online Working Notes. CEUR-WS</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frasconi</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Gradient ow in recurrent nets: the di culty of learning long-term dependencies. A eld guide to dynamical recurrent neural networks</article-title>
          . IEEE Press (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Hochreiter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidhuber</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Long short-term memory</article-title>
          .
          <source>Neural computation 9(8)</source>
          ,
          <volume>1735</volume>
          {
          <fpage>1780</fpage>
          (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Jonnagaddala</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Automatic coding of death certi cates to ICD-10 terminology</article-title>
          .
          <source>In: CLEF 2017 Online Working Notes. CEUR-WS</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Kingma</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ba</surname>
          </string-name>
          , J.:
          <article-title>Adam: A method for stochastic optimization</article-title>
          .
          <source>In: Proceedings of the 3rd International Conference on Learning Representations (ICLR)</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Lample</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ballesteros</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Subramanian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawakami</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dyer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Neural Architectures for Named Entity Recognition</article-title>
          .
          <source>In: Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>260</volume>
          {
          <issue>270</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Miftahutdinov</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tutubalina</surname>
          </string-name>
          , E.:
          <article-title>Kfu at clef ehealth 2017 task 1: Icd-10 coding of english death certi cates with recurrent neural networks</article-title>
          .
          <source>In: CLEF 2017 Online Working Notes. CEUR-WS</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E cient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>van Mulligen</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Afzal</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akhondi</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kors</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Erasmus</surname>
            <given-names>MC</given-names>
          </string-name>
          <source>at CLEF eHealth</source>
          <year>2016</year>
          :
          <article-title>Concept Recognition and Coding in French Texts</article-title>
          .
          <source>In: CLEF 2016 Online Working Notes. CEUR-WS</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>R.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavergne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rondet</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>CLEF eHealth 2017 Multilingual Information</surname>
          </string-name>
          <article-title>Extraction task overview: ICD10 coding of death certi cates in English and French</article-title>
          . In:
          <article-title>CLEF 2017 Evaluation Labs</article-title>
          and Workshop: Online Working Notes, CEUR-WS. p.
          <volume>17</volume>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hamon</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavergne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannier</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016</article-title>
          .
          <source>CEUR workshop proceedings 1609</source>
          , 28{42 (
          <year>September 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grippo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morgand</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orsi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelikan</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramadier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <string-name>
            <surname>CLEF eHealth 2018 Multilingual Information</surname>
          </string-name>
          <article-title>Extraction task Overview: ICD10 Coding of Death Certi cates in French, Hungarian and Italian</article-title>
          . In:
          <article-title>CLEF 2018 Evaluation Labs</article-title>
          and Workshop: Online Working Notes. CEUR-WS (
          <year>September 2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <volume>1532</volume>
          {
          <issue>1543</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ammar</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bhagavatula</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Power</surname>
          </string-name>
          , R.:
          <article-title>Semi-supervised sequence tagging with bidirectional language models</article-title>
          .
          <source>In: Proceedings of the 55th Annual</source>
          <article-title>Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          .
          <source>vol. 1</source>
          , pp.
          <volume>1756</volume>
          {
          <issue>1765</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Iyyer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gardner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zettlemoyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Deep contextualized word representations</article-title>
          .
          <source>In: The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luong</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Learning distributed representations for multilingual text sequences</article-title>
          .
          <source>In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing</source>
          . pp.
          <volume>88</volume>
          {
          <issue>94</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Pinter</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guthrie</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eisenstein</surname>
          </string-name>
          , J.:
          <article-title>Mimicking Word Embeddings using Subword RNNs</article-title>
          .
          <source>In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</source>
          . pp.
          <volume>102</volume>
          {
          <issue>112</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33. Ra el, C.,
          <string-name>
            <surname>Ellis</surname>
            ,
            <given-names>D.P.</given-names>
          </string-name>
          :
          <article-title>Feed-forward networks with attention can solve some longterm memory problems</article-title>
          .
          <source>In: Workshop Extended Abstracts of the 4th International Conference on Learning Representations</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramadier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
          </string-name>
          , J.:
          <article-title>Overview of the CLEF eHealth Evaluation Lab 2018</article-title>
          .
          <source>In: CLEF 2018 - 8th Conference and Labs of the Evaluation Forum. Lecture Notes in Computer Science (LNCS)</source>
          , Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vinyals</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.V.</given-names>
          </string-name>
          :
          <article-title>Sequence to sequence learning with neural networks</article-title>
          .
          <source>In: Advances in neural information processing systems</source>
          . pp.
          <volume>3104</volume>
          {
          <issue>3112</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36. S gaard, A.,
          <string-name>
            <surname>Agic</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alonso</surname>
            ,
            <given-names>H.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Plank</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bohnet</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Johannsen</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Inverted indexing for cross-lingual NLP</article-title>
          .
          <article-title>In: The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference of the Asian Federation of Natural Language Processing (ACL-IJCNLP</article-title>
          <year>2015</year>
          )
          <article-title>(</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Turney</surname>
            ,
            <given-names>P.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pantel</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>From frequency to meaning: Vector space models of semantics</article-title>
          .
          <source>Journal of arti cial intelligence research 37</source>
          ,
          <volume>141</volume>
          {
          <fpage>188</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Vyas</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carpuat</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Sparse bilingual word representations for cross-lingual lexical entailment</article-title>
          .
          <source>In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>1187</volume>
          {
          <issue>1197</issue>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qian</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soong</surname>
            ,
            <given-names>F.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          :
          <article-title>Part-of-speech tagging with bidirectional long short-term memory recurrent neural network</article-title>
          .
          <source>arXiv preprint arXiv:1510.06168</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40.
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gui</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Disease named entity recognition by combining conditional random elds and bidirectional recurrent neural networks</article-title>
          .
          <source>Database: The Journal of Biological Databases and Curation</source>
          <year>2016</year>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Xing</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Normalized word embedding and orthogonal transform for bilingual word translation</article-title>
          .
          <source>In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>1006</volume>
          {
          <issue>1011</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>