<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">WBI at CLEF eHealth 2018 Task 1: Language-independent ICD-10 coding using multi-lingual embeddings and recurrent neural networks</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jurica</forename><surname>Ševa</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Knowledge Management in Bioinformatics</orgName>
								<orgName type="institution">Humboldt-Universität zu Berlin</orgName>
								<address>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mario</forename><surname>Sänger</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Knowledge Management in Bioinformatics</orgName>
								<orgName type="institution">Humboldt-Universität zu Berlin</orgName>
								<address>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ulf</forename><surname>Leser</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Knowledge Management in Bioinformatics</orgName>
								<orgName type="institution">Humboldt-Universität zu Berlin</orgName>
								<address>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">WBI at CLEF eHealth 2018 Task 1: Language-independent ICD-10 coding using multi-lingual embeddings and recurrent neural networks</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">44167DC1EB5AC2327F8E5EC812206847</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T02:33+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>ICD-10 coding</term>
					<term>Biomedical information extraction</term>
					<term>Multilingual sequence-to-sequence model</term>
					<term>Represention learning</term>
					<term>Recurrent neural network</term>
					<term>Attention mechanism</term>
					<term>Multi-language embeddings</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes the participation of the WBI team in the CLEF eHealth 2018 shared task 1 ("Multilingual Information Extraction -ICD-10 coding"). Our contribution focus on the setup and evaluation of a baseline language-independent neural architecture for ICD-10 classification as well as a simple, heuristic multi-language word embedding space. The approach builds on two recurrent neural networks models to extract and classify causes of death from French, Italian and Hungarian death certificates. First, we employ a LSTM-based sequenceto-sequence model to obtain a death cause from each death certificate line. We then utilize a bidirectional LSTM model with attention mechanism to assign the respective ICD-10 codes to the received death cause description. Both models take multi-language word embeddings as inputs. During evaluation our best model achieves an F-score of 0.34 for French, 0.45 for Hungarian and 0.77 for Italian. The results are encouraging for future work as well as the extension and improvement of the proposed baseline system.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Automatic extraction, classification and analysis of biological and medical concepts from unstructured texts, such as scientific publications or electronic health documents, is a highly important task to support many applications in research, daily clinical routine and policy-making. Computer-assisted approaches can improve decision making and support clinical processes, for example, by giving a more sophisticated overview about a research area, providing detailed information about the aetiopathology of a patient or disease patterns. In the past years major advances have been made in the area of natural-language processing (NLP). However, improvements in the field of biomedical text mining lag behind other domains mainly due to privacy issues and concerns regarding the processed data (e.g. electronic health records).</p><p>The CLEF eHealth lab<ref type="foot" target="#foot_0">1</ref> attends to circumvent this situation through organization of various shared tasks to exploit electronically available medical content <ref type="bibr" target="#b33">[34]</ref>. In particular, Task 1<ref type="foot" target="#foot_1">2</ref> of the lab is concerned with the extraction and classification of death causes from death certificates originating from different languages <ref type="bibr" target="#b26">[27]</ref>. Participants were asked to classify the death causes mentioned in the certificates according to the International Classification of Disease version 10 (ICD-10) <ref type="foot" target="#foot_2">3</ref> . The task was concerned with French and English death certificates in previous years. In contrast, this year the organizers provided annotated death reports as well as ICD-10 dictionaries for French, Italian and Hungarian. The development of language-independent, multilingual approaches was encouraged.</p><p>Inspired by the recent success of recurrent neural network models (RNN) <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b19">20,</ref><ref type="bibr" target="#b9">10]</ref> in general and the convincing performance of the work from Miftahutdinov and Tutubalina <ref type="bibr" target="#b20">[21]</ref> in the last edition of the lab, we opt for the development of a deep learning model for this year's competition. Our work introduces a prototypical, language independent approach for ICD-10 classification using multi-language word embeddings and long short-term memory models (LSTMs). We divide the proposed pipeline into two tasks. First, we perform named entity recognition (NER), i.e. extract the death cause description from a certificate line, with an an encoder-decoder model. Given the death cause, named entity normalization (NEN), i.e. assigning an ICD-10 code to extracted death cause, is performed by a separate LSTM. Our approach builds upon a heuristic multilanguage embedding space and therefore only needs one single model for all three data sets. With this work we want to experiment and evaluate which performance can be achieved with such a simple shared embedding space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related work</head><p>This section highlights previous work related to our approach. We give a brief introduction to the methodical foundations of our work, RNNs and word embeddings. The section concludes with a summary of ICD-10 classification approaches used in previous eHealth Lab competitions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Recurrent neural networks (RNN)</head><p>RNNs are a widely used technique for sequence learning problems such as machine translation <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b5">6]</ref>, image captioning <ref type="bibr" target="#b1">[2]</ref>, NER <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b39">40]</ref>, dependency parsing <ref type="bibr" target="#b9">[10]</ref> and part-of-speech tagging <ref type="bibr" target="#b38">[39]</ref>. RNNs model dynamic temporal behaviour in sequential data through recurrent units, i.e. the hidden, internal state of a unit in one time step depends on the state of the unit in the previous time step.</p><p>These feedback connections enable the network to memorize information from recent time steps and add the ability to capture long-term dependencies.</p><p>However, training of RNNs can be difficult due to the vanishing gradient problem <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b2">3]</ref>. The most widespread modifications of RNNs to overcome this problem are LSTMs <ref type="bibr" target="#b16">[17]</ref> and gated recurrent units (GRU) <ref type="bibr" target="#b5">[6]</ref>. Both modifications use gated memories which control and regulate the information flow between two recurrent units. A common LSTM unit consists of a cell and three gates, an input gate, an output gate and a forget gate. In general, LSTMs are chained together by connecting the outputs of the previous unit to the inputs of the next one.</p><p>A further extension of the general RNN architecture are bidirectional networks, which make the past and future context available in every time step. A bidirectional LSTM model consists of a forward chain, which processes the input data from left to right, and and backward chain, consuming the data in the opposite direction. The final representation is typically the concatenation or a linear combination of both states.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Word Embeddings</head><p>Distributional semantic models (DSMs) have been researched for decades in NLP <ref type="bibr" target="#b36">[37]</ref>. Based on a huge amount of unlabeled texts, DSMs aim to represent words using a real-valued vector (also called embedding) which captures syntactic and semantic similarities between the words. Starting with the publication of the work from Collobert et al. <ref type="bibr" target="#b6">[7]</ref> in 2011, learning embeddings for linguistic units, such as words, sentences or paragraphs, is one of the hot topics in NLP and a plethora of approaches have been proposed <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b22">23,</ref><ref type="bibr" target="#b27">28,</ref><ref type="bibr" target="#b29">30]</ref>.</p><p>The majority of todays embedding models are based on deep learning models trained to perform some kind of language modeling task <ref type="bibr" target="#b28">[29,</ref><ref type="bibr" target="#b29">30,</ref><ref type="bibr" target="#b31">32]</ref>. The most popular embedding model is the Word2Vec model introduced by Mikolov et al. <ref type="bibr" target="#b21">[22,</ref><ref type="bibr" target="#b22">23]</ref>. They propose two shallow neural network models, continuous bag-ofwords (CBOW) and SkipGram, that are trained to reconstruct the context given a center word and vice versa. In contrast, Pennington et al. <ref type="bibr" target="#b27">[28]</ref> use the ratio between co-occurrence probabilities of two words with another one to learn a vector representation. In <ref type="bibr" target="#b29">[30]</ref> multi-layer, bi-directional LSTM models are utilized to learn word embeddings that also capture different contexts of it.</p><p>Several recent models focus on the integration of subword and morphological information to provide suitable representations even for unseen, out-ofvocabulary words. For example, Pinter et al. <ref type="bibr" target="#b31">[32]</ref> try to reconstruct a pre-trained word embedding by learning a bi-directional LSTM model on character level. Similarly, Bojanowski et al. <ref type="bibr" target="#b3">[4]</ref> adapt the SkipGram by taking character n-grams into account. Their fastText model assigns a vector representation to each character n-gram and represents words by summing over all of these representations of a word.</p><p>In addition to embeddings that capture word similarities in one language, multi-and cross-lingual approaches have also been investigated. Proposed methods either learn a linear mapping between monolingual representations <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b40">41]</ref> or utilize word- <ref type="bibr" target="#b12">[13,</ref><ref type="bibr" target="#b37">38]</ref>, sentence- <ref type="bibr" target="#b30">[31]</ref> or document-aligned <ref type="bibr" target="#b35">[36]</ref> corpora to build a shared embedding space.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">ICD-10 Classification</head><p>The ICD-10 coding task has already been carried out in the 2016 <ref type="bibr" target="#b25">[26]</ref> and 2017 <ref type="bibr" target="#b24">[25]</ref> edition of the eHealth lab. Participating teams used a plethora of different approaches to tackle the classification problem. The methods can essentially be divided into two categories: knowledge-based <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b17">18,</ref><ref type="bibr" target="#b23">24]</ref> and machine learning (ML) approaches <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b14">15,</ref><ref type="bibr" target="#b20">21]</ref>. The former relies on lexical sources, medical terminologies and other dictionaries to match (parts of) the certificate text with entries from the knowledge-bases according to a rule framework. For example, Di Nunzio et al. <ref type="bibr" target="#b8">[9]</ref> calculate a score for each ICD-10 dictionary entry by summing the binary or tf-idf weights of each term of a certificate line segment and assign the ICD-10 code with the highest score. In contrast, Ho-Dac et al. <ref type="bibr" target="#b13">[14]</ref> treat the problem as information retrieval task and utilize the Apache Solr search engine<ref type="foot" target="#foot_3">4</ref> to classify the individual lines.</p><p>The ML-based approaches employ a variety of techniques, e.g. Conditional Random Fields (CRFs) <ref type="bibr" target="#b14">[15]</ref>, Labeled Latent Dirichlet Analysis (LDA) <ref type="bibr" target="#b7">[8]</ref> and Support Vector Machines (SVMs) <ref type="bibr" target="#b10">[11]</ref> with diverse hand-crafted features. Most similar to our approach is the work from Miftahutdinov and Tutubalina <ref type="bibr" target="#b20">[21]</ref>, which achieved the best results for English certificates in the last year's competition. They use a neural LSTM-based encoder-decoder model that processes the raw certificate text as input and encodes it into a vector representation. Additionally, a vector which captures the textual similarity between the certificate line and the death causes of the individual ICD-10 codes is used to integrate prior knowledge into the model. The concatenation of both vector representations is then used to output the characters and numbers of the ICD-10 code in the decoding step. In contrast to their work, our approach introduces a model for multi-language ICD-10 classification. Moreover, we divide the task into two distinct steps: death cause extraction and ICD-10 classification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Methods</head><p>Our approach models the extraction and classification of death causes as two-step process. First, we employ a neural, multi-language sequence-to-sequence model to receive a death cause description for a given death certificate line. We then use a second classification model to assign the respective ICD-10 codes to the obtained death cause. The remainder of this section gives a detailed explanation of the architecture of the two models.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Death Cause Extraction Model</head><p>The first step in our pipeline is the extraction of the death cause from a given certificate line. We use the training certificate lines (with their corresponding ICD-10 codes) and the ICD-10 dictionaries as basis for our model. The dictionaries provide us with death causes for each ICD-10 code. The goal of the model is to reassemble the dictionary death cause text from the certificate line.</p><p>For this we adopt the encoder-decoder architecture proposed in <ref type="bibr" target="#b34">[35]</ref>. Figure <ref type="figure">1</ref> illustrates the architecture of the model. As encoder we utilize a unidirectional LSTM model, which takes the single words of a certificate line as inputs and scans the line from left to right. Each token is represented using pre-trained fastText<ref type="foot" target="#foot_4">5</ref> word embeddings <ref type="bibr" target="#b3">[4]</ref>. We utilize fastText embedding models for French, Italian and Hungarian trained on Common Crawl and Wikipedia articles <ref type="foot" target="#foot_5">6</ref> . Independently from the original language a word we represent it by looking up the word in all three embedding models and concatenate the obtained vectors. Through this we get a simple multi-language representation of the word. This heuristic composition constitutes a naive solution to build a multi-language embedding space. However we opted to evaluate this approach as a simple baseline for future work. Encoders' final state represents the semantic representation of the certificate line and serves as initial input for decoding process. Fig. <ref type="figure">1</ref>. Illustration of the encoder-decoder model for death cause extraction. The encoder processes a death certificate line token-wise from left to right. The final state of the encoder forms a semantic representation of the line and serves as initial input for the decoding process. The decoder will be trained to predict the death cause text from the provided ICD-10 dictionaries word by word (using special tags \s and \e for start resp. end of a sequence). All input tokens will be represented using the concatenation of the fastText embeddings of all three languages.</p><p>For the decoder we utilize another LSTM model. The initial input of the decoder is the final state of the encoder model. Moreover, each token of the dictionary death cause text (padded with special start and end tag) serves as (sequential) input. Again, we use fastText embeddings of all three languages to represent the input tokens. The decoder predicts one-hot-encoded words of the death cause. During test time we use the encoder to obtain a semantic representation of the certificate line and decode the death cause description word by word starting with the special start tag. The decoding process finishes when the decoder outputs the end tag.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">ICD-10 Classification Model</head><p>The second step in our pipeline is to assign a ICD-10 code to the generated death cause description. For this we employ a bidirectional LSTM model which is able to capture the past and future context for each token of a death cause description. Just as in our encoder-decoder model we encode each token using the concatenation of the fastText embeddings of the word from all three languages. To enable our model to attend to different parts of the death cause we add an extra attention layer <ref type="bibr" target="#b32">[33]</ref> to the model. Through the attention mechanism our model learns a fixed-sized embedding of the death cause description by computing an adaptive weighted average of the state sequence of the LSTM model. This allows the model to better integrate information over time. Figure <ref type="figure">2</ref> presents the architecture of our ICD-10 classification model. We train the model using the provided ICD-10 dictionaries from all three languages. Hungarian Embedding Italian Embedding French Embedding Fig. <ref type="figure">2</ref>. Illustration of the ICD-10 classification model. The model utilizes a bidirectional LSTM layer, which processes the death cause from left to right and vice versa. The attention layer summarizes the whole description by computing an adaptive weighted average over the LSTM states. The resulting death cause embedding will be feed through a softmax layer to get the final classification. Equivalent to our encoder-decoder model all input tokens will be represented using the concatenation of the fastText embeddings of all three languages.</p><p>In this section we will present experiments and obtained results for the two developed models, both individually as well as combined in a pipeline setting.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Training Data and Experiment Setup</head><p>The CLEF e-Health 2018 Task 1 participants where provided with annotated death certificates for the three selected languages: French, Italian and Hungarian. Each of the languages is supported by training certificate lines as well as a dictionary with death cause descriptions resp. diagnosis for the different ICD-10 codes. The provided training data sets were imbalanced concerning the different languages: the Italian corpora consists of 49,823, French corpora of 77,348 <ref type="foot" target="#foot_6">7</ref> and Hungarian corpora 323,175 certificate lines. We split each data set into a training and a hold-out evaluation set. The complete training data set was then created by combining the certificate lines of all three languages into one data set. Beside the provided certificate data we used no additional knowledge resources or annotated texts.</p><p>Due to time constraints during development no cross-validation to optimize the (hyper-) parameters and the individual layers of our models was performed. We either keep the default values of the hyper-parameters or set them to reasonable values according to existing work. During model training we shuffle the training instances and use varying instances to perform a validation of the epoch.</p><p>Pre-trained fastText word embeddings were trained using the following parameter settings: CBOW with position-weights, embedding dimension size 300, with character n-grams of length 5, a window of size 5 and 10 negative samples. Unfortunately, they are trained on corpora not related with the biomedical domain and therefore do not represent the best possible textual basis for an embedding space for biomedical information extraction. Final embedding space used by our models is created by concatenating individual embedding vectors for all three languages. Thus the input of our model is embedding vector of size 900. All models were implemented with the Keras<ref type="foot" target="#foot_7">8</ref> library.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Death cause extraction model</head><p>To identify possible candidates for a death cause description, we focus on the use of an encoder-decoder model. The encoder model uses an embedding layer with input masking on zero values and a LSTM layer with 256 units. The encoders' output is used as the initial state of the decoder model.</p><p>Based on the input description from the dictionary and a special start token, the decoder generates a death cause word by word. This decoding process continues until a special end token is generated. The entire model is optimized using the Adam optimization algorithm <ref type="bibr" target="#b18">[19]</ref> and a batch size of 700. Model training was performed either for 100 epochs or until an early stopping criteria is met (no change in validation loss for two epochs).</p><p>As the provided data set are imbalanced regarding the tasks' languages, we devised two different evaluation settings: (1) DCEM-Balanced, where each language was supported by 49.823 randomly drawn instances (size of the smallest corpus) and ( <ref type="formula">2</ref>) DCEM-Full, where all available data is used. Table <ref type="table">4</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">ICD-10 Classification Model</head><p>The classification model is responsible for assigning a ICD-10 code to death cause description obtained during the first step. Our model uses an embedding layer with input masking on zero values, followed by a bidirectional LSTM layer with 256 dimension hidden layer. Thereafter an attention layer builds an adaptive weighted average over all LSTM states. The respective ICD-10 code will be determined by a dense layer with softmax activation function. We use the Adam optimizer to perform model training. The model was validated on 25% of the data. As for the extraction model, no cross-validation or hyper-parameter optimization was performed.</p><p>Once again, we devised two approaches. This was mainly caused by the lack of adequate training data in terms of coverage for individual ICD-10 codes. Therefore, we defined two training data settings: (1) minimal (ICD-10 Minimal), where only ICD-10 codes with two or more supporting training instances are used. This leaves us with 6,857 unique ICD-10 codes and discards 2,238 unique codes with support of one. This, of course, minimizes the number of ICD-10 codes in the label space. Therefore, (2) an extended (ICD-10 Extended) data set was defined. Here, the original ICD-10 code mappings, found in the supplied dictionaries, are extended with the training instances from individual certificate data from the three languages. This generates 9,591 unique ICD-10 codes. Finally, for the remaining ICD-10 codes that have only one supporting description, we duplicate those data points.</p><p>The goal of this approach is to extend our possible label space to all available ICD-10 codes. The results obtained from the two approaches on the validation set are shown in Table <ref type="table">4</ref>.3. Using the minimal data set the model achieves an accuracy of 0.937. In contrast, using the extended data set the model reaches an accuracy of 0.954 which represents an improvement of 1.8%. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Setting</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.4">Complete Pipeline</head><p>The two models where combined to create the final pipeline. We tested both death cause extraction models (based on the balanced and unbalanced data set) in the final pipeline, as their performance differs greatly. On the contrary, both ICD-10 classification models perform similarly, so we just used the extended ICD-10 classification model, with word level tokens<ref type="foot" target="#foot_8">9</ref> , in the final pipeline. To evaluate the pipeline we build a training and a hold-out validation set during development. The obtained results on the validation set are presented in Table <ref type="table">4</ref>.4. The scores are calculated using a prevalence-weighted macro-average across the output classes, i.e. we calculated precision, recall and F-score for each ICD-10 code and build the average by weighting the scores by the number occurrences of the code in the gold standard.</p><p>Although the individual models, as shown in Tables 4.2 and 4.3 are promising, the performance decreases considerably in a pipeline setting . The pipeline model based on the balanced data set reaches a F-score of 0.61, whereas the full model achieves a slightly higher value of 0.63. Both model configurations have a higher precision than recall (0.73/0.61 resp. 0.74/0.62).</p><p>This can be contributed to several factors. First of all, a pipeline architecture always suffers from error-propagation, i.e. errors in a previous step will influence the performance of the following layers and generally lower the performance of the overall system. Investigating the obtained results, we found that the imbalanced distribution of ICD-10 codes represents one the main problems. This</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Model Precision</head><p>Recall F-score Final-Balanced 0.73 0.61 0.61 Final-Full 0.74 0.62 0.63 Table <ref type="table">3</ref>. Evaluation results of the final pipeline on the validation set of the training data. Reported figures represent the prevalence-weighted macro-average across the output classes. Final-Balanced = DCEM-Balanced + ICD-10 Extended. Final-Full = DCEM-Full + ICD-10 Extended severely impacts the decoder-encoder architecture used here as the token generation is biased towards the available data points. Therefore the models misclassify certificate lines associated with ICD-10 codes that only have a small number of supporting training instances very often.</p><p>Results obtained on the test data set, resulting from the two submitted official runs, are shown in Table <ref type="table">4</ref>. Similar to the evaluation results during development, the model based on the full data set performs slightly better than the model trained on the balanced data set. The full model reaches a F-score of 0.34 for French, 0.45 for Hungarian and 0.77 for Italian. All of our approaches perform below the mean and median averages of all participants. Surprisingly, there is a substantial difference in results obtained between the individual languages. This confirms our assumptions about the (un-) suitability of the proposed multi-lingual embedding space for this task. The results also suggest that the size of the training corpora is not influencing the final results. As seen, best results were obtained on the Italian data set were trained on the smallest corpora. Worst results were obtained on the middle, French, corpus while the biggest corpus, Hungarian, is in second place.</p><p>We identified several possible reasons for the obtained results. These also represent (possible) points for future work. One of the main disadvantages of our approach is the quality of the used word embeddings as well as the properties of the proposed language-independent embedding space. The usage of out-ofdomain word embeddings which aren't targeted to the biomedical domain are likely a suboptimal solution to this problem. We tried to alleviate this by finding suitable external corpora to train domain-dependent word embeddings for each of the supported languages, however we were unable to find any significant amount of in-domain documents (e.g. PubMed search for abstracts in either French, Hungarian or Italian found 7843, 786 and 1659 articles respectively). Furthermore, we used a simple, heuristic solution by just concatenating the embeddings of all three languages to build a shared vector space.</p><p>Besides the issues with the used word embeddings, the inability to obtain full ICD-10 dictionaries for the selected languages has also negatively influenced the results. As a final limitation to our approach, lack of multi-label classification support has also been identified (i.e. not recognizing more than one death cause in a single input text).  <ref type="table">4</ref>. Test results of the final pipeline. Final-Balanced = DCEM-Balanced + ICD-10 Extended. Final-Full = DCEM-Full + ICD-10 Extended</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Conclusion and Future Work</head><p>In this paper we tackled the problem of information extraction of death causes in an multilingual environment. The proposed solution was focused on the setup and evaluation of an initial language-independent model which relies on a heuristic mutual word embedding space for all three languages. The proposed pipeline is divided in two steps: possible token describing the death cause are generated by using a sequence to sequence model first. Afterwards the generated token sequence is normalized to a ICD-10 code using a distinct LSTM-based classification model with attention mechanism. During evaluation our best model achieves an F-score of 0.34 for French, 0.45 for Hungarian and 0.77 for Italian. The obtained results are encouraging for further investigation however can't compete with the solutions of the other participants yet.</p><p>We detected several issues with the proposed pipeline. These issues serve as prospective future work to us. First of all the representation of the input words can be improved in several ways. The word embeddings we used are not optimized to the biomedical domain but are trained on general text. Existing work was proven that in-domain embeddings improve the quality of achieved results. Although this was our initial approach, the difficulties of finding adequate in-domain corpora for selected languages has proven to be to a hard to tackle. Moreover, the multi-language embedding space is currently heuristically defined as concatenation of the three word embeddings models for individual tokens. Creating an unified embedding space would create a truly language-independent token representation. The improvement of the input layer will be the main focus of our future work.</p><p>The ICD-10 classification step also suffers from lack of adequate training data. Unfortunately, we were unable to obtain extensive ICD-10 dictionaries for all languages and therefore can't guarantee the completeness of the ICD-10 label space. Another disadvantage of the current pipeline is the missing support for multi-label classification.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 .</head><label>1</label><figDesc>.2 shows the results obtained on the training and validation set. The figures indicate that the distribution of training instances per language have a huge influence on the performance of the model. The model trained on the full training data achieves an accuracy of 0.678 on the validation set. In contrast using the balanced data set the model reaches an accuracy of 0.899 (+ 32.5%). Experiment results of our death cause extraction sequence-to-sequence model concerning balanced (equal number of training instances per language) and full data set setting.</figDesc><table><row><cell>Setting</cell><cell>Trained Epochs</cell><cell>Train Accuracy</cell><cell>Loss</cell><cell cols="2">Validation Accuracy Loss</cell></row><row><cell>DCEM-Balanced</cell><cell>18</cell><cell>0.958</cell><cell>0.205</cell><cell>0.899</cell><cell>0.634</cell></row><row><cell>DCEM-Full</cell><cell>9</cell><cell>0.709</cell><cell>0.098</cell><cell>0.678</cell><cell>0.330</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>Experiment results for our ICD-10 classification model regarding different data settings. The Minimal setting uses only ICD-10 codes with two or more training instances in the supplied dictionary. In contrast, Extended additionally takes the diagnosis texts from the certificate data and duplicates ICD-10 training instances with only one diagnosis text in the dictionary and certificate lines. * Used in final pipeline.</figDesc><table><row><cell></cell><cell>Trained Epochs</cell><cell>Train Accuracy</cell><cell>Loss</cell><cell cols="2">Validation Accuracy Loss</cell></row><row><cell>ICD-10 Minimal</cell><cell>69</cell><cell>0.925</cell><cell>0.190</cell><cell>0.937</cell><cell>0.169</cell></row><row><cell>ICD-10 Extended*</cell><cell>41</cell><cell>0.950</cell><cell>0.156</cell><cell>0.954</cell><cell>0.141</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://sites.google.com/site/clefehealth/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://sites.google.com/view/clef-ehealth-2018/task-1-multilingual-informationextraction-icd10-coding</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">3 http://www.who.int/classifications/icd/en/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">http://lucene.apache.org/solr/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">https://github.com/facebookresearch/fastText/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">https://github.com/facebookresearch/fastText/blob/master/docs/crawlvectors.md</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="7" xml:id="foot_6">For French we only took the provided data set from 2014.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="8" xml:id="foot_7">https://keras.io/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="9" xml:id="foot_8">Although models supporting character level tokens were developed and evaluated, their performance fared poorly compared to the word level tokens.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Neural machine translation by jointly learning to align and translate</title>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 6th International Conference on Learning Representations</title>
				<meeting>the 6th International Conference on Learning Representations<address><addrLine>ICLR</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Jaitly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Shazeer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="1171" to="1179" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Learning long-term dependencies with gradient descent is difficult</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Simard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Frasconi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE transactions on neural networks</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="157" to="166" />
			<date type="published" when="1994">1994</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Enriching Word Vectors with Subword Information</title>
		<author>
			<persName><forename type="first">P</forename><surname>Bojanowski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grave</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Joulin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Transactions of the Association of Computational Linguistics</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="135" to="146" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">SIBM at CLEF eHealth Evaluation Lab 2016: Extracting Concepts in French Medical Texts with ECMT and CIMIND</title>
		<author>
			<persName><forename type="first">C</forename><surname>Cabot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">F</forename><surname>Soualmia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Dahamna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Darmoni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2015 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation</title>
		<author>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Merrienboer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Gulcehre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Bahdanau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Bougares</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Schwenk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)</title>
				<meeting>the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)<address><addrLine>Doha, Qatar</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2014-10">October 2014</date>
			<biblScope unit="page" from="1724" to="1734" />
		</imprint>
	</monogr>
	<note>Association for Computational Linguistics</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Natural language processing (almost) from scratch</title>
		<author>
			<persName><forename type="first">R</forename><surname>Collobert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Weston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bottou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Karlen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kavukcuoglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Kuksa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Machine Learning Research</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="2493" to="2537" />
			<date type="published" when="2011-08">Aug. 2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">ECSTRA-INSERM@ CLEF eHealth2016-task 2: ICD10 Code Extraction from Death Certificates</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dermouche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Looten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Flicoteaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chevret</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Velcin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Taright</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2016 Online Working Notes</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A Lexicon Based Approach to Classification of ICD10 Codes. IMS Unipd at CLEF eHealth Task</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">M</forename><surname>Di Nunzio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Beghini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Vezzani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Henrot</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Transition-Based Dependency Parsing with Stack Long Short-Term Memory</title>
		<author>
			<persName><forename type="first">C</forename><surname>Dyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ballesteros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Matthews</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Smith</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</title>
				<meeting>the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="334" to="343" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Fusion Methods for ICD10 Code Classification of Death Certificates in Multilingual Corpora</title>
		<author>
			<persName><forename type="first">M</forename><surname>Ebersbach</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Herms</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Eibl</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Improving vector space word representations using multilingual correlation</title>
		<author>
			<persName><forename type="first">M</forename><surname>Faruqui</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics</title>
				<meeting>the 14th Conference of the European Chapter of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="462" to="471" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Cross-lingual dependency parsing based on distributed representations</title>
		<author>
			<persName><forename type="first">J</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Che</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yarowsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Liu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</title>
				<meeting>the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1234" to="1244" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">LITL at CLEF eHealth2017: automatic classification of death reports</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Ho-Dac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Fabre</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Birski</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Boudraa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Bourriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Cassier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Delvenne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Garcia-Gonzalez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">B</forename><surname>Kang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Piccinini</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">LITL at CLEF eHealth2016: recognizing entities in French biomedical documents</title>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">M</forename><surname>Ho-Dac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Tanguy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Grauby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Mby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Malosse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Rivière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Veltz-Mauclair</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2016 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A field guide to dynamical recurrent neural networks</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Bengio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Frasconi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schmidhuber</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2001">2001</date>
			<publisher>IEEE Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Long short-term memory</title>
		<author>
			<persName><forename type="first">S</forename><surname>Hochreiter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Schmidhuber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Neural computation</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">8</biblScope>
			<biblScope unit="page" from="1735" to="1780" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Automatic coding of death certificates to ICD-10 terminology</title>
		<author>
			<persName><forename type="first">J</forename><surname>Jonnagaddala</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Hu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Adam: A method for stochastic optimization</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Kingma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ba</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 3rd International Conference on Learning Representations (ICLR)</title>
				<meeting>the 3rd International Conference on Learning Representations (ICLR)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Neural Architectures for Named Entity Recognition</title>
		<author>
			<persName><forename type="first">G</forename><surname>Lample</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ballesteros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Subramanian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kawakami</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Dyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="260" to="270" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Kfu at clef ehealth 2017 task 1: Icd-10 coding of english death certificates with recurrent neural networks</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Miftahutdinov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tutubalina</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Efficient estimation of word representations in vector space</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1301.3781</idno>
		<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">S</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="3111" to="3119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Erasmus MC at CLEF eHealth 2016: Concept Recognition and Coding in French Texts</title>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">M</forename><surname>Van Mulligen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Afzal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">A</forename><surname>Akhondi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Vo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Kors</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2016 Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">CLEF eHealth 2017 Multilingual Information Extraction task overview: ICD10 coding of death certificates in English and French</title>
		<author>
			<persName><forename type="first">A</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">N</forename><surname>Anderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">B</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Grouin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavergne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Rondet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zweigenbaum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 Evaluation Labs and Workshop: Online Working Notes</title>
		<title level="s">CEUR-WS</title>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">17</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Clinical Information Extraction at the CLEF eHealth Evaluation lab</title>
		<author>
			<persName><forename type="first">A</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">B</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Grouin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hamon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lavergne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Tannier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zweigenbaum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CEUR workshop proceedings</title>
				<imprint>
			<date type="published" when="2016-09">2016. September 2016</date>
			<biblScope unit="volume">1609</biblScope>
			<biblScope unit="page" from="28" to="42" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">CLEF eHealth 2018 Multilingual Information Extraction task Overview: ICD10 Coding of Death Certificates in French, Hungarian and Italian</title>
		<author>
			<persName><forename type="first">A</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Grippo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Morgand</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Orsi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pelikán</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ramadier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Rey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Zweigenbaum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2018 Evaluation Labs and Workshop: Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2018-09">September 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Glove: Global vectors for word representation</title>
		<author>
			<persName><forename type="first">J</forename><surname>Pennington</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</title>
				<meeting>the 2014 conference on empirical methods in natural language processing (EMNLP)</meeting>
		<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="1532" to="1543" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Semi-supervised sequence tagging with bidirectional language models</title>
		<author>
			<persName><forename type="first">M</forename><surname>Peters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Ammar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bhagavatula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Power</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 55th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1756" to="1765" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<analytic>
		<title level="a" type="main">Deep contextualized word representations</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">E</forename><surname>Peters</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Neumann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Iyyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Gardner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b30">
	<analytic>
		<title level="a" type="main">Learning distributed representations for multilingual text sequences</title>
		<author>
			<persName><forename type="first">H</forename><surname>Pham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Luong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Manning</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing</title>
				<meeting>the 1st Workshop on Vector Space Modeling for Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="88" to="94" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Mimicking Word Embeddings using Subword RNNs</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Pinter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Guthrie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Eisenstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing</title>
				<meeting>the 2017 Conference on Empirical Methods in Natural Language Processing</meeting>
		<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="102" to="112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<analytic>
		<title level="a" type="main">Feed-forward networks with attention can solve some longterm memory problems</title>
		<author>
			<persName><forename type="first">C</forename><surname>Raffel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">P</forename><surname>Ellis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop Extended Abstracts of the 4th International Conference on Learning Representations</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Overview of the CLEF eHealth Evaluation Lab</title>
		<author>
			<persName><forename type="first">H</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Kanoulas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Azzopardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Spijker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ramadier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zuccon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Palotti</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2018 -8th Conference and Labs of the Evaluation Forum</title>
		<title level="s">Lecture Notes in Computer Science (LNCS</title>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2018">2018. 2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b34">
	<analytic>
		<title level="a" type="main">Sequence to sequence learning with neural networks</title>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Vinyals</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Le</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in neural information processing systems</title>
				<imprint>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="3104" to="3112" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b35">
	<analytic>
		<title level="a" type="main">Inverted indexing for cross-lingual NLP</title>
		<author>
			<persName><forename type="first">A</forename><surname>Søgaard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Agić</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">M</forename><surname>Alonso</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Plank</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Bohnet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Johannsen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">The 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference of the Asian Federation of Natural Language Processing (ACL-IJCNLP</title>
				<imprint>
			<date type="published" when="2015">2015. 2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b36">
	<analytic>
		<title level="a" type="main">From frequency to meaning: Vector space models of semantics</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">D</forename><surname>Turney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Pantel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of artificial intelligence research</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="page" from="141" to="188" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b37">
	<analytic>
		<title level="a" type="main">Sparse bilingual word representations for cross-lingual lexical entailment</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Vyas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Carpuat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="1187" to="1197" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b38">
	<monogr>
		<title level="m" type="main">Part-of-speech tagging with bidirectional long short-term memory recurrent neural network</title>
		<author>
			<persName><forename type="first">P</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Qian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">K</forename><surname>Soong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Zhao</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1510.06168</idno>
		<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b39">
	<analytic>
		<title level="a" type="main">Disease named entity recognition by combining conditional random fields and bidirectional recurrent neural networks</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Gui</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Database: The Journal of Biological Databases and Curation</title>
		<imprint>
			<biblScope unit="page">2016</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b40">
	<analytic>
		<title level="a" type="main">Normalized word embedding and orthogonal transform for bilingual word translation</title>
		<author>
			<persName><forename type="first">C</forename><surname>Xing</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Lin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</title>
				<meeting>the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</meeting>
		<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="1006" to="1011" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
