<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Multi-lingual ICD-10 coding using a hybrid rule-based and supervised classification approach at CLEF eHealth 2017</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Jurica</forename><surname>Ševa</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Knowledge management in Bioinformatics</orgName>
								<orgName type="institution">Humboldt Universität zu Berlin</orgName>
								<address>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Madeleine</forename><surname>Kittner</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Knowledge management in Bioinformatics</orgName>
								<orgName type="institution">Humboldt Universität zu Berlin</orgName>
								<address>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author role="corresp">
							<persName><forename type="first">Roland</forename><surname>Roller</surname></persName>
							<email>roland.roller@dfki.de</email>
							<affiliation key="aff1">
								<orgName type="department">Deutsches Forschungszentrum für Künstliche Intelligenz</orgName>
								<orgName type="institution">Language Technology</orgName>
								<address>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ulf</forename><surname>Leser</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Knowledge management in Bioinformatics</orgName>
								<orgName type="institution">Humboldt Universität zu Berlin</orgName>
								<address>
									<settlement>Berlin</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Multi-lingual ICD-10 coding using a hybrid rule-based and supervised classification approach at CLEF eHealth 2017</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">57A2D10AF69A28379F78BB19CE62BCE3</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:30+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>ICD-10 codes</term>
					<term>Multilingual Candidates Ranking</term>
					<term>Language-independent Information Extraction</term>
					<term>Language-independent Information Retrieval</term>
					<term>Hierarchical Document Classification</term>
					<term>Named Entity Recognition</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper we present our research efforts and obtained results within the CLEF eHealth challenge 2017, Track 1. The task involves the recognition and mapping of ICD-10 codes to English and French death certificates. Our approach proposes a two tier, two stage process. First, we use a rule-based system, based on handcrafted rules and the use of Apache Solr, to perform ICD-10 code Named Entity Recognition (NER). This step produces a set of possible candidates extracted from the input text. Next, we use tf-idf weighted character n-gram classification models to normalize and rank a previously generated ICD-10 candidate set. Classification models used are generated and follow the hierarchical structure of the given ICD-10 dictionaries, by creating individual classification models for the first two hierarchical levels (chapters and blocks). Finally, the top candidate, generated from the overlap between the list of possible ICD-10 code candidates (input list) and ranked list of final ICD-10 candidates (output list), is taken as the final ICD-10 code. Although the ICD-10 candidate NER is language-dependent, the normalization and ranking of candidates utilizes a language independent approach.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In recent years we have witnessed significant advances in automated natural language processing research efforts. This was partly stimulated by the increase of available gold standard corpora as it represents the foundation of scientific research. Research efforts in the field of biomedical text mining (BTM) have been less fortuitous, especially in the domain automatic analysis of electronic health (eHealth) records. This is primarily due to privacy issues and concerns linked with such documents. CLEF eHealth competition <ref type="bibr" target="#b5">[6]</ref>, through various organized tasks <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b7">8]</ref>, circumvents these restrictions by providing Corresponding author gold standard data sets/corpora. Its main focus is on creating automatic information extraction pipelines of valuable information from eHealth documents.</p><p>The CLEF eHealth 2017 Task 1 <ref type="foot" target="#foot_0">3</ref>  <ref type="bibr" target="#b9">[10]</ref> serves as an extension of the CLEF eHealth 2016 Task 2 <ref type="bibr" target="#b8">[9]</ref>. The goal was to develop a multilingual approach for information extraction of ICD-10 codes from written text. In particular, participants were asked to assign codes from the International Classification of Diseases version 10 (ICD-10) <ref type="foot" target="#foot_1">4</ref>to French and English death certificates. Additionally, it was encouraged to explore multilingual approaches/models as opposed to language dependent models. For both languages customized dictionaries of ICD-10 codes and related annotations were provided by the organizers, not excluding the use of other resources. The task had to be performed fully automatically.</p><p>In 2016, the CLEF eHealth ICD-10 coding task was applied to French death certificates only. Participating teams used different rule-based and machine-learning approaches. Ho-Dac et al. <ref type="bibr" target="#b6">[7]</ref> for instance used a CRF with various features combined with a rule-based system in order to identify more complex entities. Other participants were using machine-learning approaches such as labeled LDA, SVM, Naive Bayes <ref type="bibr" target="#b3">[4]</ref> or treated the task as an information retrieval task using tf-idf models <ref type="bibr" target="#b11">[12]</ref>. Van Mulligen et al. <ref type="bibr" target="#b10">[11]</ref>, the best performing team, extended the terminology with code-term combinations annotated in the training corpus and used a rule-based approach for indexing. Additionally, they processed initial annotations using training data derived precision scores <ref type="bibr" target="#b10">[11]</ref>.</p><p>We approached this years task as a two stage process, by combining NER and document classification to generate the final ICD-10 code. In particular, dictionary-based indexing through Apache Solr <ref type="foot" target="#foot_2">5</ref> was used for Named Entity Recognition and document classification for candidates normalization/ranking. Indexing is based on exact and fuzzy dictionary lookup thus providing potential candidates for a term sequence. The focus of this step was to increase the Recall (R) measure values, by providing a list of potential candidates. Candidates normalization and ranking, through trained classification models, is then applied to rank the list of potential candidates. The focus of this step is the increase of the Precision (P) measure. Similar to our approach, Zweigenbaum and Lavergne <ref type="bibr" target="#b11">[12]</ref> also divided the task into two steps in 2016 to i) generate candidate ICD-10 codes and ii) re-rank candidates. While their approach use tf-idf models for both parts, we use a rule-based system to generate candidate ICD-10 codes. Similarly, the second part of our pipeline models are trained based on the ICD-10 hierarchy, thus include information about dictionary chapters and blocks in our models.</p><p>In the following we describe our system and evaluation on training and test data. Compared to all participating systems, our results are well above the average for the French test data, and only average for the English test data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methods</head><p>Here we describe the corpora, used terminologies, candidate generation by indexing and candidate ranking using classification.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Corpora</head><p>The French data set is the CépiDC Causes of Death corpus. The corpus contains free text descriptions of causes of death as reported in the standardized causes of death forms. Documents are manually annotated with ICD-10 codes by medical experts. Each document can contain several lines while each line can contain multiple causes and therefore multiple ICD-10 code annotations. Additionally, year of coding, patient age, gender, location of death, and time the patient had been suffering from the coded cause are provided for each document. The English corpus is set up similarly but is provided in a different format. The origin of the data set is not mentioned in the challenge.</p><p>Both corpora mostly contain only a few words rather than well-formed sentences, which is common for medical text and a challenge for any NER or Named Entity Normalization (NEN) task. The majority of sentences of death certificates (lines) (about 60%) in the English corpus consist of two to four tokens and two to five tokens in the French corpus. Consequently, as there is almost no context available, the application of machine learning trained models is limited.</p><p>The French training set contains 65,843 death certificates from 2006 to 2012 with 264,334 ICD-10 codes annotated. The French test set contains 31,682 documents from 2014 and 2015. The English set is much smaller consisting of 13,329 death certificates from 2015 and 38,908 annotated ICD-10 codes for training and 6,665 documents for testing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Terminologies</head><p>The organizers provided custom terminologies for both languages. For French six dictionaries are available, related to different years of coding <ref type="bibr">(2006)</ref><ref type="bibr">(2007)</ref><ref type="bibr">(2008)</ref><ref type="bibr">(2009)</ref><ref type="bibr">(2010)</ref><ref type="bibr">(2011)</ref><ref type="bibr">(2012)</ref><ref type="bibr">(2013)</ref><ref type="bibr">(2014)</ref><ref type="bibr">(2015)</ref>, each providing ICD-10 codes and related terms. Roughly 15% of the terms collected in all dictionaries link to multiple ICD-10 codes with no correlation to the year of coding. Clearly, depending on the context, different ICD-10 codes have been applied. On the other hand, in the provided English terminology each unique term almost always links to a unique ICD-10 code. For supervised classification we used the hierarchy within the ICD-10 terminology as provided here for French and English <ref type="foot" target="#foot_3">6</ref> . The terminology consists of 22 chapters which are divided into blocks and further into classes and subclasses. For instance Chapter VI: Diseases of the nervous system contains the block Inflammatory diseases of the central nervous system which includes ICD-10 codes G00-G09. The class G00: Bacterial meningitis, not elsewhere classified within this block can be further divided into ICD-10 codes like G00.2: Streptococcal meningitis. In Section 2.4 we explain how this hierarchy is used to train classifiers for ranking candidate terms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Candidate generation</head><p>To align ICD-10 codes to death certificates, our system applies two methods:</p><p>1. ICD-10 code recognition focusing on high R measure values and 2. candidate normalization and ranking to improve P measure values. ICD-10 candidates are generated, from the input text, based on dictionary look-up and fuzzy search. For both languages, customized dictionaries provided by the organizers are used. Preprocessing of documents and dictionaries has been applied to increase the probability to match the correct concepts. It includes -conversion to lower case characters; -removal of punctuation and -conversion of special characters.</p><p>NER follows a stepwise matching strategy. All possible n-grams (n ≤ 5) of an input text are compared to the dictionary by exact match. If no exact match is found then fuzzy matching is applied using Apache Solr. We allow an edit distance of 1 for each token longer than five characters. Multi-token terms are queued using an AND-query. Solr results are ranked such that the first result contains most of the search tokens while only top 10 Solr results are exported to the candidate list. Overlapping sequences are removed from the candidate list by keeping only the longest matching sequences, which decreases slightly the number of candidates. The resulting list of candidates has a high recall, but a low precision. The following step aims at increasing the precision while keeping a similar level of recall.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Candidate normalization and ranking</head><p>The following step was developed to normalize and rank Section 2.3 output to a single ICD-10 code. For this we used supervised document classification. Unlike the NER process, here we developed a language-independent approach. The following classification models have been taken into consideration while performing model selection and optimization: Decision Tree Classifier, Random Forest Classifier, Stochastic Gradient Descent Classifier and Linear Support Vector Classifier. Used classification models are based on the content in the first two hierarchical levels of the ICD-10 dictionaries (chapters and blocks) for French and English. Altogether, this pipeline uses 23 different classification models:</p><p>1. A single general classification model which classifies the input text to one of 22 ICD-10 chapters and 2. 22 chapter classification models which classify and rank the input text to blocks belonging to the respective ICD-10 chapter.</p><p>The normalization and ranking process, as seen in Figure <ref type="figure" target="#fig_0">1</ref>, was performed in two stages, representing the (shallow) hierarchical structure of the available ICD-10 dictionaries used to train previously mentioned classification models. The process itself iterates for each input text through the following steps: Based on the type of text and the amount of characters available in the training data for each chapter or block labels, character level n-gram features (with n between 2 and 5) have been used for building classification models. Extracted features were reinterpreted with tf-idf weighting scheme. This produced a more distinct set of features. Furthermore, tf-idf values were then normalized with L2 norm and feature selection, based on chi 2 test and focusing in top 10% of possible features, was performed. For each of the 23 classification models, model selection and hyper-parameter optimization with randomized search and 10-fold cross validation was performed. This ensured that created models were immune to model overfitting.  <ref type="table">3</ref>. Performance on training and test data for both languages. Performance is given in precision(P), recall(R), and F-measure(F) for each part of our system: after candidate generation and after re-ranking using supervised classification. Only results after candidate re-ranking were submitted. Average and median scores, based on results of all participating teams, are also given.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Classifier</head><p>The different performances for French and English data may be a result of the differences between the datasets. For instance, we did not deal with abbreviations or dissolve coordinated clauses. While they are present in both language sets, we have the impression the English data contains more abbreviations. This could explain the poor performance of the system for the English set. In general, spell checking may improve the overall performance for both systems. Additionally, candidate generation may be improved by taking context information into account.</p><p>As far as candidate normalization and ranking is concerned, there are several possibilities how to improve the results. For instance, the current approach, based on optimized language-independent ML models and character level n-grams, ignored other possible features available in the training data (e.g. sex, age, location, etc). Including more diverse data for the classification models would be an interesting next step. One could also look at the entire hierarchical structure of ICD-10. Our ML-models used the first two hierarchical levels of ICD-10 dictionaries. We also tried out a more in-depth classification by creating models below the second level in the ICD-10 dictionary taxonomy. Unfortunately, those approaches failed to produce satisfactory results. This can be attributed to the lack of sufficient data in the supplied training data sets for all possible labels in the taxonomy. Also, we have tested using more complex features like word embeddings which did not yield satisfactory results. This can be explained by the fact that we have used available models not trained on in-domain documents. By using in-language and in-domain documents to produce word embeddings one can expect this approach to be far better. Even though the domain and used language is slightly different and available corpora are small, one could test training word embeddings on available biomedical French and English corpora such as Quaero <ref type="bibr" target="#b2">[3]</ref>, EMEA <ref type="bibr" target="#b0">[1]</ref> or Mantra <ref type="bibr" target="#b1">[2]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Normalization and ranking pipeline for final ICD-10 code selection</figDesc><graphic coords="5,134.77,115.84,345.81,97.74" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>Frequency of use per optimized classification modelsAn overview of final models, based on best classification score, and their occurrence number is given in Table1. Average P, R and F values across all classification models, for the two used hierarchical levels, are given in Table2.</figDesc><table><row><cell></cell><cell>#models</cell></row><row><cell>SVM_LinearSVC</cell><cell>13</cell></row><row><cell>RandomForestClassifier</cell><cell>6</cell></row><row><cell>LogisticRegression</cell><cell>4</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 .</head><label>2</label><figDesc>Classification models average performance across ICD-10 dictionaries hierarchical levels3 Results &amp; DiscussionWe applied our system to both language sets. Results on the French test set are well above the average results over all participating systems. Test set results on the English data show only average performance. Results for training and test data and performance of individual parts of our system are shown in Table3. The rule-based NER part referred to as candidate generation, and explained in detail in 2.3, focuses on R measure. For the French data sets candidate generation reaches R value of 0.860 for training and 0.844 for test data, while P is low as expected. After candidate ranking, explained in 2.4, using the classifier built on the ICD-10 hierarchy R value drops by 0.09 but P value increases to 0.774 for training and 0.800 for test data. For the English data sets we see a similar trend but an overall lower performance. Candidate generation only reaches a R value of 0.76 for training and test sets. Again, after candidate ranking R value drops but here by 0.16, while P value is increased up to 0.61.</figDesc><table><row><cell>Language</cell><cell>Method</cell><cell></cell><cell>Training</cell><cell></cell><cell></cell><cell>Test</cell><cell></cell></row><row><cell></cell><cell></cell><cell>P</cell><cell>R</cell><cell>F</cell><cell>P</cell><cell>R</cell><cell>F</cell></row><row><cell></cell><cell cols="2">candidate generation 0.548</cell><cell>0.860</cell><cell>0.669</cell><cell>0.557</cell><cell>0.844</cell><cell>0.671</cell></row><row><cell>French</cell><cell>candidate ranking</cell><cell>0.774</cell><cell>0.770</cell><cell>0.772</cell><cell>0.800</cell><cell>0.751</cell><cell>0.765</cell></row><row><cell></cell><cell>average score</cell><cell></cell><cell></cell><cell></cell><cell>0.648</cell><cell>0.556</cell><cell>0.593</cell></row><row><cell></cell><cell>median score</cell><cell></cell><cell></cell><cell></cell><cell>0.629</cell><cell>0.540</cell><cell>0.548</cell></row><row><cell></cell><cell cols="2">candidate generation 0.305</cell><cell>0.756</cell><cell>0.435</cell><cell>0.320</cell><cell>0.763</cell><cell>0.451</cell></row><row><cell>English</cell><cell>candidate ranking</cell><cell>0.610</cell><cell>0.610</cell><cell>0.610</cell><cell>0.616</cell><cell>0.606</cell><cell>0.611</cell></row><row><cell></cell><cell>average score</cell><cell></cell><cell></cell><cell></cell><cell>0.655</cell><cell>0.559</cell><cell>0.602</cell></row><row><cell></cell><cell>median score</cell><cell></cell><cell></cell><cell></cell><cell>0.646</cell><cell>0.527</cell><cell>0.589</cell></row><row><cell>Table</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">https://sites.google.com/site/clefehealth2017/task-1</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">http://www.who.int/classifications/icd/en/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">http://lucene.apache.org/solr/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_3">see http://www.who.int/classifications/icd/icdonlineversions/ en/ and http://apps.who.int/classifications/icd10/browse/2016/en</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="http://opus.lingfil.uu.se/EMEA.php" />
		<title level="m">Emea corpus</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="http://biosemantics.org/index.php/resources/mantra-gsc" />
		<title level="m">Mantra corpus</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<ptr target="https://quaerofrenchmed.limsi.fr/" />
		<title level="m">Quaro corpus</title>
				<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">Mohammed</forename><surname>Dermouche</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rémy</forename><surname>Vincent Looten</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Sylvie</forename><surname>Flicoteaux</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Julien</forename><surname>Chevret</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Namik</forename><surname>Velcin</surname></persName>
		</author>
		<author>
			<persName><surname>Taright</surname></persName>
		</author>
		<title level="m">ECSTRA-INSERM@ CLEF eHealth2016-task 2: ICD10 code extraction from death certificates</title>
				<imprint>
			<publisher>CLEF</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Overview of the CLEF eHealth Evaluation Lab</title>
		<author>
			<persName><forename type="first">Lorraine</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liadh</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hanna</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Leif</forename><surname>Hanlen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aurélie</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cyril</forename><surname>Grouin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">João</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guido</forename><surname>Zuccon</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2015">2015. 2015</date>
			<publisher>Springer International Publishing</publisher>
			<biblScope unit="page" from="429" to="443" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">CLEF 2017 eHealth Evaluation Lab Overview</title>
		<author>
			<persName><forename type="first">Lorraine</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liadh</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hanna</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aurélie</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aude</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Evangelos</forename><surname>Kanoulas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Rene</forename><surname>Spijker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">João</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guido</forename><surname>Zuccon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 -8th Conference and Labs of the Evaluation Forum</title>
		<title level="s">Lecture Notes in Computer Science (LNCS</title>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">LITL at CLEF ehealth2016: recognizing entities in french biomedical documents</title>
		<author>
			<persName><forename type="first">Lydia-Mai</forename><surname>Ho-Dac</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ludovic</forename><surname>Tanguy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Céline</forename><surname>Grauby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nkauj</forename><surname>Hnub</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aurore</forename><forename type="middle">Heu</forename><surname>Mby</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Justine</forename><surname>Malosse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Laura</forename><surname>Rivière</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Amélie</forename><surname>Veltz-Mauclair</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Marine</forename><surname>Wauquier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of CLEF 2016 -Conference and Labs of the Evaluation forum</title>
				<meeting><address><addrLine>Évora, Portugal</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016-09-08">5-8 September, 2016. 2016</date>
			<biblScope unit="page" from="81" to="93" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Overview of the CLEF eHealth Evaluation Lab</title>
		<author>
			<persName><forename type="first">Liadh</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lorraine</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Hanna</forename><surname>Suominen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aurélie</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">João</forename><surname>Palotti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Guido</forename><surname>Zuccon</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016. 2016</date>
			<publisher>Springer International Publishing</publisher>
			<biblScope unit="page" from="255" to="266" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Clinical information extraction at the CLEF eHealth evaluation lab</title>
		<author>
			<persName><forename type="first">Aurelie</forename><surname>Neveol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Lorraine</forename><surname>Goeuriot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Liadh</forename><surname>Kelly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Kevin</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cyril</forename><surname>Grouin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thierry</forename><surname>Hamon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Lavergne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Grégoire</forename><surname>Rey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aude</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Xavier</forename><surname>Tannier</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of CLEF 2016 Evaluation Labs and Workshop: Online Working Notes</title>
				<meeting>CLEF 2016 Evaluation Labs and Workshop: Online Working Notes</meeting>
		<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2016-09">2016. September 2016. 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">CLEF eHealth 2017 Multilingual Information Extraction task overview: ICD10 coding of death certificates in English and French</title>
		<author>
			<persName><forename type="first">Aurélie</forename><surname>Névéol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Robert</forename><forename type="middle">N</forename><surname>Anderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Bretonnel Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cyril</forename><surname>Grouin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Lavergne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Grégoire</forename><surname>Rey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Aude</forename><surname>Robert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Claire</forename><surname>Rondet</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Pierre</forename><surname>Zweigenbaum</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CLEF 2017 Evaluation Labs and Workshop: Online Working Notes</title>
				<imprint>
			<publisher>CEUR-WS</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">Zubair</forename><surname>Van Mulligen</surname></persName>
		</author>
		<author>
			<persName><surname>Afzal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Dang</forename><surname>Akhondi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jan</forename><forename type="middle">A</forename><surname>Vo</surname></persName>
		</author>
		<author>
			<persName><surname>Kors</surname></persName>
		</author>
		<title level="m">Erasmus MC at CLEF eHealth 2016: Concept recognition and coding in French texts</title>
				<imprint>
			<publisher>CLEF</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">LIMSI ICD10 coding experiments on CépiDC death certificate statements</title>
		<author>
			<persName><forename type="first">Pierre</forename><surname>Zweigenbaum</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Thomas</forename><surname>Lavergne</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2016">2016</date>
			<publisher>CLEF</publisher>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
