<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>CLEF eHealth 2017 Multilingual Information Extraction task overview: ICD10 coding of death certi cates in English and French</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aurelie Neveol</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert N. Anderson</string-name>
          <email>RNAnderson@cdc.gov</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>K. Bretonnel Cohen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Cyril Grouin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas Lavergne</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gregoire Rey</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aude Robert</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claire Rondet</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierre Zweigenbaum</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>INSERM-CepiDc</institution>
          ,
          <addr-line>Le Kremlin-Bic</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LIMSI, CNRS, Universite Paris-Saclay</institution>
          ,
          <addr-line>Orsay</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>National Center for Health Statistics</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Universite Paris-Sud</institution>
          ,
          <addr-line>Orsay</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>University of Colorado</institution>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>etre</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper reports on Task 1 of the 2017 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with coding of death certi cates, as introduced in CLEF eHealth 2016. This largescale classi cation task consisted of extracting causes of death as coded in the International Classi cation of Diseases, tenth revision (ICD10). The languages o ered for the task this year were English and French. Participant systems were evaluated against a blind reference standard of 31,690 death certi cates in the French dataset and 6,665 certi cates in the English dataset using Precision, Recall and F-measure. In total, eleven teams participated: 10 teams submitted runs for the English dataset and 9 for the French dataset. Five teams submitted their systems to the reproducibility track. For death certi cate coding, the highest performance was 0.8674 F-measure for French and 0.8501 for English.</p>
      </abstract>
      <kwd-group>
        <kwd>Natural Language Processing</kwd>
        <kwd>Entity Linking</kwd>
        <kwd>Text Classication</kwd>
        <kwd>French</kwd>
        <kwd>Biomedical Text</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The methodology applied is the shared task model. In shared tasks, multiple
groups agree on a \shared" task de nition, a shared data set, and a shared
evaluation metric. The idea is to allow evaluation of multiple approaches to a
problem while minimizing avoidable di erences related to the task de nition,
the data used, and the gure of merit applied [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>
        Over the past four years, CLEF eHealth o ered challenges addressing several
aspects of clinical information extraction (IE) including named entity
recognition, normalization [5{7] and attribute extraction [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Initially, the focus was
on a widely studied type of corpus, namely written English clinical text [
        <xref ref-type="bibr" rid="ref5 ref8">5, 8</xref>
        ].
Starting in 2015, the lab's IE challenge evolved to address lesser studied corpora,
including biomedical texts in a language other than English i.e., French [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This
year, we continue to o er a shared task based on a large set of gold standard
annotated corpora in French with a coding task that required normalized
entity extraction at the sentence level. We also provided an equivalent dataset in
English.
      </p>
      <p>
        The signi cance of this work comes from the observation that challenges and
shared tasks have had a signi cant role in advancing Natural Language
Processing (NLP) research in the clinical and biomedical domains [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ], especially for
the extraction of named entities of clinical interest and entity normalization.
      </p>
      <p>One of the goals for this shared task is to foster research addressing multiple
languages for the same task in order to encourage the development of
multilingual and language adaption methods.</p>
      <p>This year's lab suggests that the task of coding can be addressed
reproducibly with comparable performance in French and in English without relying
on translation.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Material and Methods</title>
      <p>In the CLEF eHealth 2017 Evaluation Lab Task 1, two datasets were used. The
French dataset was supplied by the French CepiDc1 and the English dataset was
supplied by the American CDC2. Both datasets refer to the International
Classi cation of Diseases, tenth revision (ICD10),a reference classi cation of about
14,000 diseases and related concepts managed by the World Health Organization
and used worldwide, to register causes of death and reasons for hospital
admissions. Further details on the datasets, tasks and evaluation metrics are given
below.
2.1</p>
      <sec id="sec-2-1">
        <title>Datasets</title>
        <p>
          The CepiDc corpus was provided by the French institute for health and
medical research (INSERM) for the task of ICD10 coding in CLEF eHealth
2017 (Task 1). It consists of free text death certi cates collected from physicians
and hospitals in France over the period of 2006{2014 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
1 Centre d'epidemiologie sur les causes medicales de deces, Unite Inserm US10, http:
//www.cepidc.inserm.fr/.
2 American Center for Disease Control, https://www.cdc.gov/
The CDC corpus was provided by the American Center for Disease Control
(CDC). It consists of free text death certi cates collected electronically in the
United States during the year 2015. These are all records due to natural causes,
i.e., there are no injury-related deaths included.
        </p>
        <p>
          Dataset excerpts. Death certi cates are standardized documents lled by
physicians to report the death of a patient. The content of the medical
information reported in a death certi cate and subsequent coding for public health
statistics follows complex rules described in a document that was supplied to
participants [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. Tables 1 and 2 present excerpts of the CepiDC and CDC
corpora that illustrate the heterogeneity of the data that participants had to deal
with. While some of the text lines were short and contained a term that could be
directly linked to a single ICD10 code (e.g., \choc septique"), other lines could
contain non-diacritized text (e.g., \peritonite..." missing the diacritic on the rst
\e"), abbreviations (e.g., \DM II" instead of \diabetes mellitus, type 2"). Other
challenges included run-on narratives or mixed text alternating between upper
case non-diacritized text and lower-case diacritized text.
line text
Descriptive statistics. Tables 3 and 4 present statistics for the speci c sets
provided to participants. For both languages, the dataset construction was
timeoriented in order to re ect the practical use case of coding death certi cates,
where historical data is available to train systems that can then be applied to
current data to assist with new document curation. For French, the training set
3 septic shock
4 colon perforation leading to stercoral peritonitis
5 Acute Respiratory Distress Syndrome
6 multiple organ failure
7 HBP: High Blood Pressure
covered the 2006{2012 period, and the development set contained death certi
cates from 2013 and the test set from 2014. For English, data was only
available for the year 2015, but the training and test sets were nonetheless divided
chronologically during that year. While the French dataset o ers more
documents spread over an eight year period, it also re ects changes in the coding
rules and practices over the period. In contrast, the English dataset is smaller
but more homogeneous.
Dataset format. In compliance with the World Health Organization (WHO)
international standards, death certi cates comprise two parts: Part I is dedicated
to the reporting of diseases related to the main train of events leading directly to
death, and Part II is dedicated to the reporting of contributory conditions not
directly involved in the main death process.10 According to WHO
recommenda8 These numbers were obtained using the linux wc -w command
9 These numbers were obtained using the linux wc -w command applied to the fourth
eld
10 As can be seen in the sample documents, the line numbering in the raw causes le
may (Table 2) or may not (Table 1) be the same in the computed causes le. In some
cases, the ordering in the computed causes le was changed to follow the causal chain
of events leading to death.
tions, the completion of both parts is free of any automatic assistance that might
in uence the certifying physician. The processing of death certi cates, including
ICD10 coding, is performed independently of physician reporting. In France and
in the United States, coding of death certi cates is performed within 18 months
of reporting using the IRIS system [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In the course of coding practice, the data
is stored in di erent les: a le that records the native text entered in the death
certi cates (referred as `raw causes' thereafter) and a le containing the result of
ICD code assignment (referred as `computed causes' thereafter). The `computed
causes' le may contain normalized text that supports the coding decision and
can be used in the creation of dictionaries for the purpose of coding assistance.
We found that the formatting of the data into raw and computed causes made
it di cult to directly relate the codes assigned to original death certi cate texts.
This makes the datasets more suitable for approaching the coding problem as a
text classi cation task at the document level rather than a named entity
recognition and normalization task. We have reported separately on the challenges
presented by the separation of data into raw and computed causes, and proposed
solutions to merge the French data into a single `aligned' format, relying on the
normalized text supplied with the French raw causes [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. Table 1 presents a
sample of French death certi cate in `raw' and `aligned' format. It illustrates the
challenge of alignment with the line 2 in the raw le "peritonite stercorale sur
perforation colique" which has to be mapped to line 4 "peritonite stercorale"
(code K65.9) and line 5 "perforation colique" (code K63.1) in the computed le.
        </p>
        <p>As can be seen in Table 2 similar alignment challenges can be encountered in
the English dataset. In Sample certi cate 2, line 1 in the raw le "STROKE IN
SEPTEMBER LEFT HEMIPARESIS" has to be mapped to line 1 (code I64,
"Stroke, not speci ed") and line 2 (code G819, "Hemiplegia, unspeci ed") in
the computed le. However, no normalized text was available for English and we
were not able to o er an aligned version of the raw and computed les for the
American dataset in this edition of the shared task.</p>
        <p>Data les. Table 5 presents a description of the les that were provided to
the participants: training (train) and development (dev, French only) les were
distributed early in the challenge (in January 2017) ; test les (test, with no gold
standard) were distributed at test time (at the end of April 2017); and the gold
standard for test les (test+g in aligned format, test, computed in raw format)
were disclosed to the participants after the text phase (in May 2017) just before
the submission of their workshop papers, so that participants could reproduce
the performance measures announced by the organizers.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Tasks</title>
        <p>ICD10 coding The coding task consisted of mapping lines in the death
certi cates to one or more relevant codes from the International Classi cation of
Diseases, tenth revision (ICD10). For the raw datasets, codes were assessed at
the certi cate level. For the aligned dataset, codes were assessed at the line level.</p>
        <p>L. Split Type Year File name
dfr train aligned 2006{2012 corpus/train/AlignedCauses 2006-2012full.csv
enfr dev aligned 2013 corpus/dev/AlignedCauses 2013full.csv
ligfr test aligned 2014 aligned/corpus/AlignedCauses 2014test.csv
Afr test+g aligned 2014 aligned/corpus/AlignedCauses 2014 full.csv
fr train raw 2006{2012 corpus/train/CausesBrutes FR training.csv
fr train ident 2006{2012 corpus/train/Ident FR training.csv
fr train computed 2006{2012 corpus/train/CausesCalculees FR training.csv
fr dev raw 2013 corpus/dev/CausesBrutes FR dev.csv
awfr dev ident 2013 corpus/dev/Ident FR dev full.csv
Rfr dev computed 2013 corpus/dev/CausesCalculees FR dev.csv
fr test raw 2014 raw/corpus/CausesBrutes FR test2014.csv
fr test ident 2014 raw/corpus/Ident FR test2014.csv
fr test computed 2014 raw/corpus/CausesCalculees FR test2014 full.csv
en train raw 2015 corpus/CausesBrutes EN training.csv
en train ident 2015 corpus/Ident EN training.csv
wen train computed 2015 corpus/CausesCalculees EN training.csv
aRen test raw 2015 raw/corpus/CausesBrutes EN test.csv
en test ident 2015 raw/corpus/Ident EN test.csv
en test computed 2015 raw/corpus/CausesCalculees EN test full.csv
Replication. The replication task invited lab participants to submit a system
used to generate one or more of their submitted runs, along with instructions
to install and use the system. Then, two of the organizers independently worked
with the submitted material to replicate the results submitted by the teams as
their o cial runs.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Evaluation metrics</title>
        <p>System performance was assessed by the usual metrics of information extraction:
precision (Formula 1), recall (Formula 2) and F-measure (Formula 3; speci cally,
we used =1.).</p>
        <p>Precision =</p>
        <p>true positives
true positives + false positives
Recall =</p>
        <p>true positives
true positives + false negatives
F-measure =
(1 +
2
2)</p>
        <p>precision recall
precision + recall
(1)
(2)
(3)</p>
        <p>Results were computed using two perl scripts, one for the raw datasets (in
English and in French) and one for the aligned dataset (in French only). The
evaluation tools were supplied to task participants along with the training data.
Measures were computed for \ALL" causes in the datasets as our main
evaluation reference for the task. In this case the evaluation is performed for all ICD
codes. Measures were also computed for \EXTERNAL" causes as our secondary
reference for the task. In this case, the evaluation is limited to ICD codes
addressing a particular type of deaths, called \external causes" or violent deaths.
These causes are of particular interest for two reasons: rst, they are
considered as \avoidable" and public health policies can target them speci cally, e.g.,
suicide prevention. Second, the context associated with these deaths is often
quite di erent from other deaths in terms of comorbidity, population a ected
and terminology used to describe the event. In practice, external causes are
characterized by codes V01 to Y98.</p>
        <p>For the raw datasets, matches (true positives) were counted for each ICD10
full code supplied that matched the reference for the associated document.</p>
        <p>For the aligned dataset, matches (true positives) were counted for each ICD10
full code supplied that matched the reference for the associated document line.</p>
        <p>The evaluation of the submissions to the replication task was essentially
qualitative: we used a scoring grid to record the ease of installing and running
the systems, the time spent to obtain results with the systems (analysts were
committed to spend at most one working day|or 8 hours|to work with each
system), and whether we managed to obtain the exact same results submitted
as o cial runs.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results and Discussion</title>
      <p>Participating teams included between one and twelve team members and resided
in Australia (team UNSW), France (teams LIMSI, LIRMM, LITL, Mondeca
and SIBM), Germany (teams TUC and WBI), Italy (Team UNIPD) and Russia
(team KFU). Teams often comprised members with a variety of backgrounds
and drew from computer science, informatics, statistics, information and library
science, clinical practice. It can be noted that one team (LITL) participated in
the challenge as a master-level class project. One team (LIMSI) was composed of
members of the organization team and submitted uno cial runs due to con ict
of interest. One team submitted baseline runs.</p>
      <p>For the English raw dataset, we received 15 o cial runs from 9 teams,
including one baseline run and one invalid run (due to formatting issues). For the
French raw dataset, we received 7 o cial runs from 4 teams. For the French
aligned dataset, we received 9 o cial runs from 6 teams, including one baseline
run.</p>
      <p>Five systems were submitted to the replication track, allowing us to attempt
replicating a total of 22 system runs.
3.1</p>
      <sec id="sec-3-1">
        <title>Methods implemented in the participants' systems</title>
        <p>
          Participants used a variety of methods, many of which relied on lexical sources
including the dictionaries supplied as part of the training data as well as other
medical terminologies and ontologies. Some of these knowledge-based methods
exploited the gold standard training data as an additional knowledge source.
IMS-UNIPD. The UNIPD team submitted o cial runs for the English dataset
and later submitted uno cial runs for the French datasets as well [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. This
team implemented a minimal expert system based on rules to translate acronyms
together with a binary weighting approach (run 1) and a tf-idf approach (run 2)
to retrieve the items in the dictionary most similar to the portion of the certi cate
of death. For both con gurations, a basic approach was used to select the class
with the highest weight.
        </p>
        <p>
          KFU. The KFU team submitted two runs for the English dataset [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. They used
sequence to sequence deep learning models based on recurrent neural networks.
As input sequence, the method takes the raw text and outputs sequence of ICD10
codes. Both the supplied corpus and dictionary were used for training, exclusive
of any additional data.
        </p>
        <p>
          LITL. The LITL team submitted runs for the French dataset in the raw and
aligned formats [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. The LITL team system was speci cally designed by
master's students (LITL programme, university of Toulouse) and their teachers for
the challenge. The system is based on the search platform SOLR. Training data
was indexed using the SolrXML format. The core is organized into ICD codes
associated with the corresponding \raw Texts", \diagnostic Texts", ICD headings
and SNOMED labels. The raw Texts from the test dataset were automatically
transformed into queries and submitted to SOLR. The two runs submitted are
based on the same collection and SOLR con guration. For Run 1, raw texts were
automatically split into several queries when di erent causes were detected by
using a custom-made rule-based system. For Run 2, each query corresponds to
the entire raw text of each CepiDC line.
        </p>
        <p>
          LIMSI. The LIMSI team submitted uno cial runs for all datasets [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The
starting point for these submissions is their last published system [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ], which
relied upon dictionary projection and supervised multi-class, mono-label text
classi cation using simple features (bag of normalized tokens, character trigrams,
and coding year). They extended this system to multi-label classi cation and the
use of dictionary and token bigram features in the classi er. Character n-grams
did not improve the F1-score on the training set and were discarded. Coding year
was kept for the French data, but not for the English data, because it only spans
year 2015. Because it only relies on the material provided by the task organizers,
the same system could be applied to both the French and English datasets. In
each case, Run 1 used a supervised machine learning method (multi-label SVM,
with unigrams, bigrams and [for French] coding year), and Run 2 used a hybrid
method: union of calibrated dictionary and multi-label SVM.
        </p>
        <p>
          LIRMM. The LIRMM team submitted runs for all datasets [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. They
annotated death certi cate text through the SIFR Bioportal Annotator (http:
//bioportal.lirmm.fr/annotator) using di erent con gurations of the web
service. For French, Simple Knowledge Organization System (SKOS) was built
using ICD10 content from the CISMeF portal, the set of dictionaries provided in
the challenge, as well as the training corpus. For the rst run, the ontology was
generated with a heuristic, where labels that correspond to multiple codes are
assigned to the most frequent code only. For the second run, a fall back strategy
relaxes the most frequent code heuristic for lines that were not assigned any
codes initially. For English, in the rst run, the SKOS was built using the
American dictionary supplied with training data. In the second run the dictionary
was combined with an owl version of ICD10 and ICD10CM (extracted from the
Uni ed Medical Language System).
        </p>
        <p>
          Mondeca. The Mondeca team submitted uno cial runs11 for all datasets [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ].
They approached multilingual extraction of IC10 codes by combining semantic
web technology and NLP concepts in four steps: (i) transform all the datasets
into RDF for a graph-based manipulation; (ii) transform the dictionaries for all
the years into SKOS for better enrichment across the knowledge-bases; (iii)
design a GATE work ow to annotate the RDF datasets based on gazetteers
extracted from the dictionaries; and (iv) work on both French (raw data) and
English corpus within a unique work ow, in a multilingual approach thus
enabling simultaneous processing of multiple languages.
        </p>
        <p>
          SIBM. The SIBM team submitted runs for all datasets [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ]. Their approach of
term extraction is performed at the phrase level using natural language
processing. The system is built using Python and Python/C extensions and produces
the following output for each identi ed concept: (i) the entry text, (ii) the o set
of the rst and the nal word contained in the health concept, (iii) the ICD10
identi er and (iv) the ICD10 term. Three main steps lead to the identi cation
of ICD10 concepts for a given text: During tokenization, the input text is sliced
into phrases, then words. Stop words are ltered and spell checking is performed
using the Enchant library. Next, during ICD10 candidate selection, a method
based on the phonetic encoding algorithm Double Metaphone (DM) is used for
approximate term search. This system relies on a database storing pre-computed
DM codes for each word available in the ICD10 dictionaries. Finally, during
candidate ranking, a combination of the longest common substring and fuzzy match
algorithms provides the candidate ranking. The most likely term having the
highest score is retained as the matching ICD10 code for the phrase.
TUC. The TUC team submitted runs for all datasets [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. Their approach is
focused on the exploration of relevant feature groups for multilingual text
classi cation regarding ICD10 codes. First, a large scale brute-force feature set is
constructed using the groups bag of words, bag of bigrams, bag of trigrams,
latent Dirichlet allocation, and the ontologies of WordNet and UMLS. In the
11 One o cial run was submitted but did not comply with the challenge required format
and could not be evaluated.
development phase, three di erent strategies were evaluated in conjunction with
support vector machines for the English and French corpus: each feature group
separately, early fusion of all feature groups, and late fusion. For English, early
fusion (run 1) and the feature group bag of bigrams (run 2) achieved the best
results. For French, average late fusion concerning bag of words and bag of bigrams
(run 1), and the feature group bag of bigrams (run 2) performed best.
UNSW. The UNSW team submitted runs for the American dataset [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. They
deployed a knowledge-based approach to tackle the task by solely using
dictionary lookup. The rst step is to index manually coded ICD10 lexicon followed
by dictionary matching. Priority rules are applied to retrieve the relevant
entity/entities and their corresponding ICD10 code(s) given free text cause of
death description. Two priority methods were implemented in the submitted
runs: the rst one relied on BM25 and the second one on direct term match.
The advantages of a knowledge-based method include speed and no need for
training data.
        </p>
        <p>
          WBI. The WBI team submitted runs for the English raw dataset and for the
French aligned dataset [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. They combined standard rule-based methods for
Named Entity Recognition (NER) with machine-learning approaches for
candidate ranking. For NER rule-based dictionary lookup and fuzzy matching using
Lucene Sorl was applied. Preference was on generating potential candidates for
each match to increase recall. Candidates were then ranked using a
machinelearning approach. Based on the hierarchy of the ICD10 terminology (chapters,
blocks, sub-chapters) combined with ICD10-Codes and Text available from the
provided dictionaries a classi er was developed for ranking candidates.
Baselines. To provide a better assessment of the task di culty and system
performance, this year we o er baseline results using two methods: 1/ the ICD
baseline consisted of exact string matching between the terms in the ICD and
the death certi cate text. 2/ the frequency baseline consisted in assigning to
a certi cate line from the test set the top 2 most frequently associated ICD10
codes in the training and development sets, using case and diacritic insensitive
line matching.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>System performance on death certi cate coding</title>
        <p>Tables 6 to 8 present system performance on the ICD10 coding task for each
dataset. Team KFU obtained the best performance in terms of F-measure both
overall and for the external causes of death on the English dataset. Team SIBM
obtained the best o cial performance in terms of F-measure both overall and for
the external causes of death on the French datasets. It is interesting to note that
the participants who obtained the best scores on the French datasets (SIBM and
LIMSI) are returning teams who also participated in the coding task in 2016.
Team SIBM's performance improved from an F-measure of .680 in 2016 to an
F-measure of .804 this year while team LIMSI's performance improved from an
F-measure of .652 in 2016 to an F-measure of .867 this year, which also exceeds
the best performance of 2016 obtained by team Erasmus with F-measure of
.848.12 This suggests that there is room for improvement on this task, and that
iterations of the task are useful to help identify the best ideas and methods to
address the task.</p>
        <p>To provide a more in-depth analysis of results, this year we also introduced
a measure of system performance on the external causes of death, which are
of speci c interest to public-health specialists, and are also thought to be more
di cult to code. This hypothesis was con rmed by the results, as system
performance was much lower on the external causes vs. all causes for all systems, both
for the English and French datasets. Interestingly, some systems o ered very
good performance overall, but comparatively quite low performance on external
causes, and vice-versa. We also note that the performance of the frequency
baseline was much higher on the French aligned dataset, compared to the French raw
dataset and English dataset. This suggests that there is value to the alignment
12 We note that these comparisons are indicative since the data sets used in 2016 and
2017 are not identical; speci cally, the 2016 test set was distributed in 2017 as a
development set and the 2017 test set consisted of new data (unreleased in 2016).
step of data preparation, and to the size of the dataset (the French dataset was
signi cantly larger than the English dataset).</p>
        <p>The results show that both knowledge-based and statistical methods can
perform well on the task. For English the best performance is obtained from
a statistical neural method (team KFU) and the second best is obtained by
a machine learning method relying on knowledge based-sources (team LIMSI).
For French, the best performance is obtained from a machine learning method
relying on knowledge based-sources (team LIMSI), while the second best is
obtained with a combination of knowledge based and Natural Language processing
methods (Team SIBM). In addition, many teams relied on a system architecture
that was the same for both languages and utilized language speci c features
or knowledge sources, requiring little language adaptation. The results are very
encouraging from a practical perspective and indicate that a coding assistance
system could prove very useful for the e ective processing of death certi cates
in multiple languages.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Replication track and replicability of the results</title>
        <p>Five teams submitted systems to our replication track. Only one of these teams
had also participated in the replication track last year. Four systems covered
both French and English, and one system only processed English.
In addition, the replication track also used the simple scripts used to produce
baseline runs.</p>
        <p>Most of the baseline and system runs could be replicated by at least one
analyst. However, the analysts still experienced varying degrees of di culty to
install and run the systems. Di erences were mainly due to the technical set-up
of the computers used to replicate the experiments. Analysts also report that
additional information on system requirements, installation procedure and
practical use would be useful for all the systems submitted, although documentation
was overall more abundant and detailed compared to last year's experiments.
In some cases, system authors were contacted for help. They were responsive
and contributed to facilitate the use of their system. The results of the
experiments suggest that replication is achievable. However, it continues to be more
of a challenge than one would hope.
Formatting issues. In the French dataset, a formatting issue a ected the
certi cates whose narratives contained a semicolon. The data export from IRIS
to csv failed to adequately protect the text eld with quotes, so that some of the
data instances were made di cult to parse. Nonetheless, this problem a ected
less than 1% of the lines so we believe it had limited impact on the results. The
export format will be corrected in future releases of the dataset. However, we
would like to note that this type of issue ts within the practical `real life' element
of this challenge. While it certainly may have made system development more
di cult, it also advocated for systems with strategies for dealing with potentially
less-than-perfect data. While unintended, we believe this situation in fact makes
for a robust evaluation because this kind of data would also be present in a
practical work ow.</p>
        <p>Did smoking contribute to the death? In the American dataset, the
assignment of code F179 \Mental and behavioral disorders due to use of tobacco,
unspeci ed" may be supported by information supplied by the reporting
physician either in certi cate narrative or in a structured data form. As a result, the
gold standard assignment of F179 is sometimes unsupported by text. The
prevalence of F179 due to form lling vs. text report is unknown and the two cases
are currently indistinguishable in the dataset. The sample document shown in
Table 2 illustrates the case of F179 assignment supported by data form and not
by text. The prevalence of the code is 4.7% in the training set and 3.9% in the
test set, which creates a bias for all evaluated systems. We estimate that the
bias could create di erences of up to 2% in the overall F-measure. However, we
note that the external causes evaluation is not impacted because F179 does not
belong to the external cause of death category.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>We released a new set of death certi cates to evaluate systems on the task of
ICD10 coding in multiple languages. This is the third edition of a biomedical
NLP challenge that provides large gold-standard annotated corpora in French.
Results show that high performance can be achieved by NLP systems on the task
of coding for death certi cates in French and in English. The level of performance
observed shows that there is potential for integrating automated assistance in
the death certi cate coding work ow in both languages. We hope that continued
e orts towards reproducibility will support the shift from research prototypes
to operational production systems. The corpus used and the participating team
system results are an important contribution to the research community. In
addition, the focus on a language other than English (French) remains a rare
initiative in the biomedical NLP community.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>We want to thank all participating teams for their e ort in addressing new
and challenging tasks. The organization work for CLEF eHealth 2017 task 1
was supported by the Agence Nationale pour la Recherche (French National
Research Agency) under grant number ANR-13-JCJC-SIMI2-CABeRneT. The
CLEF eHealth 2016 evaluation lab has been supported in part by the CLEF
Initiative and Data61.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Liadh Kelly, Hanna Suominen, Aurelie Neveol, Aude Robert, Evangelos Kanoulas, Rene Spijker,
          <article-title>Joa~o Palotti, and Guido Zuccon</article-title>
          .
          <article-title>CLEF 2017 eHealth Evaluation Lab Overview</article-title>
          .
          <source>CLEF 2017 - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS)</source>
          , Springer, September,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <source>World Health Organization. ICD-10. International Statistical Classi cation of Diseases and Related Health Problems. 10th Revision</source>
          . Volume
          <volume>2</volume>
          . Instruction manual.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Jones</surname>
            <given-names>KS</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galliers</surname>
            <given-names>JR</given-names>
          </string-name>
          .
          <source>Evaluating natural language processing systems: An analysis and review</source>
          .
          <source>1995</source>
          . Springer Science &amp; Business Media:
          <volume>1083</volume>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Voorhees</surname>
            <given-names>EM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harman</surname>
            <given-names>DK</given-names>
          </string-name>
          <article-title>and others</article-title>
          . TREC:
          <article-title>Experiment and evaluation in information retrieval</article-title>
          , vol
          <volume>1</volume>
          .
          <year>2005</year>
          . MIT press Cambridge.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Suominen</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salantera</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            <given-names>WW</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savova</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elhadad</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pradhan</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>South</surname>
            <given-names>BR</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mowery</surname>
            <given-names>DL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            <given-names>GJF</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leveling</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            <given-names>G</given-names>
          </string-name>
          .
          <article-title>Overview of the ShARe/CLEF eHealth Evaluation Lab 2013</article-title>
          . In: Forner P, Muller H,
          <string-name>
            <surname>Paredes</surname>
            <given-names>R</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosso</surname>
            <given-names>P</given-names>
          </string-name>
          , Stein B (eds),
          <source>Information Access Evaluation</source>
          . Multilinguality, Multimodality, and Visualization.
          <source>LNCS</source>
          (vol.
          <volume>8138</volume>
          ):
          <fpage>212</fpage>
          -
          <lpage>231</lpage>
          . Springer,
          <year>2013</year>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Goeuriot</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanlen</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            <given-names>G</given-names>
          </string-name>
          .
          <article-title>Overview of the CLEF eHealth Evaluation Lab 2015</article-title>
          . In: Information Access Evaluation. Multilinguality, Multimodality, and Interaction. Springer,
          <year>2015</year>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kelly</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2016</year>
          )
          <article-title>Overview of the CLEF eHealth Evaluation Lab 2016</article-title>
          . In: Fuhr N. et al. (
          <article-title>eds) Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>CLEF 2016. Lecture Notes in Computer Science</source>
          , vol
          <volume>9822</volume>
          . Springer, Cham
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kelly</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            <given-names>H</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schreck</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leroy</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mowery</surname>
            <given-names>DL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            <given-names>S</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            <given-names>WW</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
            <given-names>D</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palotti</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Overview of the ShARe/CLEF eHealth Evaluation Lab 2014</article-title>
          . In:
          <string-name>
            <surname>Kanoulas</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupu</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clough</surname>
            <given-names>P</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanderson</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanbury</surname>
            <given-names>A</given-names>
          </string-name>
          , Toms E (eds),
          <source>Information Access Evaluation</source>
          . Multilinguality, Multimodality, and Interaction.
          <source>LNCS</source>
          (vol.
          <volume>8685</volume>
          ):
          <fpage>172</fpage>
          -
          <lpage>191</lpage>
          . Springer,
          <year>2014</year>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Chapman</surname>
            <given-names>WW</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nadkarni</surname>
            <given-names>PM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hirschman</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>D'Avolio</surname>
            <given-names>LW</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savova</surname>
            <given-names>GK</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uzuner</surname>
            <given-names>O</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions</article-title>
          .
          <source>J Am Med Inform Assoc</source>
          ,
          <volume>18</volume>
          (
          <issue>5</issue>
          ):
          <fpage>540</fpage>
          -
          <lpage>3</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Huang</surname>
            <given-names>CC</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            <given-names>Z</given-names>
          </string-name>
          (
          <year>2015</year>
          ).
          <article-title>Community challenges in biomedical text mining over 10 years: success, failure and the future</article-title>
          .
          <source>Brief Bioinform</source>
          ,
          <year>2015</year>
          <article-title>May 1</article-title>
          . pii: bbv024.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pavillon</surname>
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Laurent</surname>
            <given-names>F</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Certi cation et codi cation des causes medicales de deces</article-title>
          .
          <source>Bulletin Epidemiologique</source>
          Hebdomadaire - BEH:
          <fpage>134</fpage>
          -
          <lpage>138</lpage>
          . http://opac. invs.sante.fr/doc_num.php?explnum_id=2065 (accessed:
          <fpage>2016</fpage>
          -06-06)
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Johansson</surname>
            <given-names>LA</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pavillon</surname>
            <given-names>G</given-names>
          </string-name>
          (
          <year>2005</year>
          ).
          <article-title>IRIS: A language-independent coding system based on the NCHS system MMDS</article-title>
          .
          <source>In WHO-FIC Network Meeting</source>
          , Tokyo, Japan
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lavergne</surname>
            <given-names>T</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neveol</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grouin</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            <given-names>P.</given-names>
          </string-name>
          <article-title>A Dataset for ICD-10 Coding of Death Certi cates: Creation and Usage</article-title>
          .
          <source>Proceedings of the Fifth Workshop on Building and Evaluating Ressources for Health and Biomedical Text Processing - BioTxtM2016</source>
          .
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Di Nunzio</surname>
            <given-names>GM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beghini</surname>
            <given-names>F</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vezzani</surname>
            <given-names>F</given-names>
          </string-name>
          and
          <string-name>
            <surname>Henrot G</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>A Lexicon Based Approach to Classi cation of ICD10 Codes. IMS Unipd at CLEF eHealth Task 1</article-title>
          . CLEF 2017 Online Working Notes. CEUR-WS
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Miftakhutdinov Z and Tutubalina</surname>
            <given-names>E</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>KFU at CLEF eHealth 2017 Task 1: ICD-10 Coding of English Death Certi cates with Recurrent Neural Networks</article-title>
          .
          <source>CLEF 2017 Online Working Notes. CEUR-WS</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ho-Dac</surname>
            <given-names>LM</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fabre</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Birski</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boudraa</surname>
            <given-names>I</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bourriot</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cassier</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Delvenne</surname>
            <given-names>L</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Gonzalez</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
            <given-names>EB</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piccinini</surname>
            <given-names>E</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rohrbacher</surname>
            <given-names>C</given-names>
          </string-name>
          and
          <article-title>Seguier A (</article-title>
          <year>2017</year>
          ).
          <article-title>LITL at CLEF eHealth2017: automatic classi cation of death reports</article-title>
          .
          <source>CLEF 2017 Online Working Notes. CEUR-WS</source>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zweigenbaum</surname>
            <given-names>P</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lavergne</surname>
            <given-names>T</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Multiple methods for multi-class, multilabel ICD-10 coding of multi-granularity, multilingual death certi cates</article-title>
          .
          <source>CLEF 2017 Online Working Notes. CEUR-WS</source>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zweigenbaum</surname>
            <given-names>P</given-names>
          </string-name>
          and
          <string-name>
            <surname>Lavergne</surname>
            <given-names>T.</given-names>
          </string-name>
          <article-title>Hybrid methods for ICD-10 coding of death certi cates</article-title>
          .
          <source>In Seventh International Workshop on Health Text Mining and Information Analysis</source>
          , pages
          <fpage>96</fpage>
          -
          <lpage>105</lpage>
          , Austin, Texas, USA,
          <year>November 2016</year>
          .
          <source>EMNLP</source>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Tchechmedjiev</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abdaoui</surname>
            <given-names>A</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Emonet</surname>
            <given-names>V</given-names>
          </string-name>
          and
          <string-name>
            <surname>Jonquet</surname>
            <given-names>C</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>ICD10 coding of death certi cates with the NCBO and SIFR Annotator(s) at CLEF eHealth 2017 Task 1</article-title>
          . CLEF 2017 Online Working Notes. CEUR-WS
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Atemezing</surname>
            <given-names>GA</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>NoNLP: Annotating Medical Domain by using Semantic Techologies</article-title>
          .
          <source>CLEF 2017 Online Working Notes. CEUR-WS</source>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Cabot</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soualmia</surname>
            <given-names>LF</given-names>
          </string-name>
          and
          <string-name>
            <surname>Darmoni</surname>
            <given-names>SJ</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <source>SIBM at CLEF eHealth Evaluation Lab</source>
          <year>2017</year>
          :
          <article-title>Multilingual Information Extraction with CIM-IND</article-title>
          .
          <article-title>CLEF 2017 Online Working Notes</article-title>
          . CEUR-WS
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Ebersbach</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herms</surname>
            <given-names>R</given-names>
          </string-name>
          and
          <string-name>
            <surname>Eibl</surname>
            <given-names>M</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Fusion Methods for ICD10 Code Classication of Death Certi cates in Multilingual Corpora</article-title>
          .
          <source>CLEF 2017 Online Working Notes. CEUR-WS</source>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Jonnagaddala</surname>
            <given-names>J</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hu F</surname>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Automatic coding of death certi cates to ICD-10 terminology</article-title>
          .
          <source>CLEF 2017 Online Working Notes. CEUR-WS</source>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Seva</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kittner</surname>
            <given-names>M</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roller</surname>
            <given-names>R</given-names>
          </string-name>
          and
          <string-name>
            <surname>Leser</surname>
            <given-names>U</given-names>
          </string-name>
          (
          <year>2017</year>
          ).
          <article-title>Multi-lingual ICD-10 coding using a hybrid rule-based and supervised classi cation approach at CLEF eHealth 2017</article-title>
          .
          <article-title>CLEF 2017 Online Working Notes</article-title>
          . CEUR-WS
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>