<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Lexicon Based Approach to Classi cation of ICD10 Codes. IMS Unipd at CLEF eHealth Task 1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <email>giorgiomaria.dinunzio@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Beghini</string-name>
          <email>fede.beghini92@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Vezzani</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Genevieve Henrot</string-name>
          <email>genevieve.henrot@unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Information Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Linguistic and Literary Studies</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Padua</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe the participation of the Information Management Systems (IMS) group at CLEF eHealth 2017 Task 1. In this task, participants are required to extract causes of death from death reports (in French and in English) and label them with the correct International Classi cation Diseases (ICD10) code. We tackled this task by focusing on the replicability and reproducibility of the experiments and, in particular, on building a basic compact system that produces a clean dataset that can be used to implement more sophisticated approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In this paper, we report the experimental results of the IMS group that
participated for the rst time to the CLEF eHealth Lab [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], in particular to Task 1:
\Multilingual Information Extraction - ICD10 coding" [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. This task consists
in labelling with International Classi cation Diseases (ICD10) codes death
certi cate texts written in English or in French. This work is usually performed by
experts in medicine; however, when large volumes of data need to be organized
and labelled, manual work is not only expensive but also time consuming and
probably not feasible when hundreds of thousands of death certi cates need to
be classi ed according to a taxonomy of thousands of codes. For this reason, a
possible solution is to approach this task either from a machine learning
perspective and/or a natural language processing perspective by using syntactic and/or
semantic decision rules [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The main goal of our participation to this task was to build a reproducible
set of experiments of a system that i) converts raw data into a cleaned dataset,
ii) implements a set of manual rules to split sentences and translate medical
acronyms, and iii) implement a lexicon based classi cation approach with the
aim of building a su ciently strong baseline (our initial objective was to achieve
a classi er with precision and recall equal 0.5) . We intentionally did not make
use of any machine learning approach to improve the accuracy of the classi
cation of death certi cates; in fact, the main objective was to build a modular
system that can be easily enhanced in order to make use of the cleaned training
data available. For this purpose, we devised a pipeline for processing each death
certi cate and producing a `normalized' version of the text. Indeed, death
certi cates are standardized documents lled by physicians to report the death of
a patient but the content of each document contains heterogeneous and noisy
data that participants had to deal with [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. For example, some certi cates
contain non-diacritized text, or a mix of cases and diacritized text, acronyms and/or
abbreviations, and so on.
      </p>
      <p>The main points of our contribution to this task can be summarized as
follows:
{ A reproducibility framework to explain each step of the pipeline from raw
data to cleaned data;
{ A minimal expert system based on rules to split sentences and translate
acronyms;
{ Experimenting di erent weighting approach to retrieve the items in the
dictionary most similar to the portion of the certi cate of death;
{ A simple classi cation approach to select the ICD code with the highest
weight.</p>
      <p>For this task, we submitted 2 o cial English runs plus 3 uno cial English
runs and 8 uno cial French runs.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <p>In this section, we describe the main aspects of our contribution: the software
used to build the reproducibility framework, the data cleaning pipeline, and the
classi cation approach.
2.1</p>
      <sec id="sec-2-1">
        <title>R Markdown for Reproducible Research</title>
        <p>
          The problem of reproducibility in Information Retrieval has been addressed by
many researchers in the eld in the last years [
          <xref ref-type="bibr" rid="ref12 ref4 ref6">6, 4, 12</xref>
          ]. The main concerns for
reproducibility in IR are related to system runs; in fact, even if a researcher uses
the same datasets and the same open source software, there are many hidden
parameters that make the full reproducibility of the experiment very di cult. For
this reason, there are important initiatives in the main IR conferences that
support this kind of activity (see for example the open source information retrieval
reproducibility challenge at SIGIR3 or the Reproducibility track at ECIR [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]) as
well as in the Natural Language Processing community [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>During the same time span, the Data Science community has questioned the
same issues4 and has produced interesting solutions from a software point of</p>
        <sec id="sec-2-1-1">
          <title>3 https://github.com/lintool/IR-Reproducibility 4 http://www.nature.com/news/reproducibility-1.17552</title>
          <p>
            view. The R Markdown framework5 is now considered one of the possible
solutions to document the results of an experiment and, at the same time, reproduce
each step of the experiment itself. Following the indications given by [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ], we
developed the experimental framework in R and publish the source code on github
to allow other participants to reproduce our results.6
2.2
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Pipeline for Data Cleaning</title>
        <p>In order to produce a clean dataset, we implemented the following pipeline for
data ingestion and preparation for all the experiments:
{ read a line of a death certi cate,
{ split the line according to the expression listed in Table 1;
{ remove extra white space (leading, trailing, internal);
{ transform letters to lower case;
{ remove punctuation;
{ expand acronyms (if any);
{ correct common patterns (if any).</p>
        <p>Acronym Expansion Acronym expansion is a crucial step to normalize data
and make the death certi cate clearer and more coherent with the ICD10 codes.
For the English experiments, we used a manual approach to build the list of
expanded acronyms and an automatic approach that gathers acronym from the
Web. For the French experiments, we automatically created a list of expanded
medical acronyms available on Wikipedia and a manual cleaning of the same
list.</p>
        <p>Indeed, the automatically creation of a list of acronyms gathered from the
Web presents some problems:
{ sometimes acronyms have more than one expansion, some of which do not
belong to the medical eld;</p>
        <sec id="sec-2-2-1">
          <title>5 http://rmarkdown.rstudio.com 6 https://github.com/gmdn/CLEF-eHealth-Task-1</title>
          <p>{ some entries contain more than one language, for example English and/or</p>
          <p>French and/or the Latin expanded acronym;
{ some others have some spelling mistakes.</p>
          <p>In order to deal with these issues, we referred to the ICD10 dictionary code list
which contained a list of diseases and causes of death, to other French
dictionaries,7;8 and to some reliable websites.9</p>
          <p>Moreover, we removed the wrong de nitions and the acronym expansions
written in English and in Latin, and we corrected the spelling mistakes
concerning some of the accents (especially on the grapheme &lt;e&gt;) and some typos (e.g.
"isoniazide" instead of "izoniazide"). Additionally, there were some variants that
di ered only in the hyphen, e.g. broncho-pulmonaire/bronchopulmonaire,
antiagregant plaquettaire/anti-agregant plaquettaire. In these cases, we chose the
de nition present in the ICD10 dictionary and, if both variants were present, we
entered the one that had more occurrences on the Web.
2.3</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>Classi cation</title>
        <p>
          We used a simple unsupervised lexicon based approach to label each (segment
of a) line of a death certi cate [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The procedure to assign an ICD10 code that
does not require any training is the following:
{ for each (segment of a) line compute the score of each entry of the dictionary;
{ group the ICD10 codes that have the maximum score;
{ assign the most frequent code within this group.
        </p>
        <p>
          The score of each entry is the sum of the weights of each term either binary
weighting (term present or absent) or a term frequency - inverse document
frequency (Tf-Idf) approach [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. In those cases where two or more classes have
the same number of entries with the maximum score, the rst class in the list is
assigned by default.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <p>In our experiments, we implemented:
1. a minimal expert system based on rules to translate acronyms, together with
2. a binary weighting approach or a Tf-Idf approach to retrieve the items in
the dictionary most similar to the portion of the certi cate of death, and
3. a lexicon based classi cation approach that selects the most frequent class
with the highest weight.</p>
      <p>We submitted two o cial runs for the English raw dataset. Then, we
submitted 3 uno cial English runs and 8 uno cial French runs (four for the raw
dataset and four for the aligned dataset).
7 Larousse http://www.larousse.fr/dictionnaires/francais-monolingue
8 Le Tresor de la Langue Francaise Informatise http://atilf.atilf.fr/tlfi.htm
9 http://www.cnci.univ-paris5.fr/medecine/abreviations.html,http:
//dictionnaire.doctissimo.fr/
For the two o cial English runs, we pre-processed the raw dataset in the
following way:
1. Read the rst three elds of the American dictionary (DiagnosisText, Icd1,
Icd2, Icd3) and skip lines from 69328 to 69332 since there were some problems
with the data format as shown below
...</p>
      <p>LATE EFFECTS TRAUMATIC DUODENAL HEMATOMA;CTS TRAUMATIC ...
LATE EFFECTS TRAUMATIC DUODENUM HEMORRHAGE;FECTS TRAUMATIC ...
LATE EFFECTS TRAUMATIC ELBOW HEMATOMA; TRAUMATIC ELBOW HEMORRHAGE; ...
LATE EFFECTS TRAUMATIC EMPHYSEMATOUS BULLOUS DISEASE;;
LATE EFFECTS TRAUMATIC EMPHYSEMATOUS LUNG BLEB;
...
2. Index the dictionary using either binary weights or Tf-Idf weights;
3. Build a test run by reading (and cleaning) the causes brutes le and
{ split the sentence according to the following set of patterns: \with", \due
to", \also due to", \that caused", \sec to", \on top of",
{ expand each acronym using a table of manually curated acronyms,
4. classify each line by assigning the ICD code with the highest score, if one,
or the most frequent code if more than a code matches the line of the death
certi cate.</p>
      <p>The expansion of the acronym was done by manually checking the acronyms
in the training data and building a table of expanded acronyms by means of the
Web page https://www.allacronyms.com/_medical.</p>
      <p>The results of the two runs, Unipd-run1 for the binary weighting approach
and Unipd-run2 for the Tf-Idf weighing approach are reported in Table 2.</p>
      <p>The results of the binary weighting run was very close to our expectations,
that is to classify correctly almost half of the ICD10 codes (both in terms of
Recall and Precision) by just cleaning and normalizing the data without the
help of any expert of the eld.</p>
      <p>The poor result of the Tf-Idf weighting approach on the second run was
unexpected. For this reason, we investigated this matter and, thanks to the
reproducibility approach, we were able to immediately spot two bugs in the
code: 1) we unintentionally selected the Tf weights instead of TfIdf during the
indexing phase, 2) more importantly, we made a mistake in the classi cation
code (step 4 in the above list) that prevented the algorithm to select the most
frequent code (it just assigned the rst ICD code in the initial list of results).
For this reason, we decided to correct the code and submit a second version of
Tf-Idf as an uno cial run.
3.2</p>
      <sec id="sec-3-1">
        <title>Uno cial</title>
        <p>We also submitted uno cial runs both for French and English with the same
original goal but a slightly di erent approach for the collection of acronyms and
the use of transliteration of French diacritics. In particular, we were interested in
automatically gathering medical acronyms from a Wikipedia page and manually
cleaning the table of expanded acronyms (for example, duplicated entries, both
English and French version, wrong diacritics, and so on).</p>
        <p>For the expansion of French acronyms, we used the Wikipedia page \Liste
d'abreviations en medecine"10 that contains 1,059 acronyms. After a manual
cleaning of the broken/missing/duplicated entries, we produced a table of 1,179
expanded acronyms.</p>
        <p>The increase in the number of acronyms is due to the fact that for the same
acronym there were several solutions relevant to the medical eld. Indeed, we
decided to place each variant in a di erent row with the aim of providing a more
complete overview of medical terminology. Furthermore, we applied the same
procedure when two acronyms corresponded to the same expansion by keeping
both alternatives and positioning them in di erent rows. Finally, we decided to
remove the acronym expansions that were not relevant to the medical eld.</p>
        <p>For the expansion of the English acronyms, we decided not to use the
English Wikipedia list of medical abbreviation page since it is much less informative
compared to the French version. Instead, we chose a public Web page that
contains 445 common medical abbreviations.11 For the English uno cial runs, we
did not perform any manual corrections of the table of expanded acronyms.
10 https://fr.wikipedia.org/wiki/Liste_d\%27abr\'eviations_en_m\'edecine
11 http://www.spinalcord.org/resource-center/askus/index.php?pg=kb.page&amp;
id=1413</p>
        <p>The results for the uno cial English runs are reported in Table 3. The rst
half of the table shows the results of the uno cial runs, while the second half
reports the o cial results for comparison.</p>
        <p>French Run Results For the French dataset, we had to lightly change the
code that read the aligned and the raw causes since some lines (less than 1%
of the data) had some issues with the number of elds (more than expected)
and/or contained a semicolon in the death certi cate (being the semicolon the
separating characters of the elds). See the les available for the reproducibility
track for more details.</p>
        <p>A total of sixteen uno cial French runs were submitted: eight for the raw
dataset, eight for the aligned dataset. For each type of dataset we tried the
following settings:
{ Unipd-run6 (raw), Unipd-run14 (aligned): binary weights, automatic
creation of expanded acronyms, without transliteration of diacritics;
{ Unipd-run7 (raw), Unipd-run15 (aligned): binary weights, automatic
creation of expanded acronyms, with transliteration of diacritics;
{ Unipd-run8 (raw), Unipd-run16 (aligned): binary weights, manually
curated expanded acronyms, without transliteration of diacritics;
{ Unipd-run9 (raw), Unipd-run17 (aligned): binary weights, manually
curated expanded acronyms, with transliteration of diacritics;
{ Unipd-run10 (raw), Unipd-run18 (aligned): Tf-idf weights, automatic
creation of expanded acronyms, without transliteration of diacritics;
{ Unipd-run11 (raw), Unipd-run19 (aligned): Tf-idf weights, automatic
creation of expanded acronyms, with transliteration of diacritics;
{ Unipd-run12 (raw), Unipd-run20 (aligned): Tf-idf weights, manually
curated expanded acronyms, without transliteration of diacritics;
{ Unipd-run13 (raw), Unipd-run21 (aligned): Tf-idf weights, manually
curated expanded acronyms, with transliteration of diacritics.</p>
        <p>The results for the uno cial French runs are reported in Table 4.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Final remarks and Future Work</title>
      <p>The aim of our participation was to implement a reproducible lexicon based
classi er that can be used as a baseline for further experiments. The performance
was su ciently good and in some cases the classi er achieved a classi cation
performance above 50% both for Recall and Precision which was our initial
ideal threshold as a baseline.</p>
      <p>Moreover, the preliminary results of the experiments (o cial and uno cial)
have shown interesting di erences between the English and French dataset:
{ Tf-Idf works better for English while binary weighting performs consistently
better for the French dataset;
{ For the expansion of the acronym there seems to be a trade-o between
manual curation of data and quantity of data gathered from the Web; a lot of
noisy data is comparable to a small curated set (see for example Unipd-run3
and Unipd-run5). With lots of data, a round of manual curation allows for
small (if not negligible) improvements in terms of accuracy of classi cation;
{ for the French dataset, the normalization of diacritics was a key factor that
led to improvements of 10 points percent over the non-normalized version.</p>
      <p>Before turning to a more complex system (based on a machine learning
approach), we will investigate other forms of data cleaning. In particular, we want
to investigate better the problem with diacritics and include an automatic
correction of wrong spellings of words (very frequent in the dataset) based, for
example, on the Hamming distance among the words of the ICD10 codes.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Kevin</surname>
            <given-names>B Cohen</given-names>
          </string-name>
          , Jingbo Xia, Christophe Roeder, and Lawrence Hunter.
          <article-title>Reproducibility in natural language processing: A case study of two r libraries for mining pubmed/medline</article-title>
          . In
          <source>In LREC 4REAL Workshop: Workshop on Research Results Reproducibility and Resources Citation in Science and Technology of Language</source>
          , pages
          <volume>6</volume>
          {
          <fpage>12</fpage>
          .
          <string-name>
            <surname>European Language Resources Association</surname>
          </string-name>
          (ELRA),
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Mohamed</given-names>
            <surname>Dermouche</surname>
          </string-name>
          , Vincent Looten, Remi Flicoteaux, Sylvie Chevret, Julien Velcin, and
          <string-name>
            <given-names>Namik</given-names>
            <surname>Taright</surname>
          </string-name>
          .
          <article-title>ECSTRA-INSERM @ CLEF ehealth2016-task 2: ICD10 code extraction from death certi cates</article-title>
          .
          <source>In Working Notes of CLEF</source>
          <year>2016</year>
          <article-title>- Conference and Labs of the Evaluation forum</article-title>
          , Evora, Portugal,
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          September,
          <year>2016</year>
          ., pages
          <volume>61</volume>
          {
          <fpage>68</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Eisenstein</surname>
          </string-name>
          .
          <article-title>Unsupervised learning for lexicon-based classi cation</article-title>
          .
          <source>In Proceedings of the Thirty-First AAAI Conference on Arti cial Intelligence, February 4-9</source>
          ,
          <year>2017</year>
          , San Francisco, California, USA., pages
          <volume>3188</volume>
          {
          <fpage>3194</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Ferro</surname>
          </string-name>
          .
          <article-title>Reproducibility challenges in information retrieval evaluation</article-title>
          .
          <source>J. Data and Information Quality</source>
          ,
          <volume>8</volume>
          (
          <issue>2</issue>
          ):8:
          <issue>1</issue>
          {
          <issue>8</issue>
          :4,
          <string-name>
            <surname>January</surname>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Ferro</surname>
          </string-name>
          , Fabio Crestani,
          <string-name>
            <surname>Marie-Francine</surname>
            <given-names>Moens</given-names>
          </string-name>
          , Josiane Mothe, Fabrizio Silvestri, Giorgio Maria Di Nunzio, Claudia Hau , and Gianmaria Silvello, editors.
          <source>Advances in Information Retrieval - 38th European Conference on IR Research</source>
          , ECIR
          <year>2016</year>
          , Padua, Italy, March
          <volume>20</volume>
          -23,
          <year>2016</year>
          . Proceedings, volume
          <volume>9626</volume>
          of Lecture Notes in Computer Science. Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Ferro</surname>
          </string-name>
          , Norbert Fuhr, Kalervo Jarvelin, Noriko Kando, Matthias Lippold, and
          <string-name>
            <given-names>Justin</given-names>
            <surname>Zobel</surname>
          </string-name>
          .
          <article-title>Increasing reproducibility in ir: Findings from the dagstuhl seminar on "reproducibility of data-oriented experiments in e-science"</article-title>
          .
          <source>SIGIR Forum</source>
          ,
          <volume>50</volume>
          (
          <issue>1</issue>
          ):
          <volume>68</volume>
          {
          <fpage>82</fpage>
          ,
          <year>2016</year>
          . http://sigir.org/ les/forum/2016J/p068.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Gandrud</surname>
          </string-name>
          . Reproducible Research with
          <string-name>
            <surname>R</surname>
          </string-name>
          and R Studio. Chapman and Hall/CRC, second ed. edition,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Liadh Kelly, Hanna Suominen, Aurelie Neveol, Aude Robert, Evangelos Kanoulas, Rene Spijker, Joa~o Palotti, and Guido Zuccon, editors.
          <source>CLEF 2017</source>
          eHealth
          <article-title>Evaluation Lab Overview</article-title>
          .
          <source>CLEF 2017 - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science</source>
          . Springer,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Liadh</given-names>
            <surname>Kelly</surname>
          </string-name>
          , Lorraine Goeuriot, Hanna Suominen, Aurelie Neveol, Joa~o
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Palotti</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Guido</given-names>
            <surname>Zuccon</surname>
          </string-name>
          .
          <article-title>Overview of the CLEF ehealth evaluation lab 2016</article-title>
          .
          <article-title>In Experimental IR Meets Multilinguality</article-title>
          , Multimodality, and Interaction - 7th
          <source>International Conference of the CLEF Association, CLEF</source>
          <year>2016</year>
          , Evora, Portugal, September 5-
          <issue>8</issue>
          ,
          <year>2016</year>
          , Proceedings, pages
          <volume>255</volume>
          {
          <fpage>266</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Christopher D. Manning</surname>
          </string-name>
          , Prabhakar Raghavan, and
          <article-title>Hinrich Schutze. Scoring, term weighting, and the vector space model</article-title>
          . In Introduction to Information Retrieval, pages
          <volume>100</volume>
          {
          <fpage>123</fpage>
          . Cambridge,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Aurelie</surname>
            <given-names>Neveol</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert N. Anderson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Bretonnel Cohen</surname>
            , Cyril Grouin, Thomas Lavergne, Gregoire Rey, Aude Robert, Claire Rondet, and
            <given-names>Pierre</given-names>
          </string-name>
          <string-name>
            <surname>Zweigenbaum</surname>
          </string-name>
          .
          <article-title>Clef ehealth 2017 multilingual information extraction task overview: Icd10 coding of death certi cates in english and french</article-title>
          .
          <source>In CLEF 2017 Evaluation Labs and Workshop: Online Working Notes, CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Aurelie</surname>
            <given-names>Neveol</given-names>
          </string-name>
          , Kevin Cohen, Cyril Grouin, and
          <string-name>
            <given-names>Aude</given-names>
            <surname>Robert</surname>
          </string-name>
          .
          <article-title>Replicability of research in biomedical natural language processing: a pilot evaluation for a coding task</article-title>
          .
          <source>In Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis</source>
          , pages
          <volume>78</volume>
          {
          <fpage>84</fpage>
          ,
          <string-name>
            <surname>Auxtin</surname>
            ,
            <given-names>TX</given-names>
          </string-name>
          ,
          <year>November 2016</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>