<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classi cation of ICD10 Codes with no Resources but Reproducible Code. IMS Unipd at CLEF eHealth Task 1</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <email>giorgiomaria.dinunzio@unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Information Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Padua</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we describe the second participation of the Information Management Systems (IMS) group at CLEF eHealth 2018 Task 1. In this task, participants are required to extract causes of death from multilingual death reports (French, Hungarian and Italian) and label them with the correct International Classi cation Diseases (ICD10) code. We tackled this task by focusing on the reproducible code, that we published last year, which produces a clean dataset that can be used to implement more sophisticated approaches.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In this paper, we report the experimental results of the second participation of
the IMS group to the CLEF eHealth Lab [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], in particular to Task 1:
\Multilingual Information Extraction - ICD10 coding" [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This task consists in
automatically labelling death certi cates written in di erent languages (French,
Hungarian, and Italian) with International Classi cation Diseases (ICD10) codes.
      </p>
      <p>
        The main goal of our participation to the task this year was to test the
e ectiveness of the reproducible code made available by [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which builds a
classi cation system that i) converts raw data into a cleaned dataset following a
`tidyverse' approach1, ii) implements a set of manual rules to split sentences and
translate medical acronyms, and iii) implement a lexicon based classi cation
approach [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>The contribution of our experiments to this task can be summarized as
follows:
{ A study of a reproducibility framework to explain each step of the pipeline
from raw data to cleaned data;
{ An evaluation of the application of a classi cation system prepared for a
language (French) and applied without any additional training or changes
to the source code to two di erent languages (Hungarian and Italian).</p>
      <p>We submitted three o cial runs, one for each language and prepared a
number of additional uno cial runs that we will evaluate and compare in order to
study the change in performance when adding more information in the pipeline.
1 https://www.tidyverse.org</p>
      <p>French
avec
sur
par
suite a un[e]
dans un contexte de</p>
      <p>apres
\,", \;", \/"
2</p>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <p>
        In this section, we summarize the pipeline used in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that has been reproduced
in this work for each run.
2.1
      </p>
      <sec id="sec-2-1">
        <title>Pipeline for Data Cleaning</title>
        <p>In order to produce a clean dataset, we followed the same pipeline for data
ingestion and preparation for all the experiments:
{ read a line of a death certi cate,
{ split the line according to the expression listed in Table 1;
{ remove extra white space (leading, trailing, internal);
{ transform letters to lower case;
{ remove punctuation;
{ expand acronyms (if any);
{ correct common patterns (if any).</p>
        <p>
          Acronym Expansion Acronym expansion is a crucial step to normalize data
and make the death certi cate clearer and more coherent with the ICD10 codes.
For the French experiments, we used. the original list of 1179 acronyms prepared
by a semi-automated approach by [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>We show the rst ten acronym expansions in Table 2. We want to stress the
fact that this particular implementation of the expansion selects, in those cases
where there is more than once choice (for example \aa"), only the rst choice.
This is part of our current work in order to improve this step of the pipeline.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Classi cation</title>
        <p>
          We used a simple unsupervised lexicon based approach to label each (segment
of a) line of a death certi cate [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The procedure to assign an ICD10 code that
does not require any training is the following:
        </p>
        <p>acronym expansion
5-hiaa acide 5-hydroxyindolactique
5-ht 5-hydroxytryptamine
5-ht srotonine
a1at alpha-1-antitrypsine
a1at a1-antitrypsine
aa aorte ascendante
aa a ection actuelle
aa acide amin
aa antiarthrosique
aaa anvrisme de l'aorte abdominale
step data
line pneumopathie infectieuse lobaire inferieure droite
terms pneumopathie, infectieuse, lobaire, inferieure, droite</p>
        <p>ICD10 scores J181 = 7, J13 = 1
{ for each term in the (segment of a) line, sum one for each ICD10 label that
contains the term,
{ for each (segment of a) line compute the score of each ICD10 label;
{ group the ICD10 labels that have the maximum score;
{ assign the most frequent code within this group.</p>
        <p>The score of each label is the sum of the binary weights. In those cases where
two or more classes have the same number of entries with the maximum score,
the rst class in the list is assigned by default. This is another part of the pipeline
that requires more e ort in order to improve the e ectiveness of the classi er.
In Table 3, we show an example of the rst three steps, while in Table 4 the
de nition of the ICD10 codes that received the highest score.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments and Results</title>
      <p>
        We submitted three o cial runs, one for each language: French, Hungarian, and
Italian. The idea of these experiments was to test the e ectiveness of the original
French ICD10 classi er on two new languages without any modi cation to the
source code. That is, acronym expansion and sentence splitting are done using
French resources. We used only the raw dataset for all the languages.
ICD10 de nition
J13 pneumopathie franche lobaire inferieure
J181 pneumopathie commune lobaire inferieure
J181 pneumopathie infectieuse lobaire aigue
J181 pneumopathie infectieuse lobaire moyenne
J181 pneumopathie infectieuse lobaire superieure
J181 pneumopathie lobaire inferieure
J181 pneumopathie lobaire inferieure aigue
J181 pneumopathie lobaire inferieure bilaterale
The results of the three experiments are shown in Table 5. The French run
performed su ciently well, and comparable to the results presented in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The
F1 measure is close to the average of the results of all the participants in this
task. This con rms that a solid clean dataset is a good starting point to build a
classi er, even a simple classi er like the one we implemented.
      </p>
      <p>The Hungarian and Italian results are, as we expected, worse than the average
scores (much worse for Italian). However, it seems that the Hungarian dataset
was in a sense \easier" compared to the our results of our experiments in the
Italian subtask. We are going to investigate the reasons for this large di erence
in performance as future work. Another interesting fact is that, while for the
French task Precision was much higher than Recall, for the Hungarian and Italian
dataset these two measures seem more \balanced". This may suggest that a
better acronym expansion and better sentence splitting may favour Precision
over Recall.
3.2</p>
      <p>Uno</p>
      <p>cial Runs
As part of current and future work, we have prepared a set of uno cial runs. A
rst set of runs study the e ect of an alternative weighting scheme, tf-idf instead
of binary weighting, another set of runs (for Hungarian and Italian) explore the
e ectiveness of splitting the sentence with the correct words, see Table 6, as well
as expand acronym with the appropriate language. More runs will be created
with additional parameters concerning the multiple label assignment and a better
acronym expansion algorithm.</p>
      <p>
        At present time, we have been able to evaluate the e ectiveness of some
combinations of these parameters. In particular, we tested the binary weighting
approach vs the tf-idf approach, using the original French source code
(`inappropriate' acronyms and sentence splitting), results are shown in the rst two
lines of Table 7. These results con rms that for Hungarian and Italian the binary
weighting approach performs better than tf-idf (the only language that showed
some improvement in this task with the tf-idf weights was English [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ])
      </p>
      <p>Then, we performed an experiment with binary weights and a `correct'
sentence splitting (see Table 6) with or without the French acronym expansion.
Results are shown in the last two rows of Table 7. The fact that we used a
language speci c sentence splitting did not produce any signi cant change in the
performance of the classi er. This is probably due to the fact that the Hungarian
and Italian death certi cates are much more structured (from a language
standpoint) than French ones. For example, we could rarely nd complex sentences
with words or terms listed in Table 6 in the Italian certi cates. It seems that
punctuation marks work su ciently well for these two languages. Moreover, by
removing the French acronym expansion, we obtained a slight improvement due
to the fact that we removed the noise introduced by a module in the pipeline (the
acronym expansion). In this case, results are better in terms of both Precision
and Recall compared to the o cial runs.</p>
    </sec>
    <sec id="sec-4">
      <title>Final remarks and Future Work</title>
      <p>The aim of our second participation to the CLEF eHealth Task 1 was to test
the reproducibility of the source code of the lexicon based classi er that was
implemented the previous year. The performance of the French run was good
and we consider to use it as a baseline to build a new and improved classi er.
The application of this classi er to two di erent language gave interesting results:
the results of the Hungarian run was surprisingly high and close to the average
of the results of the participant. However, the high value of the median of F1
(close to 90%) suggests that this subtask may be easier than the French one. For
the Italian run, we obtained a worse performance the reasons of which we will
investigate in a failure analysis.</p>
      <p>
        As current and future work, we are studying
{ the adaptation of the pipeline to the two new languages (better split sentence
and acronym expansion [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]);
{ the possibility to include multiple acronym expansions;
{ how to assign multiple labels to the same line (when scores are tied).
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Eisenstein</surname>
          </string-name>
          .
          <article-title>Unsupervised learning for lexicon-based classi cation</article-title>
          .
          <source>In Proceedings of the Thirty-First AAAI Conference on Arti cial Intelligence, February 4-9</source>
          ,
          <year>2017</year>
          , San Francisco, California, USA., pages
          <volume>3188</volume>
          {
          <fpage>3194</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>A.</given-names>
            <surname>Neveol</surname>
          </string-name>
          , Robert A.,
          <string-name>
            <given-names>F.</given-names>
            <surname>Grippo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Morgand</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Orsi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pelikan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ramadier</surname>
          </string-name>
          , G. Rey, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Zweigenbaum</surname>
          </string-name>
          .
          <article-title>Clef ehealth 2018 multilingual information extraction task overview: Icd10 coding of death certi cates in french, hungarian and italian</article-title>
          .
          <source>In CLEF 2018 Evaluation Labs and Workshop: Online Working Notes. CEUR-WS.org</source>
          ,
          <year>September 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Federica Beghini, Federica Vezzani, and
          <string-name>
            <given-names>Genevieve</given-names>
            <surname>Henrot</surname>
          </string-name>
          .
          <article-title>A lexicon based approach to classi cation of ICD10 codes. IMS unipd at CLEF ehealth task 1</article-title>
          .
          <source>In Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum</source>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          .,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Borbala</given-names>
            <surname>Siklosi</surname>
          </string-name>
          and
          <string-name>
            <given-names>Attila</given-names>
            <surname>Novak</surname>
          </string-name>
          .
          <article-title>Detection and expansion of abbreviations in hungarian clinical notes</article-title>
          . In Felix Castro, Alexander Gelbukh, and Miguel Gonzalez, editors,
          <source>Advances in Arti cial Intelligence and Its Applications</source>
          , pages
          <volume>318</volume>
          {
          <fpage>328</fpage>
          , Berlin, Heidelberg,
          <year>2013</year>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Hanna</given-names>
            <surname>Suominen</surname>
          </string-name>
          , Liadh Kelly, Lorraine Goeuriot, Evangelos Kanoulas, Leif Azzopardi, Rene Spijker,
          <string-name>
            <given-names>Dan</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Aurelie</given-names>
            <surname>Neveol</surname>
          </string-name>
          , Lionel Ramadier, Aude Robert, Guido Zuccon, and Joao Palotti, editors.
          <source>Overview of the CLEF eHealth Evaluation Lab</source>
          <year>2018</year>
          .
          <source>CLEF 2018 - 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science</source>
          . Springer,
          <year>September 2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>