<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>ECSTRA-APHP @ CLEF eHealth2018-task 1: ICD10 Code Extraction from Death Certi cates</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Remi Flicoteaux</string-name>
          <email>remi.flicoteaux@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>INSERM, U1153 Epidemiology and Biostatistics Sorbonne Paris Cit Research Center (CRESS), ECSTRA team</institution>
          ,
          <addr-line>Paris, F-75010</addr-line>
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper describes the participation of ECSTRA-APHP team at CLEF eHealth 2018, task 1. The task involved extracting ICD10 codes from death certi cates, mainly described with short plain texts. We casted the task as a machine learning problem the prediction of the ICD-10 codes (categorical variable) from the raw text transformed into word embeddings. We relied on probabilistic convolutional neural network for classi cation. Due to inbalanced representation of the ICD codes, we completed the prediction with dictionary-based lexical matching classi er for cases were there was less than 1,000 documents per code. Our best F1-score was 80.0% on a test set and 69.1% on the validate set (gold standard delivered by the organizers at the end of the challenge). This was the rst time convolutional neural net were used for this multilabel classi cation task. The performance of our models were under the best neural predictor (recurrent network) described last year on the same task at CLEF eHealth (F1-score around 85%).</p>
      </abstract>
      <kwd-group>
        <kwd>ICD-10 coding</kwd>
        <kwd>ICD-10 codes</kwd>
        <kwd>cause of death extraction</kwd>
        <kwd>convolutional neural network</kwd>
        <kwd>lexical matching</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Completing death certi cates is a routine task in hospitals and healthcare
institutions. In France, the death certi cates are produced by physicians and
transmitted to the French Epidemiological Center for the Causes of Death (CepiDC)
1. Beyond the administrative and personal information, the death certi cates
usually contain a free-text description of the cause(s) of death. Free texts are
converted by the CepiDC into formal standardized codes to be processed for
statistical purposes. Like many countries, the the World Health Organisation
(WHO) International Classi cation of Diseases (ICD) taxonomy 2 is used for
this normalized representation. The ICD taxonomy covers a wide range of
diseases, symptoms, signs, and other content related to diseases. The WHO issues
separate versions of ICD per language/country. In this paper, we use the French
release of ICD, which is now at its 10 th revision (called ICD-10). It covers more
1 http://www.cepidc.inserm.fr/
2 http://www.who.int/classi cations/icd/
than 38,000 codes of diagnoses, but only a subset of theses codes can be causes
of death.</p>
      <p>
        Requiring manual work and expertise, the task of ICD-10 code extraction
from text is quite time-consuming because the ICD-10 taxonomy contains
thousands of possible causes of death. Within the CLEF eHealth 2018, the task 1
focuses on the problem of automatic extraction of the causes of death from the
textual description[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Classi cation for health-related text is considered a
special case of multilabel text classi cation which may be approached either from a
machine learning perspective (supervised classi cation) or a Natural Language
Processing (NLP) perspective by using syntactic and/or semantic decision rules.
For this purpose machine learning algorithms have been successfully applied,
i.e. Support Vector Machines, Latent Dirichlet Allocation or neural network.
Both approaches aim at automating the ICD-10 code extraction from death
certi cates. In this paper, we mainly focus on probabilistic Convolution Neural
Network (CNN). Due to unbalanced ICD labels, we enriched prediction with
dictionary-based lexical matching classi er.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methods</title>
      <p>
        CNNs utilize layers with convolving lters that are applied to local features
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Originally invented for computer vision, CNNs models have subsequently
been shown to be e ective for NLP and have achieved excellent results in text
classi cation[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The lter can be seen as sliding over the columns of the
features, performing an element-wise multiplication and summation on the current
overlap, before moving one to the right. For NLP task, the lters dimension on
the rst axis is equal to the one, the resulting vector is only one-dimensional.
One of the key di erentiators between CNNs and traditional machine learning
approaches is the ability for CNNs to learn complex feature representations.
      </p>
      <p>State of the art for feature extraction is to use word vectors models,
embedding, where each word is represented by a single real-valued vector. In this model
sentences are then projected from a sparse vector space of the size of the
vocabulary onto a lower dimensional vector space which encode semantic features of
words in their dimensions. Then semantically close words are likewise close the
new lower dimensional vector space as measured with vector distance operation
like euclidean or cosine distance.</p>
      <p>In the present work the weights of words vectors are jointly learned as the rst
hidden layer of the classi cation itself, and we train a CNN that uses multiple
lters (with varying window sizes) to obtain multiple features on top of word
vectors. For this multi-label classi cation task, the prediction layer (last one)
has the size of the number of distinct labels (entries in the ICD). Here we only
focuses on codes that have already been used. Finally, the prediction is a vector
of real numbers which are equivalent to a probability upon each ICD label (but
the all vector do not sum to one). A grid search was perform to determine the
threshold among which a code was chosen.</p>
      <p>The dictionary-based lexical matching classi er rely on word recognition from
a knowledge base build from several available dictionaries on the French ICD-10
classi cation : second volume of ICD, orphanet thesaurus, French SNOMED CT,
and CepiDC dictionaries that were provided for the challenge.</p>
      <p>From the detection in the text of entries of the index (i.e. words) ranking
scores are usually computed individually for each concept mention. For this
purpose we used a very simple score base on the probability of a code associated
to a word and the number of words recognized in the text :
score =
pcodejword
nmatch</p>
      <p>We used this approach to predict rare codes (represented by less than 1,000
lines), so we choose only one code per statement. Our main idea was to improve
prediction of the CNN classi er, so we use this result to add a weight on the
output vector of CNN predictor for the selected code. A grid search was performed
to decide the size of this weight.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Dataset</title>
      <p>
        The CepiDC corpus has been created by the French Center for Epidemiology
and Medical Causes of Death (CepiDC) speci cally for the CLEF eHealth task
1 [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ]. It is composed of separate train/test samples of death certi cates. Only
the textual description of the causes of death are available for analysis. CepiDC
dataset is highly imbalanced: about 80% of documents are assigned to less than
20% of codes.
      </p>
      <p>The task was de ned at the level of each statement (line) in a death certi
cate: one statement could be associated with 0, 1 or more ICD-10 codes which
represent causes of death at various levels in the causal chain which led to the
death. Each line was tagged with 0 code (n=7,598 - 2.5%), 1 code (n=238,929
78.5%), 2 codes (n=40,572 - 13.38%) up to 14 codes codes (n=1). The dataset
included 70656 lines, it was divided into train and test set (25 000 lines). A
validation set was also provided at the end of the challenge, with 70,656 lines.
On the validation set, there was 1,431 (2.0%) lines without code, 51,383 (72.7%)
with one code, 11,981 with 2 codes (16.9%) and the maximum number of code
by line was 16 (n=1).</p>
      <p>We remove stopwords and numerics. After an homemade spelling correction
algorithm based on Levenstein distance, only words from the knoweledge base
were sustain for classi cation. The preprocessed text vectorize in tokens are used
as input for CNN and dictionary-based lexical matching.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Results</title>
      <p>We rst build the knowledge base from the available thesaurus (cf methods) and
the 2015 (most recent) CepiDC dictionaries that was provided with for the task.
At the end we had a 216,110 entries ICD thesaurus, were entries goes from 1 to
314 words. An word/code index was then build with 19,606 distinct entries.</p>
      <p>For statistical approach with convolusional network, we use words
embeddings features that were tted on the data together with classi cation task, and
no external weights were used. Models were tted with and without adding data
from knowledge base. Our best prediction gives a F1-score to 79.6% on the test
set. We performed two non o cial runs at the challenge, and on the validation
set our best F1-score was of 69.1%.</p>
      <p>We studied also performance at the code level given the level of document
for each class. Result are presented table 2.</p>
      <p>The results on the badly represented classes were expected, and indeed a
signi cant reduction in e ciency was recorded below 1000 representatives per
class.
Freq codes Prop. true positives Prop. false negatives
[1000; inf [ 84.4% 15.6%
[500; 1000[ 75.0% 25.0%
[200; 5000[ 67.7% 32.3%
[0; 200[ 28.5% 71.5%</p>
      <p>We use the second classi er based on word recognition, which did not improve
performance on the global criteria on the test set, but which allowed to gain on
the F1-score on the very low represented categories.</p>
      <p>Freq codes Prop. true positives Prop. false negatives
[1000; inf [ 85.3% 14.7%
[500; 1000[ 76.0% 24.0%
[200; 5000[ 68.5% 31.5%
[0; 200[ 34.2% 65.8%</p>
      <p>Finally, we also looked at the performances of the models on lines which
were labeled with 0 ICD code. On the validation set, the performance of our
nal predictor for theses lines was very poor : precision = 19.7%, recall=5.2%
and F1-score = 8.2%.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Discussion</title>
      <p>
        The problem of ICD code extraction has been investigated from a larger
perspective involving code assignment to various types of medical documents. The
CLEF eHealth is one of the only international conferences to propose a speci c
task on this subject each year, which makes it possible to have a particularly
interesting follow-up of the methods and their performances. Mainly two
parallel approaches are often developed statistic and entity recognition, and in case
the compilation of both. The convolutive networks have shown their e
ectiveness for automatic classi cation of documents, particularly medical documents
[
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. To our knowledge this is the rst time they are used in the CLEF eHealth
Challenge. We observe a 10% di erence on the model performance from test to
validation data. This could be due rather to over tting on training data and to
the fact that our test set might not represent the true distribution of data and
labels.
      </p>
      <p>
        Other neural net architectures are used for text classi cation. Recurrent
networks have becoming more popular in the NLP domain and seems to outperform
performance of CNN. Miftahutdinov et al. report the use of RNN in the CLEF
eHealth 2017 with success and obtained F-measure of 85% [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. New character
based approach which allowed a huge reduction of time for features engineering
seems to very promising also for their performances[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and will be to
investigate for this speci c task. But it was expected that classi er performance would
be lower for the less well represented classes, an issue that will be challenging
for every machine learning algorithm. Especially for rare diseases or documents
reporting unusual situations, knowledge-based methods will be powerful
complements.
      </p>
      <p>
        Various methods have been explored in this area. In 2016 Mulligen et al.
obtained the best results by combining a Solr tagger with ICD-10 terminologies
at the CLEF eHealth. The terminologies were derived from the task training
set and a manually curated ICD-10 dictionary. They achieved F-measure of
84.8%. Moreover the contribution of mixed methods makes perfect sense. Our
dictionary-based lexical matching was too simplistic and only marginally
improved the performance of CNN classi er even if the gain in the less well
represented categories is interesting. Zwegembaum et al reported a similar combined
approach and very promising results [
        <xref ref-type="bibr" rid="ref7 ref8">7, 8</xref>
        ].
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>We have studied CNN performance for multi-label classi cation in ICD-10 of
death certi cates. In order to take into account the low performance of machine
learning methods for situations where data are reliably represented, we combine
a dictionary based method with small improvement of performance on the most
rare situations.
Extraction task Overview: ICD10 Coding of Death Certi cates in French,
Hungarian and Italian. In: CLEF 2018 Evaluation Labs and Workshop: Online Working
Notes, CEUR-WS, September, 2018.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kanoulas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Azzopardi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spijker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nvol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramadier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Palotti</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuccon</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Overview of the CLEF eHealth Evaluation Lab 2018</article-title>
          .
          <article-title>CLEF 2018</article-title>
          .
          <source>In: 8th Conference and Labs of the Evaluation Forum, Lecture Notes in Computer Science (LNCS)</source>
          , Springer,
          <year>September 2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classi cation</article-title>
          .
          <source>In:arXiv preprint</source>
          <year>2014</year>
          . arXiv:
          <volume>1408</volume>
          .
          <fpage>5882</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hughes</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suzumura</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Medical text classi cation using convolutional neural networks</article-title>
          .
          <source>In: Stud Health Technol Inform</source>
          .
          <year>2017</year>
          ;
          <volume>235</volume>
          :
          <fpage>246250</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>LeCun</surname>
          </string-name>
          , Y.,
          <string-name>
            <surname>Bottou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ha</surname>
            <given-names>ner</given-names>
          </string-name>
          , P.:
          <article-title>Gradient-based learning applied to document recognition</article-title>
          .
          <source>In Proceedings of the IEEE</source>
          ,
          <volume>86</volume>
          (
          <issue>11</issue>
          ):
          <fpage>22782324</fpage>
          ,
          <string-name>
            <surname>November</surname>
          </string-name>
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Miftakhutdinov
          <string-name>
            <given-names>Z.</given-names>
            ,
            <surname>Tutubalina</surname>
          </string-name>
          <string-name>
            <surname>E.</surname>
          </string-name>
          :
          <article-title>KFU at CLEF eHealth 2017 Task 1: ICD-10 Coding of English Death Certi cates with Recurrent Neural Networks</article-title>
          .
          <source>In : CLEF 2017 Online Working Notes. CEUR-WS</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruder</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Universal Language Model Fine-tuning for Text Classi cation</article-title>
          . In:arXiv preprint
          <year>2018</year>
          .arXiv:
          <year>1801</year>
          .
          <article-title>06146 [cs</article-title>
          .CL]
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavergne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Multiple methods for multi-class, multi-label ICD10 coding of multi-granularity, multilingual death certi cates</article-title>
          .
          <source>In : CLEF 2017 Evaluation Labs and Workshop: Online Working Notes. CEUR Workshop Proceedings</source>
          , Dublin, Ireland,
          <year>September 2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavergne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Hybrid methods for ICD-10 coding of death certi - cates</article-title>
          .
          <source>In: Seventh International Workshop on Health Text Mining and Information Analysis</source>
          , pages
          <fpage>96</fpage>
          -
          <lpage>105</lpage>
          , Austin, Texas, USA,
          <year>November 2016</year>
          .
          <source>EMNLP</source>
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Nvol</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Robert</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grippo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lavergne</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morgand</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Orsi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelikn</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramadier</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rey</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zweigenbaum</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : CLEF eHealth 2018 Multilingual Information
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>