<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Classifying clinical case studies with ICD-10 at Codiesp CLEF eHealth 2020 Task 1-Diagnostics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paula Queipo-Alvarez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Israel Gonzalez-Carrasco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer science Department, Universidad Carlos III de Madrid</institution>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, the authors describe the approach and results for the participation in Task 1 (multilingual information extraction) of CLEF eHealth 2020. This work addresses the task of automatically assign ICD-10 codes for Diagnostics clinical case studies in Spanish and English. A dictionary-based approach has been used relying on the terminological resource provided by the organization. The system achieved a mean average precision of 0.115 (precision: 0.866, recall: 0.066).</p>
      </abstract>
      <kwd-group>
        <kwd>ICD-10 Classi cation Clinical case studies Named-Entity Recognition Dictionary based</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>1.1</p>
      <sec id="sec-1-1">
        <title>State-of-work</title>
        <p>
          CLEF eHealth has been running these annual evaluation campaigns since 2013 in
the Information retrieval, Information Management and Information Extraction.
For example, in task of multilingual Information Extraction, named entity
recognition (NER), text classi cation and acronym normalization. In 2018 [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], CLEF
organization shared a task to promote automatic clinical coding systems over
death reports in the French language (in 2018) and over german non-technical
summaries (NPTs) of animal experiments (in 2019).
        </p>
      </sec>
      <sec id="sec-1-2">
        <title>Task description</title>
        <p>This task utilizes the International Classi cation of Diseases, 10th revision
(ICD10), which is a terminology resource. The sub-task CodiEsp Diagnosis aims to
assign ICD10-CM codes (CIE10 Diagnostico, in Spanish) to clinical case documents
[7]. This terminology is tree-shaped. Annotated codes are minimum 3-character
long, and the codes with a greater number of characters are more granular. The
organization provided a list of valid codes for this sub-task with their English
and Spanish description, and an annotated corpus as well. The evaluation of
the automatic coding is against manually generated ICD10 codi cations. The
motivation of the task is to determine the most competitive approaches for
coding this type of documents and generating new clinical coding tools for other
languages and data collections.</p>
        <p>Task 1: Multilingual Information Extraction, was built upon information
extraction tasks from previous years[10] [9]. Information Extraction could be
treated as a cascaded named entity recognition with normalization, or a text
classi cation. This task was proposed to explore multilingual approaches, even
though the two languages (English and Spanish) were addressed individually.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Method</title>
      <p>In the following subsections, we describe the corpora, the terminology, the
dictionary based approach, and SpaCy Language Processing Pipelines.</p>
      <p>The system must predict the codes in both languages, English and Spanish.
The dictionary matches the codes with the terms in English and Spanish, and
the translation of the Spanish clinical case studies into English was o ered. In
this case, we recognised entities in each language using the dictionary in both
languages.</p>
      <p>The automatic classi cation with ICD-10 codes it is a multi-class and
multilabel problem. This can also be considered as a Named-Entitiy Recognition
(NER) task.</p>
      <p>
        The code is stored in a Github repositoriy[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
2.1
      </p>
      <sec id="sec-2-1">
        <title>Corpora and Terminologies</title>
        <p>
          The CodiEsp corpus is available in several versions. In this work, the third
version is considered [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The corpus is composed of 1000 clinical case studies
annotated by clinical coding professionals meeting strict quality criteria. The
clinical case studies have been randomly sampled into three subsets: the train,
the development, and the test set. The train set is composed of 500 clinical
cases, the development set has of 250 clinical cases each, the test set includes
250 clinical cases, and the background set is composed of 2,751 clinical cases.
        </p>
        <p>Apart from the text les with all the clinical case studies, the annotations for
train, development and test sets are provided. The annotation has the following
elds: articleID, that refers to the clinical case study, and every ICD10-code
found in the clinical case study.
The organization have provided terminological resources, such as the valid codes
for the task[8]. One of the les contains a list of 71486 CIE10-Diagnosticos terms
(2018 version) with their description in Spanish and English. Diagnostic codes
have the following elds: code, es-description and en-description. To create the
dictionary in Python, it was necessary to map the codes to the references and
nd the frequency of the tags for each code. It is observed that the frequencies
di er and the classes are not balanced.</p>
        <p>Data preprocessing First of all, it is necessary to join all the clinical case
studies in di erent les for train, development and test to work with them easily.
In addition, alphanumeric ICD-10 terms were mapped into di erent numeric
values to avoid errors in the Recognizer. Finally, the clinical case studies were
tokenized with SpaCy.</p>
      </sec>
      <sec id="sec-2-2">
        <title>SpaCy Language Processing Pipelines [4] They have been used in several</title>
        <p>steps. First, to segment text into tokens and produce doc objects that were
processed through the pipeline. Secondly, to assign part-of-speech tags using the
tagger. It also uses a parser to assign dependency labels. Furthermore, it includes
an Entity Recognizer to detect and label named entities.</p>
        <p>In the Figure 1 it is shown the structure of the SpaCy Language Processing
Pipeline.</p>
        <p>In order to use the Entity Recognizer, it is necessary to add Named Entities
metadata to doc objects in SpaCy. First, we loaded the English (or Spanish)
model. To avoid overlapping of entities, we replaced the default NER module
with the dictionary in English (or Spanish). We did this to prevent overlapping
of entities.</p>
        <p>After that, we detected Named Entities over the test and background set,
processing the text and showing its entities.</p>
        <p>
          This procedure has been used with two di erent models:[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] en core web sm
(2.2.5) and es-core-news-sm (2.2.5) in English and Spanish, respectively. These
models include convolutional layers, residual connections, layer normalization
and maxout non-linearity.
        </p>
        <p>
          Fuzzy matching With the library fuzzy-wuzzy, it is possible to adjust the
maximum (-1) and minimum (50) scores allowed to match between terms. Finally,
the predicted terms are saved to evaluate them with the organization script.[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Other approaches</title>
        <p>There are other approximations, such as Semantic rules, Transfer learning,
RNNbased or BERT-based. Multi-lingual information retrieval (MLIR) is the retrieval
of documents in several languages from a query. So it would be interesting to
combine English and Spanish in a model.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Results &amp; Discussion</title>
      <p>In this section, we present the results of one o cial run. This allows to compare
the e ectiveness of the classi ers and study the di erence in failure analysis.
3.1</p>
      <sec id="sec-3-1">
        <title>Experimental setup</title>
        <p>This team submitted predictions for 3001 documents, which includes test les
and background les. However, the submission was processed to include only
predictions for test les (250 documents) which were considered for computing
the metrics.</p>
        <p>
          The evaluation results came from the organization, that used the scripts
distributed as part of the Clinical Cases Coding in Spanish language Track
(CodiEsp)[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. All the experiments were performed without cross-validation.
3.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Test results</title>
        <p>The o cial metric for the subtasks CodiEsp Diagnostic is the mean average
precision (MAP). Other metrics computed over a maximum of 1 are: precision,
recall and mean average precision for a query of 30 elements (MAP@30).</p>
        <p>These metrics are computed taking into account only the predictions for the
codes presented in the train and development sets. This is because there were
codes in the test set that had not been used in the train and validation sets.</p>
        <p>In the code search on a set of clinical case studies, precision is the number
of correct codes divided by the number of all returned codes, while recall is the
number of correct codes divided by the number of codes that should have been
returned.</p>
        <p>Precision value reaches an 86.6%, whereas in recall only a 6.6%. This means
that 86.6% of the codes retrieved were correct, which is impresive due to the
di culty of the taks. Recall measures the fraction of true positives among the
predicted results. A low result means a high number of false negatives or codes
not detected because of the lexical variability. Our dictionary-based approach
was not able to detect the majority of the codes because of a lack of exibility
in the recognition.</p>
        <p>Also, there are three metrics computed for correct categories. These
categories are rst three digits of a CIE10-Diagnostic code. For example, codes
P96.5 and P96.89 pertains to the category P96.</p>
        <p>These results are slightly better than the o cial ones, due to the relaxation
of the codes into categories.</p>
        <p>We can observe a good result in precision, showing that most of the entities
detected where correct, whereas recall results are weak. Also, F-Measure is a
good measure when the class distribution is uneven, which is our case. This is
an essential problem for this dictionary matching approach. In order to increase
the recall, a state-of-the-art approach must be used. Our approach needs to be
improved due to the di culty of the task.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>This working note presents our contribution to Task 1 of CLEF eHealth
competition 2020[7]. The task challenges the automatic assignment of ICD-10 codes
for Diagnostics to clinical case studies.</p>
      <p>At the previous stages, multi-label classi cation was tried, but the model
did not succeed due to the high number of codes, classes unbalance and a few
examples of each class. Then, NER was intended among the clinical case studies
to detect Diagnostics, but the model could not predict the code. Due to the lack
of results of other approaches, a more traditional dictionary-approach was built.
It was able to automatically assign codes to the test set with SpaCy's help.</p>
      <p>Evaluation results highlight that our approach was not good enough to detect
all the entities presents in the clinical case studies due to the variability of the
terms. Lexical variability has impacted recall, that fails to detect terms.</p>
      <p>Some improvements would be semantic rules, reduce the number of
irrelevant codes, a post-processing ltering phase or including more features. Other
upgrades are the treatment of abbreviations and the detection of typos. Also, the
future enhancement could be obtained with new terminological and Linguistic
resources.</p>
      <p>Also, to include deep learning models to infer the categories that have not
been seen. This can use and additional corpus to train the model. Finally, it
is interesting to implement a system that combines English and Spanish into a
multilingual model.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was supported by the Research Program of the Ministry of
Economy and Competitiveness - Government of Spain, (DeepEMR project
TIN201787548-C2-1-R).
3758054, Funded by the Plan de Impulso de las Tecnolog as del Lenguaje (Plan
TL).
7. Miranda-Escalada, A., Gonzalez-Agirre, A., Armengol-Estape, J., Krallinger, M.:
Overview of automatic clinical coding: annotations, guidelines, and solutions for
non-english clinical cases at codiesp track of CLEF eHealth 2020. In: Working
Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop
Proceedings (2020)
8. Miranda-Escalada, A., Krallinger, M.: CodiEsp codes: list of valid CIE10 codes
for the CodiEsp task (Jan 2020). https://doi.org/10.5281/zenodo.3706838, https:
//doi.org/10.5281/zenodo.3706838, Funded by the Plan de Impulso de las
Tecnolog as del Lenguaje (Plan TL).
9. Neveol, A., Anderson, R.N., Cohen, K.B., Grouin, C., Lavergne, T., Rey, G.,
Robert, A., Rondet, C., Zweigenbaum, P.: CLEF eHealth 2017 Multilingual
Information Extraction task overview: ICD10 coding of death certi cates in English
and French. Tech. rep., https://www.cdc.gov/
10. Neveol, A., Cohen, K.B., Grouin, C., Hamon, T., Lavergne, T., Kelly, L.,
Goeuriot, L., Rey, G., Robert, A., Tannier, X., Zweigenbaum, P.: Clinical
Information Extraction at the CLEF eHealth Evaluation lab 2016. Tech. rep., http:
//quaerofrenchmed.limsi.fr/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. CodiEsp, https://temu.bsc.es/codiesp/</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. GitHub - pqueipo/Codiesp-CLEF-2020
          <string-name>
            <surname>-</surname>
          </string-name>
          eHealth-Task1:
          <article-title>From Overview of automatic clinical coding: annotations, guidelines, and solutions for non-english clinical cases at codiesp track of CLEF eHealth 2020</article-title>
          .In: Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          .CEUR Workshop Proceedings (
          <year>2020</year>
          ), https://github.com/pqueipo/Codiesp-CLEF-2020
          <string-name>
            <surname>-</surname>
          </string-name>
          eHealth-Task1
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>GitHub -</surname>
          </string-name>
          TeMU-BSC/
          <article-title>CodiEsp-Evaluation-Script: Evaluation library for CodiEsp Task</article-title>
          , https://github.com/TeMU-BSC/
          <article-title>CodiEsp-Evaluation-Script</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Language</given-names>
            <surname>Processing Pipelines spaCy Usage Documentation</surname>
          </string-name>
          , https://spacy.io/ usage/processing-pipelines
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>5. Models spaCy Models Documentation, https://spacy.io/models</mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Miranda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>CodiEsp corpus: Spanish clinical cases coded in ICD10 (CIE10) - eHealth CLEF2020</article-title>
          (
          <year>Apr 2020</year>
          ). https://doi.org/10.5281/zenodo.3758054, https://doi.org/10.5281/zenodo.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>