<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Text Augmentation Techniques for Clinical Case Classi cation</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science, University of Exeter</institution>
          ,
          <addr-line>Exeter EX4 4QE</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Clinical coding consists in the transformation (or classi cation) of patient record information into a structured or coded format using internationally recognized class codes. Coding accuracy is an ongoing challenge which has led to the organization of challenges and shared tasks to evaluate AI-enhanced, computer-assisted coding systems. In this paper we present our contribution at CodiEsp: Clinical Case Coding Task (CLEF eHealth 2020) on the automatic assignment of clinical coding (diagnosis and procedures) to clinical cases in Spanish. We approach the task as multi-label classi cation problem and leverage the powerful language model: Multilingual BERT (M-BERT) to represent the clinical cases and design various deep learning architectures based on a Convolutional Neural Network and a Long Short-Term Memory Network (CNNLSTM) classi er. To handle the class-imbalance problem, we present other models based on data augmentation techniques (i.e. word-level transformations and text generation methods) for synthesizing labeleddata. Models based on data augmentation pipelines obtain the best results, measured by the F1-score, in comparison to the other proposed models for both tasks. The pipeline based on the word-level transformations obtains the best F1-score (0.143) for the CodiEsp-D task, while the data augmentation technique using the text generation method achieves the best F1-score (0.216) for the CodiEsp-P task.</p>
      </abstract>
      <kwd-group>
        <kwd>Medical text classi cation Data augmentation Text generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The International Classi cation of Diseases (ICD) is a health care classi cation
system which provides standardized codes for reporting diseases and health
conditions1. ICD codes are widely used in Electronic Medical Records (EMR) to
describe a patient's diagnosis or treatment. In current practice medical coders
review a physician's clinical diagnosis (almost always recorded as free text) then
manually assign ICD codes according to coding guidelines. While the process
of standardizing EMR is important for making clinical and nancial decisions
manual ICD coding is expensive, time-consuming and prone to error [
        <xref ref-type="bibr" rid="ref12 ref3">12,3</xref>
        ].
Considering these constraints, automated ICD coding has become an important line
of research in the Arti cial Intelligence community. Traditional machine learning
and deep learning techniques have been applied successfully in this context and
show promising results [
        <xref ref-type="bibr" rid="ref10 ref13 ref8">13,8,16,10</xref>
        ]. However, developing an accurate
computational system to support automated ICD coding is still a challenging task. The
idiosyncrasies of medical language, the scarcity of hospitals using EMR and the
class-imbalance problem in training datasets are among the persistent challenges
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Issues related to clinical coding have led to the organization of challenges
and shared tasks aiming to evaluate automated clinical coding systems such
as the CLEF eHealth Evaluation Lab. The CLEF eHealth2 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], established in
2012 as part of the Conference and Labs of the Evaluation Forum (CLEF), is a
workshop o ering evaluation labs (datasets, evaluation frameworks, and events)
in the medical and biomedical domain on di erent tracks such as information
extraction, information management and information retrieval in a mono- and
multilingual setting. During the CLEF eHealth 2020 the Clinical Case Coding in
Spanish Shared Task3 (CodiEsp) [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] was introduced with the aim to evaluate
systems devoted to the automatic assignment of ICD codes to EMR in
Spanish. This task includes three sub-tasks: (1) Codiesp Diagnosis Coding
(CodiEspD) which consists of automatically assigning ICD10-Clinical Modi cation codes
to clinical cases in Spanish; (2) Codiesp Procedure Coding (CodiEsp-P) which
focuses on assigning ICD10-Procedure codes to clinical cases in Spanish; (3)
Codiesp Explainable Arti cial Intelligence (CodiEsp-X) which evaluates the
explainability/interpretability of the proposed systems (i.e. request to return the
text spans supporting the ICD10 code assignment).
      </p>
      <p>
        This paper presents our contribution at the CLEF eHealth CodiEsp 2020
CodiEsp-D and CodiEsp-P sub-tasks. In total ve models were submitted
during the o cial evaluation, all based on a Convolutional Neural Network and Long
Short-Term Memory Network (CNN-LSTM) classi er. Multilingual BERT
(MBERT) achieved the best performances in the CLEF eHealth 2019 Multilingual
Information Extraction task [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], hence we proposed to leverage the M-BERT
pre-trained model as a part of various deep learning architectures. Then, in order
to handle the class-imbalance problem, we designed data augmentation pipelines
exploring word-level transformation and a text generation method for
synthesizing labeled-data. To compare all the proposed systems we carried out empirical
comparisons against a standard CNN architecture, used here as a baseline.
2 https://clefehealth.imag.fr/ Date of access: 18th June 2020.
3 https://temu.bsc.es/codiesp/ Date of access: 18th June 2020.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Data</title>
      <p>The Codiesp corpus4 consists of a set of 1000 clinical cases manually annotated
by clinical coding professionals 5. Documents were coded with clinical diagnosis
and procedure codes from the Spanish o cial version of ICD10-Clinical Modi
cation and ICD10-Procedure. The released corpus has around 16,504 sentences
and 396,988 words, with an average of 396.2 words per clinical case. The
corpus has been randomly sampled into three subsets: the training set (500 clinical
cases), the development set and the test sets (250 clinical cases each). Each subset
provides clinical cases in plain text format stored as single les (each lename
corresponds to an unique clinical case identi er) and a tab-separated le with either
ICD10-Diagnostico (equivalent to ICD10-CM) or ICD10-Procedimiento
(equivalent to ICD10-PCS) code assignments according to the target task. Table 1
summarises the top-5 most frequent ICD10-Diagnostico and ICD10-Procedimiento
codes from the training and development datasets for both tasks.</p>
      <p>
        As we can observe in Table 1, the datasets provided are high imbalanced
(i.e. there is high disparity between classes), with 15.73% and 9.87%
respectively as the highest frequency rates for the Codiesp-D and Codiesp-P tasks.
In total, 10,711 codes were assigned for both tasks, of which, 1819 are unique
in the Codiesp-D datasets and 608 in the Codiesp-P datasets. The proportion
of rare classes (i.e. classes with only one observation) is also high, 1022 classes
(i.e. 56.18%) in the codiesp-D datasets and 393 (i.e. 64.64%) in the
CodiespP datasets. These ndings led us to investigate data augmentation techniques
which have shown promise in scarce labeled data situations [
        <xref ref-type="bibr" rid="ref2">17,2</xref>
        ].
      </p>
      <p>Moreover, to expand the training and development corpora, the organizers
have also released several additional data resources6 including medical literature
abstracts (i.e. abstracts from Lilacs and Ibecs with ICD10 codes), linguistic
4 Codiesp corpus available online: https://zenodo.org/record/3837305#
.XvsEN5bTVhF Date of access: 30th June 2020.
5 Information about annotation guidelines: https://zenodo.org/record/3632523#
.Xvw2N5bTU5m Date of access: 1st July 2020.
6 https://temu.bsc.es/codiesp/index.php/2019/09/19/resources/ Date of
access: 30th June 2020.
resources, gazetteers and a machine-translated version from English of Codiesp
corpus clinical cases.
3</p>
    </sec>
    <sec id="sec-3">
      <title>System architectures</title>
      <p>
        Empirical studies conducted on the development sets for each task7 found best
performance using a CNN-LSTM classi er. Figure 1 details the architecture
used and the shared parameters for both tasks.The model takes as input a
timeordered sequence of tokens (words) of arbitrary length (truncated to 396 words
which corresponds here the averaged number of words per document, and then
padded with zero vectors) and outputs a document-level prediction. After the
embedding layer, the layer corresponding to the CNN classi er (one-headed) is
introduced using a con guration of 100 parallel feature maps and a kernel size
of 3. Immediately afterwards, a LSTM layer is added (set to 100 internal units).
Then a dense layer of 64 nodes with ReLu is inserted. Finally an output layer is
used with one node containing softmax function. The models have been trained
using the Adam optimizer, with a learning rate of 0:001 and a batch size xed
to 32 for both tasks.
All proposed methods were trained and tested using the Spanish version of the
released corpora. Concerning the preprocessing steps, clinical cases were converted
to lowercase and stop-words were removed. After the tokenization process, all
tokens based only on non-alphanumeric characters and all short tokens (with
&lt; 3 characters) were also deleted. In total ve models were submitted to the
o cial evaluation, we provide below a detailed description of each of them:
- CNN-LSTM: this default approach (used here as a baseline) is based on
the architecture presented in section 3.
- M-BERT: this approach is based on the BERT language model [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Brie y,
BERT, which is based on a transformer architecture, is designed to
pretrain deep bidirectional representations from unlabeled text by jointly
conditioning on both left and right context. Several pre-trained language models
7 Evaluations (not reported here) were conducted on LSTM, BiLSTM, BiGRU, CNN
and CNN-LSTM using the same architecture as presented here.
      </p>
      <p>
        (PTM) have been built from this text encoding model which has previously
been successfully applied to various biomedical NLP tasks [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. In the CLEF
eHealth 2019 Multilingual Information Extraction task, models relying on
BERT and its variants (BioBERT and M-BERT) obtained the best results
[
        <xref ref-type="bibr" rid="ref1 ref15 ref6">1,6,15</xref>
        ]. Here, we propose to explore a Sequential Transfer Learning-based
technique (STL) using M-BERT8 . In a STL scenario the source and target
tasks are di erent and training is performed in sequence. Typically, STL
consists of two stages: a pre-training phase in which general representations are
learned on a source task or domain, and an adaptation phase during which
the learned knowledge is transferred to the target task or domain [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In
the proposed models the pre-trained language representation (i.e. M-BERT)
is introduced during the pre-training phase and then the CNN-LSTM
architecture (c.f. section 3) is applied during the adaptation phase to ne-tune
models to the target task.
- WordNet: this approach explores a traditional textual data augmentation
technique consisting of a word-level transformation: synonym replacement.
Introduced in [17], the application of this kind of local change was shown to
improve performance on text classi cation tasks, especially for small training
datasets. The process of synonym replacement is implemented as a
preprocessing step in a data generator pipeline (this pipeline generates batches of
tensor data with real-time data augmentation). For each batch, 10% of
documents' words (randomly selected) are substituted by WordNet's synonyms9
(except stopwords). Finally, edited documents are used to feed models relying
on a CNN-LSTM architecture. Below is an example of synonym replacement
on a clinical case sample.
      </p>
      <p>
        - original: Paciente de 50 an~os con antecedente de litiasis renal de
repeticion que consulto por hematuria recidivante y sensacion de
malestar. El estudio citologico seriado de orinas demostro la
presencia de celulas at picas sospechosas de malignidad.
- edited: Paciente de 50 an~os con antecedente de litiasis nefr tico de
repeticion que consulto por hematuria recidivante y percepcion de
malestar. El estudio citologico seriado de orinas demostro la
apariencia de celulas at picas sospechosas de malignidad.
- WordNet M-BERT: based on the two previous approaches: WordNet
and M-BERT, we explore the combination of a word-level data
augmentation technique and the M-BERT pre-trained language model representation.
Also implemented as a preprocessing step of the data generator pipeline,
synonym replacement is based on the same setup as in the original approach
i.e. 10% of documents' words are substituted. Then, as introduced in the
M-BERT approach, models are trained as a part of a STL scenario.
- TEXT GEN: in this approach we propose to explore a novel data
augmentation technique based on a text generation method. This strategy was recently
8 BioBERT trained from the original BERT pre-trained model and medical resources
in English can't be applied to clinical cases in Spanish
9 Synonym replacement is performed using the python library NLPAug10
introduced for synthesizing labeled data to improve text classi cation tasks.
Approaches leveraging text generation have shown promise, outperforming
state-of-the-art techniques for data augmentation, speci cally for handling
scarce data situations [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The proposed data augmentation pipeline consists
in two stages: a pre-training phase in which a language model is learned from
the given training sets and a generative phase during which the pre-trained
language model is used to generate arti cial data. In detail, the pre-trained
language model is built using an n-gram modeling approach which estimates
n-gram distribution probabilities learned from a given corpus. The language
models are trained for both tasks using the CNN-LSTM architecture
presented in section 3. During the generative phase the appropriate pre-trained
language model is introduced to generate arti cial data as a preprocessing
step in the data generator pipeline. 30% of each document is altered for each
mini batch. Formally, each document is split into sentences then 30% of the
sentences are replaced by synthesized data. To synthesize new data 30% of
the beginning of a given sentence is used as a seed then extended according
to the average length of sentences in the corpus (set to 20 words). Below is
an example of a synthesized sentence using the pre-trained language model
learned from the CodiEsp-D training set.
      </p>
      <p>- original: Analytical analysis showed hydroxyvitamin lion
ponesium sodium.
- edited: Analytical analysis showed sequence made
transoperative urine outpatient image microbiological markers ap
immunohistochemical intravenous dorsolumbar remained level signs 1788
partially establishing transplantation.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Results on the Test sets</title>
      <p>Models were trained on a workstation with a 36-core CPU and an AMD
FirePro W2100 GPU. Systems were evaluated according to the following metrics:
Mean Average Precision (MAP), MAP@30, precision, recall and F1-score. For
experimental purposes two versions of the F1-score metric are computed: the
F1-score measure which considers the full code for both tasks and the F1-score
CAT which considers only the rst three digits of ICD10-Clinical Modi cation
codes (e.g. codes R20.1 and R20.9 are mapped to R20) and the rst four digits
of ICD10-Procedure codes (e.g. the code bw40zzz is mapped to bw40). Table 2
summarizes the results obtained for both the CodiEsp-D task and the
CodiEspP task on the test sets. For readability purposes only the MAP, the F1-score
and the F1-score CAT are reported. Due to lack of time, not all models for each
task were proposed at the o cial evaluation. However we performed the missing
evaluations using the evaluation library released by the organizers11, results are
presented in italic font.</p>
      <p>As we can observe the results obtained depend on both the tasks and the
models used. For the CodiEsp-D task the Wordnet model (WN) achieves the best
MAP, followed closely by the model based on the pre-trained language model
M-BERT. For the F1-score, the WN model also obtains the best performance
against the other proposed approaches (+0:029 from the baseline). For the
F1score CAT, the baseline is rst-ranked, slightly outperforming the WN model
( 0:001 from the baseline). For the CodiEsp-P task the best MAP is obtained
using the combination of M-BERT and the data augmentation technique based
on synonym replacement (WN M-BERT) while the TEXT GEN model achieves
the best performance for both F1-scores (+0:031 for the F1-score and +0:096 for
the F1-score CAT, measured relative to the baseline). Concerning the uno cial
results (in italic font), the model WN is rst-ranked on the CodiEsp-P task while
the TEXT GEN model is the less e cient for the CodiEsp-D task.</p>
      <p>In the overall evaluation, the use of a STL-based architecture combined with
a pre-trained model has shown its e ciency, outperforming the baseline on both
tasks on the majority of evaluation metrics. Concerning data augmentation
techniques, despite missing evaluations, the proposed techniques produced strong
results in comparison with the other models, outperforming both the baseline
and the models based on M-BERT on both tasks on the majority of evaluation
metrics.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper we presented our contribution to the CLEF eHealth CodiEsp 2020
CodiEsp-D and CodiEsp-P sub-tasks. In total we proposed ve models during
the o cial evaluation in which we explored both the powerful language model:
Multilingual BERT (M-BERT) and two data augmentation techniques,
wordlevel transformation and text generation methods, for synthesizing labeled-data.
Models based on data augmentation pipelines achieved the best performances
in comparison to the other proposed models for both tasks on the majority of
evaluation metrics.
16. Shi, H., Xie, P., Hu, Z., Zhang, M., Xing, E.P.: Towards automated icd coding
using deep learning. CoRR arXiv:1711.04075 (2017)
17. Wei, J., Zou, K.: Eda: Easy data augmentation techniques for boosting performance
on text classi cation tasks. CoRR arXiv:1901.11196 (2019)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Amin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <article-title>Dun eld</article-title>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Vechkaeva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.A.</given-names>
            ,
            <surname>Wixted</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.K.</surname>
          </string-name>
          :
          <article-title>Mlt-dfki at clef ehealth 2019: Multi-label classi cation of icd-10 codes with bert</article-title>
          .
          <source>In: Working Notes of Conference and Labs of the Evaluation (CLEF) Forum. CEUR Workshop Proceedings</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Anaby-Tavor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carmeli</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goldbraich</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kantor</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kour</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shlomov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tepper</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zwerdling</surname>
          </string-name>
          , N.:
          <article-title>Do not have enough data? deep learning to the rescue!</article-title>
          <source>In: AAAI Conference on Arti cial Intelligence</source>
          . pp.
          <volume>7383</volume>
          {
          <issue>7390</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Campbell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giadresco</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Computer-assisted clinical coding: A narrative review of the literature on its bene ts, limitations, implementation and impact on clinical coding professionals</article-title>
          .
          <source>Health Information Management Journal</source>
          <volume>49</volume>
          (
          <issue>1</issue>
          ),
          <volume>5</volume>
          {
          <fpage>18</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Catling</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spithourakis</surname>
            ,
            <given-names>G.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riedel</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Towards automated clinical coding</article-title>
          .
          <source>International journal of medical informatics 120</source>
          ,
          <volume>50</volume>
          {
          <fpage>61</fpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>CoRR arXiv</source>
          :
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Dorendahl,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Leich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Hummel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            , Schonfelder, G.,
            <surname>Grune</surname>
          </string-name>
          ,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>Overview of the clef ehealth 2019 multilingual information extraction (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Goeuriot</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kelly</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miranda-Escalada</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzales</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Viviani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Overview of the clef ehealth evaluation lab 2020</article-title>
          . In: Arampatzis,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Tsikrika</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Vrochidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Joho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Lioma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Eickho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Neveol</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            , andNicola Ferro, L.C. (eds.)
            <surname>Experimental IR Meets Multilinguality</surname>
          </string-name>
          , Multimodality, and
          <source>Interaction: Proceedings of the Eleventh International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ) . LNCS Volume number:
          <volume>12260</volume>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Kavuluru</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rios</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records</article-title>
          .
          <source>Arti cial intelligence in medicine 65(2)</source>
          ,
          <volume>155</volume>
          {
          <fpage>166</fpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yoon</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>So</surname>
            ,
            <given-names>C.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kang</surname>
          </string-name>
          , J.:
          <article-title>Biobert: a pre-trained biomedical language representation model for biomedical text mining</article-title>
          .
          <source>Bioinformatics</source>
          <volume>36</volume>
          (
          <issue>4</issue>
          ),
          <volume>1234</volume>
          {
          <fpage>1240</fpage>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
          </string-name>
          , H.:
          <article-title>Icd coding from clinical text using multi- lter residual convolutional neural network</article-title>
          .
          <source>In: AAAI Conference on Arti cial Intelligence</source>
          . pp.
          <volume>8180</volume>
          {
          <issue>8187</issue>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Miranda-Escalada</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gonzalez-Agirre</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Armengol-Estape</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Krallinger</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Overview of automatic clinical coding: annotations, guidelines, and solutions for non-english clinical cases at codiesp track of ehealth clef 2020</article-title>
          . In: Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          . CEUR Workshop Proceedings (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>O</given-names>
            <surname>'Dowd</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Coding errors in nhs cause up to$ 1bn worth of inaccurate payments</article-title>
          .
          <source>BMJ: British Medical Journal (Online) 341</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Perotte</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pivovarov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Natarajan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weiskopf</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wood</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elhadad</surname>
          </string-name>
          , N.:
          <article-title>Diagnosis code assignment: models and evaluation metrics</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>21</volume>
          (
          <issue>2</issue>
          ),
          <volume>231</volume>
          {
          <fpage>237</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ruder</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Neural Transfer Learning for Natural Language Processing</article-title>
          .
          <source>Ph.D. thesis</source>
          , NATIONAL UNIVERSITY OF IRELAND, GALWAY (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Sa</surname>
            nger,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weber</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kittner</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leser</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>Classifying german animal experiment summaries with multi-lingual bert at clef ehealth 2019 task 1</article-title>
          . In: Working Notes of Conference and
          <article-title>Labs of the Evaluation (CLEF) Forum</article-title>
          . CEUR Workshop Proceedings (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>