=Paper=
{{Paper
|id=Vol-1866/paper_92
|storemode=property
|title=Automatic Coding of Death Certificates to ICD-10 Terminology
|pdfUrl=https://ceur-ws.org/Vol-1866/paper_92.pdf
|volume=Vol-1866
|authors=Jitendra Jonnagaddala,Feiyan Hu
|dblpUrl=https://dblp.org/rec/conf/clef/JonnagaddalaH17
}}
==Automatic Coding of Death Certificates to ICD-10 Terminology==
<pdf width="1500px">https://ceur-ws.org/Vol-1866/paper_92.pdf</pdf>
<pre>
       Automatic coding of death certificates to ICD-10
                        terminology

                         Jitendra Jonnagaddala1,2, * and Feiyan Hu3
       1 School of Public Health and Community Medicine, UNSW Sydney, Australia
                 2 Prince of Wales Clinical School, UNSW Sydney, Australia

                                  z3339253@unsw.edu.au
             3 Insight Centre for Data Analytics, Dublin City University, Ireland

                                       feiyan.hu@dcu.ie


       Abstract. In this study, we present methods to automatically assign ICD-10
       codes to short plain text description extracted from death certificates in English.
       We deployed an approach to tackle the task by solely using dictionary lookup,
       also known as dictionary matching or dictionary projection. The first step is to
       index manually coded ICD-10 lexicon followed by dictionary matching. Priority
       rules are applied to retrieve the relevant entity/entities and their corresponding
       ICD-10 code(s) given free text cause of death description. Because of the dic-
       tionary based method that we applied, we were able to evaluate our method even
       on the training set. The advantages of a dictionary look up method include speed
       and no need for training data. We present our results of 3 different experimental
       settings each of which has 2 individual runs. The performance is evaluated by
       precision, recall and F-measure. We identified several major issues in the corpus
       contributing to the low performance of our methods. This reiterates the fact that
       the quality of lexicon plays a significant role on the performance of dictionary
       lookup based methods.

       Keywords: Death certificates coding, Cause of death coding, ICD-10 coding,
       ICD-10 code assignment, Concept normalization, String matching,


1      Introduction


ICD also known as the International List of Causes of Death, was adopted by the Inter-
national Statistical Institute in the year 1893[1]. ICD includes the universe of diseases,
disorders, injuries and other related health conditions, listed in a comprehensive, hier-
archical way to facilitate storage, retrieval, analysis and exchange of information. It is
one of the widely used international standards to report diseases and health conditions
and to identify health trends and statistics globally. Uses of ICD include monitoring of

* Corresponding author
the incidence and prevalence of diseases, observing reimbursements and resource allo-
cation trends, and keeping track of safety and quality guidelines. Another important use
is to report deaths as well as diseases, injuries, symptoms, reasons for encounter, factors
that influence health status, and external causes of disease.

   World Health Organization (WHO published the 6th version in 1948 which is known
as ICD-6. All member states of the WHO are regulated to use the most current ICD
revision to report mortality and morbidity statistics. The ICD has been revised and pub-
lished in a series of editions to reflect advances in health and medical science over time.
The current ICD version is ICD-10, which was initially used in 1990. It covers more
than 20,000 codes including diagnoses and procedures, but only a subset of these codes
can be causes of death. Although delayed, ICD-11 is being currently drafted and is
expected to be released in 2017.

   Manually assigning ICD codes to a free text description is expensive and time-con-
suming due to the vast coverage and size of ICD terminology, thus automated methods
are required to assist the manual coders and public health reporting officials[2]. We can
consider the process of assigning ICD codes as a classification problem, or entity recog-
nition problem or, entity recognition and normalization problem, depending on the con-
text. This will allow us to leverage various techniques based on machine learning
and/or natural language processing. In recent automatic object detection tasks in im-
ages, we have seen deep learning based neural networks outperforming human players
[3]. It is legitimate to hypothesize that the ICD code assigning task in future could be
completely automated.

    Researchers have been investigating ICD code assignment in different types of med-
ical records such as pathology reports, discharge summaries and death certificates. Re-
cent studies proposed various methods specifically for ICD code assignment in death
certificates. Recently, supervised learning methods using Support Vector Machines
(SVM) to assign ICD codes has been applied [4-6]. Methods in unsupervised manner
are used in few other studies [7, 8]. The methods applied not based on classification
models are normally based on dictionary lookup, also known as dictionary matching or
projection. Mottin et al. used entity relocation and entity normalization to automatically
categorize text, compute similarity metric like cosine similarity of features in order to
find and rank input text [7]. The feature vector is formed by TF-IDF weighted bag of
words. Others claim that the hybrid method of dictionary projection and supervised
learning can outperform both dictionary projection and supervised learning [4, 6]. Our
method is based on dictionary lookup and priority rules. We applied exact and partial
string matching to look up a manually coded ICD-10 dictionary. The result is the cor-
responding ICD codes of the matching query in the dictionary. The performance of
dictionary projection is conditioning on the fact that provided lexicon has good quality.
The advantage of such method is that it is easy and cheap to compute on a large scale
dataset.
2      Methods

2.1    Corpus
We have used the CDC, distributed as part of the CLEF e-Health 2017 Task 1, for
developing our methods[9, 10]. The corpus included censored free-text descriptions of
causes of death reported by the clinicians in death certificates. These free-text descrip-
tions were manually coded by the experts using ICD-10 terminology[11]. A manually
curated ICD-10 lexicon was provided with the corpus. The methods employed in the
construction of this corpus are the same as CépiDc Causes of Death French corpus [12].
The corpus comprised of training and test sets.

   A sample (with modified content) ICD-10 coded death certificate from the corpus is
shown in Fig.1. In the sample death certificate with ID 0808, there were 3 causes of
death entities with ICD-10 codes assigned at line 1, 2 and 6 of the original full death
certificate (i.e. “pneumonia”, “atrial fibrillation”, and “CVA parkinsons disease”).
There were two ICD-10 codes assigned and ranked manually by the experts for the
cause of death statement – “CVA PARKINSONS DISEASE”. The primary cause of
death is coded as I48, which stands for “Atrial fibrillation and flutter” in ICD-10 stand-
ard terminology. In this study, we only focused on coding all the entities observed in
the death certificate. The identification of primary cause of death is beyond the scope
of this study. It is also important to note that the corpus didn’t include the full original
contents of the death certificates rather it just included only ‘cause of death’ entities.


                       Fig. 1. Sample ICD-10 coded death certificate.
2.2     Concept coding using dictionary lookup and priority rules
The proposed methods are based on our previous work, on coding PubMed articles with
MeSH terminology [13, 14]. Our methods are mainly based on dictionary lookup and
priority rules. String matching is a critical technique for dictionary lookup, which can
either be exact or partial matching (i.e. proximity and fuzzy matching). The dictionary
lookup approach has various advantages and can provide competitive results when used
with the right lexicon [14].

    Initially, the ICD-10 lexicon and the input free text descriptions in the corpus are
subjected to a few pre-processing steps. The pre-processing included tokenization, lem-
matization and stop words removal using the Apache Lucene * library. This is followed
by the expansion of abbreviations identified in the free text descriptions based on the
abbreviations lexicon. This lexicon was developed by the authors in a previous
study[14]. Finally, the dictionary matching is performed between the ICD-10 lexicon
and the free text descriptions. To identify the right code, we implemented several pri-
ority rules. Highest priority is given to the code with an exact match, followed by partial
phrase match and partial token match. In many situations, more than one code is iden-
tified by each rule, thus we employed another rule to consider only the top code re-
trieved which had the highest score. The highest score should be greater than 0.5. Sim-
ilar methods have been employed in a previous study where dictionary look up was
used in conjunction with priority rules [7]. However, in our study the priority rules are
not just limited to exact match but also cover phrase and term matches.


2.3     Experimental Setup

The training set from the corpus was used to perform initial experiments. The methods
discussed in the above section were later evaluated on the test set. Three different ex-
periments (Exp1, Exp2, Exp3), each with two runs (Run1, Run2) were performed on
the test set. In each experiment, Run1 refers to the setup where Okapi BM25 scoring
was used and TF-IDF scoring for Run2 to rank the retrieved ICD-10 codes[15].

   Exp1 considered only ICD-10 codes retrieved which met the priority rule conditions
and had the highest-ranking score. No lemmatization and stop word removal steps were
employed. Exp2 considered only ICD-10 codes retrieved which met the priority rule
conditions and had highest-ranking score. However, lemmatization and stop word re-
moval steps were employed. Exp3 was very similar to Exp2 except with the addition
of abbreviation expansion component. We developed a separate lexicon which included
abbreviations with their full forms using MEDIC vocabulary [16]. Exp1 and Exp2 were
performed on the test set solely based on our initial experiments on the train set. In
other words, we didn’t access the ground truth of the test set while performing these
experiments. Exp3 was performed after performing error analysis on the predicted
ICD-10 codes from previous experiments by accessing the ground truth of the test set.


* http://lucene.apache.org/core/
2.4    Evaluation metrics
The performance of the proposed methods was assessed using the standard metrics pre-
cision (P), recall (R) and F-measure (F) by identifying the true positives (TP), false
positives (FP) and false negatives (FN).


                                           TP
                                    P=                                                 (1)
                                         TP + FP

                                           TP                                          (2)
                                    R=
                                         TP + FN

                                       (2 × P × R)                                     (3)
                                  F=
                                          P+R
   The metrics by default consider all the ICD-10 codes irrespective of their type or
group in the terminology. However, the metrics were also used to evaluate performance
on violent deaths type (codes from V01 to Y98) of ICD-10 codes. The intuition behind
evaluating the performance of this type was specifically that public health professionals
in general are keen to identify, analyze and intervene in these avoidable deaths. Only
Exp2 runs were evaluated for violent deaths type.


3      Results

The above proposed automatic methods were applied to all the death certificates in the
dataset. The distribution of the training and test sets of the corpus is summarized in
Table 1. We noticed that the performance on the test set is lower than initial experiments
performed on the training set.

                         Table 1. Distribution of train and test sets.

                                                                         Training   Test
                                                                         set        set
      No. of death certificates                                          13,330     6,665
      No. of entities in all death certificates                          40,351     18,444
      No. of ICD-10 codes (excluding those without codes)                39,332     18,928
      No. of entities without ICD-10 codes                               2252       119
      No. of unique ICD-10 codes                                         1255       900
      No. of tokens* in all death certificates                           96,177     45,354
      Average token count per death certificate                          7.22       6.81


* Tokens are calculated using NLTK tokenizer   http://www.nltk.org/
   The results of our experiments described in the previous section on the test set are
presented in the Table 2. In Exp2 and Exp3 the BM25 scoring based Run1 outperformed
TF-IDF scoring based Run2. The performance results specifically for violent deaths
type for Exp2 runs were as follows, Exp2-Run1 achieved 0.1684(P), 0.2619(R) and
0.205(F), while Exp2-Run2 achieved 0.043(P), 0.3095(R) and 0.0755(F).

                        Table 2. Performance results on the test set

                                            Evaluation metrics
Setup
                      TP       FP            FN         P             R        F
Exp1-Run1           8112     19137        10666      0.2977        0.4320   0.3525
Exp1-Run2           7915     17870        10863      0.3070        0.4215   0.3552
Exp2-Run1           6607     9891         12171      0.4005        0.3518   0.3746
Exp2-Run2           6156     10441        12622      0.3709        0.3278   0.3480
Exp3-Run1           7090     9604         11688      0.4247        0.3776   0.3998
Exp3-Run2           6605     10081        12173      0.3958        0.3517   0.3725


4       Discussion

Our results demonstrate that the performance of dictionary lookup based approach for
ICD-10 code assignment in death certificates is inferior to supervised and/or hybrid
based methods [5, 6]. To identify the possible reasons for large number of FN and FP,
a thorough error analysis was manually performed on a subset of predicted ICD-10
codes based on the Exp2 setup. Many issues were noticed ranging from quality of the
lexicon supplied in the corpus to short comings in our experimental setup. One of the
short-comings we addressed was addition of abbreviation expansion as part of the
Exp3. We identified that the testing set and training set included various abbreviated
‘cause of death’ entities which were not addressed in Exp1 and Exp2. HTN, CAD,
COPD, CHF, CAR and CVA were some of the frequently abbreviated entities appear-
ing in the death certificates. Our custom abbreviations lexicon had around 350 entries
and it increased our F score from 0.3746 to 0.3998.

   One of the key reason for our low performance was quality of the ICD-10 lexicon
supplied. We observed many issues including inconsistent formatting errors and incom-
plete coverage of ICD-10 codes in the lexicon. For example, we noticed that there were
over 100 instances where the ICD-10 codes manually coded by the experts are not part
of the ICD-10 lexicon. W19, W75 and B334 were few such examples observed in the
corpus. There were also several issues noticed with coding performed by the experts.
There were inconsistencies in the lexicon and codes identified manually by the experts.
One such example is that there are instances where experts coded few entities to J101
but in the lexicon the correct corresponding code is J1010. Another similar type of issue
is the ‘cause of death’ entities in a death certificate don’t match to the expert coded
version. For example, consider the death certificate with ID 00004. There is only one
entity (STROKE) according to the file which doesn’t include ICD-10 codes but in the
expert coded version there were two (I64 and F179) ICD-10 codes.

   Inconsistencies in the representation of multiple entities observed in the same line of
the death certificate were also frequently observed throughout the corpus. “CVA
PARKINSONS DISEASE” is such example where Cerebrovascular accident (CVA)
and PARKINSONS DISEASE are not clearly separated. “H/O CAD AND
ELEVATED B/P”, “Respiratory Distress/arrest”, “HEMORRHAGE                             S/P
AORTOBIFEMORAL BYPASS”, “CHF - DIASTOLIC” and “H/O CAD AND
ELEVATED B/P” are similar such examples where entities are separated inconsistently
with no standard guidelines or notation. There were at least over 2000 instances of such
inconsistencies in both train and test sets. We strongly believe that by enhancing the
current ICD-10 lexicon, we can improve the dictionary lookup based performance fur-
ther. One enhancement worth exploring in future is to incorporate synonyms and,
spelling variations and corrections (Example: PNUEMONIA => PNEUMONIA;
ATRAIL FIBRILLATION => ATRIAL FIBRILLATION) into ICD-10 lexicon used
in addition to addressing the issues discussed earlier.


5      Conclusion

In conclusion, we have described our methods to automatically code death certificates
to ICD-10 terminology. Our dictionary-lookup based methods are simple, effective and
no training phase is required. However, the performance of these methods is not as
good as machine learning based topic modeling or learning to rank or hybrid methods.
The performance of dictionary lookup heavily relies on the quality of the lexicon used.
In addition, to a high-quality lexicon, enhancements such as synonym and spelling var-
iations need to be incorporated into dictionary lookup approach for better performance.
In future, we would like to improve our results by employing learning to rank algo-
rithms in conjunction with improved dictionary lookup approach.


Acknowledgements

   This study was conducted as part of the electronic Practice Based Research Network
(ePBRN) and Translational Cancer research network (TCRN) research programs. eP-
BRN is funded in part by the School of Public Health & Community Medicine, Ingham
Institute for Applied Medical Research, UNSW Medicine and South West Sydney Lo-
cal Health District. TCRN is funded by Cancer Institute of New South Wales and Prince
of Wales Clinical School, UNSW Medicine. We would like to thank the organizers of
CLEF eHealth 2017 Task 1 for providing us, with the ICD10 coded text content from
death certificates. The content of this publication is solely the responsibility of the au-
thors and does not necessarily reflect the official views of the funding bodies.
References

1.    WHO.                    [cited        2017       June];        Available        from:
      http://www.who.int/classifications/icd/en/HistoryOfICD.pdf.
2.    Jonnagaddala, J., et al., Mining Electronic Health Records to Guide and Support
      Clinical Decision Support Systems, in Improving Health Management through Clinical
      Decision Support Systems. 2016, IGI Global. p. 252-269.
3.    He, K., et al. Delving deep into rectifiers: Surpassing human-level performance on
      imagenet classification. in Proceedings of the IEEE international conference on
      computer vision. 2015.
4.    Boytcheva, S. Automatic matching of ICD-10 codes to diagnoses in discharge letters.
      in Proceedings of the Workshop on Biomedical Natural Language Processing. 2011.
5.    Dermouche, M., et al. ECSTRA-INSERM@ CLEF eHealth2016-task 2: ICD10 code
      extraction from death certificates. 2016. CLEF.
6.    Zweigenbaum, P. and T. Lavergne, Hybrid methods for ICD-10 coding of death
      certificates. EMNLP 2016, 2016: p. 96.
7.    Mottin, L., et al., BiTeM at CLEF eHealth Evaluation Lab 2016 Task 2: Multilingual
      Information Extraction.
8.    Zweigenbaum, P. and T. Lavergne. LIMSI ICD10 coding experiments on CépiDC
      death certificate statements. 2016. CLEF.
9.    Lorraine Goeuriot, L.K., Hanna Suominen, Aurélie Névéol, Aude Robert, Evangelos
      Kanoulas, Rene Spijker, Joaõ Palotti, and Guido Zuccon. , CLEF 2017 eHealth
      Evaluation Lab Overview. , in CLEF 2017 - 8th Conference and Labs of the Evaluation
      Forum, Lecture Notes in Computer Science (LNCS), Springer, September, 2017. 2017.
10.   Névéol, A.a.A., Robert N. and Cohen, K. Bretonnel and Grouin, Cyril and Lavergne,
      Thomas and Rey, Grégoire and Robert, Aude and Rondet, Claire and Zweigenbaum,
      Pierre. , CLEF eHealth 2017 Multilingual Information Extraction task overview:
      ICD10 coding of death certificates in English and French. , in CLEF 2017 Evaluation
      Labs and Workshop: Online Working Notes, CEUR-WS, September, 2017.
11.   WHO, The ICD-10 Classification of Diseases, Clinical Descriptions and Diagnostic
      Guidelines. Geneva: WHO, 1992.
12.   Lavergne, T., et al., A Dataset for ICD-10 Coding of Death Certificates: Creation and
      Usage. BioTxtM 2016, 2016: p. 60.
13.   Jonnagaddala, J., et al. Recognition and normalization of disease mentions in PubMed
      abstracts. in Proceedings of the fifth BioCreative challenge evaluation workshop,
      Sevilla, Spain, September 9-11, 2015. 2015.
14.   Jonnagaddala, J., et al., Improving the dictionary lookup approach for disease
      normalization using enhanced dictionary and query expansion. Database, 2016. 2016:
      p. baw112-baw112.
15.   Manning, C.D., P. Raghavan, and H. Schütze, Introduction to information retrieval.
      Vol. 1. 2008: Cambridge university press Cambridge.
16.   Davis, A.P., et al., MEDIC: a practical disease vocabulary used at the Comparative
      Toxicogenomics Database. Database, 2012. 2012: p. bar065.

</pre>