=Paper=
{{Paper
|id=Vol-3060/paper-6
|storemode=property
|title=Leveraging Structured Data in Predictive Process Monitoring: the Case of the ICD-9-CM in the Scenario of the Home Hospitalization Service
|pdfUrl=https://ceur-ws.org/Vol-3060/paper-6.pdf
|volume=Vol-3060
|authors=Roberto Aringhieri,Guido Boella,Enrico Brunetti,Luigi Di Caro,Chiara Di Francescomarino,Mauro Dragoni,Roger Ferrod,Chiara Ghidini,Renata Marinello,Massimiliano Ronzani,Emilio Sulis
|dblpUrl=https://dblp.org/rec/conf/aiia/AringhieriBBCFD21
}}
==Leveraging Structured Data in Predictive Process Monitoring: the Case of the ICD-9-CM in the Scenario of the Home Hospitalization Service==
<pdf width="1500px">https://ceur-ws.org/Vol-3060/paper-6.pdf</pdf>
<pre>
Leveraging structured data in Predictive Process
Monitoring: the case of the ICD-9-CM in the scenario
of the Home Hospitalization Service
Roberto Aringhieri1 , Guido Boella1 , Enrico Brunetti1,2 , Luigi Di Caro1 , Chiara Di
Francescomarino3 , Mauro Dragoni3 , Roger Ferrod1 , Chiara Ghidini3 ,
Renata Marinello2 , Massimiliano Ronzani3 and Emilio Sulis1
1
  University of Turin, Torino, Italy
2
  City of Health and Science, Torino, Italy
3
  Fondazione Bruno Kessler, Trento, Italy


                                         Abstract
                                         The large availability of hospital administrative and clinical data has encouraged the application of Pro-
                                         cess Mining techniques to the healthcare domain. Predictive Process Monitoring techniques can be used
                                         in order to learn from these data related to past historical executions and predict the future of incomplete
                                         cases. However, some of these data, possibly the most informative ones, are often available in natural
                                         language text, while structured information - extracted from these data - would be more beneficial for
                                         training predictive models. In this paper we focus on the scenario of the Home Hospitalization Service,
                                         supporting the team in making decisions on the home hospitalization of a patient, by predicting whether
                                         it is likely that a new patient will successfully undergo home hospitalization. We aim at investigating
                                         whether, in this scenario, we can take advantage of mapping unstructured textual diagnoses, reported
                                         by the doctor in the Emergency Department, into structured information, as the standardized disease
                                         ICD-9-CM codes, to provide more accurate predictions. To this aim, we devise an approach for mapping
                                         textual diagnoses in ICD-9-CM codes and leverage the structured information for making predictions.

                                         Keywords
                                         Healthcare processes, Predictive Process Monitoring, Natural Language Processing, Home Hospitaliza-
                                         tion Service


1. Introduction
The improvement of healthcare processes and the support of clinical personnel in making
decisions might have an impact on the efficiency of the healthcare services, as well as on the
quality of the work of the clinical personnel, who sparing time in administrative tasks has more
time available for taking care of patients, thus improving the patients’ quality of life. Process
Mining (PM) [1], which deals with the analysis of business processes based on their behaviour —
observed and recorded in event logs — can be a useful instrument in this setting. PM deals with

AIxIA 2021 SMARTERCARE Workshop, November 29, 2021, Milan, IT
" roberto.aringhieri@unito.it (R. Aringhieri); guido.boella@unito.it (G. Boella); enrico.brunetti@unito.it
(E. Brunetti); luigi.dicaro@unito.it (L. Di Caro); dfmchiara@fbk.eu (C. Di Francescomarino); dragoni@fbk.eu
(M. Dragoni); roger.ferrod@unito.it (R. Ferrod); ghidini@fbk.eu (C. Ghidini); rmarinello@cittadellasalute.to.it
(R. Marinello); mronzani@fbk.eu (M. Ronzani); emilio.sulis@unito.it (E. Sulis)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                          48
Roberto Aringhieri et al. CEUR Workshop Proceedings                                           48–60


the analysis of business process event logs in different ways [2], including process discovery (i.e.,
extracting process models from an event log) [1], predictions of the future of ongoing cases [3]
and process optimization [1]. PM techniques can be leveraged for the discovery and analysis of
both clinical and administrative processes in healthcare. The application of PM techniques is
further encouraged by the wide availability of administrative and clinical data in hospitals. These
data could be leveraged for discovering (and improving) processes, as well as for supporting
hospital teams in making decisions on clinical and administrative issues [4, 5]. It often happens
that these data are collected in national standard forms and documents, shared among several
hospitals on the national area. For instance, in Italy, one of these documents is the Hospital
Discharge Form (HDF), which collects information related to the clinical history of a patient
during his/her hospitalization. The data collected in the discharge form range from data (with
temporal information) related to the hospital admission, discharge and examinations carried out
during the hospitalization to data such as the number of days of hospitalization. Unfortunately,
however, not all these data are structured. Some of them, possibly the most informative ones,
are textual unstructured fields, as in the case of the patients’ diagnoses reported by the doctor
at the arrival of the patient at the Emergency Department.
   In this paper we aim at investigating whether we can take advantage of mapping unstructured
data into the structured information provided by the ICD-9-CM1 taxonomy when making predic-
tions in the scenario of the Home Hospitalization Service. We first provide some preliminaries
(Section 2) and introduce the Home Hospitalization scenario (Section 3). In Section 4 we report
about the proposed approach that aims at (i) mapping unstructured data to ICD-9-CM codes;
and (ii) leveraging this structured information when making predictions. We report on the
evaluations carried out in Section 5 and we finally conclude in Section 7.


2. Background
In this section we report the background concepts useful for understanding the remainder of
the paper.

Predictive Process Monitoring. Predictive Process Monitoring (PPM) [3] is a relatively new
branch of PM that aims at predicting at runtime and as early as possible the future development of
an ongoing incomplete execution of a process. Predictions related to the future of an incomplete
process execution (as known as case) of state of-the-art approaches can be classified in macro-
categories [6]: numeric predictions (e.g., time or cost predictions); categorical predictions
(e.g., risk predictions or specific categorical outcome predictions such as the fulfillment or the
violation of a certain property); as well as to next activities predictions (e.g, the sequence of
future activities, possibly with their payloads).
   Together with these techniques, few frameworks have also been recently developed imple-
menting and collecting these techniques, such as for instance Nirdizati [7]. These frameworks
take as input a set of past executions and use them to train predictive models to be used for
providing users with predictions at runtime. They are usually characterized by two main

    1
        https://www.cdc.gov/nchs/icd/icd9cm.htm


                                                  49
Roberto Aringhieri et al. CEUR Workshop Proceedings                                            48–60


modules: one for the case encoding, and one for the supervised learning. Each of them can be
instantiated with different techniques.

ICD-9-CM. ICD-9-CM is the ninth edition of the International Classification of Diseases. It
contains a structured standard codification of diseases and procedures that is used internationally
both in the management of public health and for statistical and epidemiological purposes.
   The ICD-9-CM assigns specific codes (and associated descriptions) to both diseases and
procedures. It is organized in the form of a taxonomy, so that each code corresponding to a
specific disease variant (subprocedure) is classified as a disease (procedure), which, in turn, is
classified as a category of diseases (procedures) and so on. In the case of the diagnoses, each
code is composed of five digits: the first three digits represent a high level disease category, the
fourth digit indicates the specific disease, while the last digit identifies the specific variant of
the disease. In turn, the first three digits are further classified according to number interval
ranges corresponding to families of diseases. For instance, the code 410.22 corresponding to
the description Acute myocardial infarction of inferolateral wall, subsequent episode of care is a
leaf of the hierarchy:

 390–459: Diseases of The Circulatory System
        410–414: Ischemic Heart Diseases
             410: Acute myocardial infarction
                  410.2: Acute myocardial infarction of inferolateral wall
                       410.22: Acute myocardial infarction of inferolateral wall, subsequent episode
                        of care

  This simple representation of the taxonomy allows us to select, for a given diagnosis code,
the level of abstraction, i.e., the ancestor, among the low levels of the taxonomy, by truncating
the last or the last two digits of the ICD-9-CM code.


3. The Home Hospitalization Service Scenario
The Home Hospitalization Service (HHS) of the City of Health and Science (CHS), which has been
in operation for over 30 years, has proven to be a valid alternative to hospitalization for a variety
of acute and chronic exacerbated diseases [8], such as uncomplicated ischemic stroke, congestive
heart failure, exacerbations of chronic obstructive pulmonary disease, onco-hematological
diseases with high transfusion requirements, dementia with behavioral disorders [9]. The HHS
consists of a multidisciplinary team. The essential criteria for taking care of an acute patient
at home are threefold: (i) clinical aspects, e.g., no need for continuous or invasive monitoring
of vital parameters, as well as to perform invasive diagnostic-interventions; (ii) geographical
aspects (residence in the area of competence of the HHS); (iii) social welfare (constant presence
of one or more caregivers, formal or informal). Every year, the service manages about 500
admissions of patients coming in most cases from the same hospital and in small part upon
direct request of the General Practitioner (GP). At the end of the treatment period, more than


                                                 50
Roberto Aringhieri et al. CEUR Workshop Proceedings                                               48–60


80% of patients are discharged to the GP, 10.5% die during hospitalization and about 8% is
moved to hospital. Over the past 8 years, the percentage of patients unable to continue care
management at home has remained constant, despite the increase in clinical complexity and
care burden of patients taken into care. In 2018, HHS patients were 492 with a high average age
(about 84 years). The overall goal is supporting the HHS team in the timely identification and
notification of the patients that can be managed through the HHS, as well as in the efficient
management of the HHS processes.

Data Description. The administrative and clinical data available so far for the specific case
study are related to Emergency Department Discharge Forms (EDDF) and to the Hospitalization
Discharge Forms (HDF) of about 400 CHS patients benefitting from the HHS. The EDDF contains
information collected at the Emergency Department (ED) such as: (i) date and time information
related to the ED admission, triage, discharge, last and latest update of the anamnesis; (ii)
structured information e.g., on the patient triage colour code; and (iii) textual notes e.g., on
the diagnosis. The HDF contains instead information about the clinical history of the patient
during the hospitalization, such as: (i) date and time information related to e.g., the hospital
admission, discharge, main intervention; (ii) structured information related to e.g., patients’
data (age, sex, civil status, etc.), number of visits; and (iii) textual information related to e.g., the
hospitalization cause and the anamnesis.


4. Approach
In order to support the HHS team in making decisions on the home hospitalization of a patient,
the overall idea is applying existing approaches of PPM to the data related to the administrative
and clinical management of ED patients. To this aim, patient data need to be transformed
into a trace describing the history of the patient and used as features to learn and provide
predictions about the home hospitalization of the patient. Most of these data are structured bits
of information, while others, equally or more informative, are collected as unstructured text,
as for instance the diagnosis informally reported by the doctor when the patient reaches the
ED. In order to be able to apply PPM approaches and properly leverage this information when
making predictions, we devised the following pipeline:

    ∙ we preprocess data so as to generate an event log describing the patient histories (Sec-
      tion 4.1);

    ∙ we map the informal diagnosis descriptions into the standardized diagnosis codes of the
      ICD-9-CM taxonomy (Section 4.2);

    ∙ we leverage the mapped structured ICD-9-CM code or one of its ancestors as a structured
      feature to be used in making predictions (Section 4.3)

4.1. Data Preprocessing and Analysis.
The dataset related to the HDFs extracted from the hospital information systems has first been
cleaned by removing hospitalizations of few days or “routine” procedures and then joined with


                                                   51
Roberto Aringhieri et al. CEUR Workshop Proceedings                                        48–60


Figure 1: Overview of the ICD-9-CM mapping pipeline.

the dataset of the ED. The following steps have been then applied to the joined dataset:
   ∙ The dataset has been transformed into an event log. The hospital discharge id number
     has been used as trace id. For the HDF data, date and time fields related to the hospital
     admission, discharge, and to the interventions performed by the patient during the hospi-
     talization have been used as timestamps for the activities H_admission, H_discharge
     and for the intervention activities (labelled with the corresponding ICD-9-CM code or
     with the procedure category they belong to in the ICD-9-CM procedures), respectively.
     Patient personal data and other structured data, such as the setting of referral, have been
     added as case attributes. Similarly, for EDDF data, date and time fields related to the
     ED admission, discharge, triage, anamnesis and diagnostic hypothesis have been used
     as timestamps for the ED_admission, ED_discharge, ED_triage, ED_anamnesis,
     ED_diagnostic_hp, respectively. Diagnosis and other few attributes have been instead
     used as case attributes. The resulting event log is composed of 413 cases with 270 different
     paths and 49 different activities.
   ∙ In order to be able to make predictions at the time of the discharge from the ED, each
     trace in the log has been truncated at the time of the activity ED_discharge, and the
     attributes that cannot be known at the time of the ED discharge have been removed,
     e.g. the attribute H_number_of_days_in_the_facility, which is known only at the
     end of the hospitalization.
  Finally, data have been labelled according to whether (i) the patient has been hospitalized at
home and the hospitalization had a positive outcome (hh, i.e., Home Hospitalization); or (ii)
she/he has been hospitalized in a different ward or the home hospitalization had a negative
outcome (no-hh). Out of the 413 cases, 368 (89%) were labeled with hh and 45 (11%) with
no-hh.

4.2. Mapping the Diagnosis field to the ICD-9-CM dictionary
In this section we briefly illustrate the Natural Language Processing (NLP) techniques applied
to short unstructured textual diagnoses in order to map them to structured ICD-9-CM diagnosis


                                                52
Roberto Aringhieri et al. CEUR Workshop Proceedings                                                   48–60


codes. Since all the textual diagnoses we want to decode are in Italian, we refer to the Italian
translation of the ICD-9-CM descriptions2 . This is used to create a description/code dictionary
of diseases, after the removal of those codes starting with letter ‘E’ (supplementary classification
of external causes) and ‘V’ (supplementary classification of factors influencing health status
and contact with health services). The technique developed is organized in three steps, which
are illustrated in Figure 1:
   ∙ Preprocessing step: the input textual diagnosis is preprocessed with the removal of
     stop-words and proper replacement of acronyms;
   ∙ Step 1: if the input diagnosis is already exactly matching one of the ICD-9-CM descrip-
     tions, then the corresponding code is taken from the dictionary;
   ∙ Step 2: if in the previous step there is no match, we try to identify among the ICD-9-CM
     descriptions the closest one to the input diagnosis. To do that we go through the following
     procedure:
            – we stem words3 in both input and ICD-9-CM diagnoses. Moreover we delete some
              undefined adjective (e.g. “non specificato” that means unspecified); this is done in
              order to identify more likely generic diagnoses than specialized ones.
            – we identify the subset of ICD-9-CM diagnoses that share the maximum number of
              stems with the input diagnosis 𝐷input
            – among the stems of this subset we select the diagnosis 𝐷ICD9 with the highest value
              of the metrics 𝑔(𝐷input , 𝐷ICD9 ) defined as
                                                  1             ∑︁
                           𝑔(𝐷1 , 𝐷2 ) =                                  𝑙𝑒𝑣.𝑟𝑎𝑡𝑖𝑜(𝑠𝑡𝑒𝑚1 , 𝑠𝑡𝑒𝑚2 )       (1)
                                           𝑙𝑒𝑛(𝐷1 )𝑙𝑒𝑛(𝐷2 )
                                                              𝑠𝑡𝑒𝑚1 ∈𝐷1
                                                              𝑠𝑡𝑒𝑚2 ∈𝐷2

              where 𝑙𝑒𝑣.𝑟𝑎𝑡𝑖𝑜(𝑠1 , 𝑠2 ) is the Levenshtein ratio between two stems 𝑠1 , 𝑠2 and
              𝑙𝑒𝑛(𝐷) counts the number of stems composing the sentence 𝐷. The denominator
              normalizes the metrics: since the numerator grows with the number of words in
              the diagnoses, the metrics is a number between 0 and 1.
Once the input diagnosis is associated to an ICD-9-CM code, we assign a MatchingRate (MR)
from 0 to 100. This metrics aims at estimating the probability that the mapping is correct. If a
match is found in Step 1, then MR = 100; if the mapping comes in Step 2, then it is computed
as follows:
                       MR = min (50𝑔(𝐷1 , 𝐷2 )(1 + 𝑟(𝐷1 , 𝐷2 )), 100)                        (2)
where 𝑔 is the metrics defined in (1) and 𝑟 is the number of stems in common between diagnoses
𝐷1 and 𝐷2 . The quality of this choice for the metrics is investigated in Section 5.1.
   The value of the MatchingRate will be used to as a filter parameter: when its value is above
a certain MatchingRate threshold, we will use the associated ICD-9-CM code, otherwise we
will assign a default code “0”. The impact of the choice of the MatchingRate threshold on the
predictions is inspected in Section 5.2.
   2
       https://www.salute.gov.it/portale/documentazione/p6_2_2_1.jsp?lingua=italiano&id=2251
   3
       We used snowball stemmer from nlkt package https://www.nltk.org/_modules/nltk/stem/snowball.html


                                                      53
Roberto Aringhieri et al. CEUR Workshop Proceedings                                          48–60


4.3. Predicting the Home Hospitalization Outcome
The structured data, either extracted from the diagnosis textual fields or already stored in
structured fields, can then be provided as input to PPM algorithms that use these features
to learn a predictive model. At runtime, when the HHS team has to decide whether a new
patient should undergo the home hospitalization, given the features of the new patient, the
predictive model will predict whether it is likely that she/he will successfully undergo home
hospitalization (hh) or whether it is better to proceed with the hospitalization in another ward
(no-hh). PPM algorithms, e.g., the ones available in Nirdizati [7], a PPM tool that collects a rich
set of state-of-the-art approaches based on machine learning algorithms, can be used to train
a predictive model able to learn the correlations between variables that describe the patient
data and examinations he/she has carried out (features) and the hospitalization at home or in
another hospital ward.


5. Evaluation
In this section we evaluate the proposed approach. In detail, we first evaluate the mapping
of the textual fields to the ICD-9-CM disease codes (Section 5.1) and then the impact of the
mapping to ICD-9-CM codes at different levels of abstraction of the ICD-9-CM taxonomy, when
making predictions on the home hospitalization outcome (Section 5.2).

5.1. ICD-9-CM Mapping Evaluation
In this section we aim at evaluating: (i) the correctness of the ICD-9-CM assignments; (ii)
whether the MatchingRate is a good metrics to evaluate the quality of each ICD-9-CM assign-
ment.
   In order to evaluate the correctness of the assignments, we analyzed the ICD-9-CM assign-
ments given to 490 different textual diagnoses in the dataset. We then asked a domain expert to
classify each assignment according to three categories:

   ∙ Good assignment: the assigned ICD-9-CM code correctly represents the semantics of the
     textual diagnosis, e.g. “anemia” (anemia) is mapped to code 599.0 corresponding to “altre
     e non specificate anemie” (other and unspecified anemias)

   ∙ Fair assignment: the assigned ICD-9-CM code represents only partially the semantics
     of the textual diagnosis, possibly it represents a superclass , e.g. “leucemia e polmonite”
     (leukemia and pneumonia) is mapped to code 208.9: “leucemia non specificata” (unspeci-
     fied leukemia), so we miss the information about pneumonia

   ∙ Bad assignment: the assigned ICD-9-CM code represents a diagnosis that is uncorrelated
     to the textual one, e.g. “acufeni” (tinnitus) is mapped to code 706.1: “altre acni” (other
     acni)

  Based on the classification of the domain expert, we found that 40.6% of the assignments
are good, 32.3% are fair and 27.1% are bad assignments. This represents a reasonable result.


                                                54
Roberto Aringhieri et al. CEUR Workshop Proceedings                                                          48–60


Indeed, by discarding the bad assignments we are able to fairly map about 72.9% of the textual
diagnoses.
   In order to check whether the MatchingRate is a good metrics to evaluate the quality of the
assignments, so as to use this metrics to discriminate the assignments we can trust as features
for prediction tasks, we show in Figure 2) the distributions of the three categories of diagnoses
with respect to the matching rate. The plot shows that almost all the bad assignments have a
low MR value: only ten bad assignments have MR higher than 70.


Figure 2: Number of diagnoses for MR values. The number of diagnoses are plotted in log scale.


                           TMR       tot     good       fair      bad     undefined
                             0      100%     40.6%     32.3%     27.1%         0%
                             50     71.0%    38.1%     23.9%      9.0%        29.0%
                             70     44.9%    32.7%     10.2%      2.0%        55.1%

Table 1
Percentage of ICD-9-CM diagnosis assignments with MR value higher than TMR .

   The metrics looks reasonably good in separating bad assignments and hence, setting a
MatchingRate threshold (TMR ) value, it can be used to automatically exclude the bad assignments
that are not able to reach the threshold. Table 1 reports for different TMR values the percentage
of diagnosis assignments that are above the threshold for each quality category.4

5.2. Home Hospitalization Outcome Prediction Evaluation
In this section we report about the accuracy of the predictions related to the HHS scenario. The
accuracy of the predictions is evaluated using the Matthews correlation coefficient metric (MCC)

     4
       The percentages in Table 1 refer to the number of assignments per diagnosis. Note that these are in principle
different from the number of assignments per trace in which the diagnosis appears, since the same diagnosis may
appear in more than one trace.


                                                        55
Roberto Aringhieri et al. CEUR Workshop Proceedings                                           48–60


[10] that is defined as follows:
                                      𝑇𝑃 × 𝑇𝑁 − 𝐹𝑃 × 𝐹𝑁
                   MCC = √︀                                                                       (3)
                           (𝑇 𝑃 + 𝐹 𝑃 )(𝑇 𝑃 + 𝐹 𝑁 )(𝑇 𝑁 + 𝐹 𝑃 )(𝑇 𝑁 + 𝐹 𝑁 )
where TP, TN, FP, FN are respectively true positive, true negative, false positive and false negative
predictions. The MCC metrics ranges from −1 to 1, where a perfect prediction measures 1, a
random prediction measures 0 and a completely wrong prediction measures −1. In unbalanced
datasets, like ours, where the number of positive and negative traces is very different (368 vs. 45),
this metrics is more suitable than others like accuracy and F-measure for measuring the quality
of the predictions [11].
   In order to evaluate whether structured features, as the ICD-9-CM codes or its ancestors,
rather than unstructured ones, as textual diagnoses, can be leveraged to get more accurate
predictions, we analyzed and compared the results obtained with different sets of features:
    ∙ without the diagnosis (no_diag);
    ∙ with the textual diagnosis (text_diag);
    ∙ with the ICD-9-CM code assigned to the textual diagnosis or one of its ancestors (icd_diag
      (all)).
This last case is further refined in different sub-cases based on two parameters: (i) the TMR (see
Table 1), for which we consider two values, i.e., 50 and 70; and (ii) the level of abstraction of the
ICD-9-CM classification, that corresponds to the number of digits (none, one or two) that we
trim from the right side of the ICD-9-CM codes (see Section 2); the higher the number of digits
trimmed, the higher the abstraction level in the ICD-9-CM taxonomy.
   As predictive model we used a Random Forest classifier on the incomplete traces properly
preprocessed as described in Section 4.1. Moreover, we tested the predictions assuming we
have observed only the first five activities at the ED. For the feature encoding we used the
frequency-based encoding [12] enriched with trace attribute features. The classifier is trained
with the 70% of the traces; 10% of the traces is used to perform the hyper-parameter optimization
on the MCC metrics (3); and finally the classifier is tested on the remaining 20% of the traces.
Due to the non-deterministic trait of the prediction, each experiment is repeated 20 times, and
the average value of MCC together with its standard deviation 𝜎 are used as reference metrics.
   The results are reported in Table 2. The first and the second columns of Table 2 show the
diagnosis information used for the prediction and its description. The third and fourth columns
contain respectively the mean and the standard deviation 𝜎 of the MCC value computed in
several (20) tests, while the fifth column contains the maximum values of MCC obtained during
the (20) tests.

   The worst performance is obtained when the textual diagnosis is used as feature (text_diag),
while no_diag performs better than text_diag. This is possibly due to the high variability of
the textual information, resulting in noise for the predictive model. The best performing way
of taking into account the diagnosis is via the mapped ICD-9-CM codes. In order to further
validate this analysis, we also checked the statistical significance of the identified difference
and we found that such a difference is actually statistically significant:
                        no_diag > text_diag                     p-value = 0.006


                                                 56
Roberto Aringhieri et al. CEUR Workshop Proceedings                                                 48–60


           Diagnosis information   Description                    avg(MCC)     𝜎(MCC)   max(MCC)
           no_diag                 without diagnosis                0.50         0.08        0.65
           text_diag               textual diagnosis                 0.4          0.1         0.6
           icd9_diag_50-5          ICD-9-CM, TMR = 50, 5 digits     0.58         0.05        0.65
           icd9_diag_50-4          ICD-9-CM, TMR = 50, 4 digits     0.58         0.05        0.70
           icd9_diag_50-3          ICD-9-CM, TMR = 50, 3 digits     0.57         0.05        0.65
           icd9_diag_70-5          ICD-9-CM, TMR = 70, 5 digits     0.60         0.08        0.70
           icd9_diag_70-4          ICD-9-CM, TMR = 70, 4 digits     0.59         0.06        0.70
           icd9_diag_70-3          ICD-9-CM, TMR = 70, 3 digits     0.63         0.05        0.70


Table 2
Prediction accuracy results obtained with different diagnosis information used in the encoding.


                    icd_diag(𝑎𝑙𝑙) > text_diag                              p-value ≤ 0.001
                      icd_diag(𝑎𝑙𝑙) > no_diag                              p-value ≤ 0.002


   Moreover, by comparing the results related to the different choices done for the ICD-9-CM
codes, we observe that TMR = 70 performs better than TMR = 50. Also in this case, we checked
the hypothesis and we found that there is a statistical significance in the difference between
icd9_diag_70-3 and each ICD-9-CM code with TMR = 50 (p-value ≤ 0.02).
   If we instead focus on the number of digits of the ICD-9-CM code, i.e., on the level of
abstraction of the diagnoses, we can observe that the choice of the level of abstraction has also
an impact on the accuracy results. Also in this case, the difference is statistically significant, i.e.,
icd9_diag_70-3> icd9_diag_70-4 with p-value ≤ 0.05. This result suggests that grouping the
diagnoses in categories is a good approximation for prediction purposes.


6. Related Work
The literature related to this work mainly pertains to two research areas: Predictive Process
Monitoring (in particular with unstructured data) and the mapping of textual fields to the ICD.
   Predictive Process Monitoring approaches can be classified based on the types of prediction
they provide: (i) numeric predictions, (ii) outcome–based predictions, and (iii) next activity
predictions. In this work we focus on outcome–based predictions, that is related to the fulfilment
of a predicate on an ongoing trace, i.e., the outcome of the home hospitalization. Almost all
the approaches in this field, rely on implicit models such as machine learning and statistical
methods. Maggi et al. [3] report an approach that classifies the fulfilment of a predicate on
an ongoing trace by exploiting both control flow and data flow. This work has then been
extended in [13, 12, 14, 15]. Di Francescomarino et al. [13] extend the work adding clustering
techniques on top of the previous approach. This results in training more classifiers with a
smaller subsets of data. Leontjeva et al. [12] treat the execution traces as complex symbolic
sequences, while Verenich et al. [14] combine these two approaches. Teinemaa et al. [15]
exploit unstructured (textual) information contained in messages exchanged between process
instances during execution in order to improve the accuracy of the predictions. Recently in
[16] Pegoraro et al. apply natural process language techiniques and LSTM neural networks to


                                                      57
Roberto Aringhieri et al. CEUR Workshop Proceedings                                         48–60


integrate information from text documents written in natural language to the prediction model.
In this work we borrow the idea of the works in PPM to extract structured information from
textual data so as to improve the accuracy of outcome-based predictive models. However, to
this aim, we leverage a mapping of textual diagnosis to ICD-9-CM diseases.
   The mapping of free text to the ICD classification has been considered in several works. In
[17] Akshara et al. provide an automated ICD-9-CM diagnosis prediction integrating structured
patients’ data together with unstructured clinical text notes. In [18] Gangavarapu et al. present
a method for ICD-9-CM code group prediction from unstructured clinical nursing notes, using
vector space and topic modeling approaches; in [19] this approach is integrated with a fuzzy
similarity cleansing approach to merge anomalous and redundant data. In [20] machine learning
and natural language processing approaches are used in the automatic mapping of ICD-10 codes
from narrative text fields. In this work the performance of different classical machine learning
classifiers are compared in terms of accuracy, precision and recall. In [21] and [22] machine
learning techniques are used to map ICD-10 codes from textual death certificates. In [23]
recurrent neural networks are used to map ICD-10 codes from Dutch cardiology discharge
letters. Differently from all the above state-of-the-art approaches, we focus on Italian textual
data and we defined an approach that is able to cope with the available NLP resources.


7. Conclusions
With the purpose of improving prediction accuracy by using structured rather than unstructured
information in PPM, we have proposed an approach for mapping textual fields to an existing
dictionary, as in the case of textual fields mapped to ICD-9-CM codes. We have applied the
proposed approach to a real-life healthcare scenario related to the HHS, and we have evaluated (i)
the quality of the mappings; and (ii) the accuracy of the predictions without using the diagnosis
information, using the textual diagnosis information, or using the structured information
contained in ICD-9-CM codes. The results are overall reasonable and confirm that having
structured rather than unstructured features and a medium level of abstraction, improves the
accuracy of the predictions.
   We plan, as future work, to further refine the pipeline devised for mapping textual fields to
the ICD-9-CM codes, e.g., by taking into account the fact that some textual descriptions are
richer than a single ICD-9-CM code and can hence be mapped to more than one code.


Acknowledgments
This research has been partially carried out within the “Circular Health for Industry” project,
funded by “Compagnia di San Paolo” under the call “Intelligenza Artificiale, uomo e società”.


References
 [1] W. M. P. van der Aalst, Process Mining - Data Science in Action, Second Edition, Springer,
     2016.


                                                58
Roberto Aringhieri et al. CEUR Workshop Proceedings                                         48–60


 [2] van der Aalst W. M. P. et al., Process mining manifesto, in: F. Daniel, K. Barkaoui,
     S. Dustdar (Eds.), BPM Workshops, Clermont-Ferrand, France, August 29, 2011, Revised
     Selected Papers, Part I, volume 99 of LNBI, Springer, 2011, pp. 169–194.
 [3] F. M. Maggi, C. Di Francescomarino, M. Dumas, C. Ghidini, Predictive monitoring of
     business processes, in: Advanced Information Systems Engineering - 26th International
     Conference, CAiSE 2014, 2014, volume 8484 of LNCS, Springer, 2014, pp. 457–472.
 [4] I. A. Amantea, E. Sulis, G. Boella, R. Marinello, D. Bianca, E. Brunetti, M. Bo, C. Fernandez-
     Llatas, A process mining application for the analysis of hospital-at-home admissions.,
     Stud Health Technol Inform 270 (2020) 522–526.
 [5] E. Sulis, P. Terna, A. Di Leva, G. Boella, A. Boccuzzi, Agent-oriented decision support
     system for business processes management with genetic algorithm optimization: an
     application in healthcare, J. Med. Syst. 44 (2020) 1–7.
 [6] C. Di Francescomarino, C. Ghidini, F. M. Maggi, F. Milani, Predictive Process Monitoring
     Methods: Which One Suits Me Best?, in: M. Weske, M. Montali, I. Weber, J. v. Brocke (Eds.),
     Business Process Management - 16th International Conference, BPM 2018, Sydney, NSW,
     Australia, September 9-14, 2018, Proceedings, volume 11080 of Lecture Notes in Computer
     Science, Springer, 2018, pp. 462–479.
 [7] W. Rizzi, L. Simonetto, C. Di Francescomarino, C. Ghidini, T. Kasekamp, F. M. Maggi,
     Nirdizati 2.0: New Features and Redesigned Backend, in: Demonstration Track at BPM
     2019, September 1-6, 2019, volume 2420 of CEUR Workshop Proceedings, CEUR-WS.org,
     2019, pp. 154–158.
 [8] E. Sulis, I. A. Amantea, G. Boella, R. Marinello, D. Bianca, E. Brunetti, M. Bo, A. Bianco,
     F. Cattel, C. Cena, et al., Monitoring patients with fragilities in the context of de-
     hospitalization services: An ambient assisted living healthcare framework for e-health
     applications, in: 23rd ISCT, IEEE, 2019, pp. 216–219.
 [9] G. Isaia, P. Bertone, G. C. Isaia, N. Ricauda, Home care for patients with chronic obstructive
     pulmonary disease, Arch Phys Med Rehabil 100 (2010) 664–5.
[10] B. Matthews, Comparison of the predicted and observed secondary structure of t4 phage
     lysozyme, Biochimica et Biophysica Acta (BBA) - Protein Structure 405 (1975) 442–451.
[11] D. Chicco, G. Jurman, The advantages of the Matthews correlation coefficient (MCC) over
     F1 score and accuracy in binary classification evaluation, BMC Genomics 21 (2020) 6.
[12] A. Leontjeva, R. Conforti, C. Di Francescomarino, M. Dumas, F. M. Maggi, Complex
     symbolic sequence encodings for predictive monitoring of business processes, in: Business
     Process Management - 13th International Conference, BPM 2015, Innsbruck, Austria,
     Proceedings, volume 9253 of Lecture Notes in Computer Science, Springer, 2015, pp. 297–
     313.
[13] C. Di Francescomarino, M. Dumas, F. M. Maggi, I. Teinemaa, Clustering-based predictive
     process monitoring, IEEE Trans. Serv. Comput. 12 (2019) 896–909.
[14] I. Verenich, M. Dumas, M. La Rosa, F. M. Maggi, C. Di Francescomarino, Complex symbolic
     sequence clustering and multiple classifiers for predictive process monitoring, in: Business
     Process Management Workshops - BPM 2015, 13th International Workshops, Innsbruck,
     Austria, volume 256 of Lecture Notes in Business Information Processing, Springer, 2015, pp.
     218–229.
[15] I. Teinemaa, M. Dumas, F. M. Maggi, C. Di Francescomarino, Predictive business process


                                                59
Roberto Aringhieri et al. CEUR Workshop Proceedings                                       48–60


     monitoring with structured and unstructured data, in: Business Process Management -
     14th International Conference, BPM 2016, Rio de Janeiro, Brazil. Proceedings, volume 9850
     of Lecture Notes in Computer Science, Springer, 2016, pp. 401–417.
[16] M. Pegoraro, M. S. Uysal, D. Georgi, W. Aalst, Text-aware predictive monitoring of business
     processes (2021).
[17] P. Akshara, S. Shidharth, K. Gokul S., K. Sowmya, Integrating structured and unstructured
     patient data for icd9 disease code group prediction, in: 8th ACM IKDD CODS and 26th
     COMAD, Association for Computing Machinery, 2021, p. 436.
[18] T. Gangavarapu, G. S. Krishnan, S. Kamath S, J. Jeganathan, Farsight: Long-term disease
     prediction using unstructured clinical nursing notes, IEEE Transactions on Emerging
     Topics in Computing 9 (2021) 1151–1169.
[19] T. Gangavarapu, A. Jayasimha, G. Krishnan, S. Kamath S., Predicting icd-9 code groups
     with fuzzy similarity based supervised multi-label classification of unstructured clinical
     nursing notes, Knowledge-Based Systems 190 (2020) 105321.
[20] R. Nkolele, Mapping of narrative text fields to ICD-10 codes using natural language
     processing and machine learning, in: Proceedings of the The Fourth Widening Natural
     Language Processing Workshop, Association for Computational Linguistics, Seattle, USA,
     2020, pp. 131–135.
[21] B. Koopman, G. Zuccon, A. Nguyen, A. Bergheim, N. Grayson, Automatic icd-10 clas-
     sification of cancers from free-text death certificates, International journal of medical
     informatics 84 (2015).
[22] F. Duarte, B. Martins, C. Pinto, M. Silva, A deep learning method for icd-10 coding of
     free-text death certificates, 2017, pp. 137–149.
[23] A. Bagheri, A. Sammani, P. G. Heijden, F. Asselbergs, D. Oberski, Automatic icd-10
     classification of diseases from dutch discharge letters, 2020, pp. 281–289.


                                                60

</pre>