=Paper=
{{Paper
|id=Vol-3821/paper3
|storemode=property
|title=Subpopulation process comparison with the help of ontological foundation: A
discussion
|pdfUrl=https://ceur-ws.org/Vol-3821/paper3.pdf
|volume=Vol-3821
|authors=Faiza Bukhsh,Priya Naguine,Jeewanie Jayasinghe
|dblpUrl=https://dblp.org/rec/conf/vmbo/BukhshNA24
}}
==Subpopulation process comparison with the help of ontological foundation: A
discussion
==
Subpopulation process comparison with the help of
ontological foundation: A discussion
Faiza Bukhsh* , Priya Naguine and Jeewanie Jayasinghe
University of Twente, Enschede, The Netherlands
Abstract
Process modelling and mining frameworks have demonstrated their effectiveness across diverse domains,
including healthcare. However, existing frameworks often lack explicit guidance on learning from best
practices. For instance, the case of Frozen shoulder (FS), a condition with multiple treatment options and
varying outcomes. Understanding how care paths differ among patient groups and determining the most
effective approach remains a challenge. By identifying this gap, our research employs the Process Mining
Project Methodology in Healthcare (𝑃 𝑀 2 𝐻𝐶) alongside the MIMIC-IV dataset to uncover distinctions
in care paths among different age groups and genders. This experimental validation seeks to identify
optimal strategies for addressing Frozen shoulders through ontological concepts. The study concludes
by presenting a set of open challenges, aiming to guide future research in healthcare by integrating
ontological concepts to learn from the best and optimal care paths. It is important to note that while this
research doesn’t offer a singular solution, it contributes significantly by opening a new dimension of
ontological research. Specifically, it delves into how various care paths can be compared and aligned
with the help of ontological foundation.
Keywords
subpopulation comparison, ontology, frozen shoulder, process mining
1. Introduction
Medical professionals often question whether there is a difference in the treatment procedures
followed by subgroups of patients diagnosed with the same disease [1]. In this case, a subgroup
refers to a group of patients with a common characteristic, e.g., all female patients diagnosed with
frozen shoulder. When comparing subpopulations, experts’ knowledge is essential. However,
Ontology-based interpretation is a valuable technique for capturing better insight into a complex
domain like healthcare.
In the context of information science and knowledge representation, an ontology is a formal
and explicit specification of a shared conceptualization [2]. It provides a structured framework
for representing knowledge in a particular domain by defining the entities, their properties,
and the relationships between them. Ontologies aim to capture a common understanding
of a domain and facilitate communication and interoperability among different systems and
applications [3].
Proceedings of the 17th International Workshop on Value Modelling and Business Ontologies (VMBO 2024), February
26–27, 2024, ’s Hertogenbosch, The Netherlands
*
Corresponding author.
$ f.a.bukhsh@utwente.nl (F. Bukhsh); p.v.naguine@student.utwente.nl (P. Naguine);
j.a.jayasinghearachchige@utwente.nl (J. Jayasinghe)
0000-0001-5978-2754 (F. Bukhsh); 0000-0002-1225-2040 (P. Naguine); 0000-0001-8619-6523 (J. Jayasinghe)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
This research aims to identify carepath for different subpopulations and learn from best
practices. Since age and gender play a role in the development of a disease such as FS [4],
these can be chosen as the subgroups. There has not been much research on comparing these
subpopulations. Ontological concepts are rich in nature and can provide a methodological way
to compare subpopulations [5].
Therefore, the objective of this research is to find the differences and similarities between
the care paths for different subpopulations and learn from best practices with the guidance of
ontological foundations.
As an example scenario, we used the MIMIC-IV database and analysed the procedures
followed by FS patients; this database contains data on approximately 300,000 patients that
were admitted to a tertiary academic medical centre in Boston, the USA, between 2008 and 2019
[6, 7]. Specifically, process mining takes data from hospital information systems (HIS) when
applied in the healthcare domain [8]. Event logs are then created using the data from the HIS to
show the sequence of processes followed by patients. The event logs created can then be used
to find the differences in the care paths followed by subgroups of patients with FS. Further,the
differences and similarities of the care paths were analyzed with BPMNDiffViz tool1 to show
comparison examples.
This paper is structured as follows. The state of the art will be described in section 2.
Section 3 will describe the methodology used with the title of "An example Scenario: Frozen
Shoulder Exploration with the Application of Process Mining (PM)". Section 4 will discuss the
subpopulation comparison through ontological foundations. Finally, in the last section 5, the
paper concludes with opening insightful research directions to the reader.
2. State of the art
Process mining techniques can be used for various purposes in the healthcare domain, e.g., with
BPMN diagrams to get the graph edit distances. [5] used process mining techniques specifically
for the process comparison of subgroups. There was a focus on the application of process
mining for subpopulation process comparison between patients diagnosed with different types
of cancer.
The tool BPMNDiffViz can be used to find graph similarity measures. It takes as input two
BPMN diagrams and gives the minimal graph edit distance (GED) as a result. The GED can
be defined as the minimum number of steps required to transform one graph into another [9].
This tool makes use of Business Process Model and Notation (BPMN) 2.0, which is one of the
frequently used notations used for process modelling [10]. [11] states that although the use of
BPMN diagrams in medicine is a recent development, it can be used to model clinical pathways
to teach and train medical staff.
Visual comparison can be used to differentiate between the care paths followed by subgroups
of patients and the tool BPMNDiffViz can be used for that. BPMNDiffViz allows for a choice
between six comparison algorithms: Greedy, TabuSearch, Genetic, AStar, Ants and simulated
annealing. [9] compares the algorithms mentioned except Genetic and concludes that the
Greedy algorithm gives the best performance results while the TabuSearch algorithm gives
1
https://pais.hse.ru/en/research/projects/CompBPMN/
more precise and accurate results. The Genetic algorithm only gives an approximation of the
GED [12].
Subpopulation comparison based on visual aspects or graph edit distances provides us with
an initial view. However, the robustness of these comparisons should be strengths beyond the
statistical figures for making decisions in real situations, especially in complex domains like
healthcare.
Ontological foundation is one of the potential approaches that can be used to ensure the
accuracy of domain structures. Ontological concepts are hierarchical domain structures that
provide a domain theory, have a syntactically and semantically rich language, and a shared and
consensual terminology [13].
The work of [14] explores ontology learning, a dynamic research field crucial for effective
ontology engineering. It distinguishes ontology-based definitions from conventional label-
centric ones, emphasizing the interconnected nature of objects. This shift allows for advanced
functionalities such as scenario search, ontology fusion, and recommendation through nuanced
relation labelling. Moreover work of [13] discusses the potential of ontology-based process
modelling (OBPM) to enhance business process management theoretically.
3. An example scenario: Frozen shoulder exploration with the
application of process mining (PM)
In the following section, we will methodically elaborate on a specific scenario to illustrate the
design and comparison of subpopulations. Throughout this example, we will highlight the
potential role that could be played by ontological concepts in shaping and evaluating these
subpopulations. The methodology to be used in this research is called Process Mining Project
Methodology in Healthcare (𝑃 𝑀 2 𝐻𝐶) [15]. 𝑃 𝑀 2 𝐻𝐶 involves 6 phases: planning, extraction,
data processing, mining and analysis, evaluation, and improvement and support.
3.1. Planning
During this phase, we chose specific subgroups to explore various care paths and organized
the sequence of events. Additionally, we conducted thorough background research on frozen
shoulder and process mining in healthcare, as detailed in the [16].
3.2. Extraction
In order to get access to and query the MIMIC-IV database, Google Cloud Platform BigQuery2
was used. Since the MIMIC-IV database stores the diagnoses given to the patients at the end
of their ICU stay using the International Classification of Diseases (ICD) Version 9 and 10
codes, the first step was to find the ICD codes associated with frozen shoulder. This was
found in the D_ICD_DIAGNOSES table [17] using the keywords frozen shoulder and adhesive
capsulitis for the long_title. The ICD codes are 7260, M750, M7500, M7501, and M7502 and
their corresponding diagnoses are "Adhesive capsulitis of shoulder", "Adhesive capsulitis of
2
https://cloud.google.com/bigquery
shoulder", "Adhesive capsulitis of unspecified shoulder", "Adhesive capsulitis of right shoulder"
and "Adhesive capsulitis of the left shoulder." It is important to note that there is a possibility
that a patient is given more than one diagnosis associated with the frozen shoulder in a single
hospitalization, e.g., a patient can be diagnosed with both M7501 and M7502.
To apply process mining algorithms to the data, the cases, events, start times and end times
have to be defined. For both the subgroup process comparison and bottleneck analysis, a case is
a patient’s admission to the hospital and the events are the procedures that the patients were
billed for.
Since the start and end times were not stored for the subgroup process comparison, the
sequence number was used instead to indicate the order in which the procedures were carried
out.
3.3. Data processing
In this phase, the CSV files on the subgroups were entered into ProM, converted into XES files
and visualised using the LogVisualiser (LogDialog) plugin. Table 1 gives an overview of the
number of cases and events per subgroup, given by the LogDialog. Also, further filtering was
required to find the differences in care paths between the different patient groups. This was
done using the Filter Log on Event Attribute Values plugin, where specific procedures were
filtered out from the care paths.
Table 1
Number of cases and events per subgroup
Subgroup #Cases #Events
*
Female 29 61
Male * 34 55
Age below 40 ** 8 18
Age between 40 and 60 ** 39 73
Age above 60 ** 16 25
*
Includes FS patients from all age groups
**
Includes FS patients from both genders
3.4. Mining and analysis
This phase involved finding the differences in the care paths between the different subgroups
and the bottlenecks in the medications taken and the procedures followed by patients during
their ICU stays. To do this, process models were created in ProM3 and Disco4 .
The Inductive Miner plugin was chosen because it gives the best fitness, i.e., the degree by
which the process models generated can recreate the cases in the event log [18]. At first, the
plugin Mine with Inductive visual Miner was used because it can create animations showing
the order in which the processes occur; it was used with the activities slider set to 1 and the
3
https://promtools.org/
4
https://fluxicon.com/disco/
paths slider set to 0.8. These settings were chosen so that the Petri net and the Inductive visual
Miner models are equivalent. Secondly, Mine Petri net with Inductive Miner was used to
create static process models that can be used for visual comparison, with a noise threshold of
0.2 to allow for slight deviations. Lastly, in order to convert the Petri net models into BPMN
diagrams so that they can be loaded into BPMNDiffViz to get the GED, Convert Petri net to
BPMN diagram was used.
The process models created in ProM and Disco for the subgroup process comparison and
bottleneck analysis can be found in the author’s GitHub repository5 .
When comparing the care paths of the subgroups, three keywords will be used. Firstly,
parallel will be used when two procedures occur but the order in which they occur does not
matter. Secondly, sequence is used when one procedure follows another. Lastly, exclusive will
be used when only one of two procedures can occur.
Also, visual comparison is performed in BPMNDiffViz using the TabuSearch algorithm with
maximum expansions and tabu list size set to 100 as this gives precise results faster than other
algorithms [9]. The activities in the BPMN diagrams are encoded with different colours: blue
denotes elements that match between the subgroups, green denotes elements that should be
added to transform one diagram into the other and red denotes elements that should be deleted
to transform one diagram into the other.
Visual comparison was made in BPMNDiffViz for the care paths followed by male and female
FS patients, resulting in a final score of 167 using the TabuSearch algorithm. 37% of the elements
matched between the care paths, 33% of the elements were deleted and 30% of the elements
were added. Table 2 shows the procedures that are only performed on either female or male FS
patients, but not both.
Table 2
Procedures performed on either male or female FS patients
Procedure Female Male
Drainage of Right Shoulder Joint, Percutaneous Approach, Di-
!
agnostic
Excision of Left Shoulder Bursa and Ligament, Percutaneous
!
Endoscopic Approach
Excision of Right Shoulder Joint, Percutaneous Endoscopic Ap-
!
proach
Other total shoulder replacement !
Release Right Shoulder Joint, External Approach !
Repair of recurrent dislocation of shoulder !
Repair Right Shoulder Joint, Percutaneous Endoscopic Approach !
Repair Right Shoulder Tendon, Open Approach !
The procedure "Other repair of shoulder" can be done in parallel with "Division of joint
capsule, ligament, or cartilage, shoulder" in male patients while in female patients, these
procedures are performed in sequence. Furthermore, it is performed in sequence with "Rotator
5
https://github.com/PriyaNaguine/Complete-Process-Models-Frozen-Shoulder
02/07/2022, 00:41 BPMN Comparator by PAIS Lab
02/07/2022, 16:30 BPMN Comparator by PAIS Lab
Final step: Comparison results Final step: Comparison results
Results Statistics Settings
Results Statistics Settings
Model: male_BPMN Model: male_BPMN
Exclusive
Model: female_BPMN Right
Shoulder
gateway
Joint,
START EVENT Exclusive External END EVENT
Division of gateway Approach
Rotator cuff joint Injection of
repair capsule, steroid
ligament, or
cartilage, Synovectom
Other
shoulder arthrotomy, y, shoulder
Exclusive Other repair Arthroscopy,
shoulder
gateway Parallel gateway Other local
of shoulder Synovectom shoulder excision or
Division of
Exclusive joint Exclusive
destruction
y, shoulder gateway Exclusive Parallel gateway of lesion of
capsule,
Parallel gateway
gateway
ligament, or
Exclusive gateway Exclusive joint, cartilage,
shoulder shoulder
gateway gateway Exclusive Exclusive
Parallel gateway gateway
gateway Other repair Exclusive Arthroscopy,
Other of shoulder gateway shoulder
Exclusive
arthrotomy, Rotator cuff
gateway
repair
shoulder
(a) Snapshot of the BPMN dia- EN
(b) Snapshot of the BPMN diagram for male FS pa-
gram for female FS patients tients
(http://bpmn.
localhost:8080/ru_pais_vkr_war/comparison/fourth_step 1/2
cuff repair" in male FS patients. However, in female patients, these processes are exclusive. This
localhost:8080/ru_pais_vkr_war/comparison/fourth_step 1/2
can be seen in figures 1a and 1b.
As can be seen in figures 1a and 1b, the procedure "Synovectomy, shoulder" is always the
last process in male FS patients, in case it is performed. In female patients, it is exclusive to
"Rotator cuff repair", while in male patients, they can occur in sequence, where "Rotator cuff
repair" is the first procedure and "Synovectomy, shoulder" is the last procedure to take place.
3.5. Evaluation and improvements
In this phase, the insights obtained in the previous phase were used to suggest improvements
and learn from care path by considering the best practices. In this phase, the stakeholders, e.g.,
medical professionals, decide on the path to be followed to implement the improvements.
This phase was conducted with an expert physiotherapist at Fysiotherapie Polman in En-
schede, The Netherlands, in order to discuss and evaluate the results of this research. Thereby,
based on the discussion of the results found with the physiotherapist, which is based on his 8
years of experience working as a physiotherapist for FS, more insight was gained on patients of
FS. In particular, there is a ratio of approximately 7:3 between female and male FS patients. This
could be because female patients ask for help earlier on. Based on his experience, there is no
difference in the care paths followed by male and female FS patients. Also, it was mentioned that
the age group between 40 and 60 years old is more prone to developing FS and this applies to
both genders. Furthermore, older people, i.e., those aged above 60, are more likely to experience
FS after shoulder trauma. In this age group, they are less likely to get surgery as it is an invasive
procedure. In general, depending on the health conditions of the patient, the older they are, the
more they are at risk of developing complications.
4. Subpopulatoin comparison through ontological foundation
Section 3 of our research shows how different care paths of subpopulations derived using
process mining and how they compare using BPMNDiffViz tool. Subpopulation comparison
based on visual aspects or graph edit distances provides us with an initial view. There is no
doubt, that we can argue the robustness of these comparisons in terms of statistical figures.
However, the question is, whether these statistical figures are sufficient for making decisions in
real situations, especially in complex domains like healthcare. Ontological foundation is one of
the good approaches that can be used to ensure not only the structural correctness but also the
accuracy of domain knowledge in the derived models.
In our research case study, we analyzed the treatment procedures and care paths for FS within
two distinct subpopulations. The central focus of this study revolves around understanding
the variations in care paths among different patient groups and determining the most effective
approach. Naturally, the expertise of domain professionals serves as the primary and most
fitting source of knowledge for these investigations. Secondly, the ontological foundation can
be employed to determine the best care path. Surprisingly, research work is scarce on using
ontologies for comparing (assessing similarities or differences) different care paths based on
subpopulations.
To address this, our case study explores a research direction on establishing a method for
comparing subpopulations within a given knowledge domain, along with defining appropriate
evaluation criteria. These criteria encompass the ontological richness and the reliability of
methodologies in conceptualization, shareability in terms of sources and granularity, explicitness
and formality through implementation tools and formalization language, and adherence to
design criteria within the methodological process of building ontologies.
In essence, our proposal leverages the significance of the ontology definition as a foundation
for comparison features, ensuring a comprehensive evaluation that goes beyond traditional
similarity metrics.
5. Food for thought
The use of ontology in subpopulation comparison involves various dimensions. Ontologies
prove beneficial in comprehending and conceptualization. Below are key discussion points
highlighting the ways in which ontology can be utilized for comparing subpopulations.
• Conceptual Clarity: Ontologies help to define and clarify the concepts related to subpop-
ulations. By establishing a common understanding of terms, attributes, and relationships,
ontology ensures clarity in the representation of diverse sub-groups.
• Semantic Interoperability: Ontological representations facilitate semantic interoperability,
allowing for the integration of diverse data sources and the comparison of subpopula-
tions across different datasets. This is crucial for ensuring consistency and accuracy in
comparisons.
• Granular Attribute Definition: Ontologies allow for the granular definition of attributes as-
sociated with subpopulations. This includes demographic information, medical conditions,
or any relevant factors. This granularity enhances the precision of comparisons.
• Relationship Modeling: Ontologies capture relationships between entities, enabling the
modelling of complex interactions within subpopulations. This is particularly valuable
when comparing the influence of different factors on health outcomes or other relevant
criteria.
• Automated Inference: Ontologies support automated reasoning and inference, allowing
for the deduction of additional information based on the defined relationships. This
capability aids in uncovering hidden patterns or correlations within subpopulations.
• Consistent Terminology: Ontologies promote the use of consistent and standardized
terminology, reducing ambiguity in the description of subpopulations. Consistency in
terminology is crucial for accurate and meaningful comparisons.
• Facilitating Data Integration: Ontologies provide a common framework for integrating
data from diverse sources, making it easier to compare subpopulations across different
studies or datasets. This promotes a more comprehensive understanding of variations
and similarities.
• Enabling Query and Retrieval: Ontologies enhance the efficiency of querying and retriev-
ing relevant information about subpopulations. Researchers can formulate queries using
ontological terms, streamlining the comparison process.
While we acknowledge that this list may not be exhaustive, it represents our initial effort to
address the multifaceted nature of this complex research. In essence, ontology serves as a
powerful tool in subpopulation comparison by offering a structured, standardized, and semanti-
cally rich representation of entities and their relationships. This approach contributes to more
meaningful, accurate, and efficient comparisons across diverse subsets of a population.
Acknowledgments
We would like to thank the physiotherapist at Fysiotherapie Polman in Enschede for sharing
important insights on frozen shoulder.
References
[1] R. S. Mans, W. M. P. van der Aalst, R. J. B. Vanwersch, A. J. Moleman, Process mining
in healthcare: Data challenges when answering frequently posed questions, in: R. Lenz,
S. Miksch, M. Peleg, M. Reichert, D. Riaño, A. ten Teije (Eds.), Process Support and Knowl-
edge Representation in Health Care, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013,
pp. 140–153. doi:10.1007/978-3-642-36438-9_10.
[2] N. Guarino, D. Oberle, S. Staab, What is an ontology?, Handbook on ontologies (2009)
1–17.
[3] T. Gruber, What is an ontology, 1993.
[4] R. Koorevaar, E. Riet, M. Ipskamp, S. Bulstra, Incidence and prognostic factors for postop-
erative frozen shoulder after shoulder surgery: a prospective cohort study, Archives of
Orthopaedic and Trauma Surgery 137 (2017). doi:10.1007/s00402-016-2589-3.
[5] F. Marazza, F. A. Bukhsh, J. Geerdink, O. Vijlbrief, S. Pathak, M. v. Keulen, C. Seifert,
Automatic process comparison for subpopulations: Application in cancer care, Inter-
national Journal of Environmental Research and Public Health 17 (????). doi:10.3390/
ijerph17165707.
[6] A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. Ivanov, R. Mark, J. Mietus, G. Moody,
C. Peng, H. Stanley, PhysioBank, PhysioToolkit, and PhysioNet: components of a new
research resource for complex physiologic signals, Circulation [Online] 101 (2000) "e215–
e220".
[7] A. Johnson, L. Bulgarelli, T. Pollard, S. Horng, L. A. Celi, R. Mark, MIMIC-IV (version 1.0),
2021. doi:10.13026/s6n6-xd98.
[8] J. Munoz-Gama, N. Martin, C. Fernandez-Llatas, O. A. Johnson, M. Sepúlveda, E. Helm,
V. Galvez-Yanjari, E. Rojas, A. Martinez-Millana, D. Aloini, I. A. Amantea, R. Andrews,
M. Arias, I. Beerepoot, E. Benevento, A. Burattin, D. Capurro, J. Carmona, M. Comuzzi,
B. Dalmas, R. de la Fuente, C. Di Francescomarino, C. Di Ciccio, R. Gatta, C. Ghidini,
F. Gonzalez-Lopez, G. Ibanez-Sanchez, H. B. Klasky, A. Prima Kurniati, X. Lu, F. Mannhardt,
R. Mans, M. Marcos, R. Medeiros de Carvalho, M. Pegoraro, S. K. Poon, L. Pufahl, H. A.
Reijers, S. Remy, S. Rinderle-Ma, L. Sacchi, F. Seoane, M. Song, A. Stefanini, E. Sulis,
A. H. ter Hofstede, P. J. Toussaint, V. Traver, Z. Valero-Ramon, I. van de Weerd, W. M.
van der Aalst, R. Vanwersch, M. Weske, M. T. Wynn, F. Zerbato, Process mining for
healthcare: Characteristics and challenges, Journal of Biomedical Informatics 127 (2022).
doi:https://doi.org/10.1016/j.jbi.2022.103994.
[9] A. Skobtsov, A. Kalenkova, Efficient algorithms for finding differences between process
models, 2019, pp. 60–66. doi:10.1109/ISPRAS47671.2019.00015.
[10] S. Ivanov, A. Kalenkova, W. Aalst, Bpmndiffviz: A tool for bpmn models comparison 1418
(2015) 35–39.
[11] H. Scheuerlein, F. Rauchfuss, Y. Dittmar, R. Molle, T. Lehmann, N. Pienkos, U. Settmacher,
New methods for clinical pathways-business process modeling notation (bpmn) and tan-
gible business process modeling (t.bpm), Langenbeck’s archives of surgery / Deutsche
Gesellschaft für Chirurgie 397 (2012) 755–761. doi:10.1007/s00423-012-0914-z.
[12] K. Riesen, A. Fischer, H. Bunke, Improving approximate graph edit distance using genetic
algorithms, 2014, pp. 63–72. doi:10.1007/978-3-662-44415-3_7.
[13] C. Corea, M. Fellmann, P. Delfmann, Ontology-based process modelling-will we live to
see it?, in: Conceptual Modeling: 40th International Conference, ER 2021, Virtual Event,
October 18–21, 2021, Proceedings 40, Springer, 2021, pp. 36–46.
[14] M. Somodevilla García, D. Vilariño Ayala, I. Pineda, An overview of ontology learning
tasks, Computación y Sistemas 22 (2018) 137–146.
[15] G. Pereira, E. Santos, M. Maceno, Process mining project methodology in healthcare: a
case study in a tertiary hospital, Network Modeling Analysis in Health Informatics and
Bioinformatics 9 (2020). doi:10.1007/s13721-020-00227-w.
[16] P. Naguine, Subpopulation Process Comparison and Bottleneck Analysis: A Case Study of
Frozen Shoulder, B.S. thesis, University of Twente, 2022.
[17] d_icd_diagnoses, https://mimic.mit.edu/docs/iv/modules/hosp/d_icd_diagnoses/, 2020.
[Online; accessed 17. May. 2022].
[18] A. Bogarín, R. Cerezo, C. Romero, Discovering learning processes using inductive miner:
A case study with learning management systems (lmss), Psicothema 30 (2018) 322–329.
doi:10.7334/psicothema2018.116.