Subpopulation process comparison with the help of ontological foundation: A discussion Faiza Bukhsh* , Priya Naguine and Jeewanie Jayasinghe University of Twente, Enschede, The Netherlands Abstract Process modelling and mining frameworks have demonstrated their effectiveness across diverse domains, including healthcare. However, existing frameworks often lack explicit guidance on learning from best practices. For instance, the case of Frozen shoulder (FS), a condition with multiple treatment options and varying outcomes. Understanding how care paths differ among patient groups and determining the most effective approach remains a challenge. By identifying this gap, our research employs the Process Mining Project Methodology in Healthcare (𝑃 𝑀 2 𝐻𝐶) alongside the MIMIC-IV dataset to uncover distinctions in care paths among different age groups and genders. This experimental validation seeks to identify optimal strategies for addressing Frozen shoulders through ontological concepts. The study concludes by presenting a set of open challenges, aiming to guide future research in healthcare by integrating ontological concepts to learn from the best and optimal care paths. It is important to note that while this research doesn’t offer a singular solution, it contributes significantly by opening a new dimension of ontological research. Specifically, it delves into how various care paths can be compared and aligned with the help of ontological foundation. Keywords subpopulation comparison, ontology, frozen shoulder, process mining 1. Introduction Medical professionals often question whether there is a difference in the treatment procedures followed by subgroups of patients diagnosed with the same disease [1]. In this case, a subgroup refers to a group of patients with a common characteristic, e.g., all female patients diagnosed with frozen shoulder. When comparing subpopulations, experts’ knowledge is essential. However, Ontology-based interpretation is a valuable technique for capturing better insight into a complex domain like healthcare. In the context of information science and knowledge representation, an ontology is a formal and explicit specification of a shared conceptualization [2]. It provides a structured framework for representing knowledge in a particular domain by defining the entities, their properties, and the relationships between them. Ontologies aim to capture a common understanding of a domain and facilitate communication and interoperability among different systems and applications [3]. Proceedings of the 17th International Workshop on Value Modelling and Business Ontologies (VMBO 2024), February 26–27, 2024, ’s Hertogenbosch, The Netherlands * Corresponding author. $ f.a.bukhsh@utwente.nl (F. Bukhsh); p.v.naguine@student.utwente.nl (P. Naguine); j.a.jayasinghearachchige@utwente.nl (J. Jayasinghe)  0000-0001-5978-2754 (F. Bukhsh); 0000-0002-1225-2040 (P. Naguine); 0000-0001-8619-6523 (J. Jayasinghe) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings This research aims to identify carepath for different subpopulations and learn from best practices. Since age and gender play a role in the development of a disease such as FS [4], these can be chosen as the subgroups. There has not been much research on comparing these subpopulations. Ontological concepts are rich in nature and can provide a methodological way to compare subpopulations [5]. Therefore, the objective of this research is to find the differences and similarities between the care paths for different subpopulations and learn from best practices with the guidance of ontological foundations. As an example scenario, we used the MIMIC-IV database and analysed the procedures followed by FS patients; this database contains data on approximately 300,000 patients that were admitted to a tertiary academic medical centre in Boston, the USA, between 2008 and 2019 [6, 7]. Specifically, process mining takes data from hospital information systems (HIS) when applied in the healthcare domain [8]. Event logs are then created using the data from the HIS to show the sequence of processes followed by patients. The event logs created can then be used to find the differences in the care paths followed by subgroups of patients with FS. Further,the differences and similarities of the care paths were analyzed with BPMNDiffViz tool1 to show comparison examples. This paper is structured as follows. The state of the art will be described in section 2. Section 3 will describe the methodology used with the title of "An example Scenario: Frozen Shoulder Exploration with the Application of Process Mining (PM)". Section 4 will discuss the subpopulation comparison through ontological foundations. Finally, in the last section 5, the paper concludes with opening insightful research directions to the reader. 2. State of the art Process mining techniques can be used for various purposes in the healthcare domain, e.g., with BPMN diagrams to get the graph edit distances. [5] used process mining techniques specifically for the process comparison of subgroups. There was a focus on the application of process mining for subpopulation process comparison between patients diagnosed with different types of cancer. The tool BPMNDiffViz can be used to find graph similarity measures. It takes as input two BPMN diagrams and gives the minimal graph edit distance (GED) as a result. The GED can be defined as the minimum number of steps required to transform one graph into another [9]. This tool makes use of Business Process Model and Notation (BPMN) 2.0, which is one of the frequently used notations used for process modelling [10]. [11] states that although the use of BPMN diagrams in medicine is a recent development, it can be used to model clinical pathways to teach and train medical staff. Visual comparison can be used to differentiate between the care paths followed by subgroups of patients and the tool BPMNDiffViz can be used for that. BPMNDiffViz allows for a choice between six comparison algorithms: Greedy, TabuSearch, Genetic, AStar, Ants and simulated annealing. [9] compares the algorithms mentioned except Genetic and concludes that the Greedy algorithm gives the best performance results while the TabuSearch algorithm gives 1 https://pais.hse.ru/en/research/projects/CompBPMN/ more precise and accurate results. The Genetic algorithm only gives an approximation of the GED [12]. Subpopulation comparison based on visual aspects or graph edit distances provides us with an initial view. However, the robustness of these comparisons should be strengths beyond the statistical figures for making decisions in real situations, especially in complex domains like healthcare. Ontological foundation is one of the potential approaches that can be used to ensure the accuracy of domain structures. Ontological concepts are hierarchical domain structures that provide a domain theory, have a syntactically and semantically rich language, and a shared and consensual terminology [13]. The work of [14] explores ontology learning, a dynamic research field crucial for effective ontology engineering. It distinguishes ontology-based definitions from conventional label- centric ones, emphasizing the interconnected nature of objects. This shift allows for advanced functionalities such as scenario search, ontology fusion, and recommendation through nuanced relation labelling. Moreover work of [13] discusses the potential of ontology-based process modelling (OBPM) to enhance business process management theoretically. 3. An example scenario: Frozen shoulder exploration with the application of process mining (PM) In the following section, we will methodically elaborate on a specific scenario to illustrate the design and comparison of subpopulations. Throughout this example, we will highlight the potential role that could be played by ontological concepts in shaping and evaluating these subpopulations. The methodology to be used in this research is called Process Mining Project Methodology in Healthcare (𝑃 𝑀 2 𝐻𝐶) [15]. 𝑃 𝑀 2 𝐻𝐶 involves 6 phases: planning, extraction, data processing, mining and analysis, evaluation, and improvement and support. 3.1. Planning During this phase, we chose specific subgroups to explore various care paths and organized the sequence of events. Additionally, we conducted thorough background research on frozen shoulder and process mining in healthcare, as detailed in the [16]. 3.2. Extraction In order to get access to and query the MIMIC-IV database, Google Cloud Platform BigQuery2 was used. Since the MIMIC-IV database stores the diagnoses given to the patients at the end of their ICU stay using the International Classification of Diseases (ICD) Version 9 and 10 codes, the first step was to find the ICD codes associated with frozen shoulder. This was found in the D_ICD_DIAGNOSES table [17] using the keywords frozen shoulder and adhesive capsulitis for the long_title. The ICD codes are 7260, M750, M7500, M7501, and M7502 and their corresponding diagnoses are "Adhesive capsulitis of shoulder", "Adhesive capsulitis of 2 https://cloud.google.com/bigquery shoulder", "Adhesive capsulitis of unspecified shoulder", "Adhesive capsulitis of right shoulder" and "Adhesive capsulitis of the left shoulder." It is important to note that there is a possibility that a patient is given more than one diagnosis associated with the frozen shoulder in a single hospitalization, e.g., a patient can be diagnosed with both M7501 and M7502. To apply process mining algorithms to the data, the cases, events, start times and end times have to be defined. For both the subgroup process comparison and bottleneck analysis, a case is a patient’s admission to the hospital and the events are the procedures that the patients were billed for. Since the start and end times were not stored for the subgroup process comparison, the sequence number was used instead to indicate the order in which the procedures were carried out. 3.3. Data processing In this phase, the CSV files on the subgroups were entered into ProM, converted into XES files and visualised using the LogVisualiser (LogDialog) plugin. Table 1 gives an overview of the number of cases and events per subgroup, given by the LogDialog. Also, further filtering was required to find the differences in care paths between the different patient groups. This was done using the Filter Log on Event Attribute Values plugin, where specific procedures were filtered out from the care paths. Table 1 Number of cases and events per subgroup Subgroup #Cases #Events * Female 29 61 Male * 34 55 Age below 40 ** 8 18 Age between 40 and 60 ** 39 73 Age above 60 ** 16 25 * Includes FS patients from all age groups ** Includes FS patients from both genders 3.4. Mining and analysis This phase involved finding the differences in the care paths between the different subgroups and the bottlenecks in the medications taken and the procedures followed by patients during their ICU stays. To do this, process models were created in ProM3 and Disco4 . The Inductive Miner plugin was chosen because it gives the best fitness, i.e., the degree by which the process models generated can recreate the cases in the event log [18]. At first, the plugin Mine with Inductive visual Miner was used because it can create animations showing the order in which the processes occur; it was used with the activities slider set to 1 and the 3 https://promtools.org/ 4 https://fluxicon.com/disco/ paths slider set to 0.8. These settings were chosen so that the Petri net and the Inductive visual Miner models are equivalent. Secondly, Mine Petri net with Inductive Miner was used to create static process models that can be used for visual comparison, with a noise threshold of 0.2 to allow for slight deviations. Lastly, in order to convert the Petri net models into BPMN diagrams so that they can be loaded into BPMNDiffViz to get the GED, Convert Petri net to BPMN diagram was used. The process models created in ProM and Disco for the subgroup process comparison and bottleneck analysis can be found in the author’s GitHub repository5 . When comparing the care paths of the subgroups, three keywords will be used. Firstly, parallel will be used when two procedures occur but the order in which they occur does not matter. Secondly, sequence is used when one procedure follows another. Lastly, exclusive will be used when only one of two procedures can occur. Also, visual comparison is performed in BPMNDiffViz using the TabuSearch algorithm with maximum expansions and tabu list size set to 100 as this gives precise results faster than other algorithms [9]. The activities in the BPMN diagrams are encoded with different colours: blue denotes elements that match between the subgroups, green denotes elements that should be added to transform one diagram into the other and red denotes elements that should be deleted to transform one diagram into the other. Visual comparison was made in BPMNDiffViz for the care paths followed by male and female FS patients, resulting in a final score of 167 using the TabuSearch algorithm. 37% of the elements matched between the care paths, 33% of the elements were deleted and 30% of the elements were added. Table 2 shows the procedures that are only performed on either female or male FS patients, but not both. Table 2 Procedures performed on either male or female FS patients Procedure Female Male Drainage of Right Shoulder Joint, Percutaneous Approach, Di- ! agnostic Excision of Left Shoulder Bursa and Ligament, Percutaneous ! Endoscopic Approach Excision of Right Shoulder Joint, Percutaneous Endoscopic Ap- ! proach Other total shoulder replacement ! Release Right Shoulder Joint, External Approach ! Repair of recurrent dislocation of shoulder ! Repair Right Shoulder Joint, Percutaneous Endoscopic Approach ! Repair Right Shoulder Tendon, Open Approach ! The procedure "Other repair of shoulder" can be done in parallel with "Division of joint capsule, ligament, or cartilage, shoulder" in male patients while in female patients, these procedures are performed in sequence. Furthermore, it is performed in sequence with "Rotator 5 https://github.com/PriyaNaguine/Complete-Process-Models-Frozen-Shoulder 02/07/2022, 00:41 BPMN Comparator by PAIS Lab 02/07/2022, 16:30 BPMN Comparator by PAIS Lab Final step: Comparison results Final step: Comparison results Results Statistics Settings Results Statistics Settings Model: male_BPMN Model: male_BPMN Exclusive Model: female_BPMN Right Shoulder gateway Joint, START EVENT Exclusive External END EVENT Division of gateway Approach Rotator cuff joint Injection of repair capsule, steroid ligament, or cartilage, Synovectom Other shoulder arthrotomy, y, shoulder Exclusive Other repair Arthroscopy, shoulder gateway Parallel gateway Other local of shoulder Synovectom shoulder excision or Division of Exclusive joint Exclusive destruction y, shoulder gateway Exclusive Parallel gateway of lesion of capsule, Parallel gateway gateway ligament, or Exclusive gateway Exclusive joint, cartilage, shoulder shoulder gateway gateway Exclusive Exclusive Parallel gateway gateway gateway Other repair Exclusive Arthroscopy, Other of shoulder gateway shoulder Exclusive arthrotomy, Rotator cuff gateway repair shoulder (a) Snapshot of the BPMN dia- EN (b) Snapshot of the BPMN diagram for male FS pa- gram for female FS patients tients (http://bpmn. localhost:8080/ru_pais_vkr_war/comparison/fourth_step 1/2 cuff repair" in male FS patients. However, in female patients, these processes are exclusive. This localhost:8080/ru_pais_vkr_war/comparison/fourth_step 1/2 can be seen in figures 1a and 1b. As can be seen in figures 1a and 1b, the procedure "Synovectomy, shoulder" is always the last process in male FS patients, in case it is performed. In female patients, it is exclusive to "Rotator cuff repair", while in male patients, they can occur in sequence, where "Rotator cuff repair" is the first procedure and "Synovectomy, shoulder" is the last procedure to take place. 3.5. Evaluation and improvements In this phase, the insights obtained in the previous phase were used to suggest improvements and learn from care path by considering the best practices. In this phase, the stakeholders, e.g., medical professionals, decide on the path to be followed to implement the improvements. This phase was conducted with an expert physiotherapist at Fysiotherapie Polman in En- schede, The Netherlands, in order to discuss and evaluate the results of this research. Thereby, based on the discussion of the results found with the physiotherapist, which is based on his 8 years of experience working as a physiotherapist for FS, more insight was gained on patients of FS. In particular, there is a ratio of approximately 7:3 between female and male FS patients. This could be because female patients ask for help earlier on. Based on his experience, there is no difference in the care paths followed by male and female FS patients. Also, it was mentioned that the age group between 40 and 60 years old is more prone to developing FS and this applies to both genders. Furthermore, older people, i.e., those aged above 60, are more likely to experience FS after shoulder trauma. In this age group, they are less likely to get surgery as it is an invasive procedure. In general, depending on the health conditions of the patient, the older they are, the more they are at risk of developing complications. 4. Subpopulatoin comparison through ontological foundation Section 3 of our research shows how different care paths of subpopulations derived using process mining and how they compare using BPMNDiffViz tool. Subpopulation comparison based on visual aspects or graph edit distances provides us with an initial view. There is no doubt, that we can argue the robustness of these comparisons in terms of statistical figures. However, the question is, whether these statistical figures are sufficient for making decisions in real situations, especially in complex domains like healthcare. Ontological foundation is one of the good approaches that can be used to ensure not only the structural correctness but also the accuracy of domain knowledge in the derived models. In our research case study, we analyzed the treatment procedures and care paths for FS within two distinct subpopulations. The central focus of this study revolves around understanding the variations in care paths among different patient groups and determining the most effective approach. Naturally, the expertise of domain professionals serves as the primary and most fitting source of knowledge for these investigations. Secondly, the ontological foundation can be employed to determine the best care path. Surprisingly, research work is scarce on using ontologies for comparing (assessing similarities or differences) different care paths based on subpopulations. To address this, our case study explores a research direction on establishing a method for comparing subpopulations within a given knowledge domain, along with defining appropriate evaluation criteria. These criteria encompass the ontological richness and the reliability of methodologies in conceptualization, shareability in terms of sources and granularity, explicitness and formality through implementation tools and formalization language, and adherence to design criteria within the methodological process of building ontologies. In essence, our proposal leverages the significance of the ontology definition as a foundation for comparison features, ensuring a comprehensive evaluation that goes beyond traditional similarity metrics. 5. Food for thought The use of ontology in subpopulation comparison involves various dimensions. Ontologies prove beneficial in comprehending and conceptualization. Below are key discussion points highlighting the ways in which ontology can be utilized for comparing subpopulations. • Conceptual Clarity: Ontologies help to define and clarify the concepts related to subpop- ulations. By establishing a common understanding of terms, attributes, and relationships, ontology ensures clarity in the representation of diverse sub-groups. • Semantic Interoperability: Ontological representations facilitate semantic interoperability, allowing for the integration of diverse data sources and the comparison of subpopula- tions across different datasets. This is crucial for ensuring consistency and accuracy in comparisons. • Granular Attribute Definition: Ontologies allow for the granular definition of attributes as- sociated with subpopulations. This includes demographic information, medical conditions, or any relevant factors. This granularity enhances the precision of comparisons. • Relationship Modeling: Ontologies capture relationships between entities, enabling the modelling of complex interactions within subpopulations. This is particularly valuable when comparing the influence of different factors on health outcomes or other relevant criteria. • Automated Inference: Ontologies support automated reasoning and inference, allowing for the deduction of additional information based on the defined relationships. This capability aids in uncovering hidden patterns or correlations within subpopulations. • Consistent Terminology: Ontologies promote the use of consistent and standardized terminology, reducing ambiguity in the description of subpopulations. Consistency in terminology is crucial for accurate and meaningful comparisons. • Facilitating Data Integration: Ontologies provide a common framework for integrating data from diverse sources, making it easier to compare subpopulations across different studies or datasets. This promotes a more comprehensive understanding of variations and similarities. • Enabling Query and Retrieval: Ontologies enhance the efficiency of querying and retriev- ing relevant information about subpopulations. Researchers can formulate queries using ontological terms, streamlining the comparison process. While we acknowledge that this list may not be exhaustive, it represents our initial effort to address the multifaceted nature of this complex research. In essence, ontology serves as a powerful tool in subpopulation comparison by offering a structured, standardized, and semanti- cally rich representation of entities and their relationships. This approach contributes to more meaningful, accurate, and efficient comparisons across diverse subsets of a population. Acknowledgments We would like to thank the physiotherapist at Fysiotherapie Polman in Enschede for sharing important insights on frozen shoulder. References [1] R. S. Mans, W. M. P. van der Aalst, R. J. B. Vanwersch, A. J. Moleman, Process mining in healthcare: Data challenges when answering frequently posed questions, in: R. Lenz, S. Miksch, M. Peleg, M. Reichert, D. Riaño, A. ten Teije (Eds.), Process Support and Knowl- edge Representation in Health Care, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 140–153. doi:10.1007/978-3-642-36438-9_10. [2] N. Guarino, D. Oberle, S. Staab, What is an ontology?, Handbook on ontologies (2009) 1–17. [3] T. Gruber, What is an ontology, 1993. [4] R. Koorevaar, E. Riet, M. Ipskamp, S. Bulstra, Incidence and prognostic factors for postop- erative frozen shoulder after shoulder surgery: a prospective cohort study, Archives of Orthopaedic and Trauma Surgery 137 (2017). doi:10.1007/s00402-016-2589-3. [5] F. Marazza, F. A. Bukhsh, J. Geerdink, O. Vijlbrief, S. Pathak, M. v. Keulen, C. Seifert, Automatic process comparison for subpopulations: Application in cancer care, Inter- national Journal of Environmental Research and Public Health 17 (????). doi:10.3390/ ijerph17165707. [6] A. Goldberger, L. Amaral, L. Glass, J. Hausdorff, P. Ivanov, R. Mark, J. Mietus, G. Moody, C. Peng, H. Stanley, PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals, Circulation [Online] 101 (2000) "e215– e220". [7] A. Johnson, L. Bulgarelli, T. Pollard, S. Horng, L. A. Celi, R. Mark, MIMIC-IV (version 1.0), 2021. doi:10.13026/s6n6-xd98. [8] J. Munoz-Gama, N. Martin, C. Fernandez-Llatas, O. A. Johnson, M. Sepúlveda, E. Helm, V. Galvez-Yanjari, E. Rojas, A. Martinez-Millana, D. Aloini, I. A. Amantea, R. Andrews, M. Arias, I. Beerepoot, E. Benevento, A. Burattin, D. Capurro, J. Carmona, M. Comuzzi, B. Dalmas, R. de la Fuente, C. Di Francescomarino, C. Di Ciccio, R. Gatta, C. Ghidini, F. Gonzalez-Lopez, G. Ibanez-Sanchez, H. B. Klasky, A. Prima Kurniati, X. Lu, F. Mannhardt, R. Mans, M. Marcos, R. Medeiros de Carvalho, M. Pegoraro, S. K. Poon, L. Pufahl, H. A. Reijers, S. Remy, S. Rinderle-Ma, L. Sacchi, F. Seoane, M. Song, A. Stefanini, E. Sulis, A. H. ter Hofstede, P. J. Toussaint, V. Traver, Z. Valero-Ramon, I. van de Weerd, W. M. van der Aalst, R. Vanwersch, M. Weske, M. T. Wynn, F. Zerbato, Process mining for healthcare: Characteristics and challenges, Journal of Biomedical Informatics 127 (2022). doi:https://doi.org/10.1016/j.jbi.2022.103994. [9] A. Skobtsov, A. Kalenkova, Efficient algorithms for finding differences between process models, 2019, pp. 60–66. doi:10.1109/ISPRAS47671.2019.00015. [10] S. Ivanov, A. Kalenkova, W. Aalst, Bpmndiffviz: A tool for bpmn models comparison 1418 (2015) 35–39. [11] H. Scheuerlein, F. Rauchfuss, Y. Dittmar, R. Molle, T. Lehmann, N. Pienkos, U. Settmacher, New methods for clinical pathways-business process modeling notation (bpmn) and tan- gible business process modeling (t.bpm), Langenbeck’s archives of surgery / Deutsche Gesellschaft für Chirurgie 397 (2012) 755–761. doi:10.1007/s00423-012-0914-z. [12] K. Riesen, A. Fischer, H. Bunke, Improving approximate graph edit distance using genetic algorithms, 2014, pp. 63–72. doi:10.1007/978-3-662-44415-3_7. [13] C. Corea, M. Fellmann, P. Delfmann, Ontology-based process modelling-will we live to see it?, in: Conceptual Modeling: 40th International Conference, ER 2021, Virtual Event, October 18–21, 2021, Proceedings 40, Springer, 2021, pp. 36–46. [14] M. Somodevilla García, D. Vilariño Ayala, I. Pineda, An overview of ontology learning tasks, Computación y Sistemas 22 (2018) 137–146. [15] G. Pereira, E. Santos, M. Maceno, Process mining project methodology in healthcare: a case study in a tertiary hospital, Network Modeling Analysis in Health Informatics and Bioinformatics 9 (2020). doi:10.1007/s13721-020-00227-w. [16] P. Naguine, Subpopulation Process Comparison and Bottleneck Analysis: A Case Study of Frozen Shoulder, B.S. thesis, University of Twente, 2022. [17] d_icd_diagnoses, https://mimic.mit.edu/docs/iv/modules/hosp/d_icd_diagnoses/, 2020. [Online; accessed 17. May. 2022]. [18] A. Bogarín, R. Cerezo, C. Romero, Discovering learning processes using inductive miner: A case study with learning management systems (lmss), Psicothema 30 (2018) 322–329. doi:10.7334/psicothema2018.116.