1. Introduction

Hertogenbosch, The Netherlands * Corresponding author. $ f.a.bukhsh@utwente.nl (F. Bukhsh); p.v.naguine@student.utwente.nl (P. Naguine); j.a.jayasinghearachchige@utwente.nl (J. Jayasinghe)

Subpopulation process comparison with the help of ontological foundation: A discussion

Faiza Bukhsh

Priya Naguine

Jeewanie Jayasinghe

0 0 University of Twente , Enschede , The Netherlands

2024

000 0 0001

Process modelling and mining frameworks have demonstrated their efectiveness across diverse domains, including healthcare. However, existing frameworks often lack explicit guidance on learning from best practices. For instance, the case of Frozen shoulder (FS), a condition with multiple treatment options and varying outcomes. Understanding how care paths difer among patient groups and determining the most efective approach remains a challenge. By identifying this gap, our research employs the Process Mining Project Methodology in Healthcare ( 2) alongside the MIMIC-IV dataset to uncover distinctions in care paths among diferent age groups and genders. This experimental validation seeks to identify optimal strategies for addressing Frozen shoulders through ontological concepts. The study concludes by presenting a set of open challenges, aiming to guide future research in healthcare by integrating ontological concepts to learn from the best and optimal care paths. It is important to note that while this research doesn't ofer a singular solution, it contributes significantly by opening a new dimension of ontological research. Specifically, it delves into how various care paths can be compared and aligned with the help of ontological foundation.

eol>subpopulation comparison ontology frozen shoulder process mining

1. Introduction

Medical professionals often question whether there is a diference in the treatment procedures followed by subgroups of patients diagnosed with the same disease [ 1 ]. In this case, a subgroup refers to a group of patients with a common characteristic, e.g., all female patients diagnosed with frozen shoulder. When comparing subpopulations, experts’ knowledge is essential. However, Ontology-based interpretation is a valuable technique for capturing better insight into a complex domain like healthcare.

In the context of information science and knowledge representation, an ontology is a formal and explicit specification of a shared conceptualization [ 2 ]. It provides a structured framework for representing knowledge in a particular domain by defining the entities, their properties, and the relationships between them. Ontologies aim to capture a common understanding of a domain and facilitate communication and interoperability among diferent systems and applications [ 3 ].

This research aims to identify carepath for diferent subpopulations and learn from best practices. Since age and gender play a role in the development of a disease such as FS [ 4 ], these can be chosen as the subgroups. There has not been much research on comparing these subpopulations. Ontological concepts are rich in nature and can provide a methodological way to compare subpopulations [ 5 ].

Therefore, the objective of this research is to find the diferences and similarities between the care paths for diferent subpopulations and learn from best practices with the guidance of ontological foundations.

As an example scenario, we used the MIMIC-IV database and analysed the procedures followed by FS patients; this database contains data on approximately 300,000 patients that were admitted to a tertiary academic medical centre in Boston, the USA, between 2008 and 2019 [ 6, 7 ]. Specifically, process mining takes data from hospital information systems (HIS) when applied in the healthcare domain [ 8 ]. Event logs are then created using the data from the HIS to show the sequence of processes followed by patients. The event logs created can then be used to find the diferences in the care paths followed by subgroups of patients with FS. Further,the diferences and similarities of the care paths were analyzed with BPMNDif Viz tool 1 to show comparison examples.

This paper is structured as follows. The state of the art will be described in section 2. Section 3 will describe the methodology used with the title of "An example Scenario: Frozen Shoulder Exploration with the Application of Process Mining (PM)". Section 4 will discuss the subpopulation comparison through ontological foundations. Finally, in the last section 5, the paper concludes with opening insightful research directions to the reader.

2. State of the art

Process mining techniques can be used for various purposes in the healthcare domain, e.g., with BPMN diagrams to get the graph edit distances. [ 5 ] used process mining techniques specifically for the process comparison of subgroups. There was a focus on the application of process mining for subpopulation process comparison between patients diagnosed with diferent types of cancer.

The tool BPMNDif Viz can be used to find graph similarity measures. It takes as input two BPMN diagrams and gives the minimal graph edit distance (GED) as a result. The GED can be defined as the minimum number of steps required to transform one graph into another [ 9 ]. This tool makes use of Business Process Model and Notation (BPMN) 2.0, which is one of the frequently used notations used for process modelling [ 10 ]. [ 11 ] states that although the use of BPMN diagrams in medicine is a recent development, it can be used to model clinical pathways to teach and train medical staf.

Visual comparison can be used to diferentiate between the care paths followed by subgroups of patients and the tool BPMNDif Viz can be used for that. BPMNDif Viz allows for a choice between six comparison algorithms: Greedy, TabuSearch, Genetic, AStar, Ants and simulated annealing. [ 9 ] compares the algorithms mentioned except Genetic and concludes that the Greedy algorithm gives the best performance results while the TabuSearch algorithm gives

1https://pais.hse.ru/en/research/projects/CompBPMN/

more precise and accurate results. The Genetic algorithm only gives an approximation of the GED [ 12 ].

Subpopulation comparison based on visual aspects or graph edit distances provides us with an initial view. However, the robustness of these comparisons should be strengths beyond the statistical figures for making decisions in real situations, especially in complex domains like healthcare.

Ontological foundation is one of the potential approaches that can be used to ensure the accuracy of domain structures. Ontological concepts are hierarchical domain structures that provide a domain theory, have a syntactically and semantically rich language, and a shared and consensual terminology [ 13 ].

The work of [ 14 ] explores ontology learning, a dynamic research field crucial for efective ontology engineering. It distinguishes ontology-based definitions from conventional labelcentric ones, emphasizing the interconnected nature of objects. This shift allows for advanced functionalities such as scenario search, ontology fusion, and recommendation through nuanced relation labelling. Moreover work of [ 13 ] discusses the potential of ontology-based process modelling (OBPM) to enhance business process management theoretically.

3. An example scenario: Frozen shoulder exploration with the application of process mining (PM)

In the following section, we will methodically elaborate on a specific scenario to illustrate the design and comparison of subpopulations. Throughout this example, we will highlight the potential role that could be played by ontological concepts in shaping and evaluating these subpopulations. The methodology to be used in this research is called Process Mining Project Methodology in Healthcare ( 2) [ 15 ]. 2 involves 6 phases: planning, extraction, data processing, mining and analysis, evaluation, and improvement and support.

3.1. Planning 3.2. Extraction

During this phase, we chose specific subgroups to explore various care paths and organized the sequence of events. Additionally, we conducted thorough background research on frozen shoulder and process mining in healthcare, as detailed in the [ 16 ].

In order to get access to and query the MIMIC-IV database, Google Cloud Platform BigQuery2 was used. Since the MIMIC-IV database stores the diagnoses given to the patients at the end of their ICU stay using the International Classification of Diseases (ICD) Version 9 and 10 codes, the first step was to find the ICD codes associated with frozen shoulder. This was found in the D_ICD_DIAGNOSES table [ 17 ] using the keywords frozen shoulder and adhesive capsulitis for the long_title. The ICD codes are 7260, M750, M7500, M7501, and M7502 and their corresponding diagnoses are "Adhesive capsulitis of shoulder", "Adhesive capsulitis of

2https://cloud.google.com/bigquery

shoulder", "Adhesive capsulitis of unspecified shoulder", "Adhesive capsulitis of right shoulder" and "Adhesive capsulitis of the left shoulder." It is important to note that there is a possibility that a patient is given more than one diagnosis associated with the frozen shoulder in a single hospitalization, e.g., a patient can be diagnosed with both M7501 and M7502.

To apply process mining algorithms to the data, the cases, events, start times and end times have to be defined. For both the subgroup process comparison and bottleneck analysis, a case is a patient’s admission to the hospital and the events are the procedures that the patients were billed for.

Since the start and end times were not stored for the subgroup process comparison, the sequence number was used instead to indicate the order in which the procedures were carried out.

3.3. Data processing

In this phase, the CSV files on the subgroups were entered into ProM, converted into XES files and visualised using the LogVisualiser (LogDialog) plugin. Table 1 gives an overview of the number of cases and events per subgroup, given by the LogDialog. Also, further filtering was required to find the diferences in care paths between the diferent patient groups. This was done using the Filter Log on Event Attribute Values plugin, where specific procedures were ifltered out from the care paths.

3.4. Mining and analysis

This phase involved finding the diferences in the care paths between the diferent subgroups and the bottlenecks in the medications taken and the procedures followed by patients during their ICU stays. To do this, process models were created in ProM3 and Disco4.

The Inductive Miner plugin was chosen because it gives the best fitness, i.e., the degree by which the process models generated can recreate the cases in the event log [ 18 ]. At first, the plugin Mine with Inductive visual Miner was used because it can create animations showing the order in which the processes occur; it was used with the activities slider set to 1 and the

3https://promtools.org/

4https://fluxicon.com/disco/ paths slider set to 0.8. These settings were chosen so that the Petri net and the Inductive visual Miner models are equivalent. Secondly, Mine Petri net with Inductive Miner was used to create static process models that can be used for visual comparison, with a noise threshold of 0.2 to allow for slight deviations. Lastly, in order to convert the Petri net models into BPMN diagrams so that they can be loaded into BPMNDif Viz to get the GED, Convert Petri net to BPMN diagram was used.

The process models created in ProM and Disco for the subgroup process comparison and bottleneck analysis can be found in the author’s GitHub repository5.

When comparing the care paths of the subgroups, three keywords will be used. Firstly, parallel will be used when two procedures occur but the order in which they occur does not matter. Secondly, sequence is used when one procedure follows another. Lastly, exclusive will be used when only one of two procedures can occur.

Also, visual comparison is performed in BPMNDif Viz using the TabuSearch algorithm with maximum expansions and tabu list size set to 100 as this gives precise results faster than other algorithms [ 9 ]. The activities in the BPMN diagrams are encoded with diferent colours: blue denotes elements that match between the subgroups, green denotes elements that should be added to transform one diagram into the other and red denotes elements that should be deleted to transform one diagram into the other.

Visual comparison was made in BPMNDif Viz for the care paths followed by male and female FS patients, resulting in a final score of 167 using the TabuSearch algorithm. 37% of the elements matched between the care paths, 33% of the elements were deleted and 30% of the elements were added. Table 2 shows the procedures that are only performed on either female or male FS patients, but not both.

The procedure "Other repair of shoulder" can be done in parallel with "Division of joint capsule, ligament, or cartilage, shoulder" in male patients while in female patients, these procedures are performed in sequence. Furthermore, it is performed in sequence with "Rotator

5https://github.com/PriyaNaguine/Complete-Process-Models-Frozen-Shoulder

localhost:8080/ru_pais_vkr_war/comparison/fourth_step 1/2 localhost:8080/ru_pcais_uvkr_wfar/rcomeparispon/fouarth_sitepr" in male FS patients. However, in fe1/2male patients, these processes are exclusive. This can be seen in figures 1a and 1b.

As can be seen in figures 1a and 1b, the procedure "Synovectomy, shoulder" is always the last process in male FS patients, in case it is performed. In female patients, it is exclusive to "Rotator cuf repair", while in male patients, they can occur in sequence, where "Rotator cuf repair" is the first procedure and "Synovectomy, shoulder" is the last procedure to take place.

3.5. Evaluation and improvements

In this phase, the insights obtained in the previous phase were used to suggest improvements and learn from care path by considering the best practices. In this phase, the stakeholders, e.g., medical professionals, decide on the path to be followed to implement the improvements.

This phase was conducted with an expert physiotherapist at Fysiotherapie Polman in Enschede, The Netherlands, in order to discuss and evaluate the results of this research. Thereby, based on the discussion of the results found with the physiotherapist, which is based on his 8 years of experience working as a physiotherapist for FS, more insight was gained on patients of FS. In particular, there is a ratio of approximately 7:3 between female and male FS patients. This could be because female patients ask for help earlier on. Based on his experience, there is no diference in the care paths followed by male and female FS patients. Also, it was mentioned that the age group between 40 and 60 years old is more prone to developing FS and this applies to both genders. Furthermore, older people, i.e., those aged above 60, are more likely to experience FS after shoulder trauma. In this age group, they are less likely to get surgery as it is an invasive procedure. In general, depending on the health conditions of the patient, the older they are, the more they are at risk of developing complications.

4. Subpopulatoin comparison through ontological foundation

Section 3 of our research shows how diferent care paths of subpopulations derived using process mining and how they compare using BPMNDif Viz tool. Subpopulation comparison based on visual aspects or graph edit distances provides us with an initial view. There is no doubt, that we can argue the robustness of these comparisons in terms of statistical figures. However, the question is, whether these statistical figures are suficient for making decisions in real situations, especially in complex domains like healthcare. Ontological foundation is one of the good approaches that can be used to ensure not only the structural correctness but also the accuracy of domain knowledge in the derived models.

In our research case study, we analyzed the treatment procedures and care paths for FS within two distinct subpopulations. The central focus of this study revolves around understanding the variations in care paths among diferent patient groups and determining the most efective approach. Naturally, the expertise of domain professionals serves as the primary and most iftting source of knowledge for these investigations. Secondly, the ontological foundation can be employed to determine the best care path. Surprisingly, research work is scarce on using ontologies for comparing (assessing similarities or diferences) diferent care paths based on subpopulations.

To address this, our case study explores a research direction on establishing a method for comparing subpopulations within a given knowledge domain, along with defining appropriate evaluation criteria. These criteria encompass the ontological richness and the reliability of methodologies in conceptualization, shareability in terms of sources and granularity, explicitness and formality through implementation tools and formalization language, and adherence to design criteria within the methodological process of building ontologies.

In essence, our proposal leverages the significance of the ontology definition as a foundation for comparison features, ensuring a comprehensive evaluation that goes beyond traditional similarity metrics.

5. Food for thought

The use of ontology in subpopulation comparison involves various dimensions. Ontologies prove beneficial in comprehending and conceptualization. Below are key discussion points highlighting the ways in which ontology can be utilized for comparing subpopulations. • Conceptual Clarity: Ontologies help to define and clarify the concepts related to subpopulations. By establishing a common understanding of terms, attributes, and relationships, ontology ensures clarity in the representation of diverse sub-groups. • Semantic Interoperability: Ontological representations facilitate semantic interoperability, allowing for the integration of diverse data sources and the comparison of subpopulations across diferent datasets. This is crucial for ensuring consistency and accuracy in comparisons. • Granular Attribute Definition: Ontologies allow for the granular definition of attributes associated with subpopulations. This includes demographic information, medical conditions, or any relevant factors. This granularity enhances the precision of comparisons. • Relationship Modeling: Ontologies capture relationships between entities, enabling the modelling of complex interactions within subpopulations. This is particularly valuable when comparing the influence of diferent factors on health outcomes or other relevant criteria. • Automated Inference: Ontologies support automated reasoning and inference, allowing for the deduction of additional information based on the defined relationships. This capability aids in uncovering hidden patterns or correlations within subpopulations. • Consistent Terminology: Ontologies promote the use of consistent and standardized terminology, reducing ambiguity in the description of subpopulations. Consistency in terminology is crucial for accurate and meaningful comparisons. • Facilitating Data Integration: Ontologies provide a common framework for integrating data from diverse sources, making it easier to compare subpopulations across diferent studies or datasets. This promotes a more comprehensive understanding of variations and similarities. • Enabling Query and Retrieval: Ontologies enhance the eficiency of querying and retrieving relevant information about subpopulations. Researchers can formulate queries using ontological terms, streamlining the comparison process.

While we acknowledge that this list may not be exhaustive, it represents our initial efort to address the multifaceted nature of this complex research. In essence, ontology serves as a powerful tool in subpopulation comparison by ofering a structured, standardized, and semantically rich representation of entities and their relationships. This approach contributes to more meaningful, accurate, and eficient comparisons across diverse subsets of a population.

Acknowledgments

We would like to thank the physiotherapist at Fysiotherapie Polman in Enschede for sharing important insights on frozen shoulder.

[1]

R. S.

Mans , W. M. P. van der Aalst ,

R. J. B.

Vanwersch ,

A. J.

Moleman , Process mining in healthcare: Data challenges when answering frequently posed questions , in: R. Lenz , S.

Miksch , M.

Peleg , M.

Reichert , D.

Riaño , A. ten Teije (Eds.), Process Support and Knowledge Representation in Health Care , Springer Berlin Heidelberg, Berlin, Heidelberg, 2013 , pp. 140 - 153 . doi: 10 .1007/978-3- 642 -36438-9_ 10 .

[2]

Guarino ,

Oberle ,

Staab , What is an ontology? , Handbook on ontologies ( 2009 ) 1 - 17 .

[3]

Gruber , What is an ontology , 1993 .

[4]

Koorevaar , E. Riet,

Ipskamp ,

Bulstra , Incidence and prognostic factors for postoperative frozen shoulder after shoulder surgery: a prospective cohort study , Archives of Orthopaedic and Trauma Surgery 137 ( 2017 ). doi:10.1007/s00402-016-2589-3.

[5]

Marazza ,

F. A.

Bukhsh ,

Geerdink ,

Vijlbrief ,

Pathak , M. v. Keulen,

Seifert , Automatic process comparison for subpopulations: Application in cancer care , International Journal of Environmental Research and Public Health 17 (????). doi:10 .3390/ ijerph17165707.

[6]

Goldberger ,

Amaral ,

Glass ,

Hausdorf ,

Ivanov ,

Mark ,

Mietus , G. Moody, C. Peng, H. Stanley, PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals , Circulation [Online] 101 ( 2000 ) "e215- e220" .

[7]

Johnson , L. Bulgarelli,

Pollard ,

Horng ,

L. A.

Celi ,

Mark , MIMIC-IV (version 1 .0), 2021 . doi: 10 .13026/s6n6- xd98 .

[8]

Munoz-Gama ,

Martin ,

Fernandez-Llatas ,

O. A.

Johnson ,

Sepúlveda ,

Helm ,

Galvez-Yanjari ,

Rojas ,

Martinez-Millana ,

Aloini ,

I. A.

Amantea ,

Andrews ,

Arias ,

Beerepoot ,

Benevento ,

Burattin ,

Capurro ,

Carmona ,

Comuzzi ,

Dalmas , R. de la Fuente,

Di Francescomarino ,

Di Ciccio ,

Gatta ,

Ghidini ,

Gonzalez-Lopez ,

Ibanez-Sanchez ,

H. B.

Klasky ,

A. Prima

Kurniati ,

Lu ,

Mannhardt ,

Mans ,

Marcos , R. Medeiros de Carvalho,

Pegoraro ,

S. K.

Poon ,

Pufahl ,

H. A.

Reijers ,

Remy ,

Rinderle-Ma ,

Sacchi ,

Seoane ,

Song ,

Stefanini ,

Sulis , A. H. ter Hofstede ,

P. J.

Toussaint ,

Traver ,

Valero-Ramon , I. van de Weerd, W. M. van der Aalst, R. Vanwersch,

Weske ,

M. T.

Wynn ,

Zerbato , Process mining for healthcare: Characteristics and challenges , Journal of Biomedical Informatics 127 ( 2022 ). doi:https://doi.org/10.1016/j.jbi. 2022 . 103994 .

[9]

Skobtsov ,

Kalenkova , Eficient algorithms for finding diferences between process models , 2019 , pp. 60 - 66 . doi: 10 .1109/ISPRAS47671. 2019 . 00015 .

[10]

Ivanov ,

Kalenkova , W. Aalst, Bpmndifviz: A tool for bpmn models comparison 1418 ( 2015 ) 35 - 39 .

[11]

Scheuerlein ,

Rauchfuss ,

Dittmar ,

Molle ,

Lehmann ,

Pienkos , U. Settmacher, New methods for clinical pathways-business process modeling notation (bpmn) and tangible business process modeling (t .bpm), Langenbeck's archives of surgery / Deutsche Gesellschaft für Chirurgie 397 ( 2012 ) 755 - 761 . doi: 10 .1007/s00423-012-0914-z.

[12]

Riesen ,

Fischer ,

Bunke , Improving approximate graph edit distance using genetic algorithms , 2014 , pp. 63 - 72 . doi: 10 .1007/978-3- 662 -44415- 3 _ 7 .

[13]

Corea ,

Fellmann ,

Delfmann , Ontology-based process modelling-will we live to see it? , in: Conceptual Modeling: 40th International Conference, ER 2021,

Virtual

Event , October 18-21 , 2021 , Proceedings 40, Springer, 2021 , pp. 36 - 46 .

[14]

Somodevilla García ,

D. Vilariño

Ayala , I. Pineda, An overview of ontology learning tasks , Computación y Sistemas 22 ( 2018 ) 137 - 146 .

[15]

Pereira , E. Santos,

Maceno , Process mining project methodology in healthcare: a case study in a tertiary hospital , Network Modeling Analysis in Health Informatics and Bioinformatics 9 ( 2020 ). doi: 10 .1007/s13721-020-00227-w.

[16]

Naguine , Subpopulation Process Comparison and Bottleneck Analysis: A Case Study of Frozen Shoulder, B.S. thesis , University of Twente, 2022 .

[17] d_icd_diagnoses, https://mimic.mit.edu/docs/iv/modules/hosp/d_icd_diagnoses/, 2020 . [Online; accessed 17. May. 2022 ].

[18]

Bogarín ,

Cerezo ,

Romero , Discovering learning processes using inductive miner: A case study with learning management systems (lmss ), Psicothema 30 ( 2018 ) 322 - 329 . doi: 10 .7334/psicothema2018. 116 .