Towards a Data-Centric Framework for Modelling, Managing and Mining BPM Processes over Pandemic Events⋆ Alfredo Cuzzocrea1,2,‡, Islam Belmerabet2, Carlo Combi3, Enrico Franconi4 and Paolo Terenziani5 1 iDEA Lab, University of Calabria, Rende, Italy 2 Department of Computer Science, University of Paris City, Paris, France 3 Department of Computer Science, University of Verona, Verona, Italy 4 Faculty of Engineering, Free University of Bozen-Bolzano, Bolzano, Italy 5 DISIT Department, University of Piemonte Orientale, Alessandria, Italy Abstract The COVID-19 pandemic exposed significant shortcomings in the ability of modern healthcare information systems to manage and mitigate pandemics, especially when faced with unforeseen events. This paper addresses these shortcomings by introducing PROTECTION, a cutting-edge data-centric framework aimed at enhancing pandemic control and prevention, specifically focusing on BPM processes. PROTECTION offers a comprehensive approach to handling pandemic-related complexities. We provide an overview of PROTECTION's structure and main functions. Keywords Process Modelling, Pandemic Events, Pandemic Control and Prevention.1 1. Introduction From this last long-term perspective, we propose PROTECTION, a framework for supporting data- Pandemics changed our daily life in many different centric process modeling, management, and mining aspects and, focusing on healthcare and clinical for pandemic prevention and control. PROTECTION aspects, highlighted the need to manage Knowledge-, focuses the attention on methodological issues in Decision- and Data-Intensive (KDDI) processes related modeling, managing and mining healthcare/clinical to the care of swab-positive patients and to the KDDI processes for the management of worldwide definition of public health policies with the double pandemics. goal of preventing and controlling the pandemic More into detail, our proposed framework’s long- spread. Besides the short-term support, for which term aims are towards providing: Information and Communication and Artificial Intelligence techniques can provide methodologies 1. clinical stakeholders with a set of and tools for collecting, analyzing, storing, sharing, methodologies/tools to manage KDDI and visualizing pandemic-related information, recent processes for the prevention and pandemic events like COVID-19 also push for long- management of worldwide pandemics; term research efforts devoted to study and proposal 2. healthcare decision-makers with of new approaches able to support healthcare and methodologies/tools for monitoring KDDI clinical organizations in planning and analyzing processes and resource consumption in their activities, specifically-focused to care, monitor, and organizations to control the care quality and prevent pandemic events. the social impact of such pandemic-related In such context, process modeling, management, processes; and mining play a leading role in effectively 3. software designers with a set of building supporting pandemic control policies at large, with a blocks and methodologies to support the special emphasis on the integration of these efficient development of KDDI process methodologies with the emerging big data trend, thus systems devoted to managing worldwide achieving the innovative definition of KDDI process pandemics. modeling, management, and mining for pandemic scenarios, like in recent COVID-19 related studies. Published in the Proceedings of the Workshops of the EDBT/ICDT 2025 ‡ This research has been made in the context of the Excellence Chair Joint Conference (March 25-28, 2025), Barcelona, Spain in Big Data Management and Analytics at University of Paris City, alfredo.cuzzocrea@unical.it (A. Cuzzocrea); Paris, France. islam.belmerabet@unical.it (I. Belmerabet); Copyright © 2025 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). carlo.combi@univr.it (C. Combi); franconi@inf.unibz.it (E. Franconi); paolo.terenziani@unipmn.it (P. Terenziani) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings As regards the proper conceptual/software 2. PROTECTION: The Overall View structure, PROTECTION is articulated into the following research assets/components: In this Section, we provide the reference architecture of PROTECTION, which has the final goal of capturing 1. clinical and healthcare KDDI process the many facets of pandemic prevention and control, modeling and management to represent as also demonstrated by the recent worldwide COVID- knowledge of the target application scenario, 19 epidemic. The proposed architecture is modular in plus its conceptual interconnections; nature, and it unveils the complex interaction of 2. clinical and healthcare KDDI process mining, process modeling, management methods, and data to both discover implicit processes (or mining approaches in the context of treating such process fragments) and to perform an “a- pandemics. posteriori” comparison between designed PROTECTION looks at pandemic events through a and actual processes; scientific lens, and it aims to uncover the fundamental 3. specific software architecture for: (i) processes that regulate their transmission patterns, modelling, managing, and evaluating the efficacy of various intervention strategies, and the healthcare and clinical KDDI processes for critical role of data-driven methods in shaping public preventing and managing pandemic events, health policy. Through rigorous analysis, this study and (ii) continuous KDDI process mining to not only elucidates the complexity inherent in monitor actual processes and obtain useful pandemic management but also emphasizes the feedback for improvement. importance of adaptable methods based on strong analytical frameworks, with respect to the specific Following this main vision, this paper introduces case studies of pandemic prevention and control. and discusses the framework PROTECTION. In more Despite this, the main research results can be detail, we describe the anatomy and main extended towards other different context such as functionalities of PROTECTION, discussing how the general bio-informatics, vaccine campaigns, cancer- proposed framework can effectively deal with the related population screenings, workplace health complex domain of pandemic control and prevention. promotion and well-being initiatives, etc. Figure 1: PROTECTION Architecture. Records, include valuable information such as patient Figure 1 shows the reference architecture of our demographics, clinical symptoms, laboratory findings, proposed framework PROTECTION. Our reference treatment procedures, and utilization trends. Using architecture consists of several component layers, such comprehensive facts, we can build complex namely (i) Big Data Sources layer; (ii) Big Data Storage models that reflect the pandemic spatio-temporal layer; (iii) Process Modeling layer; (iv) KDDI Processes development, evaluate the success of containment layer; (v) Big Data Analytics layer. In the following, we methods, and improve resource allocation in will discuss these layers in detail. healthcare settings. Furthermore, using healthcare data logs allows for the inclusion of real-time Big Data Sources Layer. Healthcare data logs, often information, permitting dynamic-modeling techniques derivable also from Electronic Healthcare/Medical that react to changing epidemiological patterns and healthcare demands. The rigorous examination of as distributed computing frameworks (e.g., Hadoop, these massive data sources provides useful insights Spark, Hive, etc.), scalable data processing engines, and for optimizing pandemic response efforts and Cloud-native analytics services. Indeed, machine improving public health preparedness measures. learning algorithms, deep learning models, and statistical techniques enable the extraction of Big Data Storage Layer. Cloud data lakes provide significant insights from a wide range of datasets, scalable and cost-effective storage for pandemic- including genomic sequences, clinical data, mobility management data sources such as epidemiological patterns, sentiment analysis from social media surveillance, genomic sequences, healthcare records, platforms, and epidemiological records. These and social media sentiment analysis. Using the analytics tools and methodologies enable us to flexibility and accessibility of Cloud infrastructures, uncover hidden patterns, correlations, and helpful we can seamlessly combine diverse information, insights, which are crucial for driving evidence-based allowing for holistic modeling techniques that reflect decision-making and establishing successful public the complex interaction of numerous factors health initiatives in response to pandemics. In influencing disease transmission and response tactics. particular, the strategy of PROTECTION consists of Implementing Cloud data lakes provides a sound exploiting recent multidimensional big data analytics background in empowering data-driven insights, methodologies, given their proven effectiveness in therefore helping to the refinement of our process several application scenarios, including healthcare modeling framework and to the optimization of analytics. Summarizing, these methodologies pandemic mitigation efforts. predicate the application of knowledge discovery techniques over multidimensionally-shaped big Process Modeling Layer. Such a layer allows datasets, to get the whole benefits from powerful stakeholders to suitably represent healthcare and multidimensional modelling paradigms. clinical processes that can be related either to the application of clinical guidelines or the specific care 3. Related Work pathways within a specific healthcare organization. Such processes need to be suitably considered and In this Section, we provide a comprehensive analysis modeled because both contain clinical tasks that need of research proposals that are related to our work. the support of data analysis tools and can be suitably Indeed, we can identify three relevant research areas inferred through process mining to explicitly describe that really influence our actions, namely: (i) pandemic such kind of organizational knowledge. data source modeling, (ii) clinical guidelines and care pathways representation and management KDDI Processes Layer. Incorporating knowledge- formalisms, and (iii) process modeling and mining. driven decision-making processes within data-intensive techniques applied on extensive Cloud data lakes is a 3.1. Pandemic Data Source Modelling very effective approach that can be adopted in this How do we model pandemic data sources? This layer of PROTECTION. By leveraging advanced challenging question can be investigated by carefully methodologies, such as machine learning algorithms, looking at the recent COVID-19 pandemic outbreak. statistical modeling, and natural language processing, Indeed, this critical event has attracted a lot of we can extract valuable insights from the diverse and research in many intertwined fields, from healthcare voluminous big datasets stored within these and medicine to bioinformatics, data science to repositories. By harnessing the capabilities of big data artificial intelligence, risk analysis to multi-parameter analytics tools and techniques coupled with the optimization, and so forth. Therefore, the issue of explicit representation of knowledge-driven decision- modelling and making publicly available COVID-19- making processes, we can also gain a comprehensive related data and information (e.g., [1,2,3]) has understanding of the pandemic dynamics. This observed a great effort from the worldwide scientific facilitates informed decision-making in public health community. Among these emerging data sources, policy formulation, resource allocation, and which contains directions for modelling pandemic intervention strategies aimed at mitigating the spread data with specific reference to COVID-19, we can of pandemics and minimizing their impact on society. identify the following ones. Big Data Analytics Layer. We can successfully First, the European Centre for Disease Prevention manage and analyze massive amounts of and Control, an agency of the European Union, heterogeneous big data stored in PROTECTION provides a huge amount of open healthcare data repositories by employing advanced approaches such repositories describing the worldwide history of this pandemic [4]. One of the main sources related to the instantiated into a CP, their execution by various evolution of the pandemic is the COVID-19 Data actors needs to be coordinated, and this may be done Repository at Johns Hopkins University [5]. Another both by computerized guideline systems and Business example of a repository of multiple datasets related to Process Management (BPM) systems (e.g., [17,18,19]). healthcare and social COVID-related issues is [6]. As for the Italian context, the Istituto Superiore di Sanità 3.3. Process Modeling and Mining provides information and historical data about the Clinical process management may also benefit COVID-19 healthcare situation [3]. Second, open from BPM systems [19, 20], which can rely on a clinical data repositories are also relevant to the scope growing general interest and work on many of PROTECTION. Indeed, even though clinical datasets proprietary and open-source tools. A plethora of data related to COVID-19 are complex to build and share and information is generated within the execution of for scientific purposes, some attempts have been the clinical processes, thus fostering the adoption of made to allow scientists to analyze such data (e.g., BPM-like approaches to model and verify the [7,8,9]). Further, since the treatment and prevention observed behavior. The intrinsic complexity of the of COVID-19 patients received attention from health field calls for models that reflect adaptivity to worldwide healthcare institutions, which are change and that are able to deal with incomplete providing a sort of continuously-evolving information, i.e., models that enjoy flexibility. At the recommendations, these can be freely interpreted as same time, the involved entities are expected to agree authoritative clinical and healthcare guidelines, which with the specific medical/healthcare knowledge, turn out to be effective under the form of procedures regulations, norms, business rules, protocols, and or technical guidance for different social, healthcare temporal constraints (e.g., [21]). Such GL systems and clinical contexts (e.g., [10,11,12]). Finally, even (either BPM-based or not) require medical knowledge bibliographic repositories are important sources of formalization, often relying on Ontologies. They have knowledge and information. Indeed, different been extensively used in the medical domain for many publishers and health organizations launched years but still deserve research efforts, in particular different initiatives to achieve some shared effort to focusing on process-aware knowledge representation put at disposal the most recent scientific articles about and on data-intensive process models (e.g., COVID-19 (e.g., [2]). [23,24,25,26]). Finally, data from already-executed CPs would help to allow the discovery of “actual” 3.2. Clinical Guidelines and Care Pathways processes, as well as their emerging correlations with Clinical guidelines (GLs) consist of therapeutic and healthcare and clinical data. Comparing designed and diagnostic recommendations encoding the “best “actual” processes may help discover either errors in practice” to care for specific patient categories. GLs following a clinical guideline or new, partially are “systematically developed statements to assist unknown, best practices that could be suitably practitioner and patient decisions about appropriate integrated into clinical guidelines/pathways. Recent health care in specific clinical circumstances”. Care approaches treating complex processes try to take pathways (CPs) are instead defined as “structured advantage of distributed architectures tackling the multidisciplinary care plans which detail essential aspects of both mining new processes (e.g., [27]), steps in the care of patients with a specific clinical complex multidimensional process mining (e.g., [28]), problem” [13]. CPs are often the concrete application and monitoring the compliance of process executions of GLs, where it is necessary to explicitly identify (e.g., [29,30]). decision-based activities and all the complex clinical knowledge and data needed to suitably perform the 4. PROTECTION: Methodology planned activities. GLs and CPs are very relevant in PROTECTION, as they support knowledge modelling The proposed framework PROTECTION is part of a in clinical and healthcare processes. long-term computer science and artificial intelligence Several formalisms and tools have been proposed project focusing on theoretical, methodological, and to represent, execute, and verify GLs, often integrating application-oriented aspects for developing KDDI formalized medical knowledge with data and process systems able to deal with the complex domain workflow aspects and supporting monitoring of GLs of pandemic control and prevention. In this Section, over time (e.g., [14]). A review of the state-of-the-art we describe some important aspects of the emerging for these models for Decision Support Systems (DSS) methodology induced by the overall PROTECTION has been published in [15] and [16]. When GLs are proposal. From a long-term perspective, our proposed Such processes must be designed and changed research addresses methodological issues in according to the possibly exponential diffusion of modeling, managing, and mining KDDI processes for pandemics. They are characterized by many decision- pandemic management in healthcare and clinical and knowledge-intensive tasks. Here, integration with organizations, focusing on KDDI pathways and data (e.g., medical records, healthcare population guidelines. Particularly, the proposed framework data, and so on) and temporal constraints have to be focuses on process modeling, management, and considered. The simulation of such processes needs to mining methodologies in order to effectively support be considered to estimate feasibility, resource pandemic control policies at large, with a special allocation, and so on. Different technical questions emphasis on the integration of these methodologies have to be addressed in this direction: How can with the emerging big data trend, thus achieving the medical knowledge of pandemic-related clinical innovative definition of so-called data-centric process guidelines be represented? How do we merge and modeling, management and mining for pandemic evaluate healthcare and clinical guidelines for scenarios. As a proof of concept, PROTECTION targets pandemic prevention and patient management? How the management of pandemics. do healthcare processes change according to the While a lot of attention has arose on both evolution of the pandemic? May we specialize healthcare and clinical data analysis and mining for healthcare pandemic control processes according to pandemic management, little attention has been paid data from the vaccine pharmacovigilance? to some more long-term perspectives, mainly focusing -Pandemic-related process mining is used to on KDDI processes that use and generate such data. discover process models from logs. Whenever it is not The main goal of PROTECTION is to propose a possible to have log files to be analyzed in order to methodological approach and some related software mine process models, the main idea is to consider tools to face future pandemics by considering the both medical and healthcare records as an indirect healthcare and clinical processes to be enacted to fight kind of log, where therapeutic and specialized exams the pandemics. Summarizing, from an attention to represent actions, main diagnoses represent data, we put the focus on KDDI processes, which have (possibly) intermediate states of patients, and to be suitably designed and executed to take such a decisions for different allowed critical pandemic under control by a seamless therapies/interventions/pathways represent integration of knowledge- decision- and data-related knowledge-intensive decisional tasks. Here questions aspects. are: May we discover some recurrent patterns of The information sources used to evaluate and tune therapeutic actions/decisions not considered in the the PROTECTION framework are both from open- guidelines? Are the tasks recorded in medical records access repositories and from some specific clinical confirming the main indications of clinical and and healthcare datasets. As for clinical/healthcare healthcare guidelines? Are there some suggestions in guidelines and pathways, we considered guidelines guidelines never considered in the medical records? for patients from the US and Europe [4,10]. We also May we suggest improvements to the guidelines used the technical guidance from WHO related to both based on the task patterns discovered from medical the clinical and healthcare actions for pandemics [11]. records? May we discover specific recurring care As for healthcare datasets, we considered the history- patterns for specific high-risk patients undergoing oriented dataset from Johns Hopkins University [5] monoclonal antibody therapies? for the worldwide healthcare monitoring of the Reaching such goals would lead to significant pandemic. Moreover, we considered specific advantages for the National Healthcare System (NHS) healthcare datasets related to the pharmacological in promptly managing and preventing pandemic monitoring of patients receiving therapies with events. The progressive adoption of ICT techniques, in monoclonal antibodies and the forthcoming fact, can play a strategic role in the current pharmacovigilance activity related to pandemic- rationalization process aimed at guaranteeing high- related vaccines. As for clinical datasets, we used quality services while reducing costs, even in a some clinical data repositories from the pandemic pandemic event, where the management and research database [9] containing electronic medical prevention have to be enacted and monitored in a fast records of (mainly) ambulatory patients. and dynamic way, to promptly react to diseases In summary, the main aspects of the proposed spreading with an exponential increase. Such a PROTECTION framework are as follows: framework motivates the growing attention towards -Modeling and analyzing healthcare KDDI clinical and healthcare process definition and processes dealing with the management of pandemics. analysis. PROTECTION pursues such goals through the [3] Epidemiology for Public Health. Istituto development of several advanced and innovative Superiore di Sanità. research activities. In particular, process management https://www.epicentro.iss.it/coronavirus/. in the clinical and healthcare domains is a significant [4] European Centre for Disease Prevention and topic, and we aim to bring new challenges in the Control, Coronavirus Threats and following research areas: ontological tools, languages Outbreaks: COVID-19 Pandemic. based on different kinds of logics, data models, and https://www.ecdc.europa.eu/en/covid-19- design tools for capturing events and temporal pandemic. constraints, temporal extensions of GLs and CPs [5] Johns Hopkins University, COVID-19 Data representation formalisms, constraint-based Repository by the Center for Systems temporal reasoning, design-time and run-time GL Science and Engineering (CSSE). verification, multidimensional analysis of healthcare https://github.com/CSSEGISandData/COVI processes, declarative and incremental process D-19. mining methods. [6] The COVID-19 Data Repository. It should be noted here that, even if the above- https://www.openicpsr.org/openicpsr/cov mentioned aspects are strictly related, they have been id19. considered in isolation and not yet applied [7] Carbon Health and Braid Health, cooperatively to managing worldwide pandemics. Coronavirus Disease 2019 Clinical Data Starting from this limitation, PROTECTION aims to Repository. https://covidclinicaldata.org. provide a set of methodologies and prototype [8] Data Science for COVID-19 (DS4C) in South software tools for the process-oriented prevention Korea. and management of worldwide pandemics. https://www.kaggle.com/kimjihoo/corona virusdataset. 5. Conclusions [9] COVID-19 Research Database. https://covid19researchdatabase.org/. This paper has proposed PROTECTION, an innovative [10] NIH, Coronavirus Disease 2019 (COVID-19) data-centric process-modelling-managing-and- Treatment Guidelines. mining framework for pandemic control and https://pubmed.ncbi.nlm.nih.gov/3934869 prevention that is based on the well-known KDDI 1/. processes paradigm. Future work is actually focused [11] WHO, Country & Technical Guidance - on further experimentally testing the capabilities of Coronavirus Disease (COVID-19). the framework. https://www.who.int/emergencies/disease s/novel-coronavirus-2019/technical- Acknowledgements guidance-publications. This research is supported by the ICSC National [12] European Respiratory Society, COVID-19: Research Centre for High Performance Computing, Guidelines and Recommendations Big Data and Quantum Computing within the Directory. https://www.ersnet.org/covid- NextGenerationEU program (Project Code: PNRR 19/covid-19-guidelines-and- CN00000013). recommendations-directory/. [13] Campbell, H., Hotchkiss, R., Bradshaw, N., Porteous, M.: Integrated Care Pathways. References British Medical Journal 316(7125), 133–137 [1] Open-Access Data and Computational (1998). Resources to Address COVID-19. [14] Combi, C., Keravnou-Papailiou, E., Shahar, Y.: https://datascience.nih.gov/covid-19-open- Temporal Information Systems in Medicine. access-resources. Springer Science & Business Media (2010). [2] The World Health Organization, Global [15] Peleg, M.: Computer-Interpretable Clinical Research on Coronavirus Disease (COVID- Guidelines: A Methodological Review. 19). Journal of Biomedical Informatics 46(4), https://www.who.int/emergencies/disease 744–763 (2013). s/novel-coronavirus-2019/global-research- [16] Wright, A., Sittig, D. F., Ash, J. S., Feblowitz, J., on-novel-coronavirus-2019-ncov. Meltzer, S., McMullen, C., et al.: Development and Evaluation of a Comprehensive Clinical Decision Support Taxonomy: Comparison of Front-End Tools in Commercial and [23] Schulz, S., Jansen, L.: Formal Ontologies in Internally Developed Electronic Health Biomedical Knowledge Representation. Record Systems. Journal of the American Yearbook of Medical Informatics 22(1), 132– Medical Informatics Association 18(3), 232– 146 (2013). 242 (2011). [24] Cohn, D., Hull, R. Business Artifacts: A Data- [17] Peleg, M.: Clinical Decision Support: The Centric Approach to Modeling Business Road Ahead. Guidelines and Workflow Operations and Processes. IEEE Data Eng. Models. San Diego, US, Elsevier, 281–306 Bull. 32(3), 3–9 (2009). (2007). [25] Calvanese, D., De Giacomo, G., Montali, M.: [18] Quaglini, S., Stefanelli, M., Lanzola, G., Foundations of data-aware process analysis: Caporusso, V., Panzarasa, S.: Flexible a database theory perspective. In Guideline-Based Patient Careflow Systems. Proceedings of the 32nd ACM SIGMOD- Artificial Intelligence in Medicine 22(1), 65– SIGACT-SIGAI Symposium on Principles of 80 (2001). Database Systems, pp. 1-12 (2013). [19] Combi, C., Oliboni, B., Zardini, A., Zerbato, F.: [26] Artale, A., Kovtunova, A., Montali, M., van der A Methodological Framework for the Aalst, W.M.: Modeling and reasoning over Integrated Design of Decision-Intensive Care declarative data-aware processes with Pathways—An Application to the object-centric behavioral constraints. In Management of COPD Patients. Journal of Business Process Management: 17th Healthcare Informatics Research 1, 157–217 International Conference, BPM 2019, pp. 139- (2017). 156, (2019). [20] Combi, C., Gambini, M., Migliorini, S., [27] Sun, S. X., Zeng, Q., Wang, H.: Process-Mining- Posenato, R.: Representing Business Based Workflow Model Fragmentation for Processes through a Temporal Data-Centric Distributed Execution. IEEE Trans. on Sys., Workflow Modeling Language: An Man, and Cybernetics-Part A: Systems and Application to the Management of Clinical Humans 41(2), 294–310 (2010). Pathways. IEEE Trans. on Sys., Man, and [28] Knoll, D., Reinhart, G., Prüglmeier, M.: Cybernetics: Systems 44(9), 1182–1203 Enabling Value Stream Mapping for Internal (2014). Logistics using Multidimensional Process [21] Chesani, F., Mello, P., Montali, M.: Abductive Mining. Expert Systems with Applications Reasoning on Compliance Monitoring: 124, 130–142 (2019). Balancing Flexibility and Regulation. In [29] Rinderle-Ma, S., Winter, K., Benzin, J.-V.: Foundations of Intelligent Systems: 23rd Predictive Compliance Monitoring in International Symposium, ISMIS 2017, pp. 3- Process-Aware Information Systems: State 16, (2017). of The Art, Functionalities, Research [22] Austin, C.A., Mohottige, D., Sudore, R.L., Directions. Information Systems 115, art. Smith, A.K., Hanson, L.C.: Tools to Promote 102210 (2023). Shared Decision Making in Serious Illness: A [30] Loreti, D., Chesani, F., Ciampolini, A., Mello, Systematic Review. JAMA Internal Medicine P.: A Distributed Approach to Compliance 175(7), 1213–1221 (2015). Monitoring of Business Process Event Streams. Future Generation Computer Systems 82, 104–118 (2018).