<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Process Data Science for Workflow Optimization in Digital Pathology: A status report</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Patrick</forename><surname>Stünkel</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Pathology</orgName>
								<orgName type="institution">Haukeland University Hospital</orgName>
								<address>
									<settlement>Bergen</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sabine</forename><surname>Leh</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Pathology</orgName>
								<orgName type="institution">Haukeland University Hospital</orgName>
								<address>
									<settlement>Bergen</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="department">Department of Research and Development</orgName>
								<orgName type="institution">Haukeland University Hospital</orgName>
								<address>
									<settlement>Bergen</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Friedemann</forename><surname>Leh</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Pathology</orgName>
								<orgName type="institution">Haukeland University Hospital</orgName>
								<address>
									<settlement>Bergen</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Process Data Science for Workflow Optimization in Digital Pathology: A status report</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">BDEEF382E08BD5A75F923F4557460F58</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T10:20+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Pathology</term>
					<term>Process Mining</term>
					<term>Workflow Modelling</term>
					<term>Event Log Data</term>
					<term>Process Analysis</term>
					<term>Data Management</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Pathology is the study of causes and effects of diseases. It is an integral part of medical diagnostics based on the microscopic analysis of tissue, cells, or body fluids. Like other medical disciplines, pathology is currently undergoing a "digital transformation", i.e., witnessing a transition from the assessment of physical tissue slides under a microscope towards analysing digital images of the same tissue slides on a computer screen. The recent advent of powerful machine learning methods and tools for digital image analysis opened the door to novel ways of conducting pathological diagnostics. Still, in order to yield a digital image of a specimen, the specimen has to pass through an elaborate multi-stage preparation process in the laboratory. We argue that in order to achieve a holistic framework of digital pathology, one must not only consider digital image analysis techniques, but also consider means for analysing the process as a whole. Concretely, we propose to analyse the event log data of the laboratory information system in order to understand flow patterns of specimens, find bottlenecks, predict the amounts of incoming samples, and plan resource allocations in an optimal manner. This is highly relevant to meet the ever-increasing number and complexity of specimens, that are handled by pathology departments around the world. The data science method working with event data is called process mining. Process mining is a relatively young but growing research discipline that seeks to bridge the gap between classic data science and business process management. It enables the discovery of control flow structures, data flow patterns, resource utilization, process performance, and more. This paper represents a report on the current state of a work-in-progress project on process mining at a large regional hospital in Western Norway. The main contribution of this report is a list of concrete challenges that we encountered when conducting process mining project in pathology, some of which, we believe, have received less attention in the literature so far. Concretely, we find current process mining techniques not perfectly suited to be directly applied to the pathology laboratory process.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>"Good health and well-being" is the third of the United Nations' 17 sustainable development goals. Health care is an integral public service that every government around the globe has to provide for its population. A growing and increasingly older population combined with the limited availability of trained medical personnel exacerbates the delivery of such health care services, e.g., in its 2013 report the OECD states that health care stands for roughly 10% of the gross domestic products of its member states, and it is expected that this number will grow even further in the future <ref type="bibr" target="#b0">[1]</ref>. Information and communication technology (ICT) is seen as an opportunity to leverage the aforementioned issue by supporting health care professionals in their daily work and by offloading repetitive, and thus automatable, tasks onto machines in order to utilize the limited human resources more efficiently <ref type="bibr" target="#b1">[2]</ref>. A traditional application of ICT, are information systems <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref>, which make the "right" information, at the "right time", available to the "right people". A well-known example of health care information systems are electronic health record systems <ref type="bibr" target="#b4">[5]</ref>. Another application of ICT in healthcare lies in the area of computer aided diagnostics. The latter is mainly facilitated by recent breakthroughs in the field of artificial intelligence/machine learning (AI/ML) in the context of medical image analysis <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7]</ref>.</p><p>Pathology is a diagnostic medical discipline that, through microscopy of tissues, cells, and fluids, often in combination with molecular diagnostics, determines the presence of diseases as well as morphological and molecular abnormalities. With the increasing availability of so-called whole slide scanners and image viewing software, also pathology becomes more and more digitized <ref type="bibr" target="#b7">[8]</ref>, i.e., the examination is not performed under a microscope anymore but on a computer screen. The latter enables the application AI/ML methods for automatic image analysis <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10]</ref>. Still, in order to arrive at a diagnostic result, specimens have to undergo an elaborate preparation process before. While "classical data science" methods are commonplace in image analysis (classification, clustering, etc.), "process data science" is less prevalent. The latter is also known as process mining <ref type="bibr" target="#b10">[11]</ref>, i.e., the discovery of process models from event log data. There are several reports on successful applications of process mining in healthcare, see <ref type="bibr" target="#b11">[12]</ref> for a comprehensive survey, but none for pathology in particular.</p><p>The goal of this paper is to present an ongoing project at the pathology department of a regional hospital at the west coast of Norway, namely Haukeland universitetssjukehus, in the following abbreviated as HUS. In this project, the present authors are applying process mining techniques to the preparation workflow of specimens in the pathology laboratory in order to analyse cycle times, detect possible bottlenecks, and, in the long run, optimize the flow times of the samples. The project is still in an early stage, but already from first experiences, we can report on some issues that have received less attention in the process mining literature. Thus, the goal of this paper is to shed more light on the possibilities of process mining in pathology, the intricacies that arise on the organizational, methodical, technical, and social level when conducting such a project, as well as to present our approach on addressing these problems.</p><p>The paper is structured as follows: Section 2 introduces the problem and solution domain of this project, namely pathology and process mining. Afterwards, section 3 presents the project itself, its context, and goals. Section 4 presents the challenges, which we have been facing until now, those we are facing right now, and those we are expecting to encounter in the future. Section 5 presents our approach to one of our main challenges, i.e., managing sensitive and heterogeneous data. Eventually, section 6 concludes this paper. The term pathology comprises the two Greek words "pathos" (suffering) and "logos" (study), hence, literally translates as the "study of diseases". Today, it is understood in a more narrow sense as the "study of causes and effects of diseases". A pathologist takes the role of a consultant towards another clinician, who is exerting primary care to a patient. The primary clinician takes a specimen from the patient, e.g., a tissue sample, and sends it to the pathologist, who examines the specimen and writes a report, most often with a conclusive diagnosis, which will help the clinician on deciding the further treatment, e.g. whether surgery or chemotherapy has to be scheduled. The historical development of pathology is closely related to the historical development of medicine itself and is characterized by several technological breakthroughs.</p><p>Starting with cultural changes in Europe during the 16th century, autopsy (i.e., the examination of human corpses) became possible, elucidating the understanding of the human body, its organs, and the effects of diseases. With the use of the microscope to study body tissues during the 19th century, histology a.k.a. microscopic anatomy was established as a discipline. Most recently, methods and techniques within immunohistochemistry and molecular biology enabled further means to understand and diagnose diseases on the cellular and molecular level. When talking about pathology, one distinguishes between the sub-disciplines autopsy, histology, cytology (analysis of cell specimen), and molecular pathology. Here, we will focus on histopathology.</p><p>In order to yield a histological slide, which can be analysed under a microscope, the specimen has to undergo a preparation process. This process is abstractly visualized in Fig. <ref type="figure" target="#fig_0">1</ref> in the form of a petri net <ref type="bibr" target="#b12">[13]</ref>. When a specimen arrives at the pathology laboratory, it is first assigned to a case ("Accessioning"), i.e. various metadata (patient data, information about the sample type, clinical inquiries) are aggregated in the laboratory information system (LIS), a priority is assigned, and the specimens are labelled with a lab-internal identifier. In most modern labs, this identifier has the form of an industrial barcode, which leverages electronic tracing throughout the process. When the specimen has been immersed in a fixative solution (e.g., formalin) for a sufficient amount of time, it can be delivered to the next stage of the process: "Grossing".</p><p>Here, the tissue is examined on a macroscopic level (i.e., "with the naked eye") for abnormal findings and marked. In case of larger specimens, slices with findings of interest are selected from the specimen. Tissues are placed in a cassette and delivered to "Processing". This step is performed by a specialized machine that automates dehydration, clearing and infiltration of the tissue with paraffin wax. Afterwards, the processed tissue is taken to "Embedding". This means that it is placed in molten paraffin wax to form a so-called block. The cooled-down paraffin block is mounted on a Microtome, which allows cutting very thin slices (∼ 3-4𝜇𝑚) from the tissue-paraffin-block. The slices are placed on a glass slide and delivered to the "Staining" process step. Here, the slide is put through different chemicals, which amplify contrasts and highlight certain biological structures, e.g. hematoxylin stains cell nuclei blue and eosin stains cell bodies (cytoplasm) red. Finally, a protective cover-slip is mounted on top of the stained tissue slice and the slide is ready to be analysed by a pathologist.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Solution Domain: Process Mining</head><p>Process Mining is a scientific approach that bridges business process management (BPM) and data science. The former is an interdisciplinary field with roots in Taylor's theory of "Scientific Management" <ref type="bibr" target="#b13">[14]</ref> and gained significant attention during the 90s when enterprise resource planning software and process-aware information systems were introduced in many organizations <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b15">16]</ref>. BPM advocates organizing a business around the services that are delivered and the processes that are executed. The associated academic discipline is concerned with all aspects of identifying, analysing, and (re-)designing such business processes. Data science is another interdisciplinary field that brings together statistics, computer science and other related disciplines <ref type="bibr" target="#b16">[17]</ref>. Its increasing popularity and significance is mainly due to the abundant availability of "big" data, allowing businesses to gain new insights <ref type="bibr" target="#b17">[18]</ref>.</p><p>While "classic" data science focuses on the derivation of prediction variables (structural features) from a set of given predictor variables, process mining is about the discovery of process models (dynamical features) from event data. Process mining started as a project proposal in the late 90s at Technical University of Eindhoven and has since then grown into its own discipline, with an active community holding conferences and workshops <ref type="foot" target="#foot_0">1</ref> . In terms of publications, <ref type="bibr" target="#b18">[19]</ref> and <ref type="bibr" target="#b19">[20]</ref> are considered to be the seminal papers in this line of research, while the textbook <ref type="bibr" target="#b10">[11]</ref> provides the most recent comprehensive overview over the field.</p><p>The principal idea of process mining is sketched in Fig. <ref type="figure" target="#fig_1">2</ref>: the base data set is called an event log: a collection of events, where each event at least must contain (i) a case identifier (to group a set of events w.r.t. to a case), (ii) a timestamp (to order the execution of activities within a case), and (iii) the name of an activity (to identity the activities within a case). The first step after obtaining an event log is to identify the control flow structure of the process model, i.e., the order in which activities may be executed, this is called "play-in" <ref type="bibr" target="#b10">[11]</ref>. With a control flow model and an event log at hand, one may do a "replay" <ref type="bibr" target="#b10">[11]</ref>. This means to simulate the execution of the event log on the control flow model, which enables conformance checking <ref type="bibr" target="#b20">[21]</ref>, i.e., verifying whether there are deviations between the process model and the event log. Moreover, one is able to discover additional perspectives of a process model. These perspectives are called data, resource, and time <ref type="bibr" target="#b10">[11]</ref>. The data perspective highlights how certain properties of a process instance affect the paths that the case takes in the control flow structure. The resource perspective highlights the resources that are required for the execution of certain activities.</p><p>And, the time perspective looks into the execution times of activities and cases. Hence, process mining does not only encompass the discovery of control flow, but also enables the detection of bottlenecks (performance analysis) and hidden dependencies (data flow analysis).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">The Project's Goal</head><p>According to <ref type="bibr" target="#b21">[22]</ref>, a (process) data science project may seek answers to one or more of the following questions:</p><p>• "What happened?"</p><p>• "Why did it happen?"</p><p>• "What will happen?"</p><p>• "What is the best that can happen?"</p><p>These questions can be associated with four activities report, analyse, predict, and plan, which are depicted in the upper left quadrant of Fig. <ref type="figure" target="#fig_2">3</ref>. These activities also correspond to the sub-goals of our project at HUS. The overarching objective of the project is to reduce the overall cycle time in the pathology department, i.e., the time from receiving a specimen to sending a diagnostic report back. This is a highly relevant concern, because of four key challenges the pathology department at HUS is facing: long cycle times, increasing number of specimens, increasing number of analyses per specimen, and a more or less constant number of resources. In order to understand the first question "What happened?", a precise model of the current process is required. The right-hand side of Fig. <ref type="figure" target="#fig_2">3</ref>   of reality and are a result of the reporting activity. Predictive models (e.g., simulations) allow making forecasts about the future and are produced in the prediction activity based on existing descriptive models. They play an important role for achieving prescriptive models. The latter are also called specifications. They steer how the actual operational tasks are performed. There are many examples of prescriptive models: they range from more abstract work guidelines (e.g., in what order blocks shall be cut on a microtome) over daily plans (e.g., worker assignments to process steps) to the level of concrete machine instructions (e.g., a computer program that routes specimens into different pathways). The ultimate objective of this project is to create such prescriptive models in order to reduce the cycle times in the laboratory.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Challenges …</head><p>For the first phase of our project (i.e., reporting), process mining has been selected as a methodology. In this section, we want to report on our own experiences after one and a half years after starting a process mining project in pathology.</p><p>In <ref type="bibr" target="#b10">[11]</ref>, v.d. Aalst presents the so-called 𝐿 * -model of process mining. It describes the architecture, stages, and activities of a process mining project. It is motivated by CRISP-DM <ref type="bibr" target="#b22">[23]</ref>, a cross-industry reference model for conducting data science projects. Fig. <ref type="figure" target="#fig_3">4</ref> contains a graphical depiction of the 𝐿 * model, taken from <ref type="bibr" target="#b10">[11]</ref>, augmented with situations where we experienced resp. expect to experience concrete issues. Our project currently finds itself in stage two of this model. Thus, this section mainly focus on the first three issues.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">…until now, …</head><p>In the preliminary stage of a 𝐿 * -project, one has to justify the purpose of the project and to apply for data access. Since we are conducting our project within the health care domain, there are especially strict requirements concerning access to data: the project had to apply for exemption from the duty of confidentiality, to do a data protection impact assessment, to carry out a risk analysis and to establish a data management plan.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Issue #1 (Organizational)</head><p>There are cyclic dependencies when writing data access applications.</p><p>When writing these applications, we experienced that in order to provide the required documentation about what kind of data we need to extract and how we are planning to safeguard privacy concerns, extensive knowledge about the database of the laboratory information system was required. To overcome this "chicken-and-egg" problem, it was essential to identify key personalities that have both clearance for accessing the database (because of their regular job description) and a sufficient understanding of the objectives of the process mining project. From our experience, this can be a challenging endeavour because these personalities are often occupied with their operational work. A complementary approach is to have the process data scientists sign respective non-disclosure agreements (NDAs). This requires to already have a legal framework in place for this. Otherwise, juridical personnel has to be involved in the project. In our case, the solution was to employ the primary technical investigator at the hospital.</p><p>After applications are approved, the first stage of the process mining project (extraction) is entered. Here, the goal is to obtain event logs, which can be processed by process mining algorithms.</p><p>Issue #2 (Technical)</p><p>The source information system, generally, does not always offer a viable event log structure.</p><p>The concept of an event log has been introduced in Sect. 2.2. In our case, the LIS logs all types of analyses performed on specimens, including histological slide preparation. The main challenge, however, is that the LIS database does directly provide the relevant events. The latter have to be extracted by combining records from several tables. In addition, not every process step is always tracked. For example, in our lab, there is no explicit registration of when the staining of a slide begins (there is only a notification when it is finished). However, it is possible to infer the start timestamp when knowing the staining programme that was executed. In a different situation, i.e., to identify when the grossing or microscopic analysis is started and finished, a separate "user access log" table has to be consulted to retrieve this information. Another issue is that the granularity of the logged events varies greatly, e.g., the system logs some internal function calls, which are not relevant for our analysis. Furthermore, event names in the database are cryptic at times and not unambiguous, which requires combining multiple fields and context information to map event records to real lab actions. Moreover, sometimes case meta-information and resource-specific event attributes are missing (e.g. at what workstation an activity was performed). Bose et al. <ref type="bibr" target="#b23">[24]</ref> discuss such "data quality" issues and group them into three categories: (i) the event log does not contain events that really happened, (ii) the event log contains more events than in reality, and (iii) the real events are concealed in the log. All three categories apply in our case.</p><p>Not cleansing the log would result in unwanted results during the process discovery phase. By conducting several small iterations, where we extracted a small excerpt of raw events from the database, mapped them to an event log, and performed process discovery on it, we could quickly see that a "naive" approach leads to inappropriate results. In our case, it was possible to assess the "quality" of the event log through the resulting control flow model because we have a clear understanding about how the general process should look like, see Sect. 2.1.</p><p>Thus, we had to deviate from the principle of "keeping the event data as raw as possible" <ref type="bibr" target="#b24">[25]</ref> and to design a transformation from the LIS database structure to an appropriate event log. Designing this transformation, however, required extensive knowledge of both the information system and the domain. To bridge the gap between the domain and IT experts, we made positive experiences by having regular meetings where both sides could exchange their knowledge and by having the IT experts getting direct "hands-on" experience in laboratory. Seeing how the lab technicians work with the LIS, helped immensely in understanding how the system digitally represents the physical actions in reality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">...,right now,...</head><p>The second stage of 𝐿 * describes the transition from an event log to a control-flow process model. This is facilitated by a process discovery algorithm.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Issue #3 (Methodological)</head><p>The existing process mining algorithms are not perfectly suited for the specimen preparation workflow in the pathology laboratory.</p><p>There is a plethora of process discovery algorithms, see <ref type="bibr" target="#b10">[11]</ref> for an overview. All of these algorithms are based on the notion of atomic token-based workflow modelling languages, i.e., a case is represented as an atomic token that flows through a net structure representing the control flow. The token may become duplicated if activities are performed in parallel but, in general, the case is not decomposed during the execution of the process. In pathology, however, there is a hierarchy of different token types flowing through the lab at the different process steps: a diagnostic request (i.e., case) may contain multiple specimens, which can become multiple cassettes/blocks, which again may result in multiple slides. The fact that a pathologist can order additional analyses in between (i.e., creating new blocks and/or slides) requires considering all these artefacts on different levels of granularity at the same time.</p><p>When we experimented with the various process discovery algorithms implemented in the open-source tool ProM<ref type="foot" target="#foot_1">2</ref> , most algorithms produced unwanted results: in most cases they simply returned a control flow where all activities principally could happen in parallel. The fuzzy miner <ref type="bibr" target="#b25">[26]</ref> algorithm produced yet the "best" result compared to others, in a sense that it discovered the general structure of Fig. <ref type="figure" target="#fig_0">1</ref>. Still, the algorithm was not able to discover the correct causal dependencies between less frequent process steps and when decreasing the abstraction level, "spurious cycles" appeared on all activities. The latter phenomenon can be explained by the fact that the process steps happen in parallel while operating with different level of granularity.</p><p>Different granularity levels are discussed in <ref type="bibr" target="#b10">[11,</ref><ref type="bibr">Chap.5.5]</ref>, where the aforementioned atomic token abstraction is described as "flattening". The chapter mentions the idea of "proclets" <ref type="bibr" target="#b26">[27]</ref>, i.e., disassembling the overall process into several process operating at different levels of granularity, and refers to a research project (ACSI project) that promotes the use of such proclets. However, the referenced website does not seem to be active any more today.</p><p>In our case, we are more or less aware how the control flow must look like. Thus, process discovery algorithms are actually less interesting for us and we can resort to creating a precise process model by hand. The latter is a confirming sign that we are dealing with a so-called "Lasagna"-process <ref type="bibr" target="#b10">[11]</ref>, i.e., a process model with a simple and well-understood control flow. We discovered that coloured petri nets (CPNs) <ref type="bibr" target="#b27">[28]</ref> are an appropriate formalism for our case, since they naturally model the idea of different types of tokens flowing through the net. Hence, our immediate next objective is to design the pathology lab process in the form of a coloured petri net and to extend the notion of event-log replay on petri nets with the notion of different token types. This is necessary to obtain the performance information of the individual process steps and different types of specimens.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3.">...and later</head><p>According to the 𝐿 * -model, our project is currently in stage two. Yet, we want to give an outlook on the issues, that we are expecting to arise in the coming stages. The third stage is the creation of an integrated model, i.e. a process model combining the notions of control flow, data, resources and time. This will, for the first time, allow giving feedback to the original process. The 𝐿 * -model discusses several options for this, namely redesigning (changing the whole process model), adjusting (changing the process configuration, for example, resource allocations), or intervening (performing concrete actions during the execution of process instance).</p><p>Issue #4 (Practical) It is not exactly clear how process mining observations can be translated into actions.</p><p>We are currently uncertain of how we eventually can transfer the analytical results to operational results. For instance, there are some physical limitations to what degree a "redesign" of the process is possible. The literature mentions approaches on how to transit from process mining to simulation <ref type="bibr" target="#b28">[29]</ref>. But, it does not mention specific methodologies for getting to means of operational support, the final stage of 𝐿 * .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Issue #5 (Social)</head><p>It is not clear how to best anticipate and mitigate social ramifications. The final objective is to reduce the overall cycle times via intelligent planning of resources and routing of specimens. When automatically assigning tasks to individual workers, both individual skills, individual preferences for particular tasks and the laboratory's current need for specific activities matter. There is a theoretical possibility to assess the performance data of individual workers. Thus, our project has to safeguard that this contingency remains unfeasible. Currently, we are hashing all usernames with a random and hidden salt. When designing reporting solutions, we have to make sure that performance data is only presented aggregated over multiple cases, such that it is not possible to identify individuals from context information of a single case. In all of this, it is paramount to include all stakeholders in the project to make them aware of the technical possibilities and the data stored in the system. Even though this issue remains in the more distant future, it is important to be aware of it already.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Executable Data Management: A Model-based Approach</head><p>In Sect. <ref type="bibr" target="#b3">4</ref> we have seen that the raw event data poses several challenges. First, there is a (organizational) challenge in gaining access to it, which necessitates to document what is extracted and how sensitive data is protected. Second, there is a (technical) challenge when it comes to mapping the raw data into an event log so that it can be used for process mining.</p><p>It turns out that metadata plays a crucial role when addressing these challenges. They serve both as documentation as well as specification for extraction and transformation. Since it is required to put them under version control to enable auditing, revision, and iterative development, one might as well consider utilizing these documents more "directly". Hence, we decided to adopt a model-based paradigm <ref type="bibr" target="#b29">[30]</ref> and consider these artefacts not only as mere means of documentation (descriptive) but also as means to configure the extraction and transformation The resulting architecture is shown in Fig. <ref type="figure">5</ref>. The bottom half of the figure shows the data layer. The data "flows" from left to right, starting from the LIS database with the raw data. In the first step, the contents of relevant tables are exported in the form of comma separated values (CSV) files, where the contents of the columns containing sensitive information are hashed. In the second step, this data is transformed into an event log structure. This transformation step has to address the challenges related to data quality, see Sect. 4.1. Eventually, the event log is replayed on the process model to obtain performance data about case and activity durations.</p><p>The top-half of Fig. <ref type="figure">5</ref>, contains the metadata documents. The database is described via SQL Create Table statements, which were manually extracted from a PDF provided by the LIS supplier. An Excel sheet declares the columns, which are extracted and which column contents are hashed. In our case, Excel turned out to be a viable compromise for a tool that domain experts are familiar with and which, simultaneously, can easily be integrated in automated toolchains. Similarly, the declarations about how event codes from the LIS map to the individual process steps are defined in an Excel sheet. For the latter, we first created a domain model of histopathology. The domain model has the form of a class diagram and is encoded using Ecore <ref type="bibr" target="#b30">[31]</ref>, a standard serialization format in the context of model-based engineering. Moreover, there is the extensible event stream (XES) schema definition <ref type="bibr" target="#b31">[32]</ref>, which defines a standard for representing event logs, and the process model defined as a coloured petri net, see Sect. 4.2.</p><p>All these documents are inter-related because they refer to each other's elements, e.g. the transition names in the CPN-model must correspond to activity names defined in the domain model. These relations are visualized as cyan-coloured links in Fig. <ref type="figure">5</ref>.</p><p>For the foundation of this infrastructure, we built on CorrLang<ref type="foot" target="#foot_2">3</ref> , an academic prototype tool addressing semantic interoperability via mediation, based on a textual domain-specific language, which was developed in the context of the first author's PhD thesis <ref type="bibr" target="#b32">[33]</ref>. The tool establishes generic relations (the cyan links) between the various metadata documents, which are interpreted to perform the extraction and transformation on the data level<ref type="foot" target="#foot_3">4</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Conclusion</head><p>To summarize the content and contributions of this state-of-the-project report: we began by introducing (digital) pathology with an emphasis on not only focusing on classic data science for image analysis, but also consider process data science for event data stored in health care information systems. Concretely, we want to gain insights about the specimen preparation process in the lab, as this constitutes a significant amount of time within the diagnostic process. There are several reports on successful applications of process mining in the health care domain <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b33">34]</ref>. Also, the ProHealth workshop series (now KR4HC) offers a significant body of knowledge about applications of process-centric approaches within healthcare. However, to our knowledge, none of these domains have addressed pathology so far.</p><p>The main contributions of this paper are (a) an experience report (Sect. 4) about conducting a process mining project in pathology (currently in the reporting phase of Fig. <ref type="figure" target="#fig_2">3</ref> and stage two of the 𝐿 * model), and (b) a conceptual approach for exploiting project documents as an executable specification for a data transformation pipeline (Sec. 5). Our experience report comprises insights that, we believe, have received less attention in the process mining literature. Especially the conceptual mismatch between atomic token-based workflow modelling languages and the flow of specimens/blocks/slides in the pathology laboratory is to highlight here.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Specimen Workflow in the Pathology Laboratory</figDesc><graphic coords="3,164.10,166.80,69.08,51.81" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Process Mining schematically, adapted from<ref type="bibr" target="#b10">[11]</ref> </figDesc><graphic coords="5,322.89,84.46,182.54,134.78" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Process Activities and Artefacts</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Challenges throughout a Process Mining Project</figDesc><graphic coords="7,227.61,84.54,277.69,279.81" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>shows three different types of models. Descriptive models (e.g., process diagrams, statistical indicators, or plots) are simplified representations</figDesc><table><row><cell></cell><cell>activities</cell><cell>artefacts</cell><cell></cell></row><row><cell></cell><cell>Report Analyse</cell><cell>Descriptive Models</cell><cell>Process Diagrams/Maps, Statistics (average cycle time, ...), Visualisations (Histogram, ...)</cell></row><row><cell>analytical operational</cell><cell>Predict Plan Operate</cell><cell>Predictive Models Prescriptive Models</cell><cell>Simulations Queueing Models Optimisation Models (LP, ILP, DP,...) Algorithms &amp; Data Models Executable Workflow Models Work Guidelines</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://icpmconference.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">http://promtools.org/doku.php</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">https://www.corrlang.io/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">A demonstration of these definitions is found in: https://github.com/webminz/piv-data-mgmt</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The present study is part of the project "PiV -Pathology services in the Western Norwegian Health Region: a centre for applied digitization". The project is funded by the Western Norway Regional Health Authority. We also would like to thank the anonymous reviewers for their helpful remarks.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<ptr target="https://www.oecd-ilibrary.org/social-issues-migration-health/health-at-a-glance-2013_health_glance-2013-en" />
		<title level="m">Health at a Glance 2013: OECD Indicators, Organisation for Economic Co-operation and Development</title>
				<meeting><address><addrLine>Paris</addrLine></address></meeting>
		<imprint>
			<publisher>OECD</publisher>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="https://www.oecd-ilibrary.org/social-issues-migration-health/improving-health-sector-efficiency_9789264084612-en" />
		<title level="m">Improving Health Sector Efficiency: The Role of Information and Communication Technologies, Organisation for Economic Co-operation and Development</title>
				<meeting><address><addrLine>Paris</addrLine></address></meeting>
		<imprint>
			<publisher>OECD</publisher>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Evaluation of health information systems research in information systems research: A meta-analysis</title>
		<author>
			<persName><forename type="first">P</forename><surname>Haried</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Claybaugh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Dai</surname></persName>
		</author>
		<idno type="DOI">10.1177/1460458217704259</idno>
	</analytic>
	<monogr>
		<title level="j">Health Informatics Journal</title>
		<imprint>
			<biblScope unit="volume">25</biblScope>
			<biblScope unit="page" from="186" to="202" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Process-Aware Information System Development for the Healthcare Domain -Consistency, Reliability, and Effectiveness</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Mans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">C</forename><surname>Russell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J M</forename><surname>Bakker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Moleman</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-12186-9_61</idno>
	</analytic>
	<monogr>
		<title level="m">Business Process Management Workshops</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Rinderle-Ma</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Sadiq</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Leymann</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="635" to="646" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Electronic health records implementation: an evaluation of information system impact and contingency factors</title>
		<author>
			<persName><forename type="first">L</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bellucci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">T</forename><surname>Nguyen</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.ijmedinf.2014.06.011</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Medical Informatics</title>
		<imprint>
			<biblScope unit="volume">83</biblScope>
			<biblScope unit="page" from="779" to="796" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A survey on deep learning in medical image analysis</title>
		<author>
			<persName><forename type="first">G</forename><surname>Litjens</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kooi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">E</forename><surname>Bejnordi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A A</forename><surname>Setio</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ciompi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ghafoorian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A W M</forename><surname>Van Der Laak</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Van Ginneken</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">I</forename><surname>Sánchez</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.media.2017.07.005</idno>
	</analytic>
	<monogr>
		<title level="j">Medical Image Analysis</title>
		<imprint>
			<biblScope unit="volume">42</biblScope>
			<biblScope unit="page" from="60" to="88" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Deep Learning Applications in Medical Image Analysis</title>
		<author>
			<persName><forename type="first">J</forename><surname>Ker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Rao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Lim</surname></persName>
		</author>
		<idno type="DOI">10.1109/ACCESS.2017.2788044</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Access</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="9375" to="9389" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Twenty Years of Digital Pathology: An Overview of the Road Travelled, What is on the Horizon, and the Emergence of Vendor-Neutral Archives</title>
		<author>
			<persName><forename type="first">L</forename><surname>Pantanowitz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sharma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">B</forename><surname>Carter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kurc</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sussman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Saltz</surname></persName>
		</author>
		<idno type="DOI">10.4103/jpi.jpi_69_18</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Pathology Informatics</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">40</biblScope>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases</title>
		<author>
			<persName><forename type="first">A</forename><surname>Janowczyk</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Madabhushi</surname></persName>
		</author>
		<idno type="DOI">10.4103/2153-3539.186902</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Pathology Informatics</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page">29</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Machine Learning Methods for Histopathological Image Analysis</title>
		<author>
			<persName><forename type="first">D</forename><surname>Komura</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ishikawa</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.csbj.2018.01.001</idno>
	</analytic>
	<monogr>
		<title level="j">Computational and Structural Biotechnology Journal</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<biblScope unit="page" from="34" to="42" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P V D</forename><surname>Aalst</surname></persName>
		</author>
		<title level="m">Process Mining: Data Science in Action</title>
				<meeting>ess Mining: Data Science in Action</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Process mining in healthcare: A literature review</title>
		<author>
			<persName><forename type="first">E</forename><surname>Rojas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Munoz-Gama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Sepúlveda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Capurro</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.jbi.2016.04.007</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Biomedical Informatics</title>
		<imprint>
			<biblScope unit="volume">61</biblScope>
			<biblScope unit="page" from="224" to="236" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Petri</surname></persName>
		</author>
		<title level="m">Kommunikation mit Automaten</title>
				<imprint>
			<date type="published" when="1962">1962</date>
		</imprint>
		<respStmt>
			<orgName>Technische Hochschule Darmstadt</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Doctoral Thesis</note>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">W</forename><surname>Taylor</surname></persName>
		</author>
		<title level="m">The Principles of Scientific Management</title>
				<meeting><address><addrLine>New York</addrLine></address></meeting>
		<imprint>
			<publisher>Harper &amp; Brothers Publishers</publisher>
			<date type="published" when="1911">1911</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>Davenport</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">E</forename><surname>Short</surname></persName>
		</author>
		<ptr target="https://sloanreview.mit.edu/article/the-new-industrial-engineering-information-technology-and-business-process-redesign/" />
		<title level="m">The New Industrial Engineering: Information Technology and Business Process Redesign</title>
				<imprint>
			<date type="published" when="1990">1990</date>
		</imprint>
		<respStmt>
			<orgName>MIT Sloan Management Review</orgName>
		</respStmt>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Reengineering Work: Don&apos;t Automate</title>
		<author>
			<persName><forename type="first">M</forename><surname>Hammer</surname></persName>
		</author>
		<ptr target="https://hbr.org/1990/07/reengineering-work-dont-automate-obliterate" />
	</analytic>
	<monogr>
		<title level="j">Obliterate</title>
		<imprint>
			<date type="published" when="1990">1990</date>
			<publisher>Harvard Business Review</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Data Science: A Comprehensive Overview</title>
		<author>
			<persName><forename type="first">L</forename><surname>Cao</surname></persName>
		</author>
		<idno type="DOI">10.1145/3076253</idno>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="page">42</biblScope>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">H</forename><surname>Davenport</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">J</forename><surname>Patil</surname></persName>
		</author>
		<ptr target="https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century" />
		<title level="m">Data Scientist: The Sexiest Job of the 21st Century</title>
				<imprint>
			<publisher>Harvard Business Review</publisher>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Process mining: a research agenda</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J M M</forename><surname>Weijters</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.compind.2003.10.001</idno>
	</analytic>
	<monogr>
		<title level="j">Computers in Industry</title>
		<imprint>
			<biblScope unit="volume">53</biblScope>
			<biblScope unit="page" from="231" to="244" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Workflow mining: A survey of issues and approaches</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">F</forename><surname>Van Dongen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Herbst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Maruster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Schimm</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J M M</forename><surname>Weijters</surname></persName>
		</author>
		<idno type="DOI">10.1016/S0169-023X(03)00066-1</idno>
	</analytic>
	<monogr>
		<title level="j">Data &amp; Knowledge Engineering</title>
		<imprint>
			<biblScope unit="volume">47</biblScope>
			<biblScope unit="page" from="237" to="267" />
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Conformance checking of processes based on monitoring real behavior</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rozinat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.is.2007.07.001</idno>
	</analytic>
	<monogr>
		<title level="j">Information Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="64" to="95" />
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Data Scientist: The Engineer of the Future</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-319-04948-9_2</idno>
	</analytic>
	<monogr>
		<title level="m">Enterprise Interoperability VI</title>
				<editor>
			<persName><forename type="first">K</forename><surname>Mertins</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Bénaben</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Poler</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">J.-P</forename><surname>Bourrières</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2014">2014</date>
			<biblScope unit="page" from="13" to="26" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><surname>Chapman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clinton</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kerber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Khabaza</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Reinartz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Shearer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Wirth</surname></persName>
		</author>
		<title level="m">CRISP-DM 1.0 Step-by-step data mining guide</title>
				<imprint>
			<date type="published" when="2000">2000</date>
		</imprint>
		<respStmt>
			<orgName>The CRISP-DM consortium</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Wanna improve process mining results?</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J C</forename><surname>Bose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Mans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<idno type="DOI">10.1109/CIDM.2013.6597227</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium on Computational Intelligence and Data Mining (CIDM)</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="127" to="134" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<monogr>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">D</forename><surname>Aalst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename></persName>
		</author>
		<ptr target="org" />
		<title level="m">Extracting event data from databases to unleash process mining</title>
				<imprint>
			<publisher>BPMcenter</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">BPM reports</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Fuzzy Mining -Adaptive Process Simplification Based on Multi-perspective Metrics</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">W</forename><surname>Günther</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-540-75183-0_24</idno>
	</analytic>
	<monogr>
		<title level="m">Business Process Management</title>
				<editor>
			<persName><forename type="first">G</forename><surname>Alonso</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Dadam</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Rosemann</surname></persName>
		</editor>
		<meeting><address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="328" to="343" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Proclets: a framework for lightweight interacting workflow processes</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Barthelmess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">A</forename><surname>Ellis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wainer</surname></persName>
		</author>
		<idno type="DOI">10.1142/S0218843001000412</idno>
	</analytic>
	<monogr>
		<title level="j">International Journal of Cooperative Information Systems</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="443" to="481" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Coloured Petri Nets</title>
		<author>
			<persName><forename type="first">K</forename><surname>Jensen</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-60794-3</idno>
		<idno>doi:</idno>
		<ptr target="10.1007/978-3-642-60794-3" />
	</analytic>
	<monogr>
		<title level="s">Monographs in Theoretical Computer Science An EATCS Series</title>
		<imprint>
			<date type="published" when="1997">1997</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Discovering simulation models</title>
		<author>
			<persName><forename type="first">A</forename><surname>Rozinat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Mans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.is.2008.09.002</idno>
	</analytic>
	<monogr>
		<title level="j">Information Systems</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="page" from="305" to="327" />
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Brambilla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Cabot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wimmer</surname></persName>
		</author>
		<title level="m">Model-Driven Software Engineering in Practice</title>
				<imprint>
			<publisher>Morgan &amp; Claypool Publishers</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note>2nd ed</note>
</biblStruct>

<biblStruct xml:id="b30">
	<monogr>
		<title level="m" type="main">EMF: Eclipse Modeling Framework</title>
		<author>
			<persName><forename type="first">D</forename><surname>Steinberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Budinsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Merks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Paternostro</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
			<publisher>Pearson Education</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b31">
	<analytic>
		<title level="a" type="main">Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams</title>
		<idno type="DOI">10.1109/IEEESTD.2016.7740858</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Std</title>
		<imprint>
			<biblScope unit="volume">1849</biblScope>
			<biblScope unit="issue">2016</biblScope>
			<biblScope unit="page" from="1" to="50" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b32">
	<monogr>
		<title level="m" type="main">A framework for multi-model consistency management</title>
		<author>
			<persName><forename type="first">P</forename><surname>Stünkel</surname></persName>
		</author>
		<ptr target="https://hdl.handle.net/11250/2837740" />
		<imprint>
			<date type="published" when="2022">2022</date>
			<pubPlace>Bergen</pubPlace>
		</imprint>
		<respStmt>
			<orgName>Høgskulen på Vestlandet</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Doctoral Thesis</note>
</biblStruct>

<biblStruct xml:id="b33">
	<analytic>
		<title level="a" type="main">Process Mining in Healthcare: Data Challenges When Answering Frequently Posed Questions</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Mans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J B</forename><surname>Vanwersch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Moleman</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-36438-9_10</idno>
	</analytic>
	<monogr>
		<title level="m">Process Support and Knowledge Representation in Health Care</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Lenz</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Miksch</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Peleg</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Reichert</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Riaño</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Teije</surname></persName>
		</editor>
		<meeting>ess Support and Knowledge Representation in Health Care<address><addrLine>Berlin, Heidelberg</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="140" to="153" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
