<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Enhancing Event Log Manipulation and Insight Discovery through Querying Process Representations with DFGs</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">María</forename><surname>Salas-Urbano</surname></persName>
							<email>msurbano@us.es</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Sevilla</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Enhancing Event Log Manipulation and Insight Discovery through Querying Process Representations with DFGs</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">B0238B8594431D8157249831A6049156</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T20:03+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>LoVizQL</term>
					<term>process mining</term>
					<term>query language</term>
					<term>Directly-Follows Graph</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this doctoral thesis project we will address some limitations of the current process mining tools to analyze business processes. To achieve this, we propose to develop and evaluate a tool based on a query language for analyzing and visualizing business processes from event logs.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Process mining techniques use event logs to discover, analyze, and optimize business processes <ref type="bibr" target="#b0">[1]</ref>. Current process mining tools offer several functionalities, such as data filtering or process visualization using Directly Follows Graphs (DFGs).</p><p>Process mining analysts often perform a frequent data analysis that involves a significant manual effort to obtain several multiple sets of traces from an event log. Additionally, the analysis implies identifying specific subsets that meet certain criteria, which necessitates repetitive actions and comparisons between DFGs, and relies heavily on the user. For instance, as observed in prior research <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>, a typical workflow for analysts involves comparing different case subsets (e.g., cases grouped by product category within a procure-to-pay process) to identify patterns or behaviors in the process data (such as cases containing transitions with unusually high cycle times).</p><p>The substantial user workload stems from the fact that current process mining tools are not prepared to simultaneously handle multiple DFGs in a consistent manner. Conducting this type of analysis using existing process mining tools is typically a time-consuming task that involves several steps. Initially, the analyst filters the event log to isolate cases associated with a specific product. Then, the analyst configures and explores the DFG to uncover insights related to those cases. This process is repeated for all products, which may be dozens or hundreds. Often, these DFGs are compared either with each other or with a specific pattern of interest to the analyst, such as transitions with a high cycle time. This comparison is usually performed applying filters back and forth because most process mining tools can visualize only one process at a time.</p><p>Our goal is to support the analyst in carrying out this labor-intensive analysis by developing Log data Visualization Query Language (LoVizQL), a query language to obtain collections of DFGs that satisfy certain conditions desired by the user. With our approach, users can discover insights about the process without manually manipulating the event log, explore the data, and compare the various visualizations that are generated during the analysis. For instance, in a single LoVizQL query, users can filter the event log traces by each organizational unit involved in the process, obtain the corresponding DFGs for each data subset, and search for those DFGs where the frequency of activity rework exceeds the average.</p><p>To address its, this PhD thesis is driven by the following research question: RQ: How can the manipulation of event logs and the discovery of aspects of interest in the DFGs be facilitated for users?</p><p>This research question can be answered by addressing the following objectives: OBJ1: Identify the frequent workflows followed by analysts to compare different subsets of cases (e.g., cases grouped by product category of a purchase-to-pay process) and to identify interesting patterns or behaviors in process data through the use of DFGs.</p><p>OBJ2: Develop a query language to manipulate collections of DFGs and discover those that may contain relevant information.</p><p>OBJ3: Develop a support tool to effectively use the query language in order to visualize and analyze business processes from event logs.</p><p>OBJ4: Validate the tool with real scenarios and real users. The objective is to determine if results similar to those obtained with typical process mining tools are obtained in a more agile way.</p><p>The rest of the document describes the methodology that will be followed to address these objectives, details of the proposal and its current state of development, and an analysis of the work related to this doctoral thesis project.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Methodology</head><p>Design Science (DS) is the research methodology to be followed in this work. This methodology serves several purposes: it aligns with existing literature, provides a nominal process model and a mental model for presenting and evaluating DS research in information systems <ref type="bibr" target="#b3">[4]</ref>. To achieve the target objectives and in accordance with the steps of DS, we pursue the following milestones: 1) to identify the problem and to motivate it, 2) to define objectives and solutions, 3) to design and development a support tool for process mining analysts, 4) to demonstrate the utility of the tool, 5) to evaluate its utility, and 6) to communicate and promote the obtained results.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposal</head><p>Regarding to the steps of DS mentioned in Section 2, we have already identified the problem in Section 1 and explained why it poses a challenge for analysts. In addition, we have also defined our research question and four objectives to address it. Currently, we have addressed OBJ1 and OBJ2.</p><p>In relation to the first objective (OBJ1), we have relied on the results published in the BPM 2022 <ref type="bibr" target="#b2">[3]</ref>. In this work, we used Business Process Intelligence Chanllenge (BPIC) to discover how process analysts answered to specific business questions related to time performance. We coded 110 answers to time performance questions in more than 60 process mining reports. As a result, we identified 55 different operations with 137 variants used in them. We analyzed the types of answers and their similarities and examined how contextual information as well as existing process mining support may have affected them. These results provide an overview of the state of practice at that time in addressing questions related to time performance and have revealed opportunities for enhancing process mining tools. For instance, the study identified the iterative use of filters on event logs and the comparison of multiple DFGs from various subsets of traces as time-consuming tasks when utilizing these tools</p><p>In addition, we carried out an extended study of this work that is under review in a journal. Through a mixed-method approach, the study analyzes operations performed by process analysts in response to such questions using the previous reports and 12 screen/audio recordings. The research provides a detailed and fine-grained characterization of these operations, allowing for classification, comparison, and assessment of how contextual information influences the analysis.</p><p>Regarding the second objective (OBJ2), we have designed using Python a first version of a language called LoVizQL <ref type="bibr" target="#b4">[5]</ref> based on this previous work <ref type="bibr" target="#b5">[6]</ref>. LoVizQL aims to automatically generate collections of DFGs containing insights about a process without the need for manual manipulation and visualization comparisons. The user can categorize the characteristics of each resulting DFG collection using the query fields defined in each query row (cf Figure <ref type="figure" target="#fig_0">1</ref>). Specifically, the user can determine how to manipulate the data (Filter step) and how to create the collection of DFGs (DFG creation step), that is, the characteristics of the DFGs (metric used, nodes, percentage of activities and paths shown). In addition, users can define the conditions that the DFGs must meet to be returned (Selection step).</p><p>On top of this language, we aim to develop tool support to help the users to visualize and analyze business processes from event logs. We have already used LoVizQL to solve a provided questions in a BPIC and obtained results similar to those of some participants. Next, we plan to develop a tool to facilitate the use of this query language by users, and we plan to evaluate the future tool with real users and real scenarios, through experiments, addressing real analytic questions, and comparing the performance and effectiveness of the tool against current process mining tools such as Disco.</p><p>Finally, we plan to disseminate the results and promote the use of process mining, with the publication of the results in high-impact journals and participation in different conferences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Related work</head><p>In the last decade, specific query languages have been developed for business process domains to obtain useful information about processes and assist in their executions. The process querying framework <ref type="bibr" target="#b6">[7]</ref> has categorized these languages into various groups. Some of them have been categorized as event log query languages <ref type="bibr" target="#b7">[8]</ref>, encompassing diverse subject areas.</p><p>Some query languages focus on event log data, treating it as graphs to discover hierarchies and summarize information, such as <ref type="bibr" target="#b8">[9]</ref>. Others <ref type="bibr" target="#b9">[10]</ref> aims to simplify query writing, combining process and data perspectives for easier selections and insights. Some languages <ref type="bibr" target="#b10">[11]</ref> facilitate querying Key Performance Indicators (KPIs) and Process Performance Indicators (PPIs) of activities or cases. Additionally, certain languages handle complex relations (constraints) between process elements, while a software company developed its own language for formalizing business questions as queries.</p><p>However, none of these languages allow users to iteratively filter event log data and select instances meeting specific conditions through comparisons. Existing process mining tools often require manual modification of Directly-Follows Graphs (DFGs) for specifying conditions, resulting in a tedious trial-and-error process. Inspired by data science, where query languages have addressed similar challenges in exploratory data analysis and visualization, we have designed LoVizQL, extending concepts from <ref type="bibr" target="#b11">[12]</ref>.</p><p>On the other hand, some works related to the identification of actions in process analysis have already been carried out. <ref type="bibr" target="#b1">[2]</ref> qualitatively analyzes BPIC reports to understand how process analysts perform their work by focusing on visual representations. We complete this investigation by focusing on identifying all specific low-level operations to understand how specific issues are addressed. Furthermore, <ref type="bibr" target="#b12">[13]</ref> carries out an empirical study to understand how analysts perform a process mining task, focusing only on the initial exploratory phase of process mining.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Query language steps</figDesc><graphic coords="3,89.29,502.60,416.70,65.60" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Acknowledgments</head><p>This research is partially funded by projects PID2021-126227NB-C21 (PERSEO), RTI2018-100763-J-I00 (CONFLEX) and TED2021-131023B-C22 (ORCHID) granted by MCIN/ AEI/ 10.13039/501100011033/ and ERDF A way of making Europe.</p><p>This PhD thesis is supervised by Manuel Resinas Arias de Reyna and Cristina Cabanillas Macías from the University of Seville.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A practitioner&apos;s guide to process mining: Limitations of the directly-follows graph</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Procedia Computer Science</title>
		<imprint>
			<biblScope unit="volume">164</biblScope>
			<biblScope unit="page" from="321" to="328" />
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Mining Process Mining Practices: An Exploratory Characterization of Information Needs in Process Analytics</title>
		<author>
			<persName><forename type="first">C</forename><surname>Klinkmüller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Müller</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Weber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Business Process Management (BPM)</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="322" to="337" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Analyzing how process mining reports answer time performance questions</title>
		<author>
			<persName><forename type="first">C</forename><surname>Capitán-Agudo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Salas-Urbano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cabanillas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Resinas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Business Process Management (BPM)</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="234" to="250" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A design science research methodology for information systems research</title>
		<author>
			<persName><forename type="first">K</forename><surname>Peffers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tuunanen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Rothenberger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chatterjee</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of management information systems</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="45" to="77" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Lovizql: A query language for visualizing and analyzing business processes from event logs</title>
		<author>
			<persName><forename type="first">M</forename><surname>Salas-Urbano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Capitán-Agudo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cabanillas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Resinas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Service-Oriented Computing (ICSOC)</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note>In press</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A query language for exploring directly-follows graph collections</title>
		<author>
			<persName><forename type="first">M</forename><surname>Salas-Urbano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Capitán-Agudo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Cabanillas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Resinas</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Jornadas de Ciencia e Ingeniería de Servicios</title>
		<imprint>
			<date type="published" when="2022">2022</date>
			<publisher>JCIS</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Process querying: Enabling business intelligence through query-based process analytics</title>
		<author>
			<persName><forename type="first">A</forename><surname>Polyvyanyy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Barros</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Decision Support Systems</title>
		<imprint>
			<biblScope unit="volume">100</biblScope>
			<biblScope unit="page" from="41" to="56" />
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<title level="m" type="main">Process Querying Methods</title>
		<author>
			<persName><forename type="first">A</forename><surname>Polyvyanyy</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
			<publisher>Springer</publisher>
			<pubPlace>Cham</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<monogr>
		<title level="m" type="main">Process Querying Methods</title>
		<author>
			<persName><forename type="first">A</forename><surname>Beheshti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Benatallah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">R</forename><surname>Motahari-Nezhad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ghodratnama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Amouzgar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Beheshti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ghodratnama</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Amouzgar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Benatallah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">R</forename><surname>Motahari-Nezhad</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
			<publisher>Springer</publisher>
			<pubPlace>Cham</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Everything you always wanted to know about your process, but did not know how to ask</title>
		<author>
			<persName><forename type="first">E</forename><surname>González López De Murillas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Reijers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M P</forename><surname>Van Der Aalst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">BPM Workshops</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="volume">281</biblScope>
			<biblScope unit="page" from="296" to="309" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<title level="m" type="main">Process Querying Methods</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M P</forename><surname>Álvarez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">C</forename><surname>Díaz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Parody</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">M R</forename><surname>Quintero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Gómez-López</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
			<publisher>Springer</publisher>
			<pubPlace>Cham</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">The patent holder&apos;s dilemma: Buy, sell, or troll?</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Abril</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Plant</surname></persName>
		</author>
		<idno type="DOI">10.1145/1188913.1188915</idno>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">50</biblScope>
			<biblScope unit="page" from="36" to="44" />
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Initial insights into exploratory process mining practices</title>
		<author>
			<persName><forename type="first">F</forename><surname>Zerbato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Soffer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Weber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">BPM Forum</title>
		<imprint>
			<biblScope unit="page" from="145" to="161" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
