<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Activity and Sequence Detection Evaluation Metrics: A Comprehensive Tool for Event Log Comparison</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Aaron</forename><surname>Friedrich Kurz</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">University of St</orgName>
								<address>
									<settlement>Gallen</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Ronny</forename><surname>Seiger</surname></persName>
							<email>ronny.seiger@unisg.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">University of St</orgName>
								<address>
									<settlement>Gallen</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Marco</forename><surname>Franceschetti</surname></persName>
							<email>marco.franceschetti@unisg.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">University of St</orgName>
								<address>
									<settlement>Gallen</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Barbara</forename><surname>Weber</surname></persName>
							<email>barbara.weber@unisg.ch</email>
							<affiliation key="aff0">
								<orgName type="institution">University of St</orgName>
								<address>
									<settlement>Gallen</settlement>
									<country key="CH">Switzerland</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Activity and Sequence Detection Evaluation Metrics: A Comprehensive Tool for Event Log Comparison</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">CDEA621361AA8AD675E75A65D7DAC14A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Business Process Management, Internet of Things, Activity Recognition, Activity Detection, Sequence Detection, Event Log Comparison Orcid 0000-0002-2547-6780 (A. F. Kurz)</term>
					<term>0000-0003-1675-2592 (R. Seiger)</term>
					<term>0000-0001-7030-282X (M. Franceschetti)</term>
					<term>0000-0002-6004-4860 (B. Weber)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Nowadays, event logs are not only created by traditional information systems, but also new data sources such as the IoT are considered to derive and construct event logs. This makes it necessary to evaluate the quality of these detected event logs and their underlying detection methods by comparison with given ground truth logs. We present AquDeM, enabling the comparison of XES-based event logs to evaluate activity and sequence detection methods. AquDeM features 1) a Python library that allows for programmatic comparison of event logs featuring a comprehensive set of metrics, and 2) a web app for visual event log comparison.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>An often investigated subject at the intersection of Business Process Management (BPM) and the Internet of Things (IoT) is the abstraction of low-level IoT events to BPM-level activities <ref type="bibr" target="#b0">[1]</ref>, which can be seen as a multi-class activity detection problem. In previous work, we presented a corresponding method <ref type="bibr" target="#b1">[2]</ref> that has the goal of detecting business process activities in real-time, based on annotated IoT data streams to enable online process conformance checking <ref type="bibr" target="#b2">[3]</ref>. While investigating methods to evaluate the quality of the IoT-based detection of activities, which we capture in event logs in XES <ref type="bibr" target="#b3">[4]</ref> format, we realized that most event log comparison tools that exist in the BPM field are not suitable to evaluate activity detection methods. They do not provide helpful metrics for the comparison of a detected event log (e.g., created from IoT data by the detection method) with a ground truth event log (representing the correct sequence and timing of activities as manually annotated or predefined) to evaluate and improve a specific detection method, since they i) focus on variant comparison to derive insights for process analysts regarding business outcomes (i.e., comparing process performance indicators) <ref type="bibr" target="#b4">[5]</ref>; and/or ii) they produce results that are not suitable for rapid comparison due to non-quantitative outputs (e.g., graphs or natural language) <ref type="bibr" target="#b4">[5]</ref>. Thus, we sought out methods from other relevant fields in the literature.</p><p>Core requirements for the event log comparison tools and metrics derived from our usecases <ref type="bibr" target="#b2">[3]</ref> are: i) they need to provide insights relevant for the detection quality of an activity detection method, i.e., on whether the activities are detected to be in-/active at the right times and/or whether the sequence of detected activities is correct w.r.t. a given ground truth; ii) they need to provide quantitative results for rapid and automatable comparison (e.g., for programmatic exploration of a potentially large number of method parameters); and iii) they should provide insights over multiple cases within the event logs. We found multiple suitable metrics in the areas of information theory, signal processing, and general activity recognition. However, none of them could be directly applied to BPM-related concepts (i.e., XES-based event logs): some important metrics (e.g., from <ref type="bibr" target="#b5">[6]</ref>) were not available as open implementations, and some others needed modification to make sense in the context of BPM (e.g., cross-correlation). Thus, we decided to implement, modify, extend, and integrate them in a tool ourselves.</p><p>The result is the Python library AquDeM<ref type="foot" target="#foot_0">1</ref> : Activity and Sequence Detection Evaluation Metrics, which takes two event logs in XES format as an input-one ground truth (GT) log and one log containing the detected (DET) activities-and allows for the calculation of a variety of comparison metrics for evaluation. Besides the library as core contribution, we have created a web application that utilizes the library, allowing for quick, intuitive, visual comparison of two event logs. In our research <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>, the Python library is not only used in this web app, but also in other more automated pipelines. The separation into library and web app allows for more varied use-cases without impacting the functionality or usability of either.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Innovation and Features</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Library</head><p>The metrics available in AquDeM can be categorized into activity level metrics and sequence level metrics. Activity level metrics are calculated for each activity type in each case separately. A sequence level metric is calculated for each case separately, but over all activity types in that case. For calculations that span multiple cases/activities, the results are aggregated, currently using the mean. Another categorization is into frame-based or event-based metrics [cf. 6]. Frame-based metrics are calculated based on specific time points when an activity is detected as (in-)active, making them dependent on the (IoT) data's sampling frequency (i.e., how often data is recorded). Event-based metrics work on the classification of events themselves and do not take the sampling frequency into account. For the calculation of the metrics, the event logs must minimally adhere to these requirements: i) each activity execution needs both a start and a complete event; and ii) the logs need a sampling frequency for the frame metrics.</p><p>The metrics were selected from literature, implemented and modified based on the requirements in Section 1. To get a better understanding of the metrics, we provide intuitive (noncomplete) explanations below. Note, that all of these metrics are also available in normalized form in the library, allowing for comparison among different event logs. In Table <ref type="table" target="#tab_0">1</ref> we give an overview of the metrics regarding the categories described above, together with references to their definitions. Furthermore, we provide a usage example of AquDem in Listing 1.</p><p>• Cross Correlation (CC) measures the similarity between the DET and GT time series by determining the shift at which they are most alike and quantifying that similarity, relative to perfect equality for time series of that length.</p><p>Two Set (TS) metrics classify frames into categories such as true positive, true negative, deletions, fragmentations, mergings, insertions, and over-fillings or under-fillings at the start and end of an activity instance.</p><p>• Event Analysis (EA) metrics categorize the GT events as correct, deleted, fragmented, merged, or both fragmented and merged; and DET events as correct, inserted, fragmenting, merging, or both fragmenting and merging.</p><p>• The Levenshtein-Distance (LD) calculates the minimum number of single activity instance edits (insertions, deletions, or substitutions) needed to transform the sequence of activity instances in DET to match GT.</p><p>• The Damerau-Levenshtein-Distance (DLD) extends the LD metric by also considering the transposition of two adjacent activity instances as a single edit. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Web App</head><p>The web app, built using streamlit,<ref type="foot" target="#foot_1">2</ref> has proven to be useful for the iterative and exploratory process of evaluation and development of the detection method in <ref type="bibr" target="#b1">[2]</ref>. Notably, the library has been developed in tandem with the web app: it is built with interactive and repeated calculations in mind, i.e., browsing metrics and going back-and-forth with different analysis parameters. The library internally relies on caching to speed up recurring requests and to re-use computations from previous requests for similar requests (e.g., a filtered view that contains data calculated in a previous view). Screenshots of the web app can be seen in Figure <ref type="figure" target="#fig_0">1</ref>. After uploading two XES logs, the user can choose a certain metric to visualize. The visualization, tabular presentation, and further options for filtering are varied for each metric to offer suitable presentations for exploration with that particular metric. The app provides specific visualizations we deemed useful (based on our use-cases), with a more flexible exploration being possible with the library.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Maturity and Evaluation</head><p>small-synth: synthetic logs, 2 cases each and 34 start / complete events in total medium-exp: experimental logs, 1 case each and 262 start / complete events in total large-exp: experimental logs, 4 cases each and 1290 start / complete events in total Machine: CPU cores: 16; Model: AMD Ryzen 7 7840HS with Radeon 780M Graphics; Threads per core: 2; RAM: 32.0 GiB We consider the maturity of the library and web app to be relatively high: they are continually improved and extended as they are used in our research, and include an automated test suite with combined branch and line coverage of &gt; 90%. To better understand the library from a performance and usability perspective, we have measured runtimes with a variety of event log pairs (GT and DET), including experimental logs from the smart factory scenario described in <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. The results can be seen in Figure <ref type="figure" target="#fig_1">2</ref>, indicating acceptable performance and scalability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>In this work we presented AquDeM: a tool featuring activity and sequence detection evaluation metrics to be used for event log comparison by BPM researchers. Besides the main, programmatically usable Python library, we provide a web app for fast, visual comparison of two event logs. The modular and decoupled design of library and web app allows for flexible usage. Given the increasing research attention in the area of activity detection in BPM and the absence of appropriate tools, we believe this to be a valuable addition to the pool of community resources.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Screenshots of web app for event log comparison; LEFT: users can choose which metric to view, filter cases and activities; CENTER: visualizations are provided for the selected metric; RIGHT: users can view an interactive timeline of the logs, comparing detected activities with the ground truth.</figDesc><graphic coords="4,89.29,108.98,117.53,119.42" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Boxplots (quartiles) for runtime of calculation of all available metrics over all possible case/activity combinations for ground truth/detected log pairs of varying complexity; 10 runs each.</figDesc><graphic coords="4,165.16,467.80,150.80,140.91" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Available metrics, with activity/sequence and frame/event categorization, and definition references.</figDesc><table><row><cell cols="4">Metric Abbr. Activity/Sequence Frame/Event Definition</cell></row><row><cell>CC</cell><cell>Activity</cell><cell>Frame</cell><cell>cf. [7], zero-padding; input vector 1 when</cell></row><row><cell></cell><cell></cell><cell></cell><cell>active, -1 when inactive</cell></row><row><cell>TS</cell><cell>Activity</cell><cell>Frame</cell><cell>cf. [6]</cell></row><row><cell>EA</cell><cell>Activity</cell><cell>Event</cell><cell>cf. [6]</cell></row><row><cell>LD</cell><cell>Sequence</cell><cell>Event</cell><cell>cf. [8]</cell></row><row><cell>DLD</cell><cell>Sequence</cell><cell>Event</cell><cell>cf. [9]</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">Video: https://youtu.be/dM4Y-80L3gA; Code: https://github.com/ics-unisg/aqudem, tags: pkg-v0.1.1, fe-v0.1.1</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">https://streamlit.io/, last accessed</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">3rd May 2024</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has received funding from the Swiss National Science Foundation under Grant No. IZSTZ0_208497 (ProAmbitIon project).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">The internet of things meets business process management: A manifesto</title>
		<author>
			<persName><forename type="first">C</forename><surname>Janiesch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Koschmider</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Mecella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Weber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Burattin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Di Ciccio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Systems, Man, and Cybernetics Magazine</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="34" to="44" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Data-driven generation of services for iot-based online activity detection</title>
		<author>
			<persName><forename type="first">R</forename><surname>Seiger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Franceschetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Weber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Service-Oriented Computing</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="186" to="194" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Proambition: Online process conformance checking with ambiguities driven by the internet of things</title>
		<author>
			<persName><forename type="first">M</forename><surname>Franceschetti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Seiger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">J G</forename><surname>González</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Garcia-Ceja</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">A R</forename><surname>Flores</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>García-Bañuelos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Weber</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">CAiSE Research Projects Exhibition</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="52" to="59" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Ieee standard for extensible event stream (xes) for achieving interoperability in event logs and event streams</title>
	</analytic>
	<monogr>
		<title level="m">IEEE Std 1849-2023 (Revision of IEEE Std</title>
				<imprint>
			<date type="published" when="2016">2016. 2023</date>
			<biblScope unit="volume">1849</biblScope>
			<biblScope unit="page" from="1" to="55" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Business process variant analysis: Survey and classification</title>
		<author>
			<persName><forename type="first">F</forename><surname>Taymouri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">L</forename><surname>Rosa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Dumas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">M</forename><surname>Maggi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Knowledge-Based Systems</title>
		<imprint>
			<biblScope unit="volume">211</biblScope>
			<biblScope unit="page">106557</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Performance metrics for activity recognition</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">A</forename><surname>Ward</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Lukowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">W</forename><surname>Gellersen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Intell. Syst. Technol</title>
		<imprint>
			<biblScope unit="volume">2</biblScope>
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">The Discrete Fourier Transform, Part 6: Cross-Correlation</title>
		<author>
			<persName><forename type="first">D</forename><surname>Lyon</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">The Journal of Object Technology</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="page">17</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Binary codes capable of correcting deletions, insertions, and reversals</title>
		<author>
			<persName><forename type="first">V</forename><forename type="middle">I</forename><surname>Levenshtein</surname></persName>
		</author>
		<author>
			<persName><surname>Others</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Soviet physics doklady</title>
				<imprint>
			<publisher>Soviet Union</publisher>
			<date type="published" when="1966">1966</date>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="page" from="707" to="710" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">A technique for computer detection and correction of spelling errors</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">J</forename><surname>Damerau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page" from="171" to="176" />
			<date type="published" when="1964">1964</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
