<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Work Tagger: A Labelling Companion</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Manuel</forename><surname>Resinas</surname></persName>
							<email>resinas@us.es</email>
							<affiliation key="aff0">
								<orgName type="institution">Universidad de Sevilla</orgName>
								<address>
									<addrLine>Avda. Reina Mercedes s/n</addrLine>
									<postCode>41012</postCode>
									<settlement>Seville</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Rocío</forename><surname>Goñi-Medina</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Universidad de Sevilla</orgName>
								<address>
									<addrLine>Avda. Reina Mercedes s/n</addrLine>
									<postCode>41012</postCode>
									<settlement>Seville</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Iris</forename><surname>Beerepoot</surname></persName>
							<email>i.m.beerepoot@uu.nl</email>
							<affiliation key="aff1">
								<orgName type="institution">Utrecht University</orgName>
								<address>
									<addrLine>Heidelberglaan 8</addrLine>
									<postCode>3584 CS</postCode>
									<settlement>Utrecht</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Adela</forename><surname>Del-Río-Ortega</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Universidad de Sevilla</orgName>
								<address>
									<addrLine>Avda. Reina Mercedes s/n</addrLine>
									<postCode>41012</postCode>
									<settlement>Seville</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Hajo</forename><forename type="middle">A</forename><surname>Reijers</surname></persName>
							<email>h.a.reijers@uu.nl</email>
							<affiliation key="aff1">
								<orgName type="institution">Utrecht University</orgName>
								<address>
									<addrLine>Heidelberglaan 8</addrLine>
									<postCode>3584 CS</postCode>
									<settlement>Utrecht</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Work Tagger: A Labelling Companion</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">D40A6925F7E8CF805F4A9199960FB0FB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:33+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>process mining, event abstraction, active window tracking, task classification Orcid 0000-0003-1575-406X (M. Resinas)</term>
					<term>0000-0002-6301-9329 (I. Beerepoot)</term>
					<term>0000-0003-3089-4431 (A. del-Río-Ortega)</term>
					<term>0000-0001-9634-5852 (H. A. Reijers)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In settings where data is recorded at a fine-granular level, it needs to be abstracted to enable process mining. While several event abstraction techniques exist, the majority are supervised and require manually labelled datasets, a process that is both time-consuming and critical for developing new methods. To streamline this process, we introduce a tool designed to facilitate the tagging of finegranular data using predefined activities, with a specific focus on Active Window Tracking (AWT) data. The tool offers features such as data visualization, filtering, and automatic classification based on GPT, which can be adjusted by the user. Our evaluation, involving four researchers tagging their AWT data, demonstrates that increased experience with the tool leads to faster tagging, and we discuss potential future enhancements.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>One of the core requirements for process mining is the recording of process activities <ref type="bibr" target="#b0">[1]</ref>. However, process behavior is not always recorded at the right granularity level <ref type="bibr" target="#b1">[2]</ref>. In settings where data is recorded at a very fine-granular level, groups of events may need to be abstracted to a higher-level activity <ref type="bibr" target="#b2">[3]</ref>. Several event abstraction techniques have been proposed, the majority of which are supervised techniques <ref type="bibr" target="#b3">[4]</ref>. Supervised techniques require additional information, typically a subset of the data that is manually labelled by a domain expert. Although this is a laborious process, the importance of labelled datasets for the development of new techniques cannot be overstated. As such, it is vital that this labelling needs to be made as quick and easy as possible.</p><p>In this paper, we present a tool that aims to support the tagging of fine-granular data using a set of predefined activities. The basic features of the tool are applicable to different types of datasets, but this particular implementation focuses on the labelling of so-called Active Window Tracking (AWT) data. The opportunities of this data for mining work practices are described in <ref type="bibr" target="#b4">[5]</ref>. AWT data contains information about a person's computer behavior in the form of start and end times of each active application and window. It allows for the use of different views on the data, the application of filters, and the modification of visualizations. Additionally, it allows a user to start from an automatic classification based on GPT and modify the labels. Through an evaluation by four researchers tagging their own AWT data, we demonstrate how more experience with the tool results in faster tagging, and reflect on future improvements.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Tool Description</head><p>Work Tagger is a web-based tool that facilitates the classification of AWT data. Work Tagger has been developed using Streamlit, which is an open-source Python framework to create interactive web-applications, whose main characteristic is that it integrates the development of both a web-based frontend and backend into a single Python code base.</p><p>Work Tagger is designed to use the AWT data collected by an application called Tockler<ref type="foot" target="#foot_0">1</ref> , which records all active windows on the computer while the application is installed and running. We have chosen Tockler because it is open source and runs locally, which helps to avoid privacy concerns. However, Work Tagger is designed to easily integrate data coming from other similar tools. The web-based user interface of Work Tagger allows users to upload files, select data for classification, and visualize the results using different views. The user interface is designed to be user-friendly and interactive, featuring dynamic UI elements such as buttons, select boxes and sliders for ease of use. Streamlit's interactive widgets enhance user experience by providing responsive and intuitive controls.</p><p>Once users upload their AWT logs (in csv format) through the user interface, the backend processes these files, converting them into a dataframe. During this process, columns are prepared with the necessary formats for efficient data manipulation. In contrast to other webbased tools, Work Tagger does not use a database for data storage. Instead, Work Tagger relies on session state variables to store data temporarily. This approach ensures that each user's data is isolated and managed independently, preventing conflicts in a multi-user environment. These session state variables are maintained for the duration of the user's interaction with the application, ensuring a personalized and consistent experience. Additionally, this decision is related to privacy concerns, we do not store records of individuals' computer usage, thus protecting users' personal information and ensuring their privacy.</p><p>When the AWT log is loaded in the application, Work Tagger displays the AWT events in a table and allows the user to label the events with the activity and case the user was performing at that moment. For activities, the user may opt to undertake the classification process either manually or automatically. In the former case, the user has to choose the activity from a predefined list of activities and subactivities based on the one used in <ref type="bibr" target="#b4">[5]</ref> for academic work activities. However, Work Tagger is designed so that the set of activities can be easily modified <ref type="foot" target="#foot_1">2</ref> . In the latter case, once automatic classification is initiated, the backend logic sends the data to the classification core to interact with the OpenAI API to perform zero-shot classification using the GPT-4o model based on the same set of activities and subactivities used in the manual classification. We opted for this approach to provide a highly flexible and adaptive classification system that does not require training the model beforehand.</p><p>Concerning the labeling of cases, only manual labeling is possible. Moreover, unlike activities, the set of cases are open and users can pick from case labels already used in the dataset or can enter new case labels. The reason for following a different approach for cases is because, unlike activities, they are very specific to a particular person and a particular moment in time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Tool Functionality</head><p>In this section, we describe the different functionalities Work Tagger has:</p><p>AWT Event Log Upload. To start using Work Tagger, the user must upload an event log from Tockler. The accepted file type is CSV with a size limit of 200 MB, although it can be easily extended. By default, the labels of Activity and Subactivity will be "No work-related" and "Unspecified No work-related." The user can also upload a CSV file that has been previously labeled in the application, or load a publicly available sample dataset <ref type="bibr" target="#b5">[6]</ref>.</p><p>AWT Data Visualization. Once the AWT event log is uploaded, the data is displayed by the application in a table (see blue box in Fig. <ref type="figure" target="#fig_0">1</ref>). Work Tagger uses pagination in the table to display the classify data in manageable chunks allowing users to navigate through the data by selecting the page they want to view and also to modify the size of the page they are visualizing. Users can personalize the visualization of the data in the table by choosing between three different views (green box in Fig. <ref type="figure" target="#fig_0">1</ref>):</p><p>• Time view. In this view, each row of the table is an event in the AWT. The rows are ordered by timestamp. Several aspects can be configured in this view using the controls in the yellow box of Fig. <ref type="figure" target="#fig_0">1</ref>. Using a calendar, users can select the date for which the data should be displayed. The calendar allows selection from the earliest to the most recent entry in the uploaded event log. Users can also select the start time of their day to adapt to people that have different schedules, e.g., night owl workers. It is also possible to filter events by activity possibly showing a window of events before or after them of configurable size. Finally, when Begin-End colouring is enabled, if the time difference between the End Time of one row and the Begin Time of the following row exceeds the number of minutes selected in the slider, the row will be marked in gray. • Active window view. In this view, the rows of the table are the different active window titles that appear in the log. The view is sorted by the number of times the title appears in the log, although it can also be sorted by duration. In this view, users can filter by application, so that only the active window titles of a certain application are shown, and by window title, so that only the titles that contain the words entered by the user are shown. This view is useful to label some activities that are clearly related to a certain window title. For instance, if the window title contains Overleaf, then it is likely that the activity is related to Write research papers. • Activity view. This view is similar to the Time view, but events are grouped by subactivity.</p><p>Like the time view, this view is sorted by timestamp, but it can also be sorted by duration of the subactivity and number of events included in a subactivity. Furthermore, users can also filter by activity. This view is useful to provide an overview of the activity labels applied and to facilitate case labeling.</p><p>Finally, users can also enable the Blocks colours option depicted in the orange box of Fig. <ref type="figure" target="#fig_0">1</ref>. When this option is enabled, each row will be colored differently based on its Activity value.</p><p>Manual Classification. It is performed by selecting one or more rows in the table using the checkboxes and then applying the labels that appear in the left sidebar of the application. Users can modify the subactivity value using the sidebar (red box in Fig. <ref type="figure" target="#fig_0">1</ref>), in three different ways:</p><p>1. Selecting from a comprehensive list of all activities in the first select box. 2. Clicking on one of the buttons that display the last three used subactivities. 3. Using a select box that categorizes subactivities by activity.</p><p>To label cases, users can choose from the cases that have already been used in the dataset by clicking in the corresponding button (cyan box in Fig. <ref type="figure" target="#fig_0">1</ref>), or can add a new case label using the corresponding textbox. Automatic Classification. It is performed using an expandable box that appears in the sidebar (grey box in Fig. <ref type="figure" target="#fig_0">1</ref>). Once expanded, a form will appear, allowing the user to enter their OpenAI key and organization details, and select the data the user wishes to classify: all data, only selected rows, or only data from a selected date. Once the necessary information is provided, the user clicks a button to start the automatic classification process.</p><p>Undo and Download CSV Buttons By clicking these buttons (see purple box in Fig. <ref type="figure" target="#fig_0">1</ref>), the user can undo the last change made, and download the updated data with all modifications made, respectively. by clicking the "Download CSV" button.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Tool Maturity</head><p>In order to evaluate and improve the tool, four authors of this paper collected data using Tockler and used Work Tagger to label a week's worth of data. While doing so, they recorded the time they spent on labelling each day of that week and the number of rows labelled. The results are depicted in Table <ref type="table" target="#tab_0">1</ref>. Generally speaking, the time it takes to tag a row in the dataset strongly decreases the more time the user spends in the tool, as can be seen in Fig. <ref type="figure" target="#fig_1">2</ref>. This is especially striking for researcher 4, who went from taking 7.2 seconds per row to 0.3 and 0.4 seconds per row. For researcher 1, a decrease between day 1 and 5 can also be seen, but it is less linear. This may be due to the fact that researcher 1 tagged days 1 through 4 within 8 days of each other, while day 5 was tagged 11 days later, when the researcher had to get back into the swing.</p><p>After tagging days 1 and 2 simultaneously, the researchers discussed their experiences and proposed changes to the tool. This resulted in the addition of a list of the subactivities last used and a pagination button on the bottom of the page. When researchers 1 through 3 had finished tagging all days, there was another round of changes to the tool. We added the option of uploading sample data, active window and activity views, a visualization depicting the duration, and an 'undo' button. Researcher 4 completed the final three days using the current version of the tool, resulting in the fastest tagging times observed.  </p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: User Interface of Work Tagger with an uploaded file in the Time view</figDesc><graphic coords="4,90.13,84.19,412.53,193.35" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Overview of Time Spent per Researcher.</figDesc><graphic coords="6,140.13,84.19,312.52,205.22" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Overview of time spent, rows tagged, and seconds spent per event tagged.</figDesc><table><row><cell>Researcher</cell><cell>Metric</cell><cell cols="5">Day 1 Day 2 Day 3 Day 4 Day 5</cell></row><row><cell></cell><cell>Time spent (in minutes)</cell><cell>70</cell><cell>35</cell><cell>24</cell><cell>25</cell><cell>7</cell></row><row><cell>Researcher 1</cell><cell># rows tagged</cell><cell>968</cell><cell>1250</cell><cell>1694</cell><cell>1472</cell><cell>238</cell></row><row><cell></cell><cell cols="2">Seconds spent per row tagged 4.3</cell><cell>1.7</cell><cell>0.9</cell><cell>1</cell><cell>1.8</cell></row><row><cell></cell><cell>Time spent (in minutes)</cell><cell>60</cell><cell>22</cell><cell>22</cell><cell>23</cell><cell>12</cell></row><row><cell>Researcher 2</cell><cell># rows tagged</cell><cell>804</cell><cell>561</cell><cell>536</cell><cell>514</cell><cell>442</cell></row><row><cell></cell><cell cols="2">Seconds spent per row tagged 4.5</cell><cell>2.4</cell><cell>2.5</cell><cell>2.7</cell><cell>1.6</cell></row><row><cell></cell><cell>Time spent (in minutes)</cell><cell>37</cell><cell>17</cell><cell>17</cell><cell>15</cell><cell>3</cell></row><row><cell>Researcher 3</cell><cell># rows tagged</cell><cell>616</cell><cell>329</cell><cell>614</cell><cell>476</cell><cell>72</cell></row><row><cell></cell><cell cols="2">Seconds spent per row tagged 3.6</cell><cell>3.1</cell><cell>1.7</cell><cell>1.9</cell><cell>2.5</cell></row><row><cell></cell><cell>Time spent (in minutes)</cell><cell>52</cell><cell>59</cell><cell>33</cell><cell>3</cell><cell>0.5</cell></row><row><cell>Researcher 4</cell><cell># rows tagged</cell><cell>433</cell><cell>862</cell><cell>924</cell><cell>550</cell><cell>69</cell></row><row><cell></cell><cell cols="2">Seconds spent per row tagged 7.2</cell><cell>4.1</cell><cell>2.1</cell><cell>0.3</cell><cell>0.4</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://maygo.github.io/tockler/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">More information on how to do it can be found at https://github.com/project-pivot/worktagger, in the README file. Still, we plan to extend the application's flexibility by allowing users to upload their own custom list of activities, enabling customization and adaptation to various domains.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work has been partially supported by grants PID2021-126227NB-C21 and PID2022-140221NB-I00 funded by MCIN/AEI /10.13039/501100011033/FEDER, EU, and TED2021-131023B-C22 funded by MCIN/AEI/10.13039/501100011033 and by the European Union "NextGenera-tionEU"/PRTR.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Process mining: Overview and opportunities</title>
		<author>
			<persName><forename type="first">W</forename><surname>Van Der Aalst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Management Information Systems (TMIS)</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="page" from="1" to="17" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Wanna improve process mining results?</title>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">J C</forename><surname>Bose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">S</forename><surname>Mans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">M</forename><surname>Van Der Aalst</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE symposium on computational intelligence and data mining (CIDM), IEEE</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="127" to="134" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Mannhardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Tax</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1704.03520</idno>
		<title level="m">Unsupervised event abstraction using pattern abstraction and local process models</title>
				<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Event abstraction in process mining: literature review and taxonomy</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">J</forename><surname>Van Zelst</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Mannhardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>De Leoni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Koschmider</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Granular Computing</title>
		<imprint>
			<biblScope unit="volume">6</biblScope>
			<biblScope unit="page" from="719" to="736" />
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A window of opportunity: Active window tracking for mining work practices</title>
		<author>
			<persName><forename type="first">I</forename><surname>Beerepoot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Barenholz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Beekhuis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gulden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Overbeek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Van De Weerd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Van Der Werf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Reijers</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2023 5th International Conference on Process Mining (ICPM), IEEE</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="57" to="64" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">A month&apos;s worth of labelled active window tracking data</title>
		<author>
			<persName><forename type="first">I</forename><surname>Beerepoot</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Best Dissertation Award, Doctoral Consortium, and Demonstration and Resources Forum at BPM 2024</title>
				<meeting>the Best Dissertation Award, Doctoral Consortium, and Demonstration and Resources Forum at BPM 2024</meeting>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
