<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">A Month&apos;s Worth of Labelled Active Window Tracking Data</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Iris</forename><surname>Beerepoot</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Utrecht University</orgName>
								<address>
									<addrLine>Heidelberglaan 8</addrLine>
									<postCode>3584 CS</postCode>
									<settlement>Utrecht</settlement>
									<country key="NL">The Netherlands</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">A Month&apos;s Worth of Labelled Active Window Tracking Data</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">9D28A1C9036689369489E5640E384FC0</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T17:02+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Event data</term>
					<term>Active Window Tracking</term>
					<term>UI logs</term>
					<term>cross-system data</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Recent process mining techniques provide interesting new ways to uncover and comprehend complex work practices within organisations. The efficacy of process mining, however, is contingent upon the accessibility and quality of event logs. This paper introduces and describes a publicly-available dataset containing labelled Active Window Tracking data, capturing my app usage and active screen titles over the course of four weeks. It aims to support diverse studies: classifying activities from window titles, identifying action patterns, and developing new visualisations for detailed process data. By making this resource available, I aim to encourage the development of new process mining techniques that provide detailed insights into work practices.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Process mining has evolved into a means to discover and understand complex work practices of employees in organisations <ref type="bibr" target="#b0">[1]</ref>. However, the quality of the results heavily relies on the availability of event logs. Most process mining studies draw on event data from single work systems <ref type="bibr" target="#b1">[2]</ref>, which leads to incomplete representations of the breadth of work that has taken place. In <ref type="bibr" target="#b0">[1]</ref>, we proposed the use of so-called Active Window Tracking (AWT) data for mining work practices, and outlined the opportunities of using this type of detailed event data. AWT records the apps that a worker uses and the title of the screen that is active at a certain point in time, providing data that sits in between UI logs and traditional single-system event data.</p><p>The window titles and apps provide a very detailed view on how the worker performs certain tasks, which may be interesting in itself for some types of analyses. However, for other analyses there may be a need to abstract window titles into higher-level activities. To support such analyses and to encourage the development of novel abstraction techniques, this resource presents a month's worth of (manually) labelled and pseudonymised AWT data. The data is publicly available on Github (https://github.com/project-pivot/labelled-awt-data). The following sections introduce the details of the resource, provide a preliminary analysis, and outline its possible uses. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Description of the Resource</head><p>The resource contains a set of files, all of which relate to one month's worth of AWT data. For the recording, I used a tool called Tockler (http://maygo.github.io/tockler/) which has been running on my computer since December 2022. It logs application title, window title, start time and end time. Contrary to many time tracking tools which aggregate the total time spent in a certain app or window, rendering it unusable for process mining, Tockler keeps the timestamps of the individual window titles that were active at a certain point in time.</p><p>The repository contains a labelled subset of the resulting data, namely from March 6 to April 2, 2023. The labelling was done locally in Excel, by exporting the data from Tockler and adding two columns to the log, one for the corresponding activity and one for the case. I deductively selected activities from the University Job Classification system used by Dutch universities (https://edu.nl/hkdr4). Examples of such activities are 'Assessing exams and giving marks', and 'Conducting research'. When I could not fit the behaviour within the existing activities, I created a new activity. Examples of added activities include 'Communicating about events', 'Planning teaching activities', and 'Reviewing journal and conference papers'. I selected the cases inductively, e.g., courses that I taught, students that I supervised, research papers that I worked on, events that I organised, etc. This procedure is also described in <ref type="bibr" target="#b0">[1]</ref>.</p><p>Table <ref type="table" target="#tab_0">1</ref> provides an overview of the datasets in the repository. The first file contains the full pseudonymised version of the labelled data, where I replaced names and other sensitive information with placeholders. The result of this can be found in the file entitled awt_data_1_pseudonymized. A snippet is provided in Table <ref type="table" target="#tab_1">2</ref>. The remaining files in the data folder contain versions in which I applied some processing. The Python notebook contains details on the steps that I took. In awt_data_2_merged_titles, I merged all sequential events with the same activity label, resulting in an abstracted event log with significantly less events. In awt_data_3_added_duration, I added a Duration column to calculate the duration between the start and end time, and in awt_data_4_added_case_type I added a case type attribute.</p><p>Depending on the use case, you might want to work with the detailed window titles in awt_data_1_pseudonymized or the final abstracted data in awt_data_4_added_case_type. If you are interested in, e.g., a technique that automatically recognises higher-level tasks performed based on (a set of) window titles, you would want to have a look at awt_data_1_pseudonymized. If you want to explore the data with process mining techniques immediately, you better check out awt_data_4_added_case_type.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Preliminary Analysis</head><p>As an initial analysis, let us examine awt_data_4_added_case_type.csv. It contains 76 hours worth of labelled AWT data over the course of the four weeks. As can be seen from Figure <ref type="figure" target="#fig_1">1</ref>, the majority of this time is spent on research projects, but a significant time was also spent on other case types such as courses, conferences, presentations, and more. The time spent on research projects and courses is reflected in the overview of duration per activity (Figure <ref type="figure" target="#fig_2">2</ref>). It shows that most of my time was spent on 'conducting research', with 'encouraging and giving lectures' and 'preparing and providing teaching sessions for students' completing the top-3. In order to demonstrate the readiness of this data for applying process mining, imagine that we are interested in the activities related to the different academic conferences. We load the preprocessed file awt_data_4_added_case_type.csv into Fluxicon's Disco (https: //fluxicon.com/disco/). We map the columns as follows:  From the global statistics, the events over time (Figure <ref type="figure" target="#fig_4">3a</ref>) and the active cases over time (Figure <ref type="figure" target="#fig_4">3b</ref>) are shown. In the former, one can easily spot the working days across the week. In the final week, I was on holiday, doing only some work on the Tuesday, which is reflect in both figures.</p><p>Now that we have a general idea of the data, we can discover the process. In this case, we are interested in the work that took place around academic conferences. As such, we filter on 'Case type' → 'Academic conference'. Figure <ref type="figure" target="#fig_5">4</ref> shows the resulting process map, with activities and paths set to 100%. During the month of March 2023, I was involved in activities related to four cases, i.e., academic conferences: BPM, ECIS, ICIS, and RCIS. The majority of work was done for the BPM conference, where I was part of the organising team and program committee. For ECIS, I was mostly involved in organising a workshop and communicating about the workshop, as well as organising my travelling. For ICIS, there was only a brief activity related to reviewing, and the time spent on the RCIS case revolved around yet more travel organisation. Note that these cases are unfinished; work on these four conferences has taken place before and after these   four weeks and the same is true for many of the other cases. In order to enable examination of finished cases that span several months without having to manually label all this data, it is vital that we develop and apply techniques to (semi-)automatically recognise higher-level activities and cases from the long-term AWT data, which is ongoing work.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Proceedings of the Best BPM Dissertation Award, Doctoral Consortium, and Demonstrations &amp; Resources Forum co-located with 22nd International Conference on Business Process Management (BPM 2024), Krakow, Poland, September 1st to 6th, 2024.Envelope i.m.beerepoot@uu.nl (I. Beerepoot) GLOBE http://irisbeerepoot.com/ (I. Beerepoot) Orcid 0000-0002-6301-9329 (I.Beerepoot)   </figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Duration per case type.</figDesc><graphic coords="3,151.80,460.30,291.67,175.77" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Duration per activity.</figDesc><graphic coords="4,130.96,84.19,333.36,154.52" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head></head><label></label><figDesc>(a) Events over time. (b) Active cases over time.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Events and active cases over time.</figDesc><graphic coords="4,96.38,574.70,199.99,57.80" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Process map of the activities related to 'Case type' → 'Academic conference'.</figDesc><graphic coords="5,89.29,84.19,416.68,240.32" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Description of the datasets.</figDesc><table><row><cell>File name</cell><cell>Number of events</cell><cell>Description</cell></row><row><cell>awt_data_1_pseudonymized.csv</cell><cell>10,066</cell><cell>Full pseudonymised dataset (individual window titles labelled in Excel)</cell></row><row><cell>awt_data_2_merged_titles.csv</cell><cell>1,227</cell><cell>Same as previous but with sequential events with same label merged (see preprocessing notebook)</cell></row><row><cell>awt_data_3_added_duration.csv</cell><cell>1,227</cell><cell>Same as previous but with duration column (see preprocessing notebook)</cell></row><row><cell>awt_data_4_added_case_type.csv</cell><cell>1,227</cell><cell>Same as previous but with case type added (manually)</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Snippet of the labelled data (corresponding to awt_data_1_pseudonymized.csv).</figDesc><table><row><cell>App</cell><cell>Title</cell><cell>Begin</cell><cell>End</cell><cell>Activity</cell><cell>Case</cell></row><row><cell>Windows Explorer</cell><cell>20230309 Gastcollege bachelor</cell><cell>8-03-23 14:14:40</cell><cell>8-03-23 14:14:46</cell><cell>Encouraging and giving lectures</cell><cell>Guest lecture NWI</cell></row><row><cell></cell><cell>NWI</cell><cell></cell><cell></cell><cell></cell><cell>bachelor</cell></row><row><cell>Microsoft PowerPoint</cell><cell>20230309 gastcollege NWI -</cell><cell>8-03-23 14:14:46</cell><cell>8-03-23 14:14:49</cell><cell>Encouraging and giving lectures</cell><cell>Guest lecture NWI</cell></row><row><cell></cell><cell>PowerPoint</cell><cell></cell><cell></cell><cell></cell><cell>bachelor</cell></row><row><cell>Microsoft PowerPoint</cell><cell>20230309 gastcollege NWI -</cell><cell>8-03-23 14:14:49</cell><cell>8-03-23 14:16:01</cell><cell>Encouraging and giving lectures</cell><cell>Guest lecture NWI</cell></row><row><cell></cell><cell>PowerPoint Presenter View</cell><cell></cell><cell></cell><cell></cell><cell>bachelor</cell></row><row><cell>Microsoft PowerPoint</cell><cell>20230309 gastcollege NWI -</cell><cell>8-03-23 14:16:01</cell><cell>8-03-23 14:16:04</cell><cell>Encouraging and giving lectures</cell><cell>Guest lecture NWI</cell></row><row><cell></cell><cell>PowerPoint</cell><cell></cell><cell></cell><cell></cell><cell>bachelor</cell></row><row><cell>Windows Explorer</cell><cell>20230309 Gastcollege bachelor</cell><cell>8-03-23 14:16:13</cell><cell>8-03-23 14:16:16</cell><cell>Encouraging and giving lectures</cell><cell>Guest lecture NWI</cell></row><row><cell></cell><cell>NWI</cell><cell></cell><cell></cell><cell></cell><cell>bachelor</cell></row><row><cell>Adobe Acrobat</cell><cell>***name10*** -Research pro-</cell><cell>8-03-23 14:18:28</cell><cell>8-03-23 14:18:34</cell><cell>Assessing the students' assign-</cell><cell>***name10***</cell></row><row><cell></cell><cell>posal.pdf -Adobe Acrobat Pro</cell><cell></cell><cell></cell><cell>ments and submitting the assess-</cell><cell></cell></row><row><cell></cell><cell>(32-bit)</cell><cell></cell><cell></cell><cell>ment to the Board of Examiners</cell><cell></cell></row><row><cell>Adobe Acrobat</cell><cell>***name10*** -Research pro-</cell><cell>8-03-23 14:19:14</cell><cell>8-03-23 14:19:26</cell><cell>Assessing the students' assign-</cell><cell>***name10***</cell></row><row><cell></cell><cell>posal.pdf -Adobe Acrobat Pro</cell><cell></cell><cell></cell><cell>ments and submitting the assess-</cell><cell></cell></row><row><cell></cell><cell>(32-bit)</cell><cell></cell><cell></cell><cell>ment to the Board of Examiners</cell><cell></cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Possible Uses</head><p>This resource may be used in different ways: <ref type="bibr" target="#b0">(1)</ref> to develop techniques that classify (a set of) window titles into higher-level activities, (2) to identify patterns of action in terms of repeated sequences of windows or activities, and (3) to develop new visualisations for detailed process data. Even better, it may spark entirely new ideas about how individuals organise their work, which I hope it does. Feel free to reach out to me with questions or requests to verify insights.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A window of opportunity: Active window tracking for mining work practices</title>
		<author>
			<persName><forename type="first">I</forename><surname>Beerepoot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Barenholz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Beekhuis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gulden</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Overbeek</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Van De Weerd</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Van Der Werf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">A</forename><surname>Reijers</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2023 5th International Conference on Process Mining (ICPM), IEEE</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="57" to="64" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">How is process mining technology used by organizations? a systematic literature review of empirical studies</title>
		<author>
			<persName><forename type="first">M</forename><surname>Thiede</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Fuerstenau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">P</forename><surname>Bezerra</surname></persName>
		</author>
		<author>
			<persName><surname>Barquet</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Business Process Management Journal</title>
		<imprint>
			<biblScope unit="volume">24</biblScope>
			<biblScope unit="page" from="900" to="922" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
