=Paper=
{{Paper
|id=Vol-2628/paper5
|storemode=property
|title=Framework for Process Discovery from Sensor Data
|pdfUrl=https://ceur-ws.org/Vol-2628/paper5.pdf
|volume=Vol-2628
|authors=Agnes Koschmider,Dominik Janssen,Felix Mannhardt
|dblpUrl=https://dblp.org/rec/conf/emisa/KoschmiderJM20
}}
==Framework for Process Discovery from Sensor Data==
<pdf width="1500px">https://ceur-ws.org/Vol-2628/paper5.pdf</pdf>
<pre>
Agnes Koschmider, Judith Michael, Bernhard Thalheim (Hrsg.): EMISA Workshop 2020
32 CEUR-WS.org Proceedings


Framework for Process Discovery from Sensor Data


Agnes Koschmider,1 Dominik Janssen,2 Felix Mannhardt3


Abstract: Process mining can give valuable insights into how real-life activities are performed when
extracting meaningful activities instances from raw sensor events. However, in many cases, the event
data generated during the execution of a process is at a much lower level of abstraction, and, in some
cases, even continuous, e.g., sensor data. This paper presents a framework to discover activities and
process models from event location sensor data. The framework is flexible enough to be applied to
any data set from raw sensor data.

Keywords: Process Mining, Raw Sensor Data, Event Abstraction, Framework.


1    Introduction
There is a clear need to give deeper insights beyond the execution sequence and performance
of formal process activities that are easily extracted from an organization’s’ enterprise
resource planning (ERP) system. In the domain of screen or computer-based work, commer-
cial process mining tools have already introduced task mining 4, which combines screen
recording and optical character recognition with topic modeling techniques, to identify not
only the existence of process problems but automatically give insights into the underlying
root causes. However, many processes span both the digital world and the physical world.
For example, some parts of a make-to-order manufacturing process will be traceable by
extracting events from the ERP and manufacturing execution systems and can be analyzed
well with current techniques. Other crucial parts of the process with high impact on the
performance, e.g., the activities on the shop-floor, will remain hidden since often there
is insufficient data on the individual steps. Thus, inferring what actually happened in the
physical world based on raw sensor data is required to fully understand such processes.

This paper presents our framework for mapping of event data to the different levels from
event (location) sensor data to a process model in order to understand processes that can be
derived from raw sensor data. The framework consists of four modular steps starting with
the correlation of events to cases, followed by activity discovery and abstracting activities
1 Kiel University, Group Process Analytics, Hermann-Rodewald-Str. 3, 24118 Kiel, Germany ak@informatik.uni-

 kiel.de
2 Kiel University, Group Process Analytics, Hermann-Rodewald-Str. 3, 24118 Kiel, Germany dominik.janssen@

 informatik.uni-kiel.de
3 Dept. of Technology Management, SINTEF Digital, Norway felix.mannhardt@sintef.no

4 https://www.celonis.com/process-mining/what-is-task-mining/


Copyright © 2020 for this paper by its authors.
Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                        Framework for Process Discovery from Sensor Data 33

from events and finally the process discovery step. Each step of the framework will be
sketched in this paper.
The paper is structured as follows. The next section lists challenges for process discover
from raw sensor data that should be considered by a framework for process discovery from
raw sensor data. Section 3 presents our framework to meet some of these challenges. Related
works are summarized in Section 4. The paper ends with a conclusion and discusses future
research directions.


2    Challenges for Process Discovery from Raw Sensor Data
To support process discovery from raw sensor data requires to bridge several assumptions
that traditional process mining techniques enforce on the structure and semantics of recorded
event data. In traditional process mining, i.e., process mining working on recorded events
or real-time data of information systems and not with low-level raw data, events shall
be (i) available as part of a single, comprehensive dataset (commonly called event log),
be (ii) totally ordered, (iii) discrete, (iv) correct and accurate, (v) show an isomorphic
relation to individual activity executions and (vi) relatively high level (i.e., close to the
business-level like "truck loading ended"). In the context of raw sensor data events can be
any kind of observations (e.g., a sensor value changed) with relevance to a certain process.
To apply process discovery from raw sensor data requires to cope with these assumptions of
traditional process mining. For this, new techniques are required that put emphasis of the
following:

1.    no single, comprehensive dataset: interconnected devices like large-scale sensor
      networks or mobile devices emit events at distributed, potentially not even stationary
      sources representing low-level raw data (e.g., raw RFID readings). Thus, events are
      distributed. Solutions would help to overcome assumptions (i) and (ii).
2.    no discrete events: usually, process mining works with recorded, discrete events or
      real-time data of information systems and not with low-level raw data [KMH18].
      Process mining algorithms expect event logs, that organizes the events in terms of
      traces at a certain level of abstraction, which does not match the low-level concept of
      raw sensor event data [So19]. Related solution would help to overcome the assumption
      (iii).
3.    no correct event log: events that are composed of an observation of an activity
      accompanied by time information extracted from sensor data are mostly affected by
      noise, inaccurate measurements and ambiguous information resulting in an uncertain
      event log. This point is related to the assumption (iv).
4.    no isomorphic relation to activity executions: sensing or observations of the physical
      world generate low-level events. The translation of low-level data into high-level
      data, known as event abstraction requires appropriate data aggregations. Beside
34 Agnes Koschmider, Dominik Janssen, Felix Mannhardt

              this, the low-level event data is tracked in continuous form requiring new techniques
              for event abstraction. For further reading we refer to [KMH18], which sketches an
              schematic overview of the hierarchy of low-level events to process model discovery.
              The state-of-the art of event abstraction is summarized in [Ze20]. Solutions shall help
              to overcome (v) and (vi).

In the next section we present our approach coping with the fourth issue.


3       Framework for Process Discovery from Raw Sensor Data
The discovery of process models from raw sensor data requires several data aggregations.
Our framework for mapping to the different levels of event data is presented in Fig. 1, which
incorporates such data aggregations. Our framework assumes a sensor event log as input
                                               Process Activities (𝐴)
                                                 Activity      Activity
                                                  Label       Behaviour


                                                     A
                                                                                3. Event/Activity Abstraction

                                                     B
                                                         Discovery
                                                         Activity


 Location Sensor
    Event Log                                                                                                                                                 Process
                                                             2.


       (𝐸! )                                    Case Event Log (𝐸" )                                        Process Event Log                                  Model
              Location                                               Location                                     Activity Activity Activity
 Time Label                                     Case Time Label                                       Case Time
                Data                                                   Data                                        Label Instance Lifecycle

    …   …        …                               1       …    …         …                              1    …       A        1        Start
                              Event/Case                                             Event/Activity                                               Process
                         1.                                                     3.                                                             4. Discovery
    …   …        …            Correlation        1       …    …         …            Abstraction       1    …       A        1     Complete

    …   …        …                               2       …    …         …                              2    …       …        …         …


                                            Fig. 1: Mapping to the different levels of event data.
derived from e.g., networks of WiFi-access points, or motion sensors in smart homes and
smart factories. Event data like this does not necessarily have a unique identifier attached to
it to identify who triggered the event. One component of the framework allows to detect
multiple entities (i.e., people or objects) at the same time. For this, it has to be ensured
that all activities are associated with the correct entity. To abstract activities from events
machine learning techniques are used. To provide a solution for process discovery from raw
sensor events, our framework consists of four steps:

1.            Event Correlation: In this step raw sensor events are grouped to a set of (unlabelled)
              activity instances by correlating each individual sensor event to an activity instance.
              The output of this first module is a log in which beyond the requirements for a sensor
              event log each event is additionally assigned an activity instance that can be retrieved
              within the set of instances.
                                         Framework for Process Discovery from Sensor Data 35

2.    Activity Discovery: In the second step process activities are discovered together with
      their labels and sensor-level process models describing the expected behaviour on a
      sensor level. To use our approach in scenarios with multiple entities, a mechanism is
      incorporated distinguishing persons from objects. The approach bases on the relative
      distance of the event (location) data.
3.    Event Abstraction: Abstraction of the log (see step 1) to a process event log where
      events are directly related to the start or completion of process activities. In particular,
      sub-traces from the event / case correlation (step 1) are combined with activity
      clusters detected within the second step Activity Discovery. Once the combination is
      complete, the result of this step is a process event log. It specifies every event that has
      been recorded in the location sensor event log as an activity. These activities have the
      following tags allocated to them: the name of the activity (label), the entity that is
      executing the activity (instance) and the information if the activity has been started or
      completed (life cycle).
4.    Discovery of Processes: Having promoted the raw sensor events to the level of activity
      instances, we still need to identify which process cases are meaningful to our analysis
      target. We have to assume that regular and routine behaviour can be observed. As a
      starting point an activity has to be selected that most likely will be the origin of the
      routine behaviour. Finally, a process model is discovered using a standard technique.
      The final output is a process model reflecting the observed behaviour of the entities
      aggregated only from raw sensor data.

We evaluated our approach on the publicly available CASAS data-set, which contains raw
sensor data from a smart home environment [Ra07]. The data set refers to a smart home
test-bed with two residents and a house equipped with 51 motion sensors. To discover
activities and process models from motion sensors, a self-organizing map (SOM) has been
used as unsupervised learning in form of clustering and process mining. Our framework can
be applied to any other arbitrary data set if the minimum requirement of time and location
data is met.


4    Related Work

A large body of research exists that addresses the discovery of events and activities at
different levels (see Fig. 1). Particularly, approaches can be found for activity recognition
from low-level events addressing recognition on different sensor types like motion or
video [Di18; NS10; RMM16; ZS12] or analyzing data from wearable [Ci19; Sz16] or
reference sensors [LRP15] where machine learning is used for data aggregation. Our
framework allows a process model view on raw sensor data, which advances existing
approaches and is beneficial in terms of evaluating the quality of data aggregations, which
machine learning is not capable of.
36 Agnes Koschmider, Dominik Janssen, Felix Mannhardt

Related to our framework are approaches for activity recognition for process mining. The
technique of [ESA16] assumes the low-level event data to be discrete and expects particular
representations of traces. This does not apply for our framework.


5    Conclusion

The rise of Internet-of-Things (IoT) gradually changes the granularity level at which event
data, captured during the execution of operational processes, is stored. While in the past
process models have been discovered from documents or interviews with domain experts,
which is error-prone and time-consuming, the challenge will be to automatically discover the
de-jure process model coined under the umbrella of process mining from large-scale sensor
networks or wearable devices [Ja17]. Process mining can give valuable and new insights
into how real-life activities in the physical world are performed when meaningful activities
instances are extracted from raw sensor data. In this paper we presented a framework for
process discovery from raw sensor data consisting of data aggregations steps. The framework
gives guidelines how to discover high-level events from raw (location) event data and how
to discover processes from activity instances that were derived from high-level events.
Further research tasks for this paper are:

1.    We plan to conduct empirical studies that compare various clustering techniques for
      data aggregations. We plan to apply different machine learning and deep learning
      techniques for abstracting activity instances from event data and comparing them in
      terms of accuracy, scalability and efficiency.
2.    The recognition of entities (person vs. object) from raw sensor data also requires
      improvement in order to discover accurate process models. We plan to use different
      approaches for entity recognition and allocation to activites and compare them.
      Hidden Markov Models and other machine learning-based approaches could remedy
      the situation.
3.    To foster a responsible use of event logs we plan to consider privacy issues when
      abstracting activities from event data [Vo20].


References

[Ci19]       Civitarese, G.: Human Activity Recognition in Smart-Home Environments
             for Health-Care Applications. In: 2019 IEEE International Conference on
             Pervasive Computing and Communications Workshops (PerCom Workshops).
             Pp. 1–1, Mar. 2019.
                                     Framework for Process Discovery from Sensor Data 37

[Di18]    Diete, A.; Sztyler, T.; Weiland, L.; Stuckenschmidt, H.: Improving Motion-
          based Activity Recognition with Ego-centric Vision. In: PerCom Workshops
          2018. IEEE Computer Society, pp. 488–491, 2018, isbn: 978-1-5386-3227-7,
          url: https://doi.org/10.1109/PERCOMW.2018.8480334.
[ESA16]   van Eck, M. L.; Sidorova, N.; van der Aalst, W. M. P.: Enabling process mining
          on sensor data from smart products. In: RCIS 2016. Pp. 1–12, 2016, url:
          https://doi.org/10.1109/RCIS.2016.7549355.
[Ja17]    Janiesch, C.; Koschmider, A.; Mecella, M.; Weber, B.; et al.: The Internet-of-
          Things Meets Business Process Management: Mutual Benefits and Challenges.
          CoRR abs/1709.03628/, 2017, arXiv: 1709.03628, url: http://arxiv.org/
          abs/1709.03628.
[KMH18]   Koschmider, A.; Mannhardt, F.; Heuser, T.: On the Contextualization of
          Event-Activity Mappings. In: BPM 2018 Workshops. LNBIP, Springer, Sept.
          2018.
[LRP15]   Larue, G. S.; Rakotonirainy, A.; Pettitt, A. N.: Predicting Reduced Driver
          Alertness on Monotonous Highways. IEEE Pervasive Computing 14/2, pp. 78–
          85, Apr. 2015, issn: 1536-1268.
[NS10]    Nentwig, M.; Stamminger, M.: A method for the reproduction of vehicle test
          drives for the simulation based evaluation of image processing algorithms.
          In: 13th International IEEE Conference on Intelligent Transportation Systems.
          Pp. 1307–1312, 2010.
[Ra07]    Rashidi, P.; Youngblood, G. M.; Cook, D. J.; Das, S. K.: Inhabitant Guidance
          of Smart Environments. In: HCI (2). Vol. 4551. LNCS, Springer, pp. 910–919,
          2007.
[RMM16]   Ranasinghe, S.; Machot, F. A.; Mayr, H. C.: A review on applications of activity
          recognition systems with regard to performance and evaluation. International
          Journal of Distributed Sensor Networks 12/8, p. 1550147716665520, 2016,
          eprint: https://doi.org/10.1177/1550147716665520, url: https://doi.
          org/10.1177/1550147716665520.
[So19]    Soffer, P.; Hinze, A.; Koschmider, A.; Ziekow, H.; Ciccio, C. D.; Koldehofe, B.;
          Kopp, O.; Jacobsen, A.; Sürmeli, J.; Song, W.: From event streams to process
          models and back: Challenges and opportunities. Information Systems 81/,
          pp. 181–200, 2019, issn: 0306-4379, url: http://www.sciencedirect.com/
          science/article/pii/S0306437917300145.
[Sz16]    Sztyler, T.; Carmona, J.; Völker, J.; Stuckenschmidt, H.: Self-tracking Reloaded:
          Applying Process Mining to Personalized Health Care from Labeled Sensor
          Data. T. Petri Nets and Other Models of Concurrency 11/, pp. 160–180, 2016.
38 Agnes Koschmider, Dominik Janssen, Felix Mannhardt

[Vo20]      von Voigt, S. N.; Fahrenkrog-Petersen, S. A.; Janssen, D.; Koschmider, A.;
            Tschorsch, F.; Mannhardt, F.; Landsiedel, O.; Weidlich, M.: Quantifying
            the Re-identification Risk of Event Logs for Process Mining - Empiricial
            Evaluation Paper. In (Dustdar, S.; Yu, E.; Salinesi, C.; Rieu, D.; Pant, V., eds.):
            Advanced Information Systems Engineering - 32nd International Conference,
            CAiSE 2020, Grenoble, France, June 8-12, 2020, Proceedings. Vol. 12127.
            Lecture Notes in Computer Science, Springer, pp. 252–267, 2020, url:
            https://doi.org/10.1007/978-3-030-49435-3%5C_16.
[Ze20]      van Zelst, S.; Mannhardt, F.; de Leoni, M.; Koschmider, A.: Event abstraction
            in process mining: literature review and taxonomy. Granular Computing/, 2020,
            url: https://doi.org/10.1007/s41066-020-00226-2.
[ZS12]      Zhang, M.; Sawchuk, A. A.: Motion Primitive-based Human Activity Recog-
            nition Using a Bag-of-features Approach. In: Proceedings of the 2Nd ACM
            SIGHIT International Health Informatics Symposium. IHI ’12, ACM, pp. 631–
            640, 2012.

</pre>