=Paper= {{Paper |id=Vol-2673/paperDR07 |storemode=property |title=MQTT-XES: Real-time Telemetry for Process Event Data |pdfUrl=https://ceur-ws.org/Vol-2673/paperDR07.pdf |volume=Vol-2673 |authors=Andrea Burattin,Martin Eigenmann,Ronny Seiger,Barbara Weber |dblpUrl=https://dblp.org/rec/conf/bpm/BurattinESW20 }} ==MQTT-XES: Real-time Telemetry for Process Event Data== https://ceur-ws.org/Vol-2673/paperDR07.pdf
           MQTT-XES: Real-time Telemetry for
                 Process Event Data

 Andrea Burattin1 , Martin Eigenmann2 , Ronny Seiger2 , and Barbara Weber2
       1
           Software and Process Engineering, Technical University of Denmark,
                               2800 Kgs. Lyngby, Denmark
                2
                  Institue of Computer Science, University of St. Gallen,
                               9000 St. Gallen, Switzerland

       Abstract. This demo paper presents an infrastructure to enable real-
       time monitoring of process events (i.e., telemetry). The infrastructure
       relies on the MQTT protocol which ensures minimum logging overhead.
       The paper presents a Java library for producing (i.e., logging) and con-
       suming events, built on top of HiveMQ. Additionally, a prototype dash-
       board to display basic statistics is reported and described.

1    Introduction
Recording happenings of interest in the context of process executions is be-
coming more and more important. Specifically, the spread of process mining
techniques [2] has paved the way to an unprecedented set of opportunities rang-
ing from discovering the processes actually being executed (as opposed to the
intended ones), calculating the conformity of executions, identifying bottlenecks,
predicting the outcomes of processes, suggesting resource allocations, etc.
    In its typical form, process mining consumes offline recordings of historical
executions (the so called “event logs”). With this paper we suggest a new family
of tools for logging events in real-time (i.e., online) and share them with “in-
terested” entities. Such type of remote logging, also referred to as “telemetry”,
represents a key component for the deployment of online process mining tech-
niques [5] that, in turn, enables to extract knowledge that can be immediately
exploited with almost no delay. For example, let’s consider a conformance check-
ing problem in a hospital: the process managers would like to know immediately
whether patients are treated according to standard protocols (i.e., the reference
processes) or not. With offline techniques, the delay between when the violation
happened and when the managers are notified depends on how often logs are
extrapolated and processed (this time span can be in the range of days up to
months or years). With real-time process mining systems, violations would be
reported immediately after they occurred, thus giving the process manager time
to compensate the issue. In the rest of the paper we present some general ideas
about the principles behind the developed artifacts (cf. Sec. 2) as well as some
technical details (cf. Sec. 3).
2    Innovation and Related Work
To achieve the goal mentioned in the previous section, i.e., a real-time process
mining system, it is necessary to build a stream processing architecture suit-




Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
2       A. Burattin et al.

                Branch1                  Topic: pmcep/Process1/#
                - Process1                                                         Conformance checking for
                - Process2                                                         Process 1 for all branches
                - Process3          Standard




                                                                                                                   MQTT-XES messages exchange
                                   MQTT broker
                                                                  Topic: pmcep/#
                Branch2
                - Process3                                                         General sta�s�cs for all




                                                                                                                   Topic subscrip�on
                - Process4                                                         processes across all branches

               ...

                Branchn                                                            Cross-branch and
                - Processx                                                         preven�ve model repair
                - Processy                       Topic: pmcep/#
                - Processz



Fig. 1. Graphical representation of possible use-cases of the contribution of the paper.


able for distributed settings and capable of high throughput. With this paper
we contribute towards this goal by suggesting a software architecture for unified
logging (i.e., an append only, ordered, distributed log) [6]. Specifically, we pro-
pose to utilize the Message Queuing Telemetry Transport (MQTT) protocol [7,
3] for tracking the execution of business process activities in real-time. MQTT
has been specifically devised to operate using a publish/subscribe protocol and,
in particular, the key idea was to support the logging and the telemetry for the
Internet of Things, i.e., devices where the computing power is low. Consistently
with such rationale, most of the complexity is delegated to the “broker”: the
remote component in charge of receiving the events from the producers and for-
warding them to the interested consumers. Though other protocols are possible
(e.g., XMPP, JMS, AMQP), MQTT represents the optimal choice to handle the
real-time logging of activities being executed, since the overhead is reduced to
the minimum. MQTT is a topic-based publish/subscribe protocol and in this
paper we propose a specific structure for the organization of the topics. By ex-
ploiting such a structure, the logging overhead is reduced even more since, even
with no payload, the mere fact that an event is published on a certain topic is
informative for most of the common process mining tasks. This aspect makes
MQTT-XES truly interoperable (relying on the payload makes the protocol not
really interoperable, since the payload is just a byte array [7, 3]).
    In Fig. 1, we sketched an idea concerning possible usages of MQTT-XES. For
example, we might have several branches of the organization, each of them run-
ning different processes, all generating events that are sent to the same MQTT
broker. Before reaching the broker it could be that some of these events are ac-
tually pre-processed, for example to reach the same abstraction level. From the
broker, events can be forwarded to several subscribed consumers, each of them in
charge of different aspects. These consumers will rely on two important aspects
of the data streams: (i) all events are available in real time, and (ii) events are
referring not just to individual processes but to the whole organization. There-
fore, it will be possible to construct dashboards monitoring the whole system (as
in “all the processes running across the whole organization”), calculating confor-
mity for the same process being executed at several branches, or defining model
repair algorithms capable of improving processes executed at a branch based
on observations coming from a different branch. When logs are stored locally in
an offline fashion, all these use cases are not feasible. The contribution of this
paper is three-fold. First, we propose a specific schema for the topic of published
                    MQTT-XES: Real-time Telemetry for Process Event Data           3

events that reflects the typical structure of the XES standard [1]. After that we
present a Java library to log the executions of events and, finally, we present a
dashboard to consume events and report some statistics.
    The idea of streaming events, in the BPM context, is not new in the liter-
ature [5] however, so far, no unified system has been devised for this purpose.
An approach based on a publish/subscribe protocol is available [8] but is is ac-
cessible just within a single instance of the ProM toolkit, making it not suitable
for distributed settings. An alternative approach [4], exclusively based on TCP
connections, leveraged the idea of transmitting small fragments of XES logs (i.e.,
each event is an XLog with one XTrace with one XEvent). However, the lack of
a unified protocol and the verbosity of the log fragments make this TCP-based
system not suitable to reach high throughput and interoperability.

3     Technical Details and Maturity
In MQTT, topics represent “addresses” where events can be sent to. They are
typically used to define scopes and, therefore, are used by consumers to receive
only relevant events. Topics are structured in “hierarchies” which are defined
using levels and level separators. An example of a topic where events referring
to a room temperature are published is the following: lyngby/B322/R212/temp.
We can expect that each event in such topic will contain as payload, a represen-
tation of the temperature for room R212 of building B322 of the lyngby campus.
Consumers, therefore, can decide to subscribe to specific topics, for example to
receive all temperature readings for that room. Consumers can also decide to re-
ceive events from whole hierarchies of topics by using wildcards. Two wildcards
are commonly used: a single-level (indicated with +) and multi-level wildcard
(#). With the single level wildcard a consumer could subscribe to all tempera-
ture events of building B322 (so temperatures of all offices): lyngby/B322/+/temp.
With the multi-level wildcard it is possible to subscribe to all events happening
at the lyngby campus: lyngby/#.
    In this paper we suggest a specific hierarchy of the topics that will allow
tracking process event data. The structure is the following:
           BASE/[SOURCE ID]/[PROCESS INSTANCE ID]/[ACTIVITY NAME]
In this structure, BASE refers to a predefined name useful to identify all events
part of the activity monitoring initiative. [SOURCE ID] refers to the name of the
source of the event. According to the XES standard this would be the identifier of
the log. Practically speaking, this could refer to the name of the process and the
branch where the process is performed. [PROCESS INSTANCE ID] is the identifier
of the case id and [ACTIVITY NAME] is the name of the activity executed. With
such a structure, topics are “reserved” for each process instance and for each
activity name.3 Therefore, the mere fact that an event was published in a topic
indicates that the corresponding activity happened, with no need to check the
actual payload of the event. This makes the whole system extremely lightweight
3
    Note that in MQTT topics do not need to be initialized however, conceptually, the
    namespace of topics would reflect the events universe [2].
    4        A. Burattin et al.


        Listing 1.1. Example of how to use the Java API to send an MQTT-XES event.

1   // Specification of the client with broker and root topic level
2   XesMqttSerializer client = new XesMqttSerializer ( " broker . hivemq . com " , " BASE " ) ;
3   // Construct of the MQTT event
4   XesMqttEvent event =
5       new XesMqttEvent ( " source - id " , " case - id " , " activity " )
6       . addTraceAttribute ( " name " , " value " ) . addEventAttribute ( " name " , " value " )
7       . removeTraceAttribute ( " dummy " ) ; // Manipulation using method chaining
8   // Connection to the broker and event sending
9   client . connect () ; client . send ( event ) ; client . disconnect () ;




          Fig. 2. The developed dashboard showing some basic statistics for all events

    and truly interoperable (the topic name must be an UTF-8 Encoded string [3]).
    With the current implementation of MQTT-XES, it is also possible to define
    attributes at the process instance and at the event levels, which are encoded
    into the payload as UTF-8 string containing a JSON object.
        To simplify the usage of MQTT-XES, a Java library has been developed
    (based on HiveMQ, cf. https://www.hivemq.com/). This library, which is aware
    of the typical BPM concepts, can be used to generate and publish events as well
    as to consume them. An example of usage of the library to send events is reported
    in Listing 1.1: in line 2 the basic configuration is provided, namely the broker’s
    host address (in this case, we used the the public broker of HiveMQ4 ) and the
    BASE of the topic. Then, in lines 4-7, the actual event to be sent is defined. The
    constructor (line 5) requires an identifier for the source (e.g., a concatenation
    of the name of the branch and the name of the process), the process instance
    identifier and the name of the activity. The following lines (6-7) show possible
    ways of adding attributes for the process instance or for the event. In line 9,
    finally, the event is sent to the broker.
        To subscribe to the events it is possible to use all standard MQTT-consumers.
    The Java library we developed can also be used to consume events using a call-
    back mechanism5 . For this work we also developed a dashboard, specifically tar-
    geting events generated for MQTT-XES. The dashboard is built using Grafana
    (see https://grafana.com/) for visualisation and PostgreSQL as storage back-
    4
      In their website they state that “Testing and usage is for free but please do not use
      it for sensitive information because everybody is allowed to subscribe to every topic”.
    5
      For a tutorial on how to use the library see https://github.com/pmcep/mqtt-xes.
                  MQTT-XES: Real-time Telemetry for Process Event Data            5

end. The MQTT event stream is collected and subsequently persisted to the
datastore. Grafana allows the visualisation of this data based on user defined
queries. These queries can also include filter and aggregation operations and
thus, the dashboard is capable of displaying a wide variety of metrics of multiple
streams in a unified manner. Although the current state of dashboard displays
several metrics, it is still an early prototype showing a proof of concept.
4   Conclusions
The work presented in this paper can be summarized as an infrastructure to
enable real-time monitoring of process events. This infrastructure relies on the
MQTT protocol, which is extremely lightweight and therefore imposes minimum
overhead. Exploiting the proposed structure of the MQTT topics, real interop-
erability is achieved (since no payload is formally necessary). The provided Java
library can be used to simplify the generation of the messages as well as their
consumption. The dashboard accompanying the paper can also be used to im-
mediately consume the events and get an idea of how the system is behaving.
Enabling BPM-engines to generate events using MQTT-XES comes with several
advantages and the ability to overcome some major limitations of existing BPM
technologies. Specifically, the ability to extend the mining beyond the single
process to focus on entire systems or investigate a single process across different
branches of an organization will pave the way to a whole new family of tech-
niques and algorithms, yet to be developed. All of that in an online fashion,
which enables rapid reaction times.
    A screencast showing the tools is available at https://youtu.be/-jrkd6kl8Nw.
A brief tutorial on how to use the APIs is available on the GitHub page of
the tools at https://github.com/pmcep/mqtt-xes/ and https://github.com/
pmcep/mqtt-xes-dashboard, where it is also possible to download all the source
code. An installation of the dashboard is available at http://mqttxes.ics.
unisg.ch/d/PkSsD3mGk/xes. MQTT-XES is used to stream the events of some
BPI Challenges (i.e., 2011, 2013-closed, 2015-m1, 2015-m2, and 2015-m3) to the
broker broker.hivemq.com:1883.

References
1. IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability
   in Event Logs and Event Streams. IEEE Std 1849-2016 pp. 1–50 (2016)
2. van der Aalst, W.: Process Mining: Data Science in Action. Springer (2016)
3. Banks, A., Briggs, E., Borgendale, K., Gupta, R.: MQTT V. 5.0. OASIS Std. (2019)
4. Burattin, A.: Process Mining Techniques in Business Environments, LNBIP,
   vol. 207. Springer (2015)
5. Burattin, A.: Streaming Process Discovery and Conformance Checking. In: Ency-
   clopedia of Big Data Technologies (2019)
6. Dean, A., Crettaz, V.: Event Streams in Action. Manning (2019)
7. Hunkeler, U., Truong, H.L., Stanford-Clark, A.: MQTT-S - A publish/subscribe pro-
   tocol for Wireless Sensor Networks. In: Proc. of COMSWARE. pp. 791–798 (2008)
8. van Zelst, S.J., Burattin, A., van Dongen, B.F., Verbeek, H.M.W.: Data Streams in
   ProM 6: A Single-node Architecture. In: BPM Demo Sessions 2014 (2014)