=Paper=
{{Paper
|id=Vol-2289/paper3
|storemode=property
|title=Anomaly Detection in Discrete Manufacturing Systems using Event Relationship Tables
|pdfUrl=https://ceur-ws.org/Vol-2289/paper3.pdf
|volume=Vol-2289
|authors=Emil Laftchiev,Xinmaio Sun,Hoang-Anh Dau,Daniel Nikovski
|dblpUrl=https://dblp.org/rec/conf/safeprocess/LaftchievSDN18
}}
==Anomaly Detection in Discrete Manufacturing Systems using Event Relationship Tables==
<pdf width="1500px">https://ceur-ws.org/Vol-2289/paper3.pdf</pdf>
<pre>
Anomaly Detection in Discrete Manufacturing Systems using Event Relationship
                                  Tables
                  Emil Laftchiev1 , Xinmaio Sun2∗ , Hoang-Anh Dau3∗ , and Daniel Nikovski1
                                       1
                                         Mitsubishi Electric Research Labs
                               e-mails: laftchiev@merl.com, nikovski@merl.com
                                                2
                                                  Boston University
                                              e-mail: xmsun@bu.edu
                                                   3
                                                     UC Riverside
                                             e-mail:hdau001@ucr.edu
                        Abstract                              Monitoring a DMP for anomalies means checking for in-
                                                              correct execution of events, incorrect event ordering, and
        Anomalies in discrete manufacturing processes         incorrect event timing. However, monitoring a DMP is dif-
        (DMPs) can result in reduced product quality, pro-    ficult, because the processes are usually orchestrated by Pro-
        duction delays, and physical danger to employees.     grammable Logic Controllers (PLCs) whose instructions are
        It is difficult to detect anomalies in DMPs, be-      programmed during design time, and are often not available
        cause sequencing devices such as programmable         later to the process engineer. Still, PLCs, due to their prox-
        logic controllers (PLCs) usually do not allow a       imity to the actual manufacturing process, are the ideal de-
        process engineer to easily determine which se-        vice for deployment of anomaly detection algorithms.
        quences of operations are observed and checking          This paper focuses on the development of anomaly detec-
        against each sequence becomes computationally         tion algorithms for incorrect event execution order in DMPs
        difficult. This paper proposes a new anomaly de-      using output data from a PLC. Our approach is to develop
        tection approach for discrete manufacturing sys-      models of normal behavior inspired by process mining tech-
        tems. The approach models the normal behavior         niques [1; 2; 3; 4], such that anomalous behavior results in
        of the DMP from the PLC output as an event re-        event sequences not observed previously. Learning models
        lationship table. This model is then used to de-      of normal behavior is an important task when monitoring
        termine if new sequences of PLC outputs could         DMPs, for three reasons. First, it is difficult to collect (bal-
        be generated by the system. Outputs that do not       anced) data sets that contain examples of anomalies when
        fit the learned model are labeled anomalous. This     the expected mean time between failures is long. Second,
        method is tested in simulation for DMPs that con-     even data sets that do contain anomalies are unlikely to con-
        tain concurrent sub-process with unique or re-        tain a sufficiently high sample number of anomalies to de-
        peated events. The results are compared to a base-    scribe the full anomaly space of the problem. Third, data
        line method proposed in prior publications. Ex-       sets containing anomalies must be labeled accordingly. This
        periments show that the proposed algorithms are       means that a human operator must label all anomalies when
        capable of achieving a higher F-score with less       they occur, which introduces additional complexity.
        than 10% of the data required by the baseline            To develop these models, this paper evaluates two types
        method. Furthermore, this modeling approach has       of processes found in DMPs: manufacturing processes that
        a linear space complexity in the length of the ob-    are composed of concurrent linear sub-processes containing
        served event sequences, as compared to polyno-        unique events, and manufacturing processes that are com-
        mial complexity of prior work.                        posed of concurrent processes with some events repeated in
                                                              a loop in each sub-process. In both cases, it’s assumed that
                                                              the discrete event sequences are extracted from the discrete
1       Introduction                                          event stream of a PLC on a per-product basis through some
Discrete manufacturing is a type of manufacturing focused     existing technology such as RFID tags. Each system that is
on producing distinct items through a sequence of discrete    modeled is demonstrated visually as a Petri Net, although
operations. A typical example of discrete manufacturing is    the end goal of the algorithms is anomaly detection and not
vehicle assembly. During vehicle assembly, each part of the   recovery of the true underlying Petri Net.
vehicle is added through a series of operations (events) on      There exists prior work in anomaly detection for DMPs
the part and the car as a whole. This manufacturing pro-      [5; 6], but the approaches presented suffer from two signif-
cess occurs at high speeds, and is in large part performed    icant disadvantages. First, prior work assumes that a com-
through automated actions taken by powerful machines and      plete data set exists such that a very precise anomaly detec-
robots. Because of the high speed and power of the compo-     tion algorithm can be trained. Second, the proposed meth-
nent machines, monitoring the discrete manufacturing pro-     ods scale non-linearly with the number of steps in the manu-
cess (DMP) is critical to ensure high product quality, min-   facturing process. In contrast, the methods proposed in this
imize production delays, and ensure the safety of human       paper are designed to be efficient with respect to the size
workers.                                                      of the model, and efficient with respect to the size of train-
                                                              ing data set. When compared to a baseline method in Sec.
    ∗
        Work performed during internship at MERL.             4.4, the approach presented in this paper achieves a better
F-score, while requiring less than 10% of the training set.       Here the time is usually described by an exponential distri-
Comparing the space complexity of the learned model, the          bution with parameter λ. From a modeling perspective, this
approach presented in this paper scales linearly in the num-      parameter can be used to induce anomalies. For example,
ber of unique events (N ) observed, O(N ), while the base-        increasing λ leads to a tighter distribution of transition du-
line method has space complexity O(M N ), where M is the          rations which could result in a new ordering of events in the
length of the observed event sequences.                           final discrete event sequence. Thus anomalies in DMPs can
                                                                  be described by timing anomalies in Petri Nets.
2     Background and Related Work
2.1    Petri Nets                                                 2.2   Anomaly detection for discrete sequences
In its classical form, a Petri Net [7] is a directed bipartite
graph with two node types called places and transitions.          For a sufficiently finely sampled time step, the changes in
These nodes are connected via directed arcs. Each arc has         the Petri Net token distribution can be observed as a se-
a defined weight, and can only connect nodes that are of          quence of discrete events that have taken place. This se-
different types. For example, an arc may connect a place          quence can be analyzed for anomalies using anomaly de-
and a transition, but may not connect two transitions or two      tection methods for discrete sequences. A comprehensive
places. A simple Petri Net consisting of two places, one          survey of anomaly detection methods in discrete sequences
transition, and two arcs is shown in Fig. 1. Here places          can be found in Chandola et. al. [8]. The authors dis-
are represented by circles and transitions by squares. When       cuss three formulations of an anomaly: whole sequence
describing discrete manufacturing processes, places corre-        anomalies where the entire sequence is anomalous; long
spond to conditions and transitions correspond to events that     subsequence anomalies within a long discrete sequence; and
could signal the start and end of a manufacturing task. A         subsequence anomalies where the subsequence frequency
condition is said to be satisfied when a token is placed in the   within the larger discrete sequence is different than nor-
corresponding place. Events may only take place when the          mal. Sequence-based anomaly detection, which addresses
preceding condition has been satisfied. Lastly, when the arc      all three formulations, is the most relevant approach to
weights are set to 1, they are typically not shown in graphi-     this paper. Sequence-based anomaly detection is addressed
cal depictions of the Petri Net.                                  through three main approaches. The first approach is kernel-
                                                                  based methods, which compute a similarity matrix for a
                                                                  given training sequence set and evaluate newly acquired se-
                                                                  quences against this similarity matrix. The second approach
                                                                  is window-based methods which combine anomaly scores
                                                                  of subsequences of the test sequence for a total test sequence
Figure 1: A Petri Net with places p1 , p2 , one transition t1 ,   anomaly score. The third approach is to learn a Markovian
and two connecting arcs each with a weight of 1.                  model from the training sequence, where the goal is to learn
                                                                  the normal distribution of the data and thus assign a proba-
  Definition: Formally a Petri net is a triple (P, T, F )         bility of being normal to a given test sequence.
where:
                                                                     A key drawback of similarity-based techniques with re-
    • P is a finite set of places                                 spect to the work in this paper is that most require a fixed
    • T is a finite set of transitions                            training and test sequence size to facilitate distance calcula-
                                                                  tions for anomaly scoring. Window-based techniques can be
    • F is a set of arcs (F ⊆ (P × T ) ∪ (T × P ))                used to overcome this limitation, but they suffer from a high
   The number of tokens in a place at any given time is at        sensitivity to the window length parameter. Both types of
least zero. The state or marking of a Petri Net refers to the     techniques suffer from an unacceptably large search space
distribution of tokens in the places. For example, for a Petri    as the number of unique symbols in the discrete sequence
Net with four places, < p1 , p2 , p3 , p4 >, a state represen-    grows.
tation such as 3p2 + 1p3 + 4p4 indicates there are 3 tokens          Markovian techniques are advantageous because they in-
in place p2 , 1 token in place p3 , and 4 tokens in place p4 .    corporate context information into analyzing future events.
The inclusion of place p1 is optional. Its absence explicitly     However, each discrete event in the sequence is not the state
indicates it contains no token.                                   of a given system and therefore Markovian methods can-
   The transitions (or events) can be fired (take place) if all   not be used directly. In such cases it is typical to use Hid-
of the input places (conditions) have at least one token (are     den Markov Model (HMM)-based techniques to address the
satisfied). A transition that is fired is said to consume one     problem. The HMM-based techniques consist of two main
token from each of it’s input places and produce one to-          approaches. The first involves learning an HMM model
ken for each of it’s output places. Thus when a condition         that best describes the normal training sequence, and then
is satisfied, an activity is executed, and a new condition is     computing the probability of each test sequence given the
achieved. Of course since multiple activities and conditions      learned model. The second approach computes the optimal
occur throughout the Petri Net, each time instant may see         hidden state sequence from each observation sequence in
multiple changes to the token distribution in the Petri Net.      the train and test set. Using the optimal hidden state model
   Each transition has the property of execution time. Or-        and a threshold, each state transition is evaluated as normal
dinary Petri Nets have instantaneous transition time, while       or abnormal. The sum of all anomalous state transitions in
timed Petri Nets have a specific duration for each transition.    a test sequence is its anomaly score. This approach suf-
When a timed Petri Net has variable (possibly stochastic)         fers from sensitivity to initialization conditions and compu-
transition times, it is referred to as a stochastic Petri Net.    tational complexity.
2.3    Process mining
An alternative approach to anomaly detection in discrete
event sequences is to first learn a generative model and then
evaluate each test sequence for adherence to the learned
model. Learning of Petri Net models from discrete event
sequences has been previously investigated in the field of
process mining. Here researchers focused on identifying an
underlying process from event log data to learn the typical
behavior. Critically, unlike the work in this paper, this re-
search did not focus on anomaly detection, but instead on
identifying event log conditions (and methods) under which
a process can be recovered. The resulting algorithms are          Figure 2: A Petri Net representing three concurrent pro-
sensitive to uncertainty in the logs. In discrete event se-       cesses in a DMP.
quences, this variability is high due to the variability of du-
ration of activities, and the repetition of some activities.
                                                                  the multiple concurrent processes are modeled as simulta-
   A comprehensive overview of process discovery tech-            neously executing branches. To execute these branches, the
niques is provided in De Weerdt et. al. [9]. In this paper        first condition in each branch is triggered by a single event
the authors note that learning Petri Nets is only tractable       whose input is one or more origin conditions. The execu-
for specialized kinds of Petri nets such as Sound Workflow        tion of events within each branch is independent of other
Nets (SWN). SWNs are a class of Petri nets with a dedicated       branches. It is assumed that the concurrent processes oper-
start and end nodes. Algorithms to learn SWN from process         ate on a single product, which means that a single final event
event logs were first introduced by W.M. van der Aalst [10;       (or series of events) denotes the end of the DMP. An exam-
11]. The first proposed algorithm was termed by the α-            ple of such a Petri Net that describes a DMP with concurrent
algorithm [12]. This algorithm was shown to be sensitive to       processes is given in [14]. This model is depicted in Fig. 2,
noise, incompleteness of event logs, and repeated events. To      and is used to demonstrate the proposed algorithm.
solve the robustness problem of the α-algorithm, Heuristic-
                                                                     The Petri Net in Fig. 2 has 2 initial places P0 and P1
sMiner algorithm was developed in [4]. HeuristicMiner ad-
                                                                  (either of which can start the process) and a final place, P18.
dresses: noise in event logs by adding weights to the graph
                                                                  The initial places are the input to transitions A and B, which
arcs, and event path uncertainty by using a Causal Matrix in-
                                                                  both share an output place, P2. The token at place P2 is
stead of a Petri Net. The authors also argue that the F-score
                                                                  consumed by transition C (t2), which in turn triggers four
is the best metric to evaluate algorithm performance.
                                                                  concurrent processes. The first process is represented by
   A successor to the α-algorithm, the α+ algorithm, fo-
                                                                  events D and E; the second process is represented by events
cuses on detecting repeated events in short loops [2]. Here
                                                                  F and G; the third process is represented by events H and
the authors first identify which elements are in a loop of
                                                                  I; and the forth process is represented by events J and K.
length 1 and then proceed with the original α-algorithm. A
                                                                  All four processes must complete in order to trigger event L
third algorithm, the α++ algorithm is designed to identify
                                                                  (t11), which is followed by a sequence of events M, N, and
more complex event flows such as AND/XOR splits and
                                                                  O. This is a real-world example of concurrency in a DMP,
joins [3]. Lastly, the β algorithm specifically focuses on
                                                                  and it’s easy to see that the complexity of this system can be
identifying concurrent processes based on their start and end
                                                                  scaled while preserving the underlying concept.
events [13].
                                                                     In Fig. 2, λ is the parameter of the exponential distribu-
                                                                  tion describing the execution time of each transition. For
3     Learning Models of Normal Operation                         example, in the figure, transition C has a λ value of 1. This
      using Event Relationships Table                             means that any single duration of this transition is a realiza-
                                                                  tion of a random variable with an exponential distribution
This section introduces model learning of normal behavior
                                                                  with mean λ = 1.
of two types of processes found in DMPs: manufacturing
processes that are composed of concurrent sub-processes,          Finding the Discrete Event Sequence in PLC Data
and manufacturing processes that are composed of concur-
                                                                  We now describe the design of an anomaly detection algo-
rent processes with some events repeated in a loop in each
                                                                  rithm that is capable of finding anomalies in discrete event
sub-process. In the first case, all events that occur in the
                                                                  sequences. The algorithm presented herein is inspired by
manufacturing sequence are unique, while in the second
                                                                  previous work in process mining algorithms presented in
case, some events are repeated in a loop. Here we restate our
                                                                  Sec. 2. Note here that the previously published algorithms
assumption that the discrete event sequences are extracted
                                                                  focused on first identifying event successions in the form of
from the discrete event stream of a PLC on a per widget
                                                                  event pairs and then applying sets of rules to the discovered
basis through some existing technology such as RFID tags.
                                                                  events such that a valid Petri Net could be recovered. In the
3.1    Algorithm I: Anomaly Detection without                     case of anomaly detection, it is not necessary to recover the
                                                                  final Petri Net, instead it is sufficient to focus on the succes-
       Event Repetition
                                                                  sion of events as evidenced by the discrete event sequences
Concurrent Processes in a Petri Net                               in the training data set. By removing the need to find a valid
The first discrete manufacturing process under considera-         Petri Net, this approach reduces the sensitivity to noise in
tion is one that is composed of concurrent sub-processes          modeling. We begin by briefly describing how to recover
with unique events that occur only once during the DMP            discrete event sequences from PLC output. For this purpose
for each product. When modeling this DMP as a Petri Net,          we use the following procedure.
   First, the data is loaded into a table or matrix. The num-                                   A    B     C
ber of columns in this table is N and the number of rows                                  A          →
is M. Here N denotes the number of number digital outputs                                 B     ←           ||
of the PLC. The rows correspond to the number of samples                                  C           ||
observed from the DMP. For sufficiently short processes (a
small number of events, N) with reasonable sample rates             Table 1: A simple event table describing the relationship
and sufficiently short time duration (small M), it is possible      among three events: A, B, and C.
to load the data entirely into memory. In such cases, it is
simpler to remove missing or corrupted data values. PLC
                                                                    following: a A → B denotes that event B follows event A; a
digital outputs are particularly straight forward to clean, be-
                                                                    A ← B denotes that event A follows event B; and || denotes
cause the digital signal consists of only two levels: 0 or 1.
                                                                    that events A and B follow one another but can happen in
If the data is too large to fit into memory, then cleaning the
                                                                    any order.
data must be interwoven with the second step, change de-
tection.
   The second step in this procedure is to detect changes in        Algorithm 1 Create Event Relationship Table
the digital outputs, or simply change detection. Specifically,      Input: a sequence of unique events, σ = t1 , t2 , ..., tn
for each of the N signal traces observed in the PLC out-            Output: an event relationship table
put, we find the instances at which the trace changes from           1: create an N+1 x N+1 dimensional table
0 to 1 or from 1 to 0. For large files, change detection can            LOOP Process:
be performed iteratively, scanning a single line of the data         2: for for n ∈ [0, len(σ) − 2] do
file, and finding the columns whose state has changed. This          3:    place → in row tn and column tn+1
means that instead of iterating through a large table, here, a       4:    place ← in column tn and row tn+1
state change table is created whose rows correspond to time          5:    if ← and → in cell then
stamps of at least 1 sensor state change, and whose columns                   replace with ||
correspond to sensors in the PLC output.                             6:    end if
   An important question is whether only activation state            7: end for
changes or activation and deactivation state changes should          8: return event relationship table
be recorded. That is, should sensor A switching from 0 to 1
be an event A1, and sensor A switching from 1 to 0 also be             Table 1, built using Alg. 1, shows the following event
an event, A0. For the Petri Net model proposed in this pa-          relations: event A always precedes event B, and events B
per, we argue that it is possible to only extract the sequence      and C always occur after one another but the order is not
of activations, ex. A1. However, it is easy to imagine other        defined. Example sequences that can be generated by this
cases where the deactivation events are equally important.          table are ABCB, AB or BCBC. A sequence that cannot be
In this case both activation and deactivation events can be         generated according to this table is BAC.
incorporated into the discrete event sequence.                         To obtain a relationship table that fully describes a given
   The third step, having isolated the changes in state for         DMP, the learning of the relationship table must be iterated
each sensor in the PLC data file, is to determine the se-           through a training data set of discrete event sequences. Im-
quence of changes. To find the sequence of changes, the             portantly, this data set must contain a sufficient sampling of
sensor columns are first ordered with respect to time. The          the possible pairwise event sequences exhibited by the DMP
table is rebuilt by sequentially appending columns in order         to learn a complete event relationship table. This is not the
of occurrence. For example, if sensor A has been triggered          same as observing all possible discrete event sequence gen-
at time 0, this is placed as the first column in the rebuilt ta-    erated by the DMP. Section 4.4 shows that relatively few
ble. If the next sensor to observe a change is sensor C, this       sequences need to be observed to generate a full event rela-
column is placed as the next column in the table.                   tionship table.
   Having processed the data, the discrete event sequence
observed from the PLC is simply the ordered sequence of             Anomaly Detection Using the Event Relationship Table
column headers from the event change table. If the PLC data         Performing anomaly detection using a relationship table
files include data that is specific to a particular product, then   such as the table shown in Table 1 is straightforward. All
repeating this procedure across data files will results in a set    that must be determined is if the the observed sequence can
of discrete event sequences observed across the manufacture         be the result of the pairwise event sequences as shown in
of a set of products.                                               the event relationship table. To see if this is the case, simply
                                                                    read the discrete event sequence from the first event to the
Building an Event Relationship Table                                last event, checking at each pairwise event sequence if it is
Having extracted the discrete event sequence from PLC               an allowed transition in the table. If any transition is not al-
data, this section proposes a simple model for the normal op-       lowed, then an anomaly is declared and the specific process
eration of a DMP: the Event Relationship Table (ERT). The           in the DMP can be investigated.
ERT is square and has rows and columns corresponding to
all unique events in the observed discrete event sequences.         The Weakness of the Proposed Algorithm
Its entries represent the temporal relationship between any         The algorithm presented in this section is attractive because
pair of events. An example of a relationship table with three       of its simplicity and interpretability. However, the devel-
unique events: A, B, and C is shown in Table 1.                     oped algorithm has been strictly limited to sequences that
   The relationship table is populated using the algorithm          contain only unique discrete events that appear only once
shown in Alg. 1. Here the discrete sequence of events (ti ) is      during a sequence. The reason for this is that anomaly de-
denoted by σ. The succession of events is described by the          tection using this approach may fail when events in the se-
                                                                   addition of these transitions has the effect of adding repeated
                                                                   events to the observed discrete event sequences. This re-
                                                                   sults in a possibly infinite set of event sequences that can
                                                                   be generated by this Petri Net, or equivalently the DMP.
                                                                   With respect to the event relationship tables discussed in the
                                                                   last section, this leads to the following observation. The di-
                                                                   mensions of the event relation table (NxN) will be smaller
                                                                   than the number of observations of the PLC output signal
                                                                   (M). This is the case because the event relationship table
                                                                   contains only unique events in the discrete event sequence.
                                                                   Thus we observe that the size of the event relationship table
                                                                   is bounded by the number of unique events in the DMP and
                                                                   not the length of the observed discrete event sequences.
                                                                   Learning Sequences of Events
                                                                   To improve anomaly detection in the presence of repeated
                                                                   events, we expand the concept of an event from a single
Figure 3: A Petri Net representing three concurrent pro-           atomic event to a complex event that consist of short se-
cesses in a DMP with events repeated in a loop.                    quences of atomic events that repeat with a high frequency
                                                                   within the discrete event sequences. For example, suppose
                                                                   that three distinct discrete events occur in a sequence: A, B,
quence are not unique, and specifically, when some events          and C. Suppose further that events B and C repeat in a loop
are repeated in a loop.                                            as BCBCBC such that the observed sequence is ABCBCBC.
   To see why this might be the case, consider the follow-         Then instead of using three single events, a better approach
ing. Suppose that event A always precedes event B, and             is to learn that the basic repeated loop is BC, and that there
both occur only once. Then, the presented algorithm would          are two events in this sequence: A and BC, the latter of
learn the relationship A → B. Now, suppose event A oc-             which is a complex event.
curs first, then event B takes place but this time event A is         Learning the smallest set of event subsequences that de-
repeated after event B. Then, the learned relationships are        scribe a discrete event sequence is generally an NP-hard
both A → B, and B → A which means that the learned re-             problem [15]. For this reason, this paper focuses on find-
lationship will be A||B. This relationship has low diagnostic      ing short event subsequences that occur more than once, and
value. This is because the relationship applies regardless of      uses the remaining single events to complete the set of pat-
the event order, which means that both AB or BA are valid          terns that describe the discrete event sequences. Obtaining
event sequences. Importantly, this relationship cannot help        a computationally feasible solution to the problem is aided
to specifically identify the sequence ABA as normal. Thus,         by two assumptions:
the algorithm will fail to detect AB and BA as abnormal se-
quences (a miss and a false negative in detection, respec-           • If events are repeated in a loop, the events may appear
tively). Such failures will result in an increased missed de-          more than once, and may appear contiguously in a se-
tection rate. To address this issue the section to follow ex-          quence.
tends the developed algorithm to improve its performance             • Events that are not repeated in a loop appear as unique
for the case of repeated events in a discrete event sequence.          events in the sequence.

3.2   Algorithm II: Anomaly Detection with Event                   The first assumption is important, and it follows from the
                                                                   stated goal of separating the discrete event sequences by
      Repetition
                                                                   product (manufactured item). If events are allowed to repeat
Real-life DMPs are often much more complicated than the            in a non-contiguous fashion, this would imply that multi-
concurrent process DMP presented in the last section. This         ple products are entering the discrete manufacturing process
is because events may be repeated within any of the DMP            during the given event sequence.
processes. Consider the example DMP for phone manufac-
turing in [13]. This DMP executes multiple concurrent pro-         Finding Repeated Event Sequences
cesses, and is completed with an event loop that checks the        Build on the previously introduced notation, the set of all
quality of the final product. Such repetition of events re-        contiguous subsequences within σ is W . The set of subse-
sults in entries of the event relationship table that have low     quences that occur with a frequency of at least 2 are part of
diagnostic value.                                                  the set Whf . The set of unique events that are not part of a
   In this section, we improve the previously proposed algo-       subsequence and occur only once is We . The union of sets
rithm so that it is capable of identifying event loops within      Whf and We is the set P which contains the set of all unique
each of the concurrent processes. To develop this algorithm,       events and high frequency subsequences that describe a dis-
this section proposes a more complex Petri Net model,              crete event sequence. Given this notation and assumptions,
shown in Fig. 3, that contains both concurrent processes           event subsequences repeated with high frequency can be
and events repeated in a loop within each process. In con-         found as shown in Alg. 2.
trast to the model in Sec. 3.1, this Petri Net is not published.       This algorithm begins with a sequence of events σ =
To the best of the authors’ knowledge, this model is more          t1 , t2 , t3 , . . . , tn which has K unique events, where K is
complex than published DMP models with event loops.                smaller than the sequence length (K ≤ n). In step 1
   Here, a new transition (D, G, or J) is inserted into each       the algorithm generates all subsequences w0 with length
process such that some events are repeated in a loop. The          len(w0 ) ∈ [2, K] and places these subsequences in a set
W . W has a size of (n − 1) + (n − 2) + . . . + (n − K) =                                   A    B     C    DE
Kn−K(K +1)/2 subsequences. Step 2 finds a subset Whf                                  A          →
whose component subsequences have a frequency of occur-                               B    ←           ||   →
rence in σ ≥ 2. Steps 3 through 5 are a loop which is used                           C           ||         →
to remove subsequences of high occurrence from the origi-                            DE          ←    ←     *
nal event sequence σ. This loop returns σ̂ which contains all
single events that do not belong to high frequency patterns.        Table 2: A simple relationship table that includes event pat-
Step 6 places all events in σ̂ in We . Lastly, step 7 returns the   terns repeated in a loop.
intersection of We and Whf , P .
                                                                    presented in this paper creates compact representations of
Algorithm 2 Find the set of high frequency patterns/single          the discrete event sequences.
events, P , that covers a sequence of events
Input: Sequence of K unique characters, σ                           4     Experimental Results
Output: P = We ∩ Whf
 1: W ← Find w0 = tnk where n1 < n2 < ... st.                       This section describes an empirical verification for the pro-
    len(w0 ) ∈ [2, K]                                               posed algorithms in simulation. In addition, we discuss a
 2: Whf ← w0 ∈ W with freq > 2                                      suitable evaluation error metric to be used, the minimum
    LOOP Process:                                                   number of experiments performed, and the cross-validation
 3: for each wi ∈ Whf do                                            method used to ensure generalizability of the results.
 4:    σ̂ ← remove w0 from σ
 5: end for return σ̂                                               4.1    Choosing the Performance Metric
 6: place all events in σ̂ in We                                    Anomaly detection is a classification problem with binary
 7: return P = We ∩ Whf                                             classes: normal (negative) or abnormal (positive). Anomaly
                                                                    detection algorithms are typically described by their sensi-
                                                                    tivity and precision. Sensitivity is the number of times that
Build a Pattern Relationship Table                                  the positive class was correctly identified divided by the
Using the set P , it is now possible to build a relationship        number of all positive class instances (true positive rate).
table as shown in Section 3.1. Because the events here may          Precision is the number of truly anomalous cases identified
be subsequences of events, the new table is referred to as          relative to the number of all cases identified as anomalous
a pattern relationship table. The word pattern specifically         (positive) by the algorithm.
denotes the fact that both unique events and high frequency            Precision and sensitivity can be combined in a single
subsequences occur in P . Patterns that occur with high fre-        statistic called the F-score, defined as the harmonic mean
quencies are denoted by a new event relationship type, *.           of precision and sensitivity. The F-score allows a simple 1-
The procedure to build the relationship table is shown in           dimensional comparison to be made between algorithms. In
Alg. 3. An example of a relationship table that encodes             this paper, the F-score is used to compare the presented al-
events A, B, C, and DE is shown in Table 2.                         gorithms with a known anomaly detection algorithm based
                                                                    on a prefix tree, which is a baseline method [6].
Algorithm 3 Create Pattern Relationship Table
                                                                    4.2    Choosing the Number of Experiments
Input: σ, P
Output: a pattern relationship table                                Typically, algorithm evaluations assume that the training
 1: create an len(P )+1 x len(P )+1 dimensional table               data set is complete, which means that it contains a thorough
    LOOP Process:                                                   enumeration of all possible normal and abnormal cases.
 2: while σ do                                                      Given this assumption, an algorithm’s error rates are rep-
 3:    find patterns p1 and p2 ∈ P s.t. p1, p2 are consecu-         resentative of the shortcomings of an algorithm’s ability
       tive and are at the beginning of σ                           to capture the problem domain. However, in the case of
 4:    place → in row p1 and column p2                              anomaly detection algorithms for DMPs, this assumption
 5:    place ← in column p1 and row p2                              may not hold, for several reasons. First, events in a DMP
 6:    if ← and → in cell then                                      can be stochastic in duration. Second, DMPs are designed
 7:       replace with ||                                           to be highly reliable, providing few opportunities to observe
 8:    end if                                                       anomalies. Third, deployment cost of DMPs is tightly cou-
 9:    if pi for i ∈ [1, 2] repeats in a loop then                  pled to deployment time which means that training data set
10:       place ∗ in row pi and column pi                           collection must be as short as possible. We should also note
11:    end if                                                       that some DMPs with loops in their execution logic would
12:    remove p1 from σ                                             produce an infinite number of valid sequences, so exhaus-
13: end while                                                       tive enumeration of all of them would be impossible.
14: return pattern relationship table                                  The high reliability of DMPs, coupled with desired short
                                                                    deployment times, means that anomaly detection algorithms
                                                                    that are designed for DMPs would have typically have to
   Note here that this event relationship table contains four       work with less than sufficiently well sampled training data
base elements: A, B, C, and DE, while the number of unique          set. The stochastic duration of events in the DMP means
elements in the sequence is five. Furthermore, the length           that there is a much larger number of possible discrete event
of the discrete event sequences generated by this table can         sequences that can be generated than one would expect from
be arbitrary. Thus, this table illustrates that the approach        the number of events in the DMP.
   Prior work has modeled the stochasticity in event dura-
tion of Petri Nets using exponential, Gaussian, or triangular
distributions [16]. In this paper, an exponential distribution
is used to describe the event duration, and the parameter of
the exponential distribution is described by λ. Given the
stochasticity in a process, it is instructive to calculate the
number of possible discrete event sequences that could be
generated.
   Here we use the real DMP shown in Fig. 2. In this DMP
the variation in discrete event sequences is produced by the
four concurrent processes. These processes start with tran-
sition C and end with transition L. Then, the number of pos-
sible generated sequences is equal to the number of feasi-
ble permutations of DEFGHIJK. The total number of the
permutations of these 8 events is 8! Among all the permu-
                                                                  Figure 4: Experimental results for a DMP composed of con-
tations, D must come before E, F must come before G, H
                                                                  current sub-processes each with unique events.
must come before I, and J must come before K. The proba-
bility of this ordering is 0.5 ∗ 0.5 ∗ 0.5 ∗ 0.5 = 0.0625, and
the number of possible sequence is 8! ∗ 0.0625 = 2, 520.          4.4   Experimental Results
   This means that any indexing method, for example prefix        This section presents the performance of the proposed al-
trees (the baseline method of this paper), must be trained on     gorithms as compared to the performance of a baseline
all 2, 520 unique event sequences to achieve the best pos-        method. Here the baseline method is the prefix tree method,
sible performance (F-score). In practice, when evaluating         which was chosen because of its relation to prior published
the performance of algorithms, many more data sequences           work.
(simulated experiments) must be collected. This is because
each new sequence is not necessarily unique. For this rea-        Algorithm performance with unique events in
son, the experimental results in Sec 4.4 show a succession        concurrent processes
of training data set sizes ranging up to 10, 000 training se-     The first set of experiments demonstrate the performance
quences. This succession shows the F-score improvement            of the developed algorithms, when the simulated DMP has
as a function of training data set size.                          concurrent processes, but no repeated events (Fig. 2).
                                                                  The experiments are performed on training sets of up to
4.3   Cross-Validation/Experiment Method                          10, 000 discrete event sequences using the method of cross-
Lastly, it is important to ensure that the algorithm evalua-      validation with 10 partitions. The results of this experiment
tion is not biased to any single training data set. That is,      are shown in Fig. 4. In this figure, the pattern relationship
the experimental method should ensure that the algorithm          table algorithm is represented by an orange line, the event
performance statistics do not depend on a single data set,        relationship algorithm is represented by a blue line, and the
but rather represent generalized statistics that might be ob-     prefix method is represented by a green line. Each point
served on a new test data set.                                    on the plot represents an averaged F-score for the given
   In machine learning, such generalized statistics are           method, and given the number of discrete event sequences in
customarily determined using a technique called cross-            the total training data set. About each line, the shaded region
validation (CV). In CV, models are trained iteratively on         represents two standard deviations in the F-score within the
partitions of the available data, and tested on a held out data   cross-validation rate set.
set. Both the training and testing data sets are changed for         This plot shows that the baseline method’s F-score even-
each iteration, and the performance statistics are averaged       tually approaches that of the pattern relationship table al-
over all iterations. This paper uses CV, and therefore the re-    gorithm, but only as the training data set size approaches
ported rates in Sec. 4.4 are averaged across the number of        10, 000 sequences. In contrast, the algorithms presented
tested partitions.                                                in this paper achieve their maximal F-score with only 110
   For example, suppose that 1, 100 normal discrete event         (1.1%) of the training discrete event sequences required by
sequences are generated in simulation, and 100 abnormal           the baseline method. The reason for this is that the meth-
discrete event sequences are also generated in simulation.        ods presented here are able to learn event transitions even in
Anomalies are generated from normal sequences by: intro-          partially unique sequences, while the prefix method requires
ducing an erroneous event transition (ex. normal: A → B,          all possible event sequences to be observed and stored in a
abnormal: B → A), removing an element of the discrete             prefix tree.
event sequence, and introducing a repeated event. These              Lastly, note that the pattern relationship table has the
types of anomalies are placed in 33, 33, and 34 discrete          higher F-score even after all 10, 000 training sequences are
event sequences, respectively.                                    observed. The reason for this is that pattern relationship ta-
   During CV, 100 normal sequences are set aside for test-        ble has learned subsequences of events that occur with high
ing, and the remaining 1, 000 normal sequences are split into     frequency, and therefore has learned event transitions that
10 partitions. For each of the 10 partitions, each algorithm      are of more importance than single events.
is trained, and it’s performance is tested using the 100 nor-
mal and 100 abnormal held out discrete event sequences.           Algorithm performance with repeated events in
For each partition, the F-score of all algorithms under test-     concurrent processes
ing is calculated. After the 10th iteration, the F-scores are     The second set of experiments showcases the performance
averaged and the average score is returned.                       of the developed algorithms, when the simulated DMP has
                                                                       tending the alpha-algorithm to mine short loops. Tech-
                                                                       nical report, BETA Working Paper Series, 2004.
                                                                  [3] Lijie Wen, Wil MP van der Aalst, Jianmin Wang, and
                                                                       Jiaguang Sun. Mining process models with non-free-
                                                                       choice constructs. Data Mining and Knowledge Dis-
                                                                       covery, 15(2):145–180, 2007.
                                                                  [4] AJMM Weijters, Wil MP van Der Aalst, and AK Alves
                                                                       De Medeiros. Process mining with the heuristics
                                                                       miner-algorithm. Technische Universiteit Eindhoven,
                                                                       Tech. Rep. WP, 166:1–34, 2006.
                                                                  [5] Sicco Verwer. Efficient Identification of Timed Au-
                                                                       tomata: Theory and practice. PhD thesis, Delft Uni-
                                                                       versity of Technology, 2010.
Figure 5: Experimental results for a DMP composed of con-         [6] Oliver Niggemann, Benno Stein, Alexander Maier,
current sub-processes with events repeated in a loop in each           Asmir Vodenčarevič, and Hans Kleine Büning. Learn-
process.                                                               ing behavior models for hybrid timed systems. In
                                                                       AAAI Conference on Artificial Intelligence, AAAI’12,
                                                                       2012.
concurrent processes each with one repeated event (Fig. 3).
Here, we use the same experimental parameters as in the           [7] Christos G Cassandras and Stephane Lafortune. Intro-
previous section. The results of the experiment are shown              duction to discrete event systems. Springer Science &
in Fig. 5.                                                             Business Media, 2009.
   Once again, the pattern algorithm and the event relation-      [8] Varun Chandola, Arindam Banerjee, and Vipin Kumar.
ship table algorithm both achieve their respective maximal             Anomaly detection for discrete sequences: A survey.
F-scores with less than 1, 000 (10%) sequences in the train-           IEEE Transactions on Knowledge and Data Engineer-
ing data set. In contrast, the prefix tree method, even after          ing, 24(5):823–839, 2012.
10, 000 training data sequences, still has an F-score that is     [9] Jochen De Weerdt, Manu De Backer, Jan Vanthienen,
lower than the F-score of the event relationship table algo-
                                                                       and Bart Baesens. A multi-dimensional quality as-
rithm. These experiments show that when events are re-
                                                                       sessment of state-of-the-art process discovery algo-
peated in a loop, the variable length of the resulting discrete
                                                                       rithms using real-life event logs. Information Systems,
event sequence is a limitation for indexing methods like the
                                                                       37(7):654 – 676, 2012.
prefix tree, most likely because of the size of the required
training data set.                                                [10] Wil MP Van der Aalst. The application of petri nets to
                                                                       workflow management. Journal of circuits, systems,
5   Conclusions                                                        and computers, 8(01):21–66, 1998.
In this paper, we propose a novel approach for anomaly de-        [11] Wil MP Van der Aalst. Verification of workflow nets.
tection in discrete manufacturing processes (DMPs). The                In International Conference on Application and The-
approach is to learn models of normal operation of a DMP               ory of Petri Nets, pages 407–426. Springer, 1997.
in the form of relationship tables between pairs of events.       [12] Wil Van der Aalst. Process discovery: An introduc-
These tables describe compactly the sequences of discrete              tion. In Process Mining, pages 163–194. Springer
events that can be observed in a DMP. The relationship ta-             Berlin Heidelberg, 2016.
bles can then be used to determine if a new sequence of           [13] Lijie Wen, Jianmin Wang, Wil M. P. van der Aalst,
events occurring in the DMP is anomalous or not.                       Biqing Huang, and Jiaguang Sun. A novel approach
   The algorithms proposed herein are tested in simulation             for process mining based on event types. Journal of
on two types of DMPs: DMPs with concurrent processes                   Intelligent Information Systems, 32(2):163–190, Apr
that have only unique events per process, and DMPs with                2009.
concurrent processes and repeated events within each pro-
cess. The performance of our algorithms is compared to a          [14] Dejan Gradišar and Gašper Mušič. Petri-net modelling
baseline method similar to that proposed in previously pub-            of an assembly process system. In Proceedings of the
lished work. The results show that the presented algorithms            7th International Ph.D. Workshop: Young generation
are capable of achieving a high F-score even for relatively            viewpoint, volume 7, page 16. Institute of Information
small training data sets. Furthermore, the presented mod-              Theory and Automation, 2006.
els are compact, reducing the space complexity of the DMP         [15] Danny Hermelin, Dror Rawitz, Romeo Rizzi, and
model from polynomial complexity to linear complexity.                 Stéphane Vialette. The minimum substring cover
                                                                       problem. In Christos Kaklamanis and Martin Skutella,
References                                                             editors, Approximation and Online Algorithms, pages
[1] W. van der Aalst, T. Weijters, and L. Maruster. Work-              170–183, Berlin, Heidelberg, 2008. Springer Berlin
    flow mining: discovering process models from event                 Heidelberg.
    logs. IEEE Transactions on Knowledge and Data En-             [16] Subhash Chandra, Muhammad Al Salamah, and
    gineering, 16(9):1128–1142, Sept 2004.                             Vakkar Ali. Stochastic simulation of assembly line for
[2] AK Alves de Medeiros, BF Van Dongen, WMP Van                       optimal sequence using petri nets (pn). 11:26–33, 01
    Der Aalst, and AJMM Weijters. Process mining: Ex-                  2014.

</pre>