<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>VisuELs: Visualization of Event Logs (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gae¨l Bernard</string-name>
          <email>gael.bernard@utoronto.ca</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Periklis Andritsos</string-name>
          <email>periklis.andritsos@utoronto.ca</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>CRP</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leucocytes</institution>
          ,
          <addr-line>LacticAcid, ER Registration, ER Sepsis Triage, + 8 others...</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Toronto, Faculty of Information</institution>
          ,
          <addr-line>Toronto</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <abstract>
        <p>-We propose a technique to transform event logs of any size into compact visualizations that we call VisuELs (Visualization of Event Logs). VisuELs are particularly useful in the exploratory phase of a process mining project to extract key insights about an event log (e.g., average length, top activities, patterns of behaviours). New VisuELs can be generated using Python or a web-based tool without fine-tuning any parameter. Index Terms-process mining, sampling, visualization</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>II. VISUELS</title>
      <p>VisuELs aims to be compact and readable–no matter the
input logs’ complexity, size, or traces’ length. To achieve
this, we show only a few representative traces. Similar to
the dotted chart, the rows show the representatives, and the
(coloured) squares represent the (type of) activities. In Fig. 1,
we show a VisuEL of the Sepsis event logs composed of 1 049
cases. Despite the relative simplicity of the representation, a
VisuEL achieves four ambitious goals. VisuELs should: (G1)
summarize the logs; (G2) be easy to interpret; (G3) be easy
to build; (G4) be comparable. Next, we present five features
that contribute to the fulfillment of these goals.</p>
      <p>Downsizing Scale. To choose the number of representatives
to display on VisuELs, we propose the following downsizing
scale: dlog1:5(s)e, s being the number of cases of the original
event logs. Typically, a log of 10 000 traces is summarized by
23 representatives. Even extremely large or small logs would
fit in a grid of reasonable size; e.g., log1:5(109) = 52 and
dlog1:5(3)e = 3. Similar to reducing VisuELs’ vertical extent,
1049 cases
we limit the horizontal size by showing a maximum of 20
activities. As can be seen in Fig. 1, suspension points highlight
the presence of longer traces. These methods to limit VisuELs’
sizes enable the visualization of logs of any size in a readable
way (G1) and make their comparison possible (G4).</p>
      <p>
        Sampling. To select the traces that will appear on the
VisuEL, we borrow the iterative c-min sampling technique that
has been shown to produce the most representative downsized
event logs–in terms of earth movers’ distance from the original
event logs [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Using this technique, we ensure that a VisuEL
fairly represents the input logs (G1).
      </p>
      <p>Colors and Legend. For readability purposes (G2), we
colour only the top 5 activities and use a neutral gray colour
for the other activities. In addition, we added an option to
produce several VisuELs using a single shared legend. This
way, distinct VisuELs will use the same colours for the same
activities, making their comparison easier (G4). The advantage
of the shared legend is highlighted in the second use case,
where we visualize clusters of traces.</p>
      <p>Ordering. The traces are sorted by similarity to
facilitate their reading (G2). To achieve this, we measure their
Levenshtein distance, and then we apply an approximation
of the travelling salesman problem to find the ordering that
minimizes the distance. Ultimately, similar traces will appear
next to each other, which facilitates the identification of
patterns.</p>
      <p>Parameter Free. To make the creation of VisuELs flawless
(G3), we ensure that it is possible to create VisuELs without
having to fine-tune any parameters.</p>
    </sec>
    <sec id="sec-2">
      <title>III. USE CASE</title>
    </sec>
    <sec id="sec-3">
      <title>We showcase VisuELs using two use cases.</title>
      <sec id="sec-3-1">
        <title>A. Logs Gallery</title>
        <p>We transformed 18 mainstream datasets from process
mining into VisuELs. Due to space constraints, only one of them
is visible in Fig.1, while the other ones are visible online1.
The 18 VisuELs provide a clear overview of the datasets
from where we can extract insights such as the occurrence of
loops of size 1 (BPI 2017), traces often starting with the same
set of activities (BPI 2012), a broad set of unique activities
(BPI2018), few variants appearing many times (BPI2020 1),
or short traces (BPI2020 5).</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Clusters of traces</title>
        <p>We used ngrams and KMeans to discover 12 clusters of
similar traces from the dataset BPI 2020 competition (Permit
Log). The goal was to highlight the ability of VisuELs to
summarize the clustering results. In Fig. 2, we show the
original Permit Log and 4 clusters–all the clusters are visible
online1. We used a shared legend to ease their comparison.
Overall, we can extract valuable insights from the VisuELs
visible in Fig. 2. First, clusters 4 and 5 seem to be relatively
structured. Second, cluster 5 does not have the activity
‘declaration submitted by employee’ compared to other clusters.
Third, cluster 8 look very chaotic and lengthy. Fourth, in
cluster 9, the activity ‘declaration submitted by employee’
occurs 3 times per trace, a behaviour specific to this cluster.
We argue that such observations may be difficult to extract if
one has to switch between various views for each cluster.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>IV. ARCHITECTURE AND SCALABILITY</title>
      <p>
        VisuEL is written in Python and produces scalable vector
graphics (SVG). It can read several formats including XES
files [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and PM4py object [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Moreover, we also propose a
web-based tool version. The source code, the web tool, and
an introductory video are available online1.
      </p>
      <p>VisuELs are fast to generate, even for large event logs. The
longest time to build a VisuEL in our use case was for the
‘BPI 2018’ dataset composed of 2.5M events, where it took
42 seconds using a machine with 16GB of RAM, 4 CPUs, and
a processor speed of 2.8 GHz. The time can be further reduced</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Sani</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          , and W. M. van der Aalst, “
          <article-title>The impact of biased sampling of event logs on the performance of process discovery</article-title>
          ,” Computing, pp.
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Zerbato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Soffer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Weber</surname>
          </string-name>
          , “
          <article-title>Initial insights into exploratory process mining practices,” in Business Process Management Forum, A</article-title>
          . Polyvyanyy,
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Wynn</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. Van Looy</surname>
          </string-name>
          , and M. Reichert, Eds. Cham: Springer International Publishing,
          <year>2021</year>
          , pp.
          <fpage>145</fpage>
          -
          <lpage>161</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C. W.</given-names>
            <surname>Gu</surname>
          </string-name>
          <article-title>¨nther and</article-title>
          <string-name>
            <surname>W. M. Van Der Aalst</surname>
          </string-name>
          , “
          <article-title>Fuzzy mining-adaptive process simplification based on multi-perspective metrics</article-title>
          ,” in International conference on business process management. Springer,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Song</surname>
          </string-name>
          and
          <string-name>
            <surname>W. M. van der Aalst</surname>
          </string-name>
          , “
          <article-title>Supporting process mining by showing events at a glance,”</article-title>
          <source>in Proceedings of the 17th Annual Workshop on Information Technologies and Systems (WITS)</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>139</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jalali</surname>
          </string-name>
          , “
          <article-title>Reflections on the use of chord diagrams in social network visualization in process mining</article-title>
          ,” in
          <source>2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS)</source>
          . IEEE,
          <year>2016</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bernard</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Andritsos</surname>
          </string-name>
          , “
          <article-title>Selecting representative sample traces from large event logs</article-title>
          ,” in International Conference on Process Mining. Springer,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>[7] “IEEE standard for extensible event stream (xes) for achieving interoperability in event logs and event streams</article-title>
          ,
          <source>” IEEE Std 1849-2016</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          , and
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , “
          <article-title>Process mining for python (pm4py): Bridging the gap between process-and data science</article-title>
          ,
          <source>” CoRR</source>
          , vol. abs/
          <year>1905</year>
          .06169,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>