<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Python-JXES: Python Implementation of JSON Serialization for XES Standard</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maxim Vidgof</string-name>
          <email>maxim.vidgof@wu.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Download/Demo URL</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Documentation URL</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vienna University of Economics and Business (WU Wien)</institution>
          ,
          <addr-line>Welthandelsplatz 1, 1020 Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>18</lpage>
      <abstract>
        <p>Process mining requires event logs. XES is a widely accepted format for storing and exchanging event log data, however, it only has XML serialization, leading to sub-optimal storage and time requirements in certain scenarios. JXES is a recently proposed serialization format for XES based on JSON, a more lightweight data interchange format. This paper presents Python-JXES: a Python implementation of JXES. Its read and write performance is evaluated against a certified state-of-the-art tool, and file sizes of JXES and XES serializations of open-source event logs are compared. JXES achieves up to 33% storage savings and up to 73% faster read speeds. Python-JXES can be used to facilitate eficient event log storage, streaming process mining and process mining on IoT devices.</p>
      </abstract>
      <kwd-group>
        <kwd>JXES</kwd>
        <kwd>XES</kwd>
        <kwd>Event log format</kwd>
        <kwd>Event data</kwd>
        <kwd>Process mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR</p>
      <p>ceur-ws.org
Screencast video
Value
Python-JXES
0.1
Apache 2.0
Python
Microsoft Windows, GNU/Linux
https://pypi.org/project/jxes/
https://github.com/MaxVidgof/python-jxes
https://github.com/MaxVidgof/python-jxes
https://youtu.be/8adiYqeczAs</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Exchanging event logs is crucial for research and practice of Process Mining (PM) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. XES is a
widely accepted XML-based standard for event log serialization, facilitating exchange of event
logs. Despite the flexibility and tool support of XML, however, JSON is gaining popularity as a
more lightweight data interchange format. This can be seen by the fact that newer standards for
event log data start ofering JSON serialization: OCEL [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] ofers JSON serialization in addition
to XML and SQLite, and some OCED reference implementations [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] also rely on OCEL-JSON
for serializing OCED data.
https://complex.wu.ac.at/nm/vidgof (M. Vidgof)
      </p>
      <p>© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).</p>
      <p>
        JSON serialization for XES has already been proposed by Narayana, Khalifa &amp; van der Aalst [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
but has not received its well-deserved attention and was only implemented as ProM plugin.
This paper presents Python implementation of JXES and showcases its benefits by comparing
it to a certified state-of-the-art XES implementation. It shows how JXES can be beneficial in
scenarios like event log storage, Streaming Process Mining and PM on IoT devices.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Background</title>
      <sec id="sec-3-1">
        <title>2.1. XES Standard</title>
        <p>
          XES standard 1849-2023[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] defines a format for storing event logs and event streams. A log
contains records of executions of a business process. A log consists of traces, each corresponding
to one case. Traces consist of atomic events. A log can also consist only of events without
traces, such log is called a stream. Information about a log, a trace or an event is stored in
attributes. XES standard allows attributes to be nested, however, leaves this feature optional for
implementations. Attributes can be elementary (integer, string, date and time, etc.) or composite
(lists). Global attributes describe attributes that are available for every event or trace in the
log. Classifiers make events comparable to each other and are represented by an ordered list of
attribute keys. XES supports extensions that allow to attach semantics to the described elements
and defines a set of standard extensions.
2.2. JXES
JXES is a JSON format of event log adhering to the XES principles [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. JXES uses XES meta-model
and defines JSON serialization for each of its components, allowing to store all information an
XES log can contain inside a JSON file. Java implementation exists in a form of ProM Plugin.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Implementation</title>
      <p>
        Python-JXES defines a set of tools to convert PM4Py [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] event logs and event streams to JSON
objects, store them as JXES (.jxes) files, as well as load them from the JXES files. It uses
PM4Py’s legacy log object for internal representation as it follows the XES structure more closely
in comparison to the new tabular format. It must also be noted that tools for bidirectional
conversion between the two formats (legacy and tabular) exist in PM4Py, thus it can be safely
said that all PM4Py log representations are supported.
      </p>
      <p>JSON ofers a smaller set of possible data formats than XML, thus time and ID attribute types
are represented as strings in JXES. Time is serialized and de-serialized according to ISO 86011
using a regular expression.</p>
      <p>
        While this implementation strives to be strictly conforming to the XES standard as defined in
Clause 8.1. and supports all features described in Section 2.1, some discrepancies exist. First, as
the standard only defines XML serialization, conforming log instances are also defined only
in terms of XML, making all other serialization formats technically non-conforming. Second,
log attribute xes.version is assigned string value "1849-2023", although it had to be of type
xs:decimal (i.e., float in JSON) as per standard Clause 5.1.2. Finally, as the original JXES
proposal included containers described in XES 2.0 standard [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], despite them being removed
from the 1849-2023 version of the standard, Python-JXES still supports them. However, this
feature can be made optional in future versions of the tool to ensure scrict conformance.
      </p>
      <p>The implementation and basic examples are publicly available on PyPi2 and GitHub3.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Evaluation</title>
      <p>
        In this section, the implementation will be evaluated. The file size in JXES will be compared
to standard XML serialization. Read and write speeds of python-jxes will be compared to a
state-of-the-art tool PM4Py [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which is certified by XES Working Group 4.
      </p>
      <p>The evaluation was performed the following way: first, an XES event log was read by PM4Py,
5 times for each importer variant5 except rustxes, then, this log was serialized back to XES
using PM4Py, also 5 times for each of the two exporter variants. Following that, the in-memory
log was serialized as JXES for 5 times. Finally, the JXES log was imported for 5 times. In order
to prevent caching, cache was flushed before each of the 5 iterations of every test. For every
combination of file, tool and variant, the median of the 5 runs was used. All evaluation results
can be found in a separate GitHub repository6.</p>
      <p>(a) XES logs smaller than 150 MB
(b) XES logs larger than 150 MB
4.1. Setup
All tests ran on a laptop with Intel®Core ™ i7-1260P CPU, 32 GB DDR4 RAM and Corsair
MP600 CORE XT NVMe SSD, running Ubuntu 22.04, Python 3.11 and PM4Py version 2.7.11.13.
2https://pypi.org/project/jxes/
3https://github.com/MaxVidgof/python-jxes
4https://www.tf-pm.org/resources/xes-standard/for-vendors/tool-support
5https://github.com/pm4py/pm4py-core/tree/release/pm4py/objects/log/importer/xes/variants
6https://github.com/MaxVidgof/python-jxes-evaluation
Open-source event logs such as BPI Challenge 2011-2013, 2015, 2017-2019, as well as Italian
helpdesk log and Road trafic fines log were user for evaluation.
4.2. Size</p>
      <sec id="sec-5-1">
        <title>4.3. Read speed</title>
      </sec>
      <sec id="sec-5-2">
        <title>4.4. Write speed</title>
        <p>Write benchmarks show similar results.
Serializing event logs in JXES is faster than
the best PM4Py XES serialization by at least
28%. Median write performance improvement
is 40%, and maximum improvement almost
reaches 47%.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Discussion</title>
      <p>
        Space savings by JXES are achieved because a lot less characters are needed. First,
JSON removes the necessity to specify data types as opposed to XML. Second,
there is no need for opening and closing tags in JSON. Because of this, an event
&lt;event&gt;&lt;string key="concept:name" value="example"/&gt;&lt;/event&gt; can be written as
{"concept:name":"example"}, leading to significant space reduction. This is important for at
least the following scenarios:
1. Storing large amounts of historical data. This might not seem critical for smaller
event logs ranging between tens of kilobytes and tens of megabytes in size. However, for
real-life logs reaching tens of gigabytes in size, this becomes relevant.
2. Data transfer. When logs have to be transferred, e.g., between the system collecting
them and the system analyzing them, data size is also crucial. Also important is the case
of Streaming Process Mining, where smaller sizes may also reduce latency because it will
take less time to transmit an event completely.
3. Streaming Process Mining. In previous work [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], JXES was suggested as the event
serialization format. This implementation now enables such approach.
4. Constrained devices and IoT. There are some approaches to perform Process Mining
directly on IoT devices. The small devices will obviously profit from a more lightweight
data format, which both takes less space to store and less time to import and export.
      </p>
      <p>It must be noted at this stage that binary serialization formats can ofer even more compression.
For instance, pickle7 requires even less space, e.g., 1.1 GB for pickle versus 1.3 GB for JXES
versus 1.9 GB XES for BPIC 2018. However, in contrast to both XES and JXES, pickle is a binary
format that is not readable for humans and only allows to share data between Python programs,
leaving out a significant fraction of the PM ecosystem. Similarly, file compression can reduce
the file sizes significantly, however, it might still be impractical for streaming or IoT scenarios
as (de-)compressing requires additional time and computational resources. In addition, the
compressed files also cannot be read without decompression. It is worth pointing out though,
that compressed JXES files are still about 10% smaller than compressed XES files containing the
same event log.</p>
      <p>While JXES shows significant improvements in read and write speeds, it must be noted that,
in comparison to XML, Python’s default implementation of JSON does not support line-by-line
reading, which could improve reading of large event logs or streams even further. However,
non-standard implementations for incremental JSON parsing do exist, and exploring them is
left for future work.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion</title>
      <p>XES is a standard XML-based serialization format for event logs. As JSON-based formats are
becoming increasingly adopted, JXES – a JSON serialization format for XES was proposed. This
paper presents Python implementation of JXES and evaluates it against state-of-the-art XES
7https://docs.python.org/3/library/pickle.html
tools for open-source event logs, showcasing significant improvements in storage requirements
and read and write performance.</p>
      <sec id="sec-7-1">
        <title>6.1. Future work</title>
        <p>Future work is aimed at evaluating JXES in scenarios where it can be most beneficial, such
as Streaming Process Mining and PM of IoT devices. Also evaluation in regular PM scenarios
would be beneficial. Finally, the implementation might receive new features such as incremental
JSON parsing.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>W. M. van der Aalst</surname>
          </string-name>
          ,
          <source>Process Mining: Data Science in Action, Second Edition</source>
          , Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Koren</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. N.</given-names>
            <surname>Adams</surname>
          </string-name>
          , G. Park,
          <string-name>
            <given-names>B.</given-names>
            <surname>Knopp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Graves</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rafiei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Liß</surname>
          </string-name>
          , L. T. genannt
          <string-name>
            <surname>Unterberg</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Zhang</surname>
            , C. T. Schwanen,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Pegoraro</surname>
            ,
            <given-names>W. M. P. van der Aalst</given-names>
          </string-name>
          ,
          <source>OCEL (objectcentric event log) 2</source>
          .0 specification,
          <source>CoRR abs/2403</source>
          .
          <year>01975</year>
          (
          <year>2024</year>
          ). URL: https://doi.org/10. 48550/arXiv.2403.
          <year>01975</year>
          . doi:
          <volume>10</volume>
          .48550/ARXIV.2403.
          <year>01975</year>
          . arXiv:
          <fpage>2403</fpage>
          .
          <year>01975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Calegari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Delgado</surname>
          </string-name>
          ,
          <article-title>A model-driven engineering perspective for the object-centric event data (OCED) metamodel</article-title>
          , in: J.
          <string-name>
            <surname>D. Weerdt</surname>
          </string-name>
          , L. Pufahl (Eds.),
          <string-name>
            <surname>Business Process Management Workshops - BPM 2023 International Workshops</surname>
          </string-name>
          , Utrecht,
          <source>The Netherlands, September 11-15</source>
          ,
          <year>2023</year>
          , Revised Selected Papers, volume
          <volume>492</volume>
          <source>of Lecture Notes in Business Information Processing</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>508</fpage>
          -
          <lpage>520</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -50974-2_
          <fpage>38</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -50974-2\_
          <fpage>38</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. B. S.</given-names>
            <surname>Narayana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Khalifa</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          ,
          <article-title>JXES: JSON support for the XES event log standard</article-title>
          , CoRR abs/
          <year>2009</year>
          .06363 (
          <year>2020</year>
          ). URL: https://arxiv.org/abs/
          <year>2009</year>
          .06363. arXiv:
          <year>2009</year>
          .06363.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>[5] Ieee standard for extensible event stream (xes) for achieving interoperability in event logs and event streams</article-title>
          , IEEE Std 1849
          <article-title>-2023 (Revision of IEEE Std 1849-</article-title>
          <year>2016</year>
          ) (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>55</lpage>
          . doi:
          <volume>10</volume>
          .1109/IEEESTD.
          <year>2023</year>
          .
          <volume>10267858</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Berti</surname>
          </string-name>
          ,
          <string-name>
            <surname>S. J. van Zelst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schuster</surname>
          </string-name>
          ,
          <article-title>Pm4py: A process mining library for python</article-title>
          ,
          <source>Softw. Impacts</source>
          <volume>17</volume>
          (
          <year>2023</year>
          )
          <article-title>100556</article-title>
          . URL: https://doi.org/10.1016/j.simpa.
          <year>2023</year>
          .
          <volume>100556</volume>
          . doi:
          <volume>10</volume>
          .1016/J. SIMPA.
          <year>2023</year>
          .
          <volume>100556</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H. M. W.</given-names>
            <surname>Verbeek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C. A. M.</given-names>
            <surname>Buijs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. F. van Dongen</given-names>
            ,
            <surname>W. M. P. van der Aalst</surname>
          </string-name>
          , Xes, xesame,
          <source>and prom 6</source>
          , in: P.
          <string-name>
            <surname>Sofer</surname>
          </string-name>
          , E. Proper (Eds.),
          <source>Information Systems Evolution - CAiSE Forum</source>
          <year>2010</year>
          , Hammamet, Tunisia, June 7-9,
          <year>2010</year>
          , Selected Extended Papers, volume
          <volume>72</volume>
          <source>of Lecture Notes in Business Information Processing</source>
          , Springer,
          <year>2010</year>
          , pp.
          <fpage>60</fpage>
          -
          <lpage>75</lpage>
          . URL: https: //doi.org/10.1007/978-3-
          <fpage>642</fpage>
          -17722-
          <issue>4</issue>
          _5. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -17722-4\_5.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidgof</surname>
          </string-name>
          ,
          <article-title>Towards process mining on kafka event streams (short paper)</article-title>
          , in: S. Böhm, D. Lübke (Eds.),
          <source>Proceedings of the 16th ZEUS Workshop</source>
          , Ulm, Germany,
          <source>February 29- March 1</source>
          ,
          <year>2024</year>
          , volume
          <volume>3673</volume>
          <source>of CEUR Workshop Proceedings, CEUR-WS.org</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3673</volume>
          /paper1.pdf.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>