<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>OpyenXES: A Complete Python Library for the eXtensible Event Stream Standard</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hernan Valdivieso</string-name>
          <email>hfvaldivieso@uc.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wai Lam Jonathan Lee</string-name>
          <email>walee@uc.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jorge Munoz-Gama</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marcos Sepulveda</string-name>
          <email>marcos@ing.puc.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Department, School of Engineering Ponti cia Universidad Catolica de Chile</institution>
          ,
          <addr-line>Santiago</addr-line>
          ,
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Signi cance to the BPM Field</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>There has been a spectacular growth in the availability of event data. To transport, store and exchange these event data, the eXtensible Event Stream (XES) has become the acknowledged standardization due to its simplicity, exibility, extensibility and expressivity. Currently, the OpenXES library exists as the popular open-source Java implementation of the XES standard. However, despite the gaining popularity of Python as the core programming language for data science, there has yet to exist a complete and open-source implementation of the standard in the language. This paper presents OpyenXES as a complete and opensource implementation of the XES standard in Python. This opens up the rich portfolio of Python packages for data science to researchers and practitioners in the eld of business process management.</p>
      </abstract>
      <kwd-group>
        <kwd>event log</kwd>
        <kwd>XES</kwd>
        <kwd>Python</kwd>
        <kwd>open-source</kwd>
        <kwd>data science</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        XES standard is such an important issue that one of the recent achievements of
the XES working group is the approval of the XES certi cation proposal. With
this, software can be certi ed for the level of support for the XES standard. At
the same time, the popularity of Python as a programming language for data
science has been rising in the recent years [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This has been due to the open
source nature of the community, the collection of packages oriented towards data
science, e.g., NumPy, SciPy, pandas, and matplotlib, and its high productivity
for prototyping and building small and reusable systems. In fact, the code
implementation of many recent research papers are in Python for precisely these
reason. However, despite the large collection of useful tools for data science in
the Python environment, this adoption trend is largely unseen in the eld of
business process management. We believe one of the key reasons is the lack of
support in handling event data in the XES format. While there have been some
previous attempts to ll this gap, e.g., simple Python scripts that support parts
of the XES standard, or complex suites that are di cult to be used in
standalone prototyping, there has yet to be a full implementation of the standard. In
R, there are similar support for XES les with the BupaR packages [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However,
it does not support the full XES standard.
      </p>
      <p>In this work we make a serious step towards that direction by presenting
OpyenXES, a complete open-source Python library for the XES standard. The
remainder of the paper is structured as follows: Section 2 presents the library
and the principles behind it. Section 3 illustrates the applicability of the library
with a set of di erent examples. Section 4 presents the links to access the library,
the source-code, and the documentation. Finally, Section 5 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>OpyenXES: The Library</title>
      <p>OpyenXES is a Python library to support the creation, analysis, and storing
of event data satisfying the XES standard. OpyenXES was created with the
following principles in mind:
B Python: OpyenXES was implemented in Python for Python. It was designed
to be programmed as any Python script, using typical Python syntax, e.g.,
iterate the traces of a log using `for', or add a new event using `append'.
Moreover, it is registered under the Python Package Index (PyPI) so that it
is pip installable.</p>
      <p>
        B Complete: Unlike other Python scripts for XES, OpyenXES includes all the
elements de ned in the XES standard [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], such as `classi ers' and
`extensions'. Figure 1 shows the overview of the main components of the library.
Software using OpyenXES should have the same XES certi cation level as
current software using OpenXES.
      </p>
      <p>
        B Open: OpyenXES was created following the Open Science Principle, like
other related tools such as OpenXES or ProM. Therefore, both the library
and the source code are publicly available (cf. Section 4). This opens the
door to the data science community to fork, correct, and improve the library
e.g., propose more e cient DB-based implementations for the storage and
processing of the XES logs [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        B OpenXES Compatible: The state-of-the-art implementation for XES is the
OpenXES open-source Java library [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In order to ease the transition from
one library to another (and the other way around), OpyenXES has preserved
both the naming and the package structure of the more mature (cf. Figure
1). A researcher familiar with one library would be able to implement
programs using the other library with little e ort. Notice that, the name itself
(OpyenXES) is an homage paid to and its creators and maintainers for their
sel ess work.
      </p>
      <p>B Combinable: One of the main strength of Python is its data science and
machine learning libraries (NumPy, SciPy, pandas, matplotlib, jupyter,
scikit learn, . . . ). OpyenXES was designed to be combined with such libraries
in order to realize its potential as a data science tool.
The following set of examples illustrate the maturity, usefulness, and
applicability of OpyenXES for process-oriented data science and its combination with
other popular Python libraries. Due to space limitations, the source codes of all
the examples are available in https://github.com/opyenxes/OpyenXes/tree/
master/example.</p>
      <p>B Log Anonymization: An event log is iterated through to remove any
information about the resources involved in the cases and the events.
B CSV to XES : Event data stored in a Comma Separated Value le (.csv) is
converted into a XES event log le, including the date transformation into
a XES compatible format.
B Random Log : A log is created from scratch using random values.
B Filter Variants : A new event log is generated by only preserving one of the
traces from all the traces with the same sequences of activities, i.e., only the
di erent variants of the original event log.</p>
      <p>
        B Alpha Algorithm: The classic process mining discovery algorithm Alpha
Algorithm [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is easily implemented in Python using OpyenXES to generate
the footprints.
      </p>
      <p>B Reporting Statistics : An event log is iterated through to obtain statistics
about the number of activities performed by each resource, and the results
are graphically plotted using matplotlib library.</p>
      <p>B Trace Clustering : OpyenXES is combined with the popular machine
learning Python library sci-kit learn to clustering the traces using the k-means
algorithm.</p>
      <p>These examples only tease in an instructive way the full potential of
OpyenXES and Python for process-oriented data science, opening the door to more
complex operations such as: k-anonymity, tool-speci c format transformation,
Python speci c libraries for CSV or JSON processing, event log simulation,
domain-speci c lters, other process mining algorithms, or the combination with
well known data science python libraries.
The library is available as a GitHub public repository under the organization
opyenxes (https://github.com/opyenxes) opening the door to be forked,
extended, or improved by the community. The source code is also available in
the repository. The library is completely documented in http://opyenxes.
readthedocs.io/en/latest/?badge=latest (cf. Figure 2). A screencast
illustrating the main features of OpyenXES is available in www.processmininguc.
com/tools. Finally, examples showcasing the use of the library are available in
https://github.com/opyenxes/OpyenXes/tree/2018-bpm-demo/example.
5</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusions</title>
      <p>This paper presented OpyenXES, a Python library for the eXtensible Event
Stream (XES), a format used for the interchange of event log data between tools
and application domains. The library includes all the elements of the standard
(such as classi ers or extensions), it is open-source and open to the community,
and it respects the same naming and package architecture than the
state-of-theart Java library, easing the transition from one library to the other. We believe
that OpyenXES could trigger the interest of a new set of data scientists for
research in BPM and process mining.</p>
      <p>Acknowledgments. This work is partially supported by CONICYT-PCHA /
Doctorado Nacional / 2017-21170612, Vicerrector a de Investigacion de la
Ponti cia Universidad Catolica de Chile / Concurso Investigacion Pregrado 2017,
and by the Departamento de Ciencias de la Computacion UC /
Fond-DCC2017-0001. The authors would like to thank the members of the IEEE Task
Force on Process Mining XES Working Group and the creators and mantainers
of OpenXES Java Library for their sel ess e orts.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. OpenXes, http://www.xes-standard.org/openxes/start</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>XES</given-names>
            <surname>Web</surname>
          </string-name>
          , http://www.xes-standard.org
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>3. IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams</article-title>
          .
          <source>IEEE Std</source>
          <year>1849</year>
          -2016 pp.
          <volume>1</volume>
          {
          <issue>50</issue>
          (Nov
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          : Process Mining - Data Science in Action. Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Janssenswillen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Depaire</surname>
          </string-name>
          , B.:
          <article-title>bupar: Business process analysis in R. In: Proceedings of the BPM Demo Track and BPM Dissertation Award co-located with (BPM 2017)</article-title>
          .
          <article-title>(</article-title>
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Puget</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>What Language is Best for Machine Learning and Data Science (</article-title>
          <year>2016</year>
          ), https://www.ibm.com/developerworks/community/blogs/jfp/ entry/What_Language_Is_Best_For_Machine_Learning_And_Data_Science
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Syamsiyah</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>van Dongen</surname>
            ,
            <given-names>B.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>DB-XES: enabling process discovery in the large</article-title>
          .
          <source>In: SIMPDA. CEUR Workshop Proceedings</source>
          , vol.
          <volume>1757</volume>
          , pp.
          <volume>63</volume>
          {
          <fpage>77</fpage>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Verbeek</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Buijs</surname>
            ,
            <given-names>J.C.A.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van Dongen</surname>
            ,
            <given-names>B.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Prom 6: The process mining toolkit</article-title>
          .
          <source>In: Proceedings of the Business Process Management 2010 Demonstration Track</source>
          , Hoboken, NJ, USA, September
          <volume>14</volume>
          -
          <issue>16</issue>
          ,
          <year>2010</year>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>