<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>H. Mustroph);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ELoader: A Web Application for Event Log Selection and Preparation for Neural Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Henryk Mustroph</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michel Kunkler</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefanie Rinderle-Ma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technical University of Munich, TUM School of Computation, Information and Technology</institution>
          ,
          <addr-line>Garching</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>One essential step in building reproducible and comparable results in deep learning, while also supporting custom neural network designs, is unified data selection and preparation. For neural network-based process monitoring applications, where the underlying data are primarily event logs, only a few Python libraries support unified event-log selection and preparation. However, there is no practical tool that provides these functionalities in an intuitive manner. Therefore, we present ELoader, a prototype web application that enables users to select an event log, prepare it, and download the resulting training, validation, and test sets as Python pickle files bundled in a .zip package for direct use in neural networks.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Event Log</kwd>
        <kwd>Data Selection and Preparation</kwd>
        <kwd>Neural Network</kwd>
        <kwd>Web Application</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Value
Tool name
Current version
Legal code license
Languages (libraries) used
Web App URL
Source code repository
Screencast video</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        In deep learning applications, reproducibility of results and fair comparison can be ensured
only when the same data are used and the data preparation process is consistent. This also
applies to process monitoring tasks, such as predictive process monitoring (PPM), where custom
neural networks (NNs) trained on event logs are compared against each other [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Therefore, [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
recognize that one of the future research goals of PPM is to focus not only on standard evaluation
metrics but also on standardized event log selection and preparation. In this context, data
selection refers to using the same event logs across comparable approaches. Data preparation is
the umbrella term for preprocessing (e.g., cleaning, feature extraction, normalization), encoding
(e.g., numerical and categorical), and splitting into training, validation, and test sets. Consistent
data preparation is essential for evaluating an NN’s performance across diferent approaches.
For example, while most PPM approaches rely on common event logs, such as those from the
Business Process Intelligence Challenges (BPICs), some still use other logs, including private or
non-shareable ones [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. This variety increases bias and complicates fair comparison across
methods [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Existing work has addressed unifying event log selection and preparation in
process monitoring by developing Python libraries or extensions. One example is VERONA [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
a Python library designed to support reproducibility and comparability in NN–based process
monitoring, which provides users with event log selection and preprocessing functionalities.
Additionally, there is a Scikit-learn library extension for process mining [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. However, while
Python libraries ofer flexibility and customization for data preparation, they require time to
read the documentation and careful environment setup to ensure function calls work correctly.
In contrast, practical tools such as web applications hide this complexity and require less efort
to set up. Nirdizati [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a web-based toolkit for PPM that lets users upload, prepare, and
directly train machine learning models on event logs. It targets practitioners building low-code
prediction models and dashboards for decision making, rather than selecting open-source event
logs and downloading the prepared datasets for custom NN input.
      </p>
      <p>At present, there is no practical tool that enables easy event log selection and preparation.
Such a tool would allow custom NNs to be implemented, trained, and tested without the need to
develop a dedicated data preparation pipeline, and it would also support the use of standardized
datasets for comparing diferent process monitoring approaches, such as in PPM. This work
presents ELoader, a web application prototype that simplifies the preparation of open-source
event logs for training NNs. Users can select an event log and define various preprocessing
parameters, e.g., which event attributes should be encoded, or which event attributes should
additionally be obtained via feature engineering. Thereafter, the event log is encoded and split
into a training, validation, and test set. The user then receives a single .zip package that
contains the training, validation, and test sets, each stored as a pickled PyTorch dataset. The
tool is intended for practitioners and researchers who aim to build custom or reimplement
existing NNs for process monitoring applications without the need to implement their own
data preparation pipeline.</p>
    </sec>
    <sec id="sec-3">
      <title>2. ELoader Web Application</title>
      <p>This section starts with the functionalities of ELoader, followed by its architecture,
implementation details, design, and a description of the user interface.</p>
      <sec id="sec-3-1">
        <title>2.1. Functionality</title>
        <p>
          The functionality of ELoader is split into event log selection, preparation, and splitting.
Selection: The application is linked to a directory containing all event logs for data selection.
We aim to provide a web application that converts openly accessible and widely used event logs
for NN-based process monitoring tasks, thereby simplifying and unifying event log selection.
Preparation: For data preparation, we use the same procedure as described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We apply
feature engineering on the timestamp value by introducing a case elapsed time attribute,
representing the time elapsed since the first event in the case, an event elapsed time
attribute, representing the time since the last event within the same case (with the value set to 0
for the first event), a day of the week attribute, and a time of day attribute. The latter two
features are incorporated due to the potential influence of periodic trends on the future course
of a process. For example, in a company that operates only on weekdays, when an activity is
completed on Friday evening, the next activity is unlikely to occur before Monday. Then, we
apply standard scaling to all continuous event attributes, except for the raw timestamp, and
encode missing values as 0. Following [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], we also apply input padding to facilitate batch training.
Each case is padded with zeros at the beginning to a fixed length, the so-called window size,
determined by the maximum case length in the event log, excluding the top 1.5% of the longest
cases plus the minimum sufix size. For every categorical event attribute with K unique category
classes, we add an additional NA (not available) class for missing values and an unknown class.
For the event label attribute we added an end of sequence token (EOS) category class. We then
apply for each categorical event attribute index encoding. The user can specify a minimum
sufix size (i.e., the length of the target event sequence). Each case is then transformed into a
list of prefix–sufix samples stored as one concatenated tensor list. Starting with a prefix of
length one and increasing until the prefix length reaches case length. The corresponding sufix
consists of the remaining events of the case, followed by EOS tokens as needed to ensure it is at
least the minimum sufix size. The sufix length decreases from case length − 1 down to 0, with
EOS tokens used to pad the sufix if the actual number of remaining events is smaller than the
minimum sufix size. For all other event attributes, categorical (other than the event label) and
continuous, the values from the last valid event in the sufix are copied forward.
        </p>
        <p>
          Suppose a case consists of the events [, , , ], and the minimum sufix size is set to 2.
The resulting prefix-sufix samples are: prefix [] with sufix [, , ], prefix [, ] with sufix
[, ], prefix [, , ] with sufix [, ], and prefix [, , , ] with sufix [, ].
This type of case encoding can be used for full-sequence training (i.e., sufix prediction) as in
[
          <xref ref-type="bibr" rid="ref3 ref7">7, 3</xref>
          ], as well as for next-activity training (i.e., next-activity prediction) as in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
Split: The data are then split into training, validation, and test sets according to user-defined
percentages, ensuring a random yet balanced distribution. When the same event log and split
percentages are used, the resulting sets remain identical.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.2. Architecture and Implementation</title>
        <p>Figure 1 depicts the architecture of the ELoader web application. Users access the user
interface (UI), which is implemented in the frontend using the React.js JavaScript library and
Material UI. Communication between the frontend and backend is handled via a REST API. The
backend is implemented in Python and is organized into two packages: the Communication
package, which manages communication with the frontend using the FastAPI library, and
the EL_Functionality package, which handles all event log preparation functionality
using PyTorch. Each .zip package returned by the frontend contains a train, validation and
test set as a Python pickle file 1 (.pkl). Pickle files are encoded byte streams of Python
objects 2) and can be easily stored, read, and decoded using just a few lines of Python code.
1https://docs.python.org/3/library/pickle.html, accessed on 2025-09-23
2https://docs.pytorch.org/docs/stable/tensors.html, accessed on 2025-09-23</p>
        <p>Each set is represented as an object of our custom EventLogDataset class, which provides
three main attributes: all_categories, encoder_decoder, and tensor_list. The all_categories
attribute stores a tuple of two lists with the same structure, one for categorical event
attributes and one for continuous event attributes. Each list contains multiple tuples, one
per event attribute. Each tuple consists of: A string with the attribute name, an integer
specifying the number of labels (for continuous attributes this is always 1), and, a
dictionary mapping each category class to its corresponding index ID (for continuous attributes
the dictionary is empty). The encoder_decoder attribute contains configuration information
such as the window size, minimum sufix size, and the parameters for standardization and
de-standardization of continuous event attributes which can be accessed, for example, via
dataset.encoder_decoder.continuous_encoders[’case_elapsed_time’] for the
case elapsed time. The tensor_list attribute contains a tuple of three elements: the
ifrst holds the tensors for categorical event attribute values, the second holds the tensors
for continuous event attribute values, and the third stores the case IDs corresponding to
those tensors. Each event attribute has its own tensor. Each tensor is a matrix with shape,
number of samples × window size, while the list of case IDs has length equal to the number of
samples.</p>
        <p>The encoded data or the decoded plain data can both be accessed by calling the
EventLogDataset class that is serialized in the pickle files. An example is provided in the main.py that is
included in every .zip package.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Design and User Interface</title>
        <p>Figure 2 shows the main page of ELoader. In the first step, the user can select an event log from
the list to be prepared. Once an event log is chosen, the boxes for step two, data preparation,
open. The first box displays all individual event log–specific case, event label, timestamp, date
format, and time feature attribute names. These values are required for the data preparation
functionalities and cannot be changed by the user. The second box contains all custom input
ifelds, which are pre-filled with default values but can be modified by the user. Here, the user
can set the validation set size and the test set size (values between 0 and 1). Additionally, the
user can choose the minimum sufix size (a value between 1 and 10). Finally, the user can select
which categorical and continuous event attributes should be included in the train validation
and test sets. Here only valid event attribute names are allowed. When the user has filled in all
data preparation fields, the Start Encoding button can be pressed to begin the preprocessing,
encoding, and splitting. When the process is finished, the training, validation, and test sets are
downloaded in a .zip package.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Discussion and Future Work</title>
      <p>
        We tested the functionality of ELoader on multiple datasets, including several BPIC event logs, the
Helpdesk and the Sepsis event log. As expected, the computation time and resource consumption
for data preparation increase with the log size, which also afects the size of the resulting .zip
package. Furthermore, we used the underlying ELoader data preparation functionality in our
ICPM 2025 paper Probabilistic Sufix Prediction of Business Processes , where it was applied to
three diferent models, our own sufix prediction model [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the next-activity/sufix prediction
model of [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and the remaining-time prediction model of [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. These experiments demonstrate
that ELoader is suficiently generic to be applied to various PPM tasks. However, ELoader is not
yet complete, and several additional functionalities are planned for future development, such as
temporal data splitting as described in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Moreover, we intend to gather further requirements
from practitioners, researchers, and potential users during the ICPM demonstration to guide the
extension and improvement of the tool. Nevertheless, the first prototype already enables easy
and eficient event log selection, preparation, and loading for direct input into NNs for PPM.
This supports more reproducible and comparable evaluations of diferent NN–based process
monitoring methods.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, we used ChatGPT in order to: Grammar and spelling
check, paraphrase, and reword. After using this tool, we reviewed and edited the content as
needed and take full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ceravolo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Comuzzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. De Weerdt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Di Francescomarino</surname>
            ,
            <given-names>F. M.</given-names>
          </string-name>
          <string-name>
            <surname>Maggi</surname>
          </string-name>
          ,
          <article-title>Predictive process monitoring: concepts, challenges, and future research directions</article-title>
          ,
          <source>Process Science</source>
          <volume>1</volume>
          (
          <year>2024</year>
          )
          <article-title>2</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehdiyev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Majlatow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fettke</surname>
          </string-name>
          ,
          <article-title>Augmenting post-hoc explanations for predictive process monitoring with uncertainty quantification via conformalized monte carlo dropout</article-title>
          ,
          <source>Data Knowl. Eng</source>
          .
          <volume>156</volume>
          (
          <year>2025</year>
          )
          <article-title>102402</article-title>
          . doi:
          <volume>10</volume>
          .1016/J.DATAK.
          <year>2024</year>
          .
          <volume>102402</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>B.</given-names>
            <surname>Wuyts</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>K. L. M. vanden Broucke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Weerdt</surname>
          </string-name>
          ,
          <article-title>Sutran: an encoder-decoder transformer for full-context-aware sufix prediction of business processes</article-title>
          ,
          <source>in: ICPM</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>17</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICPM63005.
          <year>2024</year>
          .
          <volume>10680671</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Gamallo-Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Rama-Maneiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Lama, VERONA: A python library for benchmarking deep learning in business process monitoring</article-title>
          ,
          <source>SoftwareX</source>
          <volume>26</volume>
          (
          <year>2024</year>
          )
          <article-title>101734</article-title>
          . doi:
          <volume>10</volume>
          .1016/J.SOFTX.
          <year>2024</year>
          .
          <volume>101734</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. S.</given-names>
            <surname>Oyamada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Tavares</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Junior</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ceravolo</surname>
          </string-name>
          ,
          <article-title>A scikit-learn extension dedicated to process mining purposes</article-title>
          ,
          <source>in: Demonstration Track CoopIS</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>15</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3552</volume>
          /paper-3.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Rizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Francescomarino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Maggi</surname>
          </string-name>
          ,
          <article-title>Nirdizati: an advanced predictive process monitoring toolkit</article-title>
          ,
          <source>J. Intell. Inf. Syst</source>
          .
          <volume>63</volume>
          (
          <year>2025</year>
          )
          <fpage>259</fpage>
          -
          <lpage>291</lpage>
          . doi:
          <volume>10</volume>
          .1007/ S10844-024-00890-9.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Mustroph</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kunkler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rinderle-Ma</surname>
          </string-name>
          ,
          <article-title>An uncertainty-aware ED-LSTM for probabilistic sufix prediction</article-title>
          ,
          <source>CoRR abs/2505</source>
          .21339 (
          <year>2025</year>
          ). doi:
          <volume>10</volume>
          .48550/ARXIV.2505.21339.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Camargo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. G.</given-names>
            <surname>Rojas</surname>
          </string-name>
          ,
          <article-title>Learning accurate LSTM models of business processes</article-title>
          ,
          <source>in: BPM</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>286</fpage>
          -
          <lpage>302</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -26619-6_
          <fpage>19</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Weytjens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Weerdt</surname>
          </string-name>
          ,
          <article-title>Learning uncertainty with artificial neural networks for predictive process monitoring</article-title>
          ,
          <source>Appl. Soft Comput</source>
          .
          <volume>125</volume>
          (
          <year>2022</year>
          )
          <article-title>109134</article-title>
          . doi:
          <volume>10</volume>
          .1016/J. ASOC.
          <year>2022</year>
          .
          <volume>109134</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Weytjens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Weerdt</surname>
          </string-name>
          ,
          <article-title>Creating unbiased public benchmark datasets with data leakage prevention for predictive process monitoring</article-title>
          ,
          <source>in: BPM Workshops</source>
          , volume
          <volume>436</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>29</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -94343-
          <issue>1</issue>
          _
          <fpage>2</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>