<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Retrograde Process Analysis in Running Legacy Applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marius Breitmayer</string-name>
          <email>marius.breitmayer@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lisa Arnold</string-name>
          <email>lisa.arnold@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Manfred Reichert</string-name>
          <email>manfred.reichert@uni-ulm.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Databases and Information Systems, Ulm University</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <fpage>11</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Process mining algorithms are highly dependent on the existence and quality of event logs. In many cases, however, software systems (e.g., legacy systems) do not leverage workflow engines capable of producing high-quality event logs for process mining algorithms. As a result, the application of process mining algorithms is drastically hampered for such legacy systems. The generation of suitable event data from running legacy software systems, therefore, would foster approaches such as process mining, data-based process documentation, and process-oriented software migration of legacy systems. This paper discusses the need for dedicated event log generation approaches in this context.</p>
      </abstract>
      <kwd-group>
        <kwd>legacy systems</kwd>
        <kwd>process mining</kwd>
        <kwd>code analysis</kwd>
        <kwd>event log</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Software applications are implemented to address the needs of users, use cases,
and business processes. However, the majority of common software systems (e.g.,
legacy systems or individual software solutions) have not been designed with the
goal to provide high-quality process-related event logs that allow for
comprehensive process analyses and visualizations with modern process mining tools.
Relevant questions emerging in legacy software modernization projects include,
for example, how the process implemented by the legacy software system is
structured (Process Discovery ) or to what extent its execution deviates from a
predefined to-be process (Conformance Checking ). Currently, there exist three
basic approaches to obtain process models:
1. Log analysis uses existing logs (e.g., event logs) to reconstruct the
implemented process based on audit or workflow data. Consequently, the quality
of the resulting process model is directly correlated with both the existence
and quality of corresponding event logs [
        <xref ref-type="bibr" rid="ref2 ref3">2,3</xref>
        ]. However, a vast majority of
individual applications and legacy systems are often unable to provide
appropriate event logs. Moreover, even database-centric applications typically
do not provide transaction-level audit data. Consequently, there has been no
effective entry point for process mining yet.
2. Interviews may be conducted to discover the desired process model as
perceived by key users and process owners [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Additionally, data models
may be parsed to identify effects of processes on corresponding data.
Analyzing such data models enables assumptions on the underlying processes.
      </p>
      <p>
        This approach, however, is very time consuming and paved with both
misunderstandings and misconceptions. In addition, interviews do not ensure
completeness of the relevant processes and their various aspects, as they
often neglect exceptions or specific process perspectives (e.g., data, time).
3. Pattern recognition attempts to identify typical process patterns in
various data pools using algorithms from the field of artificial intelligence [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
The algorithms require a deep analysis and learning phase prior to their
application to the raw data. This is a time-consuming, cost-intensive, and
fuzzy approach, which is therefore hardly pursued.
      </p>
      <p>In the context of legacy systems, however, none of the presented approaches is
easily applicable. All three approaches have in common that the business
processes (and event logs), implemented by the legacy software systems, need to be
represented accurately. Since most individual software solutions do not
necessarily use process engines capable of delivering suitable process data, alternative
approaches are required. One approach to tackle this challenge is, to observe
process participants during process execution and to record their interactions
with the software system resulting in a fine-grained documentation.</p>
      <p>Section 2 describes the proposed solution approach. Section 3 discusses
related work. Finally, Section 4 provides a summary and outlook.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Solution Approach</title>
      <p>A human-centered business process can be defined as a sequence of user
interactions with a software application, where each interaction is subject-bound (i.e.,
part of the same transaction). In legacy systems, such processes can be initiated
and terminated by suitable actions (e.g., pre-defined key combinations or menu
items). Adding such actions to an event stream with the associated application
object (e.g., an order identified by its unique order number), subsequently,
process mining tools will have process related event logs as input. The collected
event data may then constitute the basis for a plethora of use cases, such as
process documentation, process mining, and process-oriented cost estimations
for modernizing legacy software systems (i.e., software migration). We aim to
create different logging variants for existing legacy production systems:
1. Dedicated recording documents existing processes by assigning related
program components. Users may determine the start and end of the recording
using predefined key combinations, thus precisely delimiting all activities
that constitute the recorded process (or the considered process part).
2. Silent recording tracks the entire usage of the application from the first
login until closing the application. A decision can be made as to whether
this should be done for all sessions or only for selected user sessions (e.g.,
only sessions of users from a certain department). Furthermore, it may be
configured, which information should be stored (e.g., to ensure compliance
with data protection requirements).
To minimize the performance effects of these recording on running applications,
we rely on existing logging mechanisms of the application infrastructure.</p>
      <p>
        For Oracle applications using a WebLogic Server, for example, Oracle
Diagnostic Logging (ODL) offers extensive possibilities to manage application
information via the administration console. Among others, oracle logger classes (e.g.,
Application Development Framework ) may use this information through ODL
handlers [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In Single Page Applications (e.g., the Oracle JavaScript Extension
Toolkit JET), the primary object is known, however, the context between
multiple process steps may get lost due to the loose coupling of user sessions and
services. Even applications based on Oracles Forms allow adding appropriate
message calls for each PL/SQL unit.
      </p>
      <p>Using existing system logging functionality, the recording quality is
significantly increased compared to purely mining the data model, as user interactions
can be unambiguously linked to the process, program code, and associated data.</p>
      <p>Fig. 1 depicts the approach. In a first step we identify relevant objects using
information from the database and the source code of the application. However,
especially in databases of legacy systems, assumptions such as good
normalization or even the existence of foreign key constraints are often not applicable.
The reason for this is that in many cases the logic is represented in the source
code of applications rather than the database. By combining knowledge from
the database (e.g., create, read, update, and delete -operations) and
corresponding source code (e.g., code fragments corresponding to such operations), we are
able to tackle this issue. After having identified process-relevant objects in both
source code and database, we correlate them and add code tracking capabilities
to the legacy system using, for example, the possibilities mentioned previously.
This does then enable the generation of event logs from either dedicated or silent
recording. These event logs may then be used during analysis.</p>
      <p>
        When analyzing event logs generated from such legacy systems, a valuable
effect can be achieved that the three approaches described in Section 1 are unable
to provide: If certain entries in the event stream are missing when comparing the
event stream with the source code, this indicates that the process steps involved,
although implemented and present, have never been used. This information is
essential when removing technical debts and modernizing legacy systems [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Process Visualization</p>
      <p>Meta Data
Data Synchonization</p>
      <p>Repository</p>
      <p>Data</p>
      <p>Cube
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>
        This work is related to the research areas process mining, event log generation,
and code analysis. Process mining [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides techniques to discover business
process models from event logs [
        <xref ref-type="bibr" rid="ref12 ref16">16,12</xref>
        ], to evaluate conformance between process
event logs and models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], and to enhance processes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Existing process
discovery approaches mainly focus on the control flow perspective while the data
perspective is mostly neglected [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. The latter is of particular interest for
meaningful process analysis and improvements (e.g., legacy system migration to new
software architectures).
      </p>
      <p>
        Event log generation is concerned with the generation of event log based on
various sources. In [
        <xref ref-type="bibr" rid="ref11 ref4">11,4</xref>
        ], approaches to record user activities based on desktop
actions (e.g., for robotic process automation) are presented. Our approach is
also able to correlate such desktop actions with the corresponding source code
fragments and database operations, allowing for a more detailed event log
generation. The case study presented in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] discusses the generation of event logs
from a real-world data warehouse of a large U.S. health system. While some
challenges (e.g., correlating events) may also arise in the context of legacy
systems, we plan to minimize required domain expert interviews by automatically
extracting domain knowledge from the source code.
      </p>
      <p>
        Code analysis comprises traditional analysis (e.g., style checking or data flow
analysis [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]) and profiling (e.g., CEGAR [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and BMC [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]) which, combined with
process knowledge, yield great potential for software improvement and migration.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Outlook</title>
      <p>
        This paper emphasizes the need for spending research efforts on the recording
of high quality event data in legacy systems. This not only enables the
application of existing process mining algorithms, but also additional use cases such
as, for example, data-driven process documentation, facilitation software
migration projects or cost reduction through process-driven development. Note that
corresponding work is also relevant in the context of robotic process automation
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>Acknowledgments This work is part of the SoftProc project, funded by the
KMU Innovativ Program of the Federal Ministry of Education and Research,
Germany (F.No. 01IS20027A)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          :
          <article-title>Process discovery: Capturing the invisible</article-title>
          .
          <source>IEEE Computational Intelligence Magazine</source>
          <volume>5</volume>
          (
          <issue>1</issue>
          ),
          <fpage>28</fpage>
          -
          <lpage>41</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          : Process Mining: Data Science in Action. Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>van der Aalst</surname>
            ,
            <given-names>W.M.P.</given-names>
          </string-name>
          , et al.:
          <article-title>Process mining manifesto</article-title>
          .
          <source>In: Int'l Conf on BPM'11</source>
          . pp.
          <fpage>169</fpage>
          -
          <lpage>194</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Agostinelli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marrella</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mecella</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Automated generation of executable rpa scripts from user interface logs</article-title>
          .
          <source>In: Business Process Management: Blockchain and Robotic Process Automation Forum</source>
          . pp.
          <fpage>116</fpage>
          -
          <lpage>131</lpage>
          . Springer International Publishing (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Biere</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimatti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>E.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strichman</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Bounded model checking</article-title>
          . Carnegie Mellon University (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Carmona</surname>
            , J., van Dongen,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weidlich</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Conformance Checking. Springer (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Clarke</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grumberg</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jha</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Veith</surname>
          </string-name>
          , H.:
          <article-title>Counterexample-guided abstraction refinement</article-title>
          . In: Emerson,
          <string-name>
            <given-names>E.A.</given-names>
            ,
            <surname>Sistla</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.P</surname>
          </string-name>
          . (eds.) Computer Aided Verification. pp.
          <fpage>154</fpage>
          -
          <lpage>169</lpage>
          . Springer (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Cunningham</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>The wycash portfolio management system</article-title>
          .
          <source>SIGPLAN OOPS Mess</source>
          .
          <volume>4</volume>
          (
          <issue>2</issue>
          ) (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dumas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosa</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mendling</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reijers</surname>
            ,
            <given-names>H.A.</given-names>
          </string-name>
          :
          <source>Fundamentals of Business Process Management</source>
          . Springer, 2nd edn. (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Khedker</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanyal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Karkare</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Data Flow Analysis: Theory and Practice</article-title>
          . CRC Press, Inc., USA, 1st edn. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Linn</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimmermann</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Werth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Desktop activity mining - a new level of detail in mining business processes</article-title>
          .
          <source>In: Workshops der INFORMATIK 2018 - Architekturen</source>
          , Prozesse, Sicherheit und Nachhaltigkeit. pp.
          <fpage>245</fpage>
          -
          <lpage>258</lpage>
          . Köllen Druck+Verlag GmbH (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Peña</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bayona-Oré</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Process mining and automatic process discovery</article-title>
          .
          <source>In: 2018 7th International Conference On Software Process Improvement (CIMPS)</source>
          .
          <source>IEEE</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Reichert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Process and data: Two sides of the same coin?</article-title>
          <source>In: 20th Int'l Conf on Cooperative Information Systems (CoopIS'12)</source>
          . pp.
          <fpage>2</fpage>
          -
          <lpage>19</lpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Remy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pufahl</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sachs</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Böttinger</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weske</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Event log generation in a health system: A case study</article-title>
          .
          <source>In: Business Process Management</source>
          . pp.
          <fpage>505</fpage>
          -
          <lpage>522</lpage>
          . Springer International Publishing (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Vesterli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Oracle ADF Survival Guide. Apress</surname>
          </string-name>
          , Berkeley, CA,
          <year>1st</year>
          edn. (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Weerdt</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Backer</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vanthienen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baesens</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A multi-dimensional quality assessment of state-of-the-art process discovery algorithms using real-life event logs</article-title>
          .
          <source>Inf Sys</source>
          <volume>37</volume>
          (
          <issue>7</issue>
          ),
          <fpage>654</fpage>
          -
          <lpage>676</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Wewerka</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reichert</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Robotic process automation - a systematic mapping study and classification framework</article-title>
          .
          <source>Enterprise Information Systems</source>
          (
          <year>2022</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>