=Paper= {{Paper |id=Vol-3299/Paper13 |storemode=property |title=Interactive Process Identification and Selection from SAP ERP (Extended Abstract) |pdfUrl=https://ceur-ws.org/Vol-3299/Paper13.pdf |volume=Vol-3299 |authors=Julian Weber,Gyunam Park,Majid Rafiei,Wil van der Aalst |dblpUrl=https://dblp.org/rec/conf/icpm/WeberPRA22 }} ==Interactive Process Identification and Selection from SAP ERP (Extended Abstract)== https://ceur-ws.org/Vol-3299/Paper13.pdf
Interactive Process Identification and Selection from
SAP ERP (Extended Abstract)
Julian Weber, Alessandro Berti1,2,* , Gyunam Park1 , Majid Rafiei1 and
Wil van der Aalst1,2
1
    Process and Data Science Group @ RWTH Aachen, Aachen, Germany
2
    Fraunhofer Institute of Technology (FIT), Sankt Augustin, Germany


                                         Abstract
                                         SAP ERP is one of the most popular information systems supporting various organizational processes,
                                         e.g., O2C and P2P. However, the amount of processes and data contained in SAP ERP is enormous. Thus,
                                         the identification of the processes that are contained in a specific SAP instance, and the creation of a
                                         list of related tables is a significant challenge. Eventually, one needs to extract an event log for process
                                         mining purposes from SAP ERP. This demo paper shows the tool Interactive SAP Explorer that tackles the
                                         process identification and selection problem by encoding the relational structure of SAP ERP in a labeled
                                         property graph. Our approach allows asking complex process-related queries along with advanced
                                         representations of the relational structure.

                                         Keywords
                                         ETL, SAP, Object-Centric Process Mining




1. Introduction
Process mining is a branch of data science that provides methods for the analysis of event data
recorded by information systems such as ERP and CRM systems. An essential step for such
analyses is extracting an event log from the information systems. Moreover, this is one of the
most time-consuming steps in most process mining projects. Thus, a successful extraction is a
key to any process mining initiative.
   The SAP ERP system is a popular choice for managing business processes such as order-to-
cash (O2C, management of orders from the customers) and procure-to-pay (P2P, management
of orders to the suppliers). Despite its widespread adoption, the extraction of event logs from
an SAP ERP system remains ad-hoc. Commercial vendors such as Celonis provide extraction
tools, focusing on the most common processes, e.g., O2C and P2P. However, other processes
are under less attention, such as inventory management, financial planning, accounting, and
production control processes, leading to few process mining projects in such processes.
   It is challenging to extract event logs from SAP ERP systems due to their complexity, e.g., a
typical SAP system contains 800,000 tables with tons of relationships. This inevitably requires
ICPM 2022 Doctoral Consortium and Tool Demonstration Track
*
 Corresponding author.
$ julian.weber1@rwth-aachen.de (J. Weber); a.berti@pads.rwth-aachen.de (A. Berti);
gnpark@pads.rwth-aachen.de (G. Park); majid.rafiei@pads.rwth-aachen.de (M. Rafiei);
wvdaalst@pads.rwth-aachen.de (W. v. d. Aalst)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)




                                                                                                          61
domain knowledge from the process experts of the organization. To this end, the expert needs
to 1) identify the process to analyze, 2) select the relevant tables containing relevant data of the
process, and 3) design query statements, e.g., using SQL.
   In this demo paper, we present the tool Interactive SAP Explorer to support the domain expert
for the first two initial steps, i.e., process identification and tables selection. Given a user input of
a core document class in the organization, the tool identifies the most relevant process and the
underlying tables. For instance, if the input by the user is purchase order document, then the
most relevant process is the P2P process, and the underlying tables are as follows: EBAN for a
purchase requisitions, EKKO for purchase orders, EKBE for goods/invoice receipts, RBKP/RSEG
for invoice processing, BKPF/BSEG for payments, and CDHDR/CDPOS for changes in documents.
   The tool first encodes the relational structure of SAP in a labeled property graph inserted
inside a graph database. Then, a web interface is provided that permits the exploration of the
relational structure of the SAP instance, the identification of the most important processes,
and the creation of a list of tables for extraction. The list of tables is eventually provided to
another component of the tool which has been previously introduced in [1] which creates an
object-centric event log from such a list of tables. The tool improves the prototype proposed in
[1] with better performance, customization, and exploration possibilities, particularly in the
process identification and selection phases.
   The rest of this extended abstract is organized as follows. Section 2 describes the functioning
of the extractor. Section 3 points to the availability of the tool. Section 4 discusses the maturity
of the tool. Eventually, Section 5 concludes the paper.


2. Innovations and Features
This section explains 1) process identification and selection, which is the main contribution of
this paper, and 2) process extraction, which uses the output of this paper to produce event logs.
First, the process identification and selection is implemented as follows:

    • The elements of the relational structure of SAP that are important for the definition of a
      set of classes related to a given process are imported inside a graph database (Neo4J).
          – A graph database permits a faster exploration of the neighboring entities to a given
            concept because the connections are referenced inside the node object.
          – The chosen graph database (Neo4J) provides efficient implementations of layout
            algorithms, which can be executed on a significant amount of nodes/edges to provide
            an understandable graphical representation of the relational structure in SAP.
    • Then, the identification process can be started. The first step is to identify a document type
      of interest (for example, the purchase orders and sales orders). This is directly connected,
      in the relational structure of SAP, to a set of tables (purchase orders are connected to the
      tables EKKO, EKPO, EKPA, EKET, EKKN ).
    • The next step is expanding the aforementioned set of tables. Starting from the initial set
      of tables, we identify the tables connected to the initial tables via the relational structure.
      The union of these tables contains the set of events regarding a process in SAP. For
      example, by expanding the tables related to the purchase orders document type, we get




                                                   62
      a set of tables including purchase requisitions (EBAN ), goods/invoice receipts (EKBE),
      accounting documents (BKPF), and other tables containing the events of the P2P process
      in SAP.

  The process extraction component, which uses the approach described in [1], aims to extract
an object-centric event log out of the SAP system based on the relevant tables identified in the
previous step. There is no need to specify any SQL query.

    • A pre-processing step is performed to restrict the extraction to the desired configuration.
    • The extraction of the object-centric event log is performed, with an output following the
      OCEL specification http://www.ocel-standard.org/.


3. Availability of the Application
The source codes of the different components of the tool are available in the following reposito-
ries:

    • Layer of web services that can be run on IIS: this component can be downloaded at
      https://github.com/Javert899/interactive-extractor-from-sap-main/tree/main/
      Backend-C%23/SAPExtractorAPI.
    • Angular web application: this component can be downloaded at
      https://github.com/Javert899/interactive-extractor-from-sap-main/tree/main/Frontend/
      InteractiveSAPExtractor.
    • Python web services for the extraction of the object-centric event log: this component can be
      downloaded at
      https://github.com/Javert899/sap-extractor.

Note that there is a dependency on non-open source UI components which need to be licensed to
a single user. Therefore, the application is not directly runnable from the aforementioned source
repositories. Also, the extractor requires the availability of an SAP ECC instance supported by
the Oracle database and the installation of the Neo4J graph database, which is released under
a proprietary license. The authors can provide access to the compiled version of the project
under request. A videocast of the application is provided at the address https://www.youtube.
com/watch?v=Wi2xuUS0YSY.


4. Maturity
The existing version of the tool can connect only to an SAP ECC instance supported by the Oracle
database. Despite this being a popular option, this limits the possibility to apply the extractor
in a generic setting. The extractor needs different components to run. This is architecturally
complicated and, therefore, highly dependent on the functioning of existing queries/connectors
on different versions of the software.
   Our extractor overcomes the following limitations of existing SAP extractors; they are process-
specific, they rely on traditional event logs, and suffer from convergence/divergence issues.




                                                63
However, there are remaining limitations, including the fairly basic definition of activity/times-
tamp concepts. The choice of the graph database to navigate the relational structure of SAP is
advantageous in terms of performance. After the selection of a set of tables, the extraction of
an object-centric event log is left to the Python component, which executes many SQL queries
to load the information needed in memory. Therefore, the extractor is limited by the amount of
memory of the client.
   The challenges are on both the theoretical and practical side. Theoretically, the selection of
the activity concept is still a challenge. Practically, supporting different editions of SAP with
different underlying databases, and an in-memory approach to compose the object-centric event
log are still open challenges.


5. Conclusion
This demo paper presents an interactive extractor of object-centric event logs from SAP ERP,
which is composed by two components: process identification and selection (novelty of this
paper) and process extraction (using [1]). While the tool’s code is open-source, it relies on some
components released with a proprietary license. Section 4 discusses some limitations of the tool
with the current architecture, which compromises its applicability in an enterprise setting
  .


6. Acknowledgments
We thank the Alexander von Humboldt (AvH) Stiftung for supporting our research. Funded by
the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s
Excellence Strategy–EXC-2023 Internet of Production – 390621612.


References
[1] A. Berti, G. Park, M. Rafiei, W. M. P. van der Aalst, An event data extraction approach from
    SAP ERP for process mining, in: J. Munoz-Gama, X. Lu (Eds.), Process Mining Workshops -
    ICPM 2021 International Workshops, Eindhoven, The Netherlands, October 31 - November 4,
    2021, Revised Selected Papers, volume 433 of Lecture Notes in Business Information Processing,
    Springer, 2021, pp. 255–267. doi:10.1007/978-3-030-98581-3\_19.




                                               64