=Paper=
{{Paper
|id=None
|storemode=property
|title=MLP Networks Applied to the Problem of Prediction of Runtime of SAP BW Queries
|pdfUrl=https://ceur-ws.org/Vol-960/poster1.pdf
|volume=Vol-960
}}
==MLP Networks Applied to the Problem of Prediction of Runtime of SAP BW Queries==
<pdf width="1500px">https://ceur-ws.org/Vol-960/poster1.pdf</pdf>
<pre>
          MLP Networks Applied to the Problem of Prediction of
                    Runtime of SAP BW Queries
                      Tatiana Escovedo1, Tarsila Tavares2, Rubens Melo3, Marley M.B.R. Vellasco4


Abstract.1 The SAP BW is a BI tool used daily by about 8000                       their query in advance, in order to decide the best schedule for
employees of a big oil company in Brazil, running monthly about                   execution of the query. This article aims to investigate the
150,000 queries to assist in the analysis inherent in their                       application of neural networks in the problem of prediction of the
professional activities. A query is created to meet a need for                    performance of BW queries along the day. Section 2 presents a
specific business analysis and its response time is directly affected
                                                                                  summary on Neural Networks, the tools Weka and SAP BW.
by the use of BW server by other users. The main problem today is
that there is no way to estimate the execution time in advance, for               Section 3, in turn, explains the problem in study and section 4
the user to decide the best time to execute the query he needs for                describes the proposed solution. The following section 5, will
his work. This article proposes a solution to this problem by                     present the results and section 6 will evaluate them, pointing out
developing a prediction classification model for the performance of               some possible future work. Finally, section 7 concludes this work.
BW queries at certain times of the day using Multilayer Perceptron
(MLP) neural network and the Weka tool [WEKA, 2011].
                                                                                  2        BASIC CONCEPTS
1           INTRODUCTION                                                          The purpose of this section is to present briefly the basic concepts
                                                                                  related to neural networks and the tools Weka and SAP BW used
The constant changes in the market have prompted the need for                     in this work. These concepts will provide the theoretical
more timely and accurate integrated information from different                    background to the issues raised in this study.
departments in order to accelerate the process of decision making
in the organizations. An analysis of historical integrated data can
provide indicators of business growth or danger. This fact also                   2.1      Neural Networks
prompted the emergence of data warehouse technology.                              Artificial neural networks are used in the area of intelligent
   Data Warehousing is not a product but a strategy that recognizes               systems and computational intelligence [Jang et al, 1997]
the need to store data in separate information systems and                        [ZADECH, 1992]. A neural network is a parallel and distributed
consolidate them in order to support various professionals of a                   system composed of simple processing units. Its goal is to store
company in making decisions quickly and effectively. A data                       experimental knowledge and make it available for use. Artificial
warehouse is a subject oriented, integrated, time variant and                     neural networks are similar to the human brain, because knowledge
nonvolatile collection of data, which aims to support the process of              is acquired through a learning process and the strength of
decision making [Inmon, 1992].                                                    connections between neurons, or synaptic weights are used to store
   To meet its needs for analytical information, the company uses                 the acquired knowledge [Haykin, 1999]. The Multilayer Perceptron
the SAP BW since 2004. Through so-called BW queries, users can                    (MLP) is a type of neural network widely used for its ease of
perform various analysis of business information. Data extracted                  implementation and for being considered a universal approximator
from many sources is integrated and stored in the data warehouse                  [Hornik et al, 1989]. MLP networks have powerful computing
implemented with the BW, and then accessed through queries                        power due to the insertion of intermediate layers that enable the
which give useful information for the daily work of thousands of                  solution of non-linearly separable problems. Thus, MLP networks
users. However, a significant problem is that many queries take                   have at least three layers: the input layer, the intermediate or
considerable time to perform, because they work with a large                      hidden layer and the output layer. A network with one hidden layer
volume of data and also because they dispute server time with                     can implement any continuous function, with two intermediate
other users. Currently, users have no way to estimate the runtime of              layers, can approximate any mathematical function [CIBENKO,
                                                                                  1989].
1, 2, 3
     Department of Electrical Engineering - Pontifical Catholic University
  of Rio de Janeiro (PUC-Rio), Brazil
4
  Department of Computer Science - Pontifical Catholic University of Rio          2.2      Weka
  de Janeiro (PUC-Rio), Brazil
                                                                                  Weka [WEKA, 2011] is a free software for data mining. Its
e-mail: 1 tatiana@inf.puc-rio.br, 2tarsilabello@gmail.com,                        implemented features and techniques are described in detail in
3
  rubens@inf.puc-rio.br, 4 marley@ele.puc-rio.br
                                                                                  [Witten & Frank, 2005], the implementers of the tool. One of the
                                                                                  uses of the tool is the extraction of classifiers in databases. A


                                                                             48
classifier (or classification model) is used to identify the class to
which a specific observation in a database belongs from its
characteristics (attributes). This tool is used in this work to classify
BW queries into categories according to the scheduled time of
execution.


2.3       SAP BW
SAP BW is part of the solution of SAP Business Intelligence
software; whose main purpose is to store in a single location (data
warehouse) separate from the transactional environment, data from
different sources, facilitating the provision of analytical
information for users in an integrated and uniform way. Because it
                                                                                                  Figure 2.2. Executing a BW Query
is integrated with SAP ERP R/3 it does not need interface files,
which provides highly reliable transfer and maintenance of the                     This section presented briefly the basic concepts related to
quality of the information. The tool encourages the generation of               neural networks and the tools Weka and SAP BW, which are the
queries (queries) by the users, by means of user-friendly tools on              basics necessary to follow the issues raised in this article. The next
the web and / or Excel [McDonald et al, 2006]. In this                          section presents the real problem addressed in this work.
environment, information is stored in a structured manner to
facilitate queries and analysis, thus supporting the decision making
and management. The BW software is complementary to the R/3.                    3         PROBLEM DESCRITION
While the R/3 is configured to optimally run transactions from day
                                                                                The SAP BW BI tool, as previously mentioned, is part of the day to
to day business (eg buy, sell, manufacturing) BW is optimally
                                                                                day work of about 8000 users of the company, using queries to
configured to allow analysis (eg how are my indicators going, how
                                                                                assist in the analysis inherent in their professional activities.
good are the sales of a product group to a customer, how is the
                                                                                Monthly, about 150 thousand executions of more than 8000
stock of a particular product evolving, etc). The information can
                                                                                business user queries in the production environment are recorded.
be extracted from SAP BW by creating queries using the Query
                                                                                    A query is created to meet a specific need for business analysis,
Designer tool, which is also part of the suite SAP BW [PALEKAR
                                                                                and can contain several input filters and several rows and columns
et al, 2010]. This tool has a simple interface and can be easily used
                                                                                that can be customized according to user navigation. Thus, some
by the business users to build queries to support their analysis.
                                                                                queries work with millions of records and take a long time to run.
Figure 2.1 shows the construction of a query with the tool query
                                                                                Besides relying on adequate filters, the query response time is also
designer.
                                                                                directly affected by the concurrent use of the BW server by other
                                                                                users. Thus, the response time of a single query using the same
                                                                                filters can be very different for two runs in different times.
                                                                                    The main problem today is that there is no way to estimate the
                                                                                execution time in advance, so that the user can decide whether or
                                                                                not to execute the query in the moment. Without this information,
                                                                                many users start a query and when they check that it is taking too
                                                                                long, they stop it. However, the query may continue to run on the
                                                                                server, overloading the machine unnecessarily.
                                                                                    If before executing the query the user had its forecast of
                                                                                execution time, he could decide whether to run it now or postpone
                                                                                it to a time that has faster response avoiding unnecessarily burden
                                                                                for the server with an application that will not be completed.
                                                                                    The execution time of all the queries from BW for each of the
                                                                                users is recorded by the BW Statistics BW SAP software another
                                                                                tool that is also part of the SAP suite. We intend to use this
                                                                                historical basis for predicting the execution time of queries based
                   Figure 2.1. Creating a BW Query                              on previous runs. Valuable information is available from the BW
                                                                                Statistics, such as the user who executed the query, the total
    After the creation of the queries either by business users or by            execution time, date and time of execution and number of lines
IT staff of the company they are published and made available to                returned.
the users so that they can extract the necessary data for their                     There are two main benefits expected from implementing a
analyses. The execution of queries can be performed by the                      solution to predict the execution time of queries from BW. First,
Business Explorer (BEx), a tool that is also part of the suite SAP              users knowing in advance the predicted performance of a particular
BW. This tool is integrated with Microsoft Excel, the environment               query may avoid its unproductive execution and the indefinite
in which most business users are already familiar. Alternatively,               awaiting for the information he or she needs. Second, by possibly
the query may be executed through the web interface. Figure 2.2                 not executing a query classified as long for the moment, the user
illustrates the execution of a query using the web interface.                   will relieve the server from an unproductive burden.


                                                                           49
4         SOLUTION DESCRITPION                                                Finally, the column information "TIMEALL" represents the query
                                                                              execution time in minutes. This information is rounded to integer
To execute a BW query, the user enters the desired input filters.             and then mapped into categories 0-4 numerical classification
Depending on the existing filters in the query, the same query may            representing the query, as follows:
be specified for different periods of time. For example, for a period            0: Very fast - up to 2 minutes
of one month or one complete year. Therefore, the filters play an                1: Fast - 2 to 5 minutes
important role in determining the size of the result set of the query.           2: Medium - 5 to 10 minutes
Hence, to minimize the problem of different running times of the                 3: Slow - 10 to 20 minutes
same query caused by the difference of input filters which may                   4: Very slow - more than 20 minutes
imply in different sizes of the result, we have chosen for this work             In addition to these conversions, the database passed through a
only queries that are always executed using the same filter. Thus,            cleanup that eliminated inconsistent values, eg, negative execution
differences in execution time are caused only by the occupation of            time. No incomplete information was found in the database. The
the server, not by higher or lower amount of records returned by              resulting database, consisting of 10,893 records, was divided into
the query. Thus, the queries chosen for this study were:                      training set (2/3 or 7262 records) and test set (1/3 or 3631 records),
• Query 1: P_5_CO_P_COOM07M_00014 - Gross                                     both chosen randomly. Then the training and testing files for Weka
     Administrative Expenditure                                               were created, as detailed in the next subsection.
•     Query 2: P_5_PS_P_PSCO02M_0029 – CAPEX
•     Query 3: P_5_MM_P_MMPU14M_0002 - Total Purchasing                       4.2       Weka Configurations and Execution of the
   The following subsections will detail the process of selecting                             Simulation
variables, handling of the historical database, setting adequate
                                                                              The training and testing files were generated from the database as
configurations of Weka and execution of simulations. The database
                                                                              shown in Figure 4.2. We used the Multilayer Perceptron (MLP)
used was extracted from BW Statistics for the years 2010 and                  network topology and the BackPropagation (BP) learning
2011.                                                                         algorithm.

4.1       Variables choice and the database
The BW Statistics tool provides valuable information regarding the
execution of queries. For this study, the information considered
relevant in the first analysis is illustrated by Figure 4.1.


                                                                                               Figure 4.2. Creation of Files for Weka

                                                                                 Various configurations were tried in Weka in order to achieve
                                                                              the best classification rate in the samples. Figure 4.3 illustrates a
                                                                              neural network generated by Weka.
                                                                                 We used nine different settings, illustrated by Table 4.1:
                                                                                 I: Without normalization of the input attributes
           Figure 4.1. Relevant Information of BW Statistics
                                                                                 II: With normalization of the input attributes and varying the
                                                                              neurons in the hidden layer as follows:
   The column "Query" refers to the technical name of the query.
                                                                                 a) 4 neurons
As only three types of queries were used in this study, this field has
                                                                                 b) 5 neurons
been mapped to identifiers 1,2 and 3, representing the query used.
                                                                                 c) 6 neurons
The column "User" provides information on which user performed
                                                                                 III: With normalization of the input attributes and varying the
the query. As we consider that queries are executed the same way,
                                                                              number of training epochs as follows:
regardless of the user, we do not consider this information relevant
                                                                                 a) 1 epoch
to this study. The information "0TCTUTIME" represents the time
                                                                                 b) 100 epochs
of execution of the query. This information is mapped into classes
                                                                                 c) 1000 epochs
0-23 representing the hour of the day when the query was
                                                                                 IV: With normalization of the input attributes and using a
performed. The column "0CALDAY" represents the date of
                                                                              validation set. V: With normalization of input attributes and binary
execution of the query. As we consider the day of the week as the
                                                                              encoding of input attributes.
most relevant information instead of a specific date, this
                                                                                 V: With normalization of input attributes and binary encoding
information was mapped into categories 0-5, where 0 represents
                                                                              of input attributes.
Saturday or Sunday and the other values from Monday to Friday.


                                                                         50
                                                                                 There is a strong indication that for this particular problem and
                                                                             considering the data used, the number of training epochs was a
                                                                             decisive factor in the results, since as we reduce this number
                                                                             (settings IIIa and IIIb), the results become far worse. The variation
                                                                             in the number of processors in the hidden layer can also have some
                                                                             correlation, because when we start to increase this number
                                                                             (configurations IIb and IIb), the results begin to deteriorate,
                                                                             indicating that four processors in the hidden layer represent a good
                                                                             number for this problem.
                                                                                 The use of a validation set (configuration IV), which is usually a
                                                                             good practice, gave much worse results. In contrast, we expected a
                                                                             worse outcome when we do not use the normalization of the input
                                                                             attributes (configuration I), but the results were very similar (and
           Figure 4.3. Neural Network generated by WEKA                      even slightly better) than most of the settings that used the
                                                                             normalization.
                   Table 4.1. Weka Configurations

                                                                             6          EVALUATING THE SOLUTION
                                                                             After performing the simulations and analysis of the results some
                                                                             suggestions of future work as well as some open questions aiming
                                                                             at improvements were found. These points are presented below.


                                                                             6.1        Treatment of the Database
   This section described the proposed solution to the problem of
                                                                             Examining the training base, after the simulations, we found that it
predicting the execution time of BW queries. The next section                showed some discrepancies that were not previously treated. We
presents the results.
                                                                             cite as an example, the number of occurrences of the second query
                                                                             (61% of training base), which is much larger than the number of
5        RESULTS                                                             occurrences of the query 1 and 3, as illustrated in Figure 6.1.

After the execution of nine simulations as detailed in the previous
section, different results were obtained. Table 5.1 summarizes
these results obtained in different simulations. The results were
compared using the following criteria (all in %):
   Correct classification
   Incorrect classification
   Mean absolute error MAE
   Root mean squared error RMSE
   Relative absolute error RAE
   Root relative squared error RRSE
   As we can see in Table 5.1, the configuration that showed the
best results was the configuration IIIc. This configuration used                   Figure 6.1. Distribution of Training Base per Query (1, 2 and 3)
normalization of input attributes, four neurons in the hidden layer
and 1000 epochs of training. It did not use the validation set and              We also observed by Figure 6.2 a discrepancy in the class of
nor binary encoding of the input attributes. In contrast, the setting        runtime: class 0 (very fast) represents 68% of the training base. In
that showed the worst results was the setting IIIa. This                     further work, minimization of such differences should be sought by
configuration also used normalization of input attributes and four           means of homogenization techniques of the database.
neurons in the hidden layer, but used only one training epoch, it
also did not use neither the validation set and nor the binary
encoding of the input attributes.

                    Table 5.1. Summary of Results


                                                                                     Figure 6.2. Distribution of Training Base per Class (0 to 4)


                                                                        51
                                                                               7         CONCLUSION
   Another point observed from the analysis of the training base is
the largest concentration of execution of queries between 11h30                This paper investigated the application of neural networks in the
and 0h (UTC), as illustrated in Figure 6.3. In future work, the use            problem of prediction of execution time of BW queries in a big
of hour intervals instead of the hour variable may possibly give               organization where this represents a huge problem. Section 2
                                                                               presented a summary on Neural Networks, the tool Weka and SAP
more accurate results.
                                                                               BW. Section 3, in turn, detailed the problem in question and
                                                                               section 4 describes the proposed solution. Next, Section 5
                                                                               presented the results and section 6 evaluated them, pointing out
                                                                               some possible future work.
                                                                                   The classification model utilized in this work was built using
                                                                               MLP networks and the BackPropagation algorithm. The setting
                                                                               that showed the best results used normalization of input attributes,
                                                                               four neurons in the hidden layer and 1000 epochs of training. It
                                                                               used neither validation set nor binary encoding of the input
                                                                               attributes. This configuration classified correctly 94.3% of the
                                                                               samples in the expected classes and proved to be the best
                                                                               configuration of this research.
                                                                                   Finally it must be pointed out that the implementation of such a
                                                                               mechanism to predict the execution time of BW queries is
 Figure 6.3. Distribution of Training Base per Execution Time (0 to 23)        extremely useful to the company, not only to provide greater
                                                                               visibility to the users of the expected time of their queries, but also
                                                                               to avoid burdening the server with queries that are interrupted by
6.2       Use of other queries
                                                                               the users because they are taking longer than expected. Thus, we
The aim of this paper is to investigate the application of neural              intend to deepen this research using the suggestions for future work
networks in the problem of predicting the response time of BW                  outlined above.
queries. Only three types of queries were used in this study. New
studies are required to confirm the results obtained, using a wider
variety of queries. Moreover, as mentioned earlier, the                        REFERENCES
parameterization of queries implies in different user input filters
                                                                               [INMON, 1992] Inmon, W. H., Building the Data Warehouse. 1st Edition.
and, depending on the values used, the same type of query can
                                                                               Wiley and Sons, 1992.
return a very different number of records from one execution to
another, influencing directly in its running time. Therefore, in               [HAYKIN, 1999] Haykin, S., Redes neurais - Princípios e prática, 2a
future, we should not only use a larger number of queries, but also            edição, Editora Bookman, 1999.
study a way to handle the customization of the filter input queries.
One possible approach is to tackle the filter issue mentioned in               [JANG ET AL, 1997] Jang, J.S.R, Sun, C.T., Mizutani, E., Neuro-fuzzy and
section 4. The idea is to perform the classification of a query into           soft computing: a computacional approach to learning and machine
different categories considering also the possibilities of filters and         intelligence. Prentice-Hall, Upper Sadler River, New Jersey, USA, 1997.
record for each execution, which filters were used and what were
                                                                               [ZADEH, 1992] Zadeh, L., Fuzzy logic, neural networks and soft
their values.                                                                  computing. Proceedings of the 2nd International Conference on Fuzzy
                                                                               Logic and Neural Networks, Izuka, Japan, PP. 13-14, 1992.

6.3       Integration of another system with Weka                              [CYBENKO, 1989] Cybenko, G., Approximations by superpositions of
                                                                               sigmoidal functions. Math. Control, Signals, Systems, 2:303-314, 1989.
Another interesting future work and extremely useful for the
company would be the integration of the estimate generated by                  [HORNIK et al, 1989] Hornik, K., Stinchcombe, M., White, H., Multilayer
Weka with a system that the user could use to check the predicted              Feedforward Networks are Universal Approximators, Neural Networks,
time of execution of a query "now" versus a future time (eg, within            Vol. 2, pp. 359-366, 1989.
15 min, within 1 hour, in 4 hours). Thus, the user could decide the
best time to execute the query and reduce the occupation of the                [WITTEN & FRANK, 2005] WITTEN, I. H. e FRANK, E., Data Mining:
server with unproductive and inconsequent queries.                             Practical Machine Learning Tools and Techniques, Morgan Kaufmann
                                                                               Publishers, 2nd edition, 2005.

                                                                               [WEKA,         2011]        Site       da       ferramenta      Weka:
6.4       Other methods                                                        http://www.cs.waikato.ac.nz/ml/Weka/. Acessado em dezembro de 2011.
For future research, we intend to evaluate other methods for
                                                                               [McDONALD et al, 2006] McDonald; Wilmsmeier, Dixon, Inmon,
solving the problem of forecasting performance of BW queries, for              Mastering the SAP Business Information Warehouse, Second Edition ed.
example, fuzzy logic and neuro-fuzzy models. The idea is, after its            Wiley, 2006.
implementation, to compare the results in order to raise issues not
yet perceived for the problem at hand.                                         [PALEKAR et al, 2010] Amol Palekar, Bharat Patel, and Shreekant
   This section presented some suggestions for future work within              Shiralkar, A Practical Guide to SAP Netweaver Business Warehouse 7.0.
the same theme explored in this article. The following section                 SAP PRESS, 2010.
concludes this paper.


                                                                          52

</pre>