=Paper= {{Paper |id=Vol-1355/paper3 |storemode=property |title=Business Rule Mining from Spreadsheets |pdfUrl=https://ceur-ws.org/Vol-1355/paper5.pdf |volume=Vol-1355 |dblpUrl=https://dblp.org/rec/conf/icse/Roy15 }} ==Business Rule Mining from Spreadsheets== https://ceur-ws.org/Vol-1355/paper5.pdf
                  Business Rule Mining from Spreadsheets
                                                                     Sohon Roy
                                                   Dept. of Software & Computer Technology
                                                         Delft University of Technology
                                                               Delft, Netherlands
                                                               S.Roy-1@tudelft.nl


   Abstract—Business rules represent the knowledge that guides               oriented software [1, 2]; but we want to apply the technique on
the operations of a business organization. They are implemented              spreadsheets. The potential benefits of that are as follows.
in software applications used by organizations, and the activity of            1) High Level Analysis of Spreadsheets – Extracting
extracting them from software is known as business rule mining.
                                                                             business rules enables generation of documentation for
It has various purposes amongst which migration and generating
documentation are the most common. However, apart from                       spreadsheets at a higher abstraction level than the spreadsheets
conventional software, organizations also use spreadsheets for a             themselves. This facilitates the following:
large part of their operations and decision-making activities.                    a) Comprehension – It becomes easier for end-users, who
Therefore we believe that spreadsheets are also rich in business             are typically not programmers, to understand the structure and
rules. We thus propose to develop an automated system for                    operation of large and complex spreadsheets helping them
extracting business rules from spreadsheets in a human
                                                                             efficiently work with or modify such spreadsheets with
comprehensible natural language format. This position paper
describes our motivation, the problem description, related work,
                                                                             reduced errors and mistakes.
and challenges we foresee.                                                       b) Comparison – Comparing spreadsheets becomes
   Index Terms—End-user computing, Business rule mining,                     possible in order to estimate whether they implement same or
Spreadsheets, Knowledge mining.                                              similar functionalities, or even are identical behavior-wise
                                                                             only differing in data values. The latter cannot be done for
              I. INTRODUCTION & MOTIVATION                                   example by an application that compares spreadsheets in data
   In her book author B. Halle writes that according to the                  and formula level.
Business Rules Group 1 a business rule is “a statement that                       c) Validation – Organizations using set of well-formed
defines or constrains some aspect of the business. It is                     and pre-laid business rules can validate whether the
intended to assert business structure or to control or influence             spreadsheets created by their employees accurately implement
the behavior of the business” [1]. Thus business rules are                   those rules or if there are errors in the logical level.
rules that unambiguously determine the actions or results                       2) Understanding of Organizational Business Rationale –
necessary for desirable operation of a business. Therefore in                Some organization may not have their business strategies well
the context of software applications, it can be stated that                  laid out in business rule format; yet vital business knowledge
business rules are what that hold the knowledge [1] that is
                                                                             of experts working in the company is hidden in spreadsheets.
implemented in the form of programming instructions;
                                                                             Extracting this knowledge would help to form a clear picture
whether be it a conditional statement like IF-THEN-ELSE or
                                                                             of how that organization works and its structure.
an expression like AREA=3.14*(RADIUS)^2. Thus for most
                                                                                3) Support for Migration – IT architects need to understand
practical purposes, business rule mining from software
applications is essentially the mining of knowledge. Apart                   the business logic when migrating functionalities and
from conventional software, all types of organizations also                  computations implemented in spreadsheets into conventional
depend heavily on the use of spreadsheets [3, 4]. Due to their               software. Furthermore business analysts need to ensure that
wide use in all levels of company operations, the domain                     the IT architects understood it correctly. This can be achieved
knowledge that gets inculcated in spreadsheets is too valuable               through knowledge extraction and an automated process
a resource to be left untapped [5]. Therefore we want to                     would largely help in this regard.
facilitate the extraction of business knowledge from                            4) Safe Re-use and Replication of Spreadsheets – Often
spreadsheets through a process of automated business rule                    spreadsheets are created on ad-hoc basis by experts in an
mining. Business rule mining is an activity that is also invoked             organization to implement their unique strategies for certain
during migration of legacy software systems into systems that                scenarios. Over time such spreadsheets grow in size and
are considered modern like SOA, modular software, or object                  complexity and are used by several employees for similar
                                                                             scenarios but with different data sets. Invariably the users are
1                                                                            forced to employ the method of copy-paste to replicate the
  An independent organization formerly part of the users group Guidance of
                                                                             original spreadsheet and customize it according to their needs
Users of Integrated Data-Processing Equipment (GUIDE) of IBM corporation,
acknowledged as pioneers of the business rule approach                       by manipulating data and formula. However this process is
www.businessrulesgroup.org
extremely error-prone [6]. It is probably safer to re-generate                          IV. RELATED WORK
spreadsheets from scratch using the blueprint or structure of          Mittermeir et al. proposed an approach for finding high
the original spreadsheet instead of copy-pasting. Automated        level structures in spreadsheets through logical and semantic
business rule extraction can facilitate such blueprint formation   classification of cells [7]. Abraham et al. worked on header and
and thus make replications of spreadsheets safer.                  unit inference where units imply values or cell contents and the
                                                                   headers are column headers or the labels [8]. Chatvichienchai
                  II. GOAL AND APPROACH                            proposed a method for meta-data extraction from spreadsheets
    Our goal is to devise an algorithm and subsequently an         [9] where meta-data are the various labels and also the data that
application that will automatically extract business rules from    are analogous to primary keys of databases. These works are
spreadsheets. Based on the successful implementation of such       generally oriented towards the purpose of error reduction in
an application our research questions will be as follows.          spreadsheets and are not motivated from the business rule
    RQ1: How accurate the automatically extracted                  standpoint. Hermans et al. developed a method for extracting
business rules will be as compared to those extracted              class diagrams from spreadsheets [10]. Our business rule
manually by domain experts and spreadsheet users?                  extraction algorithm will draw its foundation from the class
    RQ2: How efficient is the automatic extraction process         diagram extraction algorithm and improve upon its limitations.
compared to manually extracting business rules from
spreadsheets?                                                                          V. CONCLUDING REMARKS
    Towards answering these research questions, we will                To summarize, this paper proposes an application for
employ user-studies and controlled experiments, in which we        business rule mining from spreadsheets and the research
will compare the results of automatic and manual extraction of     questions RQ1 and RQ2. Such an application will facilitate
business rules from spreadsheets.                                  high level analysis of spreadsheets, understanding of
                                                                   organizational business strategies, support for migration, and
                   III. PROBLEM ILLUSTRATION                       better re-use of spreadsheets. However, due to their inherent
                                                                   flexibility, spreadsheets do not impose any fixed structural
                                                                   uniformity with regards to layout. This makes the mapping
                                                                   between data and labels difficult and that will be a key
                                                                   challenge to overcome.

                                                                                              REFERENCES
                                                                   [1] B. von Halle, Business Rules Applied: Building Better Systems
                                                                        Using the Business Rule Approach, Wiley Computer Publishing,
                                                                        2002.
                                                                   [2] T. Morgan, Business Rules and Information Systems: Aligning
                                                                        IT with Business Goals, Addison-Wesley, 2002.
                                                                   [3] L. Bradley, K. McDaid, Using bayesian statistical methods to
                                                                        determine the level of error in large spreadsheets, in Proc. of
                                                                        ICSE ’09, Companion Volume, 2009, pp. 351–354.
             Fig. 1. Spreadsheet for calculation of revenues
                                                                   [4] C. Scaffidi, M. Shaw, B. A. Myers, Estimating the numbers of
    Typical spreadsheets implement business rules to calculate          end users and end user programmers, Proc. of VL/HCC ’05,
results. For example in Fig.1 the cell E19 contains the formula         2005, pp. 207–214.
SUM(E13:E18). From this formula our algorithm has to infer         [5] F. Hermans, Gathering domain knowledge from spreadsheets,
the     business    rule     “Total     earned     revenue    =         Proc. of ESEC/FSE ’09 Doctoral Symposium, 2009, pp.37-38.
Admissions+…+Other earned revenue”. Mapping E13:E18 to             [6] F. Hermans, B. Sedee, M. Pinzger, A. van Deursen, Data Clone
Admissions…Other earned revenue is straightforward.                     Detection and Visualization in Spreadsheets, Proc. of ICSE ’13,
However there is more to determine as the Total Earned                  2013, pp. 292-301.
Revenue is divided into columns for Last Year, Current Year,       [7] R. Mittermeir, M. Clermont, Finding High-Level Structures in
etc. Thus the mapping becomes two dimensional. Furthermore              Spreadsheet, Proc. of WCRE ’02, 2002, pp. 221-232.
a parser will reach three blank rows and an auxiliary header       [8] R.Abraham, M. Erwig, Header and Unit Inference for
row (actuals, budget, etc.) before it reaches the “Year” column         Spreadsheets Through Spatial Analyses, Proc. of VLHCC ’04,
                                                                        2004, pp. 165-172.
header row. Making things even more challenging, the whole
structure is repeated into vertical blocks viz. Earned Revenue,    [9] S. Chatvichienchai, Spreadsheet Metadata Extraction: A Layout
                                                                        –Based Approach, Database and Expert Systems Applications
Private Sector Revenue. When mapping the rule “Total private
                                                                        Lecture Notes in Computer Science Volume 7446, 2012, pp
sector revenue=…” the parser will encounter formulas in the             147-160.
19th row instead of reaching the column headers! Thus, same
                                                                   [10] F. Hermans, M. Pinzger, A. van Deursen, Automatically
formula repeated both vertically (in blocks) and horizontally           Extracting Class Diagrams from Spreadsheets, Proc. of ECOOP
(in year columns), yet being distinct semantically, is a                ’10, 2010, pp. 52-75.
considerable challenge.