=Paper= {{Paper |id=Vol-1355/paper8 |storemode=property |title=Toward Reverse Engineering of VBA Based Excel Spreadsheet Applications |pdfUrl=https://ceur-ws.org/Vol-1355/paper10.pdf |volume=Vol-1355 |dblpUrl=https://dblp.org/rec/conf/icse/AmalfitanoASFT15 }} ==Toward Reverse Engineering of VBA Based Excel Spreadsheet Applications== https://ceur-ws.org/Vol-1355/paper10.pdf
  Toward Reverse Engineering of VBA Based Excel
             Spreadsheet Applications
     Domenico Amalfitano, Nicola Amatucci, Vincenzo De Simone, Anna Rita Fasolino, Porfirio Tramontana
                            Department of Electrical Engineering and Information Technologies
                                             University of Naples Federico II
                                                Via Claudio 21, Naples, Italy
                Email: domenico.amafitano@unina.it, nicola.amatucci@unina.it, vincenzo.desimone2@unina.it,
                                         anna.fasolino@unina.it, ptramont@unina.it


   Abstract—Modern spreadsheet systems can be used to imple-         will present a technique for abstracting conceptual data models
ment complex spreadsheet applications including data sheets,         by analyzing spreadsheets data while in section III we will
customized user forms and executable procedures written in           present a tool supporting the abstraction of views representing
a scripting language. These applications are often developed
by practitioners that do not follow any software engineering         the relationships between VBA procedures, user forms and
practice and do not produce any design documentation. Thus,          spreadsheet cells. Finally, in section IV some conclusions and
spreadsheet applications may be very difficult to be maintained or   future works will be presented.
restructured. In this position paper we present in a nutshell two
reverse engineering techniques and a tool that we are currently               II. DATA M ODEL R EVERSE E NGINEERING
realizing for the abstraction of conceptual data models and
business logic models.                                                  The first technique presented in this paper regards the
                                                                     abstraction of conceptual data models from the analysis of the
                      I. I NTRODUCTION                               structure and the information included in an Excel spreadsheet
   Spreadsheets are interactive software applications for ma-        by means of heuristic rules. This technique is based on
nipulation and storage of data. In a spreadsheet data are            heuristic rules automatically applicable on a spreadsheet. By
organized in worksheets, any of which is represented by a            means of these rules, set of candidate classes with attributes,
matrix of cells each containing either data or formulas that         relationships between them and the corresponding cardinalities
are automatically calculated at any variation of data. Modern        are abstracted on the basis of the structure and of the properties
spreadsheet systems (e.g. Microsoft Excel) are integrated with       of spreadsheets and of their components, such as sheets, cells,
scripting languages (e.g., Microsoft Visual Basic for Applica-       cell headers, etc.. In particular, the heuristics consider cells
tions). An end user can extend the presentation layer of the         labels and data by looking for repeated data, synonyms and
application with Visual Basic for Application (VBA in the            group of cells organized as arrays or matrixes.
following) by defining new User Forms and can develop new               The considered rules have been extensively presented in [1]
business functions by means of Procedures. The execution of          and [2] where they have been used with success to abstract the
these procedures is event-driven: they can be attached to events     conceptual data model underlying some complex spreadsheet
related to any element of the Excel Object Model. Whenever           applications used in the automotive context as informative sys-
one of these events occurs, the attached event handler code          tems. The abstracted data model represented a useful starting
is executed. VBA can also be used to programmatically                point in the process of migration of an existing spreadsheet-
access and modify the underlying Excel Object Model, i.e.            based informative system towards a Web application. Some
the hierarchy of objects contained in Excel representing all its     of the rules were derived from works found in literature such
accessible resources.                                                as in [3] and [4]. These works are focused on spreadsheets
   Very often developers create complex spreadsheet appli-           used as calculation sheets. The proposed techniques are based
cations without any documentation at design level making             on the analysis of formulas to abstract relationships between
each task related to their maintenance very hard. In order to        candidate classes. Instead, we focused on spreadsheets used as
reconstruct design models of an existing spreadsheet applica-        informative systems and proposed further rules based on the
tion reverse engineering techniques and tools are needed. The        analysis of cell properties and values.
research community has devoted great attention to analysis of           Figure 1 shows an example of a possible application of
formulas in a spreadsheet, while they leave out the analysis of      some of the proposed rules. First of all, the sheet of the
the spreadsheets embedded code and the static and dynamic            spreadsheet shown in Figure 1 may be abstracted as a UML
relationships between the code and the spreadsheet’s cells.          class. Moreover, the two distinct rectangular areas separated
   In this paper we will present in a nutshell two reverse           by a blank column and composed of labels and data may be
engineering techniques and a tool supporting the second              abstracted as other two classes. Two composition relationships
technique allowing the abstraction of design models of an            between these two classes and the class representing the sheet
existing spreadsheet application. In details, in section II we       may be abstracted, too. The first rows of the two areas have
cells with bold texts and different background colors: they                     is composed of a single workbook with 4 worksheets, a
may be abstracted as attributes of the classes representing the                 VBProject including 6 code modules (that implement a total
areas. The texts of these cells may be abstracted as names of                   of 10 procedures) and a user form with 12 controls. In the
the attributes of the corresponding classes.                                    right part of the figure there are some detailed information
                                                                                provided by the EXACT tool about a procedure selected by
                                                                                the user and a graph showing the dependencies between this
                                                                                procedure and the other application components. In particular,
                                                                                the considered procedure writes in 9 different groups of cells,
                                                                                reads the values of a group of cells and calls two procedures.




Fig. 1. An example of application of heuristic rules to abstract a conceptual
data model of a spreadsheet
                                                                                              Fig. 2. Structural View of an Excel Spreadsheet

  III. B USINESS L OGIC M ODEL R EVERSE E NGINEERING
                                                                                           IV. C ONCLUSIONS AND F UTURE W ORKS
   The second contribution presented in this position paper is                     In this position paper we have presented in a very concise
related to a reverse engineering technique that we have realized                way two reverse engineering techniques and a tool abstracting
to abstract models of the business logic of a spreadsheet                       conceptual data models and business logic models of Excel
application by statically analyzing sheets, user forms, VBA                     spreadsheets, taking into account both the data structure,
procedures and their inter-relationships. The technique is com-                 the VBA code procedures, the User Forms and their inter-
pletely supported by an interactive Excel add-in called EXACT                   relationships. In future work, we plan to extend the EXACT
(EXcel Application Comprehension Tool) that we have real-                       tool by introducing the data model reverse engineering tech-
ized and that is available at https://github.com/reverse-unina/                 nique proposed in section II with the aim to make possible
EXACT. It provides features for the extraction of information                   comprehension processes of Excel spreadsheet applications.
and the abstraction of several views of an existing spreadsheet
application.                                                                                                 R EFERENCES
   The tool extracts information from a spreadsheet by ex-                      [1] D. Amalfitano, A. R. Fasolino, P. Tramontana, V. D. Simone, G. D.
ploiting the features offered by the Office Primary Interop                         Mare, and S. Scala, “Information extraction from legacy spreadsheet-
Assemblies for the Excel Application that exposed the Mi-                           based information system - an experience in the automotive context,”
                                                                                    in DATA 2014 - Proceedings of 3rd International Conference on
crosoft Excel Object Library and by statically analyzing the                        Data Management Technologies and Applications, Vienna, Austria,
source VBA code. The tool is able to reconstruct the set                            29-31 August, 2014, 2014, pp. 389–398. [Online]. Available: http:
of elements composing the spreadsheet including sheets, user                        //dx.doi.org/10.5220/0005139603890398
                                                                                [2] D. Amalfitano, A. Fasolino, V. Maggio, P. Tramontana, G. Di Mare,
forms, event handlers, classes and procedures. Furthermore the                      F. Ferrara, and S. Scala, “Migrating legacy spreadsheets-based systems to
tool recovers information related to the relationships between                      web mvc architecture: An industrial case study,” in Software Maintenance,
these elements such as procedure calls, relationships between                       Reengineering and Reverse Engineering (CSMR-WCRE), 2014 Software
                                                                                    Evolution Week - IEEE Conference on, Feb 2014, pp. 387–390.
events and the corresponding handlers, and dependencies                         [3] R. Abraham and M. Erwig, “Inferring templates from spreadsheets,”
between procedures and cells.                                                       in Proceedings of the 28th International Conference on Software
   The EXACT tool provides several features of software                             Engineering, ser. ICSE ’06. New York, NY, USA: ACM, 2006, pp. 182–
                                                                                    191. [Online]. Available: http://doi.acm.org/10.1145/1134285.1134312
visualization. It offers multiple views at different levels of                  [4] F. Hermans, M. Pinzger, and A. van Deursen, “Automatically extracting
detail and provides cross-referencing functions for switching                       class diagrams from spreadsheets,” in Proceedings of the 24th European
between these views. For example, the left part of Figure                           Conference on Object-oriented Programming, ser. ECOOP’10. Berlin,
                                                                                    Heidelberg: Springer-Verlag, 2010, pp. 52–75. [Online]. Available:
2 exemplifies the structural view proposed by EXACT for                             http://dl.acm.org/citation.cfm?id=1883978.1883984
an Excel spreadsheet. The figure shows that the spreadsheet