=Paper= {{Paper |id=Vol-1209/paper15 |storemode=property |title=Using a Visual Language to Create Better Spreadsheets |pdfUrl=https://ceur-ws.org/Vol-1209/paper_16.pdf |volume=Vol-1209 }} ==Using a Visual Language to Create Better Spreadsheets== https://ceur-ws.org/Vol-1209/paper_16.pdf
     Using a Visual Language to Create Better Spreadsheets

                             Bas Jansen                                            Felienne Hermans
                   Delft University of Technology                              Delft University of Technology
                       b.jansen@tudelft.nl                                      f.f.j.hermans@tudelft.nl




ABSTRACT                                                            2.     PROBLEM DEFINITION
It is known that spreadsheets are error-prone. It is very           The success of spreadsheets can be partially explained by
difficult for users to get an overview of the design of the         their easy-to-use interface. However, it is this same interface
spreadsheet, and this is causing errors. Furthermore users          that is responsible for some of the problems that are associ-
are not always aware of the best way to structure a spread-         ated with spreadsheets. If you write a document or a pro-
sheet and just start modeling. To address this we will build        gram you can scroll up and down to get an overall overview
a visual language to develop spreadsheet models. This en-           of the object you are creating. Also you can use tools (like
ables users to visualize the design of their spreadsheets. A        an outline view, table of contents generator, or dependency
spreadsheet generator will create the spreadsheet based on          graph) within the software to get a better overview. In Excel
the specifications made with our visual language. During            users can easily enter data and formulas in cells, but as soon
this process, best practices for structuring spreadsheets are       as they hit the enter-button a spreadsheet will only show
automatically incorporated. There will be a bidirectional           the result of the formula and not the formula itself. In a for-
link between the model and the associated spreadsheet.              mula, references are made to other cells in the spreadsheet.
                                                                    It is not possible to immediately see these references. You
1.   INTRODUCTION                                                   could say the underlying design of the spreadsheet is hidden
Spreadsheets are extensively used by companies. Informa-            ‘behind’ the spreadsheet itself. The hidden design makes it
tion embedded in these spreadsheets often forms the basis           difficult to understand a spreadsheet and to get an overview
for business decisions [7]. The quality of these spreadsheets       of the design. This is causing errors [8].
impacts the quality of the decisions. It is known that spread-
sheets are error-prone [9]. A poorly structured spreadsheet         A part of the design of a spreadsheet is the way the informa-
is often the cause of errors. Because of the nature of the          tion is structured within the sheet. There are best practices
spreadsheet user interface it is difficult to keep an overview of   for the structure of a spreadsheet that can be found in lit-
the underlying design. And without a clear design the struc-        erature. A commonly found model is to split input, model
ture of the spreadsheet gets messy. Also users do not always        and output. This works quite well for some of the problems
possess the knowledge to structure a spreadsheet properly.          that people want to solve with spreadsheets. However, it is
                                                                    not the best model in every situation. Spreadsheets are also
In order to address this, we present a research plan to de-         often used for what-if questions: “What is going to happen
velop an alternative user interface that enables the user to        if I change this?” In this situation it would be better to
design a spreadsheet using a visual language. Based on              have input and output closely together. Putting input and
the instructions made with this language the spreadsheet is         output closely together will lead to a completely different
generated automatically. The transformation between the             structure of the spreadsheet than implementation of the in-
model represented in the visual language and the automati-          put, model, output principle. The optimal way to structure
cally generated spreadsheet is bidirectional. Changes in the        a spreadsheet depends on the kind of problem that you want
model are propagated to the spreadsheet and vice versa.             to solve with the spreadsheet.
The visual language will help the user to keep an overall
overview of the spreadsheet design. And even more impor-            In practice, users are not making a conscious choice of how
tant, because the spreadsheet is generated automatically,           they structure their spreadsheets. They want to solve a
well-known design patterns can be incorporated to structure         problem as quickly as possible and just start entering the
the spreadsheet.                                                    data and formulas without giving the structure a lot of
                                                                    thought. When the complexity of spreadsheets increases
                                                                    over time they end up with a messy model. At that point it
                                                                    is difficult to change the underlying structure and the risk
                                                                    of errors is imminent.

                                                                    This brings us to the two problems we want to focus on in
                                                                    our research:


                                                                         1. The design of a spreadsheet is hidden behind the spread-
                                                                   are allowed in the spreadsheet. How should we, for example,
                                                                   handle a change in the spreadsheet that violates the model?

                                                                   4.     HYPOTHESIS
                                                                   There are several hypotheses underpinning the proposed so-
                                                                   lution that will be evaluated during the research:


                                                                        1. Users are more likely to use a visual language than a
                                                                           written language as an alternative to develop a spread-
                                                                           sheet.
                                                                        2. The representation of a spreadsheet in the visual lan-
                                                                           guage will help users to get an overall overview of the
                                                                           spreadsheet.
                                                                        3. If the user has a better overall overview of the spread-
                                                                           sheet, the spreadsheet will contain fewer errors.
                                                                        4. The automatic spreadsheet generator will use common
                                                                           design patterns to structure the data in a spreadsheet.
                                                                           This will improve the underlying design of the spread-
                                                                           sheets.
                                                                        5. A better structured spreadsheet contains fewer errors.
      Figure 1: Model and associated spreadsheet
                                                                   5.     APPROACH
       sheet itself, making it very difficult for an user to get   A spreadsheet model consists of formulas and operations on
       an overview of the design                                   data. These formulas and operations are the constructs of
                                                                   our visual language. To develop this visual language we
     2. Users are not aware of the best way to structure a         have to research what constructs should be included. With
        spreadsheet and just start modeling. The model works,      our language it is possible to model the majority of questions
        but is poorly structured and error-prone.                  that are solved with spreadsheets. This implies that we need
                                                                   to get an understanding of the different questions that users
3.     PROPOSED SOLUTION                                           try to solve with spreadsheets. Finally we will also explore
To address the above mentioned problems, we will develop           if the constructs we find are depending on the domain in
an alternative user interface for the development of a spread-     which the spreadsheet is used or if they are used regardless
sheet. The basis for this user interface is a visual language      of the domain.
(see also Figure 1). One of the success factors of spread-
sheets is their flexibility and ease of use. If users have to      To answer these questions we will use the EUSES spread-
learn a specific programming language before they can start        sheet corpus [6]. This corpus contains over 4000 real world
developing a spreadsheet, we expect that the adoption of           spreadsheets from domains such as finance, biology, and ed-
this alternative user interface will be very low. However, if      ucation. We will analyze what kind of formulas are used in
we can develop a visual language that is easy to understand,       spreadsheets and how they are combined. This will be trans-
works intuitively and at the same time makes use of a drag         lated to the constructs that are needed to build the visual
and drop interface, we expect a higher adoption.                   language. The spreadsheets from the EUSES corpus will be
                                                                   complemented with real-life spreadsheets collected from our
It will be possible for the user to develop a spreadsheet          industrial partners.
with the visual language in one screen and seeing the as-
sociated spreadsheet in another screen. The link between           To automatically generate a spreadsheet and to structure it
the model and the associated spreadsheet should be bidirec-        in the most optimal way, we will inventory the best prac-
tional. Changes made in the model should be propagated to          tices in spreadsheet design. Besides, we will research if ad-
the associated spreadsheet and vice versa.                         ditional design patterns are needed to cover the majority
                                                                   of questions. We will carry out a literature study to get
To implement this solution we face many challenges. It it is       an overview of the commonly used and known design pat-
not in the scope of this paper to address them all, but we         terns. Furthermore, we search for additional design patterns
would like to highlight two of them in more detail. First of       by analyzing the spreadsheets in the EUSES corpus and the
all, we face the challenge of the scalability of a visual lan-     spreadsheets collected from our industrial partners.
guage. Real-life spreadsheets are often complex models and
it is difficult to present such a model visually in a clear way.   Based on the knowledge gained about the required con-
Careful attention should be paid to the level of details that      structs for the visual language and the best practices to
are presented in the visual language. Second, if we want to        structure a spreadsheet, we develop a prototype of a ‘graph-
make the link between the model en the spreadsheet bidi-           ical spreadsheet generator’. The spreadsheet generator will
rectional, we have to think about what kind of operations          generate the spreadsheet based on the specifications made
with the visual language and uses a suitable design pattern       models in spreadsheets themselves. Because of this, users do
to structure the data. At this point, we also investigate if      not have to familiarize themselves with a new programming
it is possible to automatically generate a graphical repre-       environment. Furthermore, the authors have presented a
sentation (using the syntax of the visual language) from an       technique to perform co-evolution of the ClassSheet model
existing spreadsheet. This is needed to synchronize the vi-       and the related spreadsheet. Modifications to the model are
sual model with the spreadsheet and allow the users to make       automatically propagated to the spreadsheet.
changes in both the model and the spreadsheet.
                                                                  The main difference between the ClassSheet approach and
6.   EVALUATION                                                   ours is the introduction of a visual language that does not
To validate our hypotheses, we carry out two different kinds      use the tabular two-dimensional layout of spreadsheet de-
of evaluations. First, we will evaluate the impact of the         sign. We agree that users should not be required to fa-
visual language on the behavior of the users. Is it true that     miliarize themselves with a new programming environment
they have a better overall overview of the spreadsheet and        before they can develop a spreadsheet. Therefore the visual
do they prefer a visual language over a written language?         language will be used in the same environment as the asso-
These two questions can be answered with case studies [11].       ciated spreadsheet. However, if the model is represented in
In the case studies we will ask users to develop a real-life      a spreadsheet-like layout, it is still difficult for users to get a
spreadsheet with the new interface and afterwards interview       good overview of the design of the model. That is the reason
them about their experiences.                                     why we develop a language that specifies and visualizes the
                                                                  model at the same time.
Furthermore, we want to know if the spreadsheets that are
developed with our visual language contain less errors and        Also, our visual language embraces object-oriented princi-
if this is caused by a better structure or because the end-       ples, but does not expect the users to be aware of these
user has a better overview/understanding of the spreadsheet       principles. The overall goal of our research is to apply soft-
or both. To evaluate this, controlled experiments will be         ware engineering principles to the design of spreadsheets to
performed. Two sets of participants are asked to develop          improve the overall quality of spreadsheets. However, the
a certain spreadsheet model. One group will use the visual        spreadsheet user - who is not a professional programmer -
language, the other group the classical spreadsheet interface.    should be able to develop the spreadsheet without being re-
The two resulting sets of spreadsheets will be compared with      quired to have knowledge of these principles.
each other concerning the number of errors and development
time. Besides the experiments, we interview the participants      Furthermore, our visual language can be used to generate
to get a better insight of the users experience with the visual   the associated spreadsheet. This does not imply that the
language.                                                         user is restricted in influencing the layout of this spread-
                                                                  sheet (as is the case with the ClassSheet approach). It was
7.   RELATED WORK                                                 estimated that 95% of U.S. Firms uses spreadsheets for fi-
                                                                  nancial reporting [10] and layout is an important factor for
Already in 2001 Burnett et al [2] developed Forms/3, a gen-
                                                                  effective reporting.
eral purpose visual programming language. Main rationale
to develop this language was to remove spreadsheet limi-
                                                                  Finally, the current ClassSheets approach enables the co-
tations without sacrificing consistency with the spreadsheet
                                                                  evolution of the spreadsheet model and the spreadsheet data.
paradigm. Two principles in particular guided the develop-
                                                                  At the theoretical level, the evolution of the instance of the
ment process: directness and immediate visual feedback.
                                                                  model and the co-evolution of the model itself has been real-
                                                                  ized [3]. In our research, we focus on bidirectional transfor-
In our approach we are less concerned with the limitations
                                                                  mations and integrate them in the prototype of the spread-
of modern spreadsheet languages. We want to improve the
                                                                  sheet generator.
overall quality of spreadsheets by introducing a visual lan-
guage that supports users by visualizing the design of their
spreadsheet and help them to better structure their data.         8.     EXPECTED CONTRIBUTION
However, directness and especially immediate visual feed-         This research will lead to the following contributions:
back are also two valuable guiding principles in our research.
                                                                       1. A classification of the type of questions that end-users
Engels and Erwig [5] have described an automatic transfor-                try to solve with spreadsheets.
mation process to generate a spreadsheet from a so called
ClassSheet. The development of ClassSheets is a further                2. Better understanding of the kind of formulas that are
elaboration of the work on spreadsheet templates [1]. With                used in spreadsheets and the way these formulas are
ClassSheets, it is possible to model spreadsheets according to            combined to solve questions.
domain-related business object structures. The ClassSheet              3. Best practices for the design/structure of spreadsheets.
represents both the structure and relationships of the in-
volved (business) objects and classes and the computational            4. A visual language to model spreadsheets.
details of how attributes are related and derived from each
                                                                       5. Methods to automatically generate a spreadsheet from
other. ClassSheets help to reduce the semantic distance be-
                                                                          the visual language.
tween a problem domain and a spreadsheet application.
                                                                       6. Methods to automatically generate a graphical repre-
Cunha et al. [4] have further improved the concept of Class-              sentation (using the syntax of the visual language) of
Sheets. They have embedded the ClassSheets spreadsheet                    a spreadsheet.
     7. A prototype of an alternative user interface for the         In Proceedings of the 20th IEEE/ACM international
        development of spreadsheets that is based on a visual        Conference on Automated software engineering, pages
        language.                                                    124–133. ACM, 2005.
                                                                 [6] M. Fisher and G. Rothermel. The euses spreadsheet
9.     REFERENCES                                                    corpus: a shared resource for supporting
 [1] R. Abraham, M. Erwig, S. Kollmansberger, and                    experimentation with spreadsheet dependability
     E. Seifert. Visual specifications of correct                    mechanisms. ACM SIGSOFT Software Engineering
     spreadsheets. In Visual Languages and Human-Centric             Notes, 30(4):1–5, 2005.
     Computing, 2005 IEEE Symposium on, pages 189–196.           [7] F. Hermans, M. Pinzger, and A. van Deursen.
     IEEE, 2005.                                                     Supporting professional spreadsheet users by
 [2] M. M. Burnett, J. W. Atwood, R. W. Djang,                       generating leveled dataflow diagrams. In Proceedings
     J. Reichwein, H. J. Gottfried, and S. Yang. Forms/3:            of the 33rd International Conference on Software
     A first-order visual language to explore the boundaries         Engineering, pages 451–460. ACM, 2011.
     of the spreadsheet paradigm. Journal of functional          [8] F. Hermans, B. Sedee, M. Pinzger, and A. v. Deursen.
     programming, 11(2):155–206, 2001.                               Data clone detection and visualization in spreadsheets.
 [3] J. Cunha, J. P. Fernandes, J. Mendes, H. Pacheco,               In Proceedings of the 2013 International Conference
     and J. Saraiva. Bidirectional transformation of                 on Software Engineering, pages 292–301. IEEE Press,
     model-driven spreadsheets. In Theory and Practice of            2013.
     Model Transformations, pages 105–120. Springer,             [9] R. R. Panko. What we know about spreadsheet errors.
     2012.                                                           Journal of Organizational and End User Computing
 [4] J. Cunha, J. Mendes, J. Saraiva, and J. P. Fernandes.           (JOEUC), 10(2):15–21, 1998.
     Embedding and evolution of spreadsheet models in           [10] R. R. Panko and N. Ordway. Sarbanes-oxley: What
     spreadsheet systems. In Visual Languages and                    about all the spreadsheets? arXiv preprint
     Human-Centric Computing (VL/HCC), 2011 IEEE                     arXiv:0804.0797, 2008.
     Symposium on, pages 179–186. IEEE, 2011.                   [11] R. K. Yin. Case study research: Design and methods,
 [5] G. Engels and M. Erwig. Classsheets: automatic                  volume 5. sage, 2009.
     generation of spreadsheet applications from
     object-oriented specifications.