<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using a Visual Language to Create Better Spreadsheets</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bas Jansen</string-name>
          <email>b.jansen@tudelft.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felienne Hermans</string-name>
          <email>f.f.j.hermans@tudelft.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Delft University of Technology</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>It is known that spreadsheets are error-prone. It is very di cult for users to get an overview of the design of the spreadsheet, and this is causing errors. Furthermore users are not always aware of the best way to structure a spreadsheet and just start modeling. To address this we will build a visual language to develop spreadsheet models. This enables users to visualize the design of their spreadsheets. A spreadsheet generator will create the spreadsheet based on the speci cations made with our visual language. During this process, best practices for structuring spreadsheets are automatically incorporated. There will be a bidirectional link between the model and the associated spreadsheet.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>In order to address this, we present a research plan to
develop an alternative user interface that enables the user to
design a spreadsheet using a visual language. Based on
the instructions made with this language the spreadsheet is
generated automatically. The transformation between the
model represented in the visual language and the
automatically generated spreadsheet is bidirectional. Changes in the
model are propagated to the spreadsheet and vice versa.
The visual language will help the user to keep an overall
overview of the spreadsheet design. And even more
important, because the spreadsheet is generated automatically,
well-known design patterns can be incorporated to structure
the spreadsheet.</p>
    </sec>
    <sec id="sec-2">
      <title>2. PROBLEM DEFINITION</title>
      <p>
        The success of spreadsheets can be partially explained by
their easy-to-use interface. However, it is this same interface
that is responsible for some of the problems that are
associated with spreadsheets. If you write a document or a
program you can scroll up and down to get an overall overview
of the object you are creating. Also you can use tools (like
an outline view, table of contents generator, or dependency
graph) within the software to get a better overview. In Excel
users can easily enter data and formulas in cells, but as soon
as they hit the enter-button a spreadsheet will only show
the result of the formula and not the formula itself. In a
formula, references are made to other cells in the spreadsheet.
It is not possible to immediately see these references. You
could say the underlying design of the spreadsheet is hidden
`behind' the spreadsheet itself. The hidden design makes it
di cult to understand a spreadsheet and to get an overview
of the design. This is causing errors [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>A part of the design of a spreadsheet is the way the
information is structured within the sheet. There are best practices
for the structure of a spreadsheet that can be found in
literature. A commonly found model is to split input, model
and output. This works quite well for some of the problems
that people want to solve with spreadsheets. However, it is
not the best model in every situation. Spreadsheets are also
often used for what-if questions: \What is going to happen
if I change this?" In this situation it would be better to
have input and output closely together. Putting input and
output closely together will lead to a completely di erent
structure of the spreadsheet than implementation of the
input, model, output principle. The optimal way to structure
a spreadsheet depends on the kind of problem that you want
to solve with the spreadsheet.</p>
      <p>In practice, users are not making a conscious choice of how
they structure their spreadsheets. They want to solve a
problem as quickly as possible and just start entering the
data and formulas without giving the structure a lot of
thought. When the complexity of spreadsheets increases
over time they end up with a messy model. At that point it
is di cult to change the underlying structure and the risk
of errors is imminent.</p>
      <p>This brings us to the two problems we want to focus on in
our research:
1. The design of a spreadsheet is hidden behind the
spreadsheet itself, making it very di cult for an user to get
an overview of the design
2. Users are not aware of the best way to structure a
spreadsheet and just start modeling. The model works,
but is poorly structured and error-prone.</p>
    </sec>
    <sec id="sec-3">
      <title>3. PROPOSED SOLUTION</title>
      <p>To address the above mentioned problems, we will develop
an alternative user interface for the development of a
spreadsheet. The basis for this user interface is a visual language
(see also Figure 1). One of the success factors of
spreadsheets is their exibility and ease of use. If users have to
learn a speci c programming language before they can start
developing a spreadsheet, we expect that the adoption of
this alternative user interface will be very low. However, if
we can develop a visual language that is easy to understand,
works intuitively and at the same time makes use of a drag
and drop interface, we expect a higher adoption.
It will be possible for the user to develop a spreadsheet
with the visual language in one screen and seeing the
associated spreadsheet in another screen. The link between
the model and the associated spreadsheet should be
bidirectional. Changes made in the model should be propagated to
the associated spreadsheet and vice versa.</p>
      <p>To implement this solution we face many challenges. It it is
not in the scope of this paper to address them all, but we
would like to highlight two of them in more detail. First of
all, we face the challenge of the scalability of a visual
language. Real-life spreadsheets are often complex models and
it is di cult to present such a model visually in a clear way.
Careful attention should be paid to the level of details that
are presented in the visual language. Second, if we want to
make the link between the model en the spreadsheet
bidirectional, we have to think about what kind of operations
are allowed in the spreadsheet. How should we, for example,
handle a change in the spreadsheet that violates the model?</p>
    </sec>
    <sec id="sec-4">
      <title>4. HYPOTHESIS</title>
      <p>There are several hypotheses underpinning the proposed
solution that will be evaluated during the research:
1. Users are more likely to use a visual language than a
written language as an alternative to develop a
spreadsheet.
2. The representation of a spreadsheet in the visual
language will help users to get an overall overview of the
spreadsheet.
3. If the user has a better overall overview of the
spreadsheet, the spreadsheet will contain fewer errors.
4. The automatic spreadsheet generator will use common
design patterns to structure the data in a spreadsheet.
This will improve the underlying design of the
spreadsheets.</p>
      <p>5. A better structured spreadsheet contains fewer errors.</p>
    </sec>
    <sec id="sec-5">
      <title>5. APPROACH</title>
      <p>A spreadsheet model consists of formulas and operations on
data. These formulas and operations are the constructs of
our visual language. To develop this visual language we
have to research what constructs should be included. With
our language it is possible to model the majority of questions
that are solved with spreadsheets. This implies that we need
to get an understanding of the di erent questions that users
try to solve with spreadsheets. Finally we will also explore
if the constructs we nd are depending on the domain in
which the spreadsheet is used or if they are used regardless
of the domain.</p>
      <p>
        To answer these questions we will use the EUSES
spreadsheet corpus [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. This corpus contains over 4000 real world
spreadsheets from domains such as nance, biology, and
education. We will analyze what kind of formulas are used in
spreadsheets and how they are combined. This will be
translated to the constructs that are needed to build the visual
language. The spreadsheets from the EUSES corpus will be
complemented with real-life spreadsheets collected from our
industrial partners.
      </p>
      <p>To automatically generate a spreadsheet and to structure it
in the most optimal way, we will inventory the best
practices in spreadsheet design. Besides, we will research if
additional design patterns are needed to cover the majority
of questions. We will carry out a literature study to get
an overview of the commonly used and known design
patterns. Furthermore, we search for additional design patterns
by analyzing the spreadsheets in the EUSES corpus and the
spreadsheets collected from our industrial partners.
Based on the knowledge gained about the required
constructs for the visual language and the best practices to
structure a spreadsheet, we develop a prototype of a
`graphical spreadsheet generator'. The spreadsheet generator will
generate the spreadsheet based on the speci cations made
with the visual language and uses a suitable design pattern
to structure the data. At this point, we also investigate if
it is possible to automatically generate a graphical
representation (using the syntax of the visual language) from an
existing spreadsheet. This is needed to synchronize the
visual model with the spreadsheet and allow the users to make
changes in both the model and the spreadsheet.</p>
    </sec>
    <sec id="sec-6">
      <title>6. EVALUATION</title>
      <p>
        To validate our hypotheses, we carry out two di erent kinds
of evaluations. First, we will evaluate the impact of the
visual language on the behavior of the users. Is it true that
they have a better overall overview of the spreadsheet and
do they prefer a visual language over a written language?
These two questions can be answered with case studies [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
In the case studies we will ask users to develop a real-life
spreadsheet with the new interface and afterwards interview
them about their experiences.
      </p>
      <p>Furthermore, we want to know if the spreadsheets that are
developed with our visual language contain less errors and
if this is caused by a better structure or because the
enduser has a better overview/understanding of the spreadsheet
or both. To evaluate this, controlled experiments will be
performed. Two sets of participants are asked to develop
a certain spreadsheet model. One group will use the visual
language, the other group the classical spreadsheet interface.
The two resulting sets of spreadsheets will be compared with
each other concerning the number of errors and development
time. Besides the experiments, we interview the participants
to get a better insight of the users experience with the visual
language.</p>
    </sec>
    <sec id="sec-7">
      <title>7. RELATED WORK</title>
      <p>
        Already in 2001 Burnett et al [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] developed Forms/3, a
general purpose visual programming language. Main rationale
to develop this language was to remove spreadsheet
limitations without sacri cing consistency with the spreadsheet
paradigm. Two principles in particular guided the
development process: directness and immediate visual feedback.
In our approach we are less concerned with the limitations
of modern spreadsheet languages. We want to improve the
overall quality of spreadsheets by introducing a visual
language that supports users by visualizing the design of their
spreadsheet and help them to better structure their data.
However, directness and especially immediate visual
feedback are also two valuable guiding principles in our research.
Engels and Erwig [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] have described an automatic
transformation process to generate a spreadsheet from a so called
ClassSheet. The development of ClassSheets is a further
elaboration of the work on spreadsheet templates [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. With
ClassSheets, it is possible to model spreadsheets according to
domain-related business object structures. The ClassSheet
represents both the structure and relationships of the
involved (business) objects and classes and the computational
details of how attributes are related and derived from each
other. ClassSheets help to reduce the semantic distance
between a problem domain and a spreadsheet application.
Cunha et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] have further improved the concept of
ClassSheets. They have embedded the ClassSheets spreadsheet
models in spreadsheets themselves. Because of this, users do
not have to familiarize themselves with a new programming
environment. Furthermore, the authors have presented a
technique to perform co-evolution of the ClassSheet model
and the related spreadsheet. Modi cations to the model are
automatically propagated to the spreadsheet.
      </p>
      <p>The main di erence between the ClassSheet approach and
ours is the introduction of a visual language that does not
use the tabular two-dimensional layout of spreadsheet
design. We agree that users should not be required to
familiarize themselves with a new programming environment
before they can develop a spreadsheet. Therefore the visual
language will be used in the same environment as the
associated spreadsheet. However, if the model is represented in
a spreadsheet-like layout, it is still di cult for users to get a
good overview of the design of the model. That is the reason
why we develop a language that speci es and visualizes the
model at the same time.</p>
      <p>Also, our visual language embraces object-oriented
principles, but does not expect the users to be aware of these
principles. The overall goal of our research is to apply
software engineering principles to the design of spreadsheets to
improve the overall quality of spreadsheets. However, the
spreadsheet user - who is not a professional programmer
should be able to develop the spreadsheet without being
required to have knowledge of these principles.</p>
      <p>
        Furthermore, our visual language can be used to generate
the associated spreadsheet. This does not imply that the
user is restricted in in uencing the layout of this
spreadsheet (as is the case with the ClassSheet approach). It was
estimated that 95% of U.S. Firms uses spreadsheets for
nancial reporting [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and layout is an important factor for
e ective reporting.
      </p>
      <p>
        Finally, the current ClassSheets approach enables the
coevolution of the spreadsheet model and the spreadsheet data.
At the theoretical level, the evolution of the instance of the
model and the co-evolution of the model itself has been
realized [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In our research, we focus on bidirectional
transformations and integrate them in the prototype of the
spreadsheet generator.
      </p>
    </sec>
    <sec id="sec-8">
      <title>8. EXPECTED CONTRIBUTION</title>
      <p>This research will lead to the following contributions:
1. A classi cation of the type of questions that end-users
try to solve with spreadsheets.
2. Better understanding of the kind of formulas that are
used in spreadsheets and the way these formulas are
combined to solve questions.
3. Best practices for the design/structure of spreadsheets.
4. A visual language to model spreadsheets.
5. Methods to automatically generate a spreadsheet from
the visual language.
6. Methods to automatically generate a graphical
representation (using the syntax of the visual language) of
a spreadsheet.
7. A prototype of an alternative user interface for the
development of spreadsheets that is based on a visual
language.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Abraham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Erwig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kollmansberger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Seifert</surname>
          </string-name>
          .
          <article-title>Visual speci cations of correct spreadsheets</article-title>
          .
          <source>In Visual Languages and Human-Centric Computing</source>
          ,
          <source>2005 IEEE Symposium on</source>
          , pages
          <volume>189</volume>
          {
          <fpage>196</fpage>
          . IEEE,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>M. M. Burnett</surname>
            ,
            <given-names>J. W.</given-names>
          </string-name>
          <string-name>
            <surname>Atwood</surname>
            ,
            <given-names>R. W.</given-names>
          </string-name>
          <string-name>
            <surname>Djang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Reichwein</surname>
            ,
            <given-names>H. J.</given-names>
          </string-name>
          <string-name>
            <surname>Gottfried</surname>
            , and
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Forms/3: A rst-order visual language to explore the boundaries of the spreadsheet paradigm</article-title>
          .
          <source>Journal of functional programming</source>
          ,
          <volume>11</volume>
          (
          <issue>2</issue>
          ):
          <volume>155</volume>
          {
          <fpage>206</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cunha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Pacheco</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Saraiva</surname>
          </string-name>
          .
          <article-title>Bidirectional transformation of model-driven spreadsheets</article-title>
          .
          <source>In Theory and Practice of Model Transformations</source>
          , pages
          <volume>105</volume>
          {
          <fpage>120</fpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cunha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Saraiva</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          .
          <article-title>Embedding and evolution of spreadsheet models in spreadsheet systems</article-title>
          .
          <source>In Visual Languages and Human-Centric Computing (VL/HCC)</source>
          ,
          <source>2011 IEEE Symposium on</source>
          , pages
          <volume>179</volume>
          {
          <fpage>186</fpage>
          . IEEE,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Engels</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Erwig</surname>
          </string-name>
          .
          <article-title>Classsheets: automatic generation of spreadsheet applications from object-oriented speci cations</article-title>
          .
          <source>In Proceedings of the 20th IEEE/ACM international Conference on Automated software engineering</source>
          , pages
          <volume>124</volume>
          {
          <fpage>133</fpage>
          . ACM,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fisher</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Rothermel.</surname>
          </string-name>
          <article-title>The euses spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms</article-title>
          .
          <source>ACM SIGSOFT Software Engineering Notes</source>
          ,
          <volume>30</volume>
          (
          <issue>4</issue>
          ):1{
          <issue>5</issue>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hermans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pinzger</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. van Deursen.</surname>
          </string-name>
          <article-title>Supporting professional spreadsheet users by generating leveled data ow diagrams</article-title>
          .
          <source>In Proceedings of the 33rd International Conference on Software Engineering</source>
          , pages
          <volume>451</volume>
          {
          <fpage>460</fpage>
          . ACM,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hermans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Sedee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pinzger</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. v.</given-names>
            <surname>Deursen</surname>
          </string-name>
          .
          <article-title>Data clone detection and visualization in spreadsheets</article-title>
          .
          <source>In Proceedings of the 2013 International Conference on Software Engineering</source>
          , pages
          <volume>292</volume>
          {
          <fpage>301</fpage>
          . IEEE Press,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Panko</surname>
          </string-name>
          .
          <article-title>What we know about spreadsheet errors</article-title>
          .
          <source>Journal of Organizational and End User Computing (JOEUC)</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <volume>15</volume>
          {
          <fpage>21</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Panko</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Ordway</surname>
          </string-name>
          .
          <article-title>Sarbanes-oxley: What about all the spreadsheets</article-title>
          ?
          <source>arXiv preprint arXiv:0804.0797</source>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Yin</surname>
          </string-name>
          .
          <source>Case study research: Design and methods</source>
          , volume
          <volume>5</volume>
          . sage,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>