<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Spreadsheet Cell-Meaning Model for Testing</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Kulesz</string-name>
          <email>daniel.kulesz@informatik.uni-stuttgart.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Software Technology University of Stuttgart</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Most attempts to test spreadsheets use the spreadsheetinternal model to automatically detect input, intermediate and output cells. In this paper, we discuss an example which shows why this approach is problematic even for simple spreadsheets. Following this, we derive a number of requirements for more feasible spreadsheet cell-meaning models and describe a rst prototype we have designed.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Input cells: Non-formula cells referenced by formula
cells
Intermediate cells: Formula cells referenced by other
cells and referencing other cells themselves
Result cells: Formula cells not referenced by other cells
but referencing other cells themselves
We will refer to this model as the `naive model` throughout
this paper. While this model is certainly correct from a
technical point of view, looking at it from the perspective of
a spreadsheet user's domain can lead to imminent con icts
in a number of cases. In this paper we discuss these cases and
propose an explicit model which is more di cult to extract
but which we believe is better suited for testing spreadsheets.
Since the model is concerned with the type of cells from the
perspective of what the cells mean to users, we could refer
to it as \cell-type model". Unfortunately, the term
\celltype" is usually already used for describing the data type of
a spreadsheet cell's contents. To avoid confusion, we use the
notion of \cell-meaning model" instead.</p>
    </sec>
    <sec id="sec-2">
      <title>2. ISSUES WITH THE NAIVE MODEL</title>
      <p>Figure 1 shows a spreadsheet for managing grades of a study
course. The spreadsheet is lled with data by a course
instructor and later passed to a secretary for transferring the
grades to a di erent system. Furthermore, the spreadsheet
is used for statistical purposes by the manager of the study
course this exam belongs to. We want to use this
spreadsheet as a showcase with counter examples, arguing why the
naive model can lead to a biased perception of the actual
input cells and output cells of a spreadsheet:</p>
      <p>The total points (D7 to D12) could be output values
for a secretary who has to process these grades
further (e.g. write letters to students), but these cells are
referenced by the grade cells (E7 to E12). Thus, the
cells would be regarded as intermediate cells by naive
models and not as output cells.</p>
      <p>The study program manager might not be interested
at all in the total points of the particular students but
only in the failure rate (B15). Thus, he would not see
the grades as output cells.</p>
      <p>Spreadsheet authors sometimes use defensive
programming techniques and introduce checks themselves. The
plausibility column (G) is such an example: It checks
whether any of the grades for Task 1 or Task 2 are
outside acceptable limits (zero to maximum points for
the task). The naive model would treat the cells in the
plausibility column (G7 to G12) as output as well.
The second worksheet named `grading key` is
referenced by VLOOKUP functions in the grade column's
cells (E7 to E12). The naive model would therefore
interpret these referenced cells as input cells.
However, none of the users of this spreadsheet is supposed
to change the contents of these cells as they contain
merely static data.</p>
    </sec>
    <sec id="sec-3">
      <title>3. REQUIREMENTS</title>
      <p>We are convinced that biased perceptions can be reduced if
a spreadsheet is tested using a model explicitly designed for
this purpose. From the discussion in the previous section,
we derive the following requirements for such a model:
User-speci able: It must be possible for users to
specify the cells themselves. If automatic extraction is
used, users must be able to change it.</p>
      <p>Support for views: Since users have di erent
perceptions and needs of the same spreadsheet, we either
need one testing model with di erent views or it must
be allowed that more than one testing model instance
per spreadsheet exists.</p>
      <p>Input cell types: The model must separate input
cells at least into two types: data cells (which contain
data that rarely changes or is fed from another system)
and actual decision variables which are supposed to be
manipulated by the user (of this model).</p>
      <p>Output cell types: Apart from intermediate cells,
the model must support cells which provide the data
with nal results (which the user is looking for) as well
as support for plausibility and other additional cells.
Apart from these rather theoretical requirements, we
identied two major practical requirements for the success of the
implementation of such a model:</p>
      <p>Understandability: It must be easy for spreadsheet
users to understand the model with no or little
training, so that users can identify cells correctly in the
sense of the model.</p>
      <p>
        Acceptance: Even if the model would be easily
understandable for spreadsheet users, its bene ts must be
striking so that spreadsheet users will be motivated to
take the e ort connected with using the model.
(Basically this requires a proper attention investment model
as described by Blackwell and Burnett [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]).
      </p>
      <p>It seems infeasible to expect spreadsheet users to identify all
cell-meanings manually, especially for huge spreadsheets. As
already discussed, fully automated cell-meaning detection
seems impossible | but it might be bene cial to consider
assisting users by proposing cell-meanings based on
autodetection techniques.</p>
      <p>
        A promising starting point could be the work of Hermans
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] which tries to identify plausibility cells automatically by
inspecting result cells for two additional constraints: the
formula starting with the IF operation and containing at least
one branch which can result in a string. While this
certainly works in many cases (including our small example),
we have already seen spreadsheets which use numeric
outputs for plausibility cells so this approach would fail. Yet,
since such cases are pretty rare, asking users to just validate
auto-detected cell-meanings instead of asking them to
specify cell-meanings themselves might result in lower overall
e ort and thus higher acceptance.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. PROTOTYPE</title>
      <p>We prototyped a cell-meaning model which takes into
account the requirements stated in the previous section. The
model is illustrated in Figure 2 as a standard UML class
diagram.</p>
      <p>We provide a partial implementation of our prototype in our
tool `Spreadsheet Inspection Framework` which is available
as open source software from GitHub1. The implementation
allows users to manually mark cells and use them later for
specifying test cases, but does not support all proposed
cellmeaning types yet and lacks auto-detection capabilities.</p>
    </sec>
    <sec id="sec-5">
      <title>5. FUTURE WORK</title>
      <p>To assess the feasibility of the model, further research is
required. A crucial aspect for evaluation will be the question
whether the cell types can be communicated clearly to users,
so users will tag existing spreadsheet cells according to our
proposed model.</p>
      <p>Another important aspect will be the acceptance of the model.
Since we believe that acceptance might be very low without
a reasonable level of automated assistance, it seems
worthwhile to address this point rst.</p>
      <p>Although we explained the theoretical limitations of the
naive model in this work, it must be explored whether the
additional manual e ort connected with applying our model
yields enough bene ts in terms of its ability to detect faults
in spreadsheets more accurately.
1https://github.com/kuleszdl/Spreadsheet-InspectionFramework</p>
    </sec>
    <sec id="sec-6">
      <title>6. ACKNOWLEDGEMENT</title>
      <p>We would like to thank Zahra Karimi, Kornelia Kuhle, Mandy
Northover, Jochen Ludewig and Stefan Wagner for their
constructive feedback on earlier versions of this position paper.
Furthermore, we received many good hints and comments
from the reviewers for which we are very thankful.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Blackwell</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Burnett</surname>
          </string-name>
          .
          <article-title>Applying attention investment to end-user programming</article-title>
          .
          <source>In Human Centric Computing Languages and Environments</source>
          ,
          <year>2002</year>
          . Proceedings. IEEE 2002 Symposia on, pages
          <volume>28</volume>
          {
          <fpage>30</fpage>
          . IEEE,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Blackwell</surname>
          </string-name>
          .
          <article-title>First steps in programming: A rationale for attention investment models</article-title>
          .
          <source>In Human Centric Computing Languages and Environments</source>
          ,
          <year>2002</year>
          .
          <source>Proceedings. IEEE 2002 Symposia on, pages</source>
          <volume>2</volume>
          {
          <fpage>10</fpage>
          . IEEE,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>M. Fisher</surname>
            <given-names>II</given-names>
          </string-name>
          , G. Rothermel,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brown</surname>
          </string-name>
          , M. Cao,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cook</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Burnett</surname>
          </string-name>
          .
          <article-title>Integrating automated test generation into the wysiwyt spreadsheet testing methodology</article-title>
          .
          <source>ACM Transactions on Software Engineering and Methodology (TOSEM)</source>
          ,
          <volume>15</volume>
          (
          <issue>2</issue>
          ):
          <volume>150</volume>
          {
          <fpage>194</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hermans</surname>
          </string-name>
          .
          <article-title>Improving spreadsheet test practices</article-title>
          .
          <source>Center for Advanced Studies on Collaborative Research</source>
          , CASCON,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baharloo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Williamson</surname>
          </string-name>
          .
          <article-title>Toward an integrated framework for declarative and interactive spreadsheet debugging</article-title>
          .
          <source>In Proceedings of the 8th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE)</source>
          , pages
          <fpage>117</fpage>
          {
          <fpage>124</fpage>
          .
          <string-name>
            <surname>SciTePress</surname>
          </string-name>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hofer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wotawa</surname>
          </string-name>
          .
          <article-title>Avoiding, nding and xing spreadsheet errors{a survey of automated approaches for spreadsheet qa</article-title>
          .
          <source>Journal of Systems and Software</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sca di</surname>
          </string-name>
          , M. Shaw, and
          <string-name>
            <given-names>B.</given-names>
            <surname>Myers</surname>
          </string-name>
          .
          <article-title>Estimating the numbers of end users and end user programmers</article-title>
          .
          <source>In Visual Languages and Human-Centric Computing</source>
          ,
          <source>2005 IEEE Symposium on</source>
          , pages
          <volume>207</volume>
          {
          <fpage>214</fpage>
          . IEEE,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>