<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Tool-supported fault localization in spreadsheets: Limitations of current evaluation practice</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Birgit Hofer</string-name>
          <email>bhofer@ist.tugraz.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dietmar Jannach Thomas Schmitz</string-name>
          <email>dietmar.jannach@udo.edu</email>
          <email>dietmar.jannach@udo.edu thomas.schmitz@udo.edu</email>
          <email>thomas.schmitz@udo.edu</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kostyantyn</string-name>
          <email>kostya@i</email>
          <email>kostya@ifit.uni-klu.ac.at</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franz Wotawa</string-name>
          <email>wotawa@ist.tugraz.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Graz University of Technology</institution>
          ,
          <addr-line>8010 Graz</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Shchekotykhin, University Klagenfurt</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>TU Dortmund TU Dortmund</institution>
          ,
          <addr-line>44221 Dortmund, Germany 44221 Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In recent years, researchers have developed a number of techniques to assist the user in locating a fault within a spreadsheet. The evaluation of these approaches is often based on spreadsheets into which arti cial errors are injected. In this position paper, we summarize di erent shortcomings of these forms of evaluations and sketch possible remedies including the development of a publicly available spreadsheet corpus for benchmarking as well as user and eld studies to assess the true value of the proposed techniques.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Locating the true causes why a given spreadsheet program
does not compute the expected outcomes can be a tedious
task. Over the last years, researchers have developed a
number of methods supporting the user in the fault localization
and correction (debugging) process. The techniques range
from the visualization of suspicious cells or regions of the
spreadsheet, and the application of known practices from
software engineering like spectrum-based fault localization
(SFL) or slicing, to declarative and constraint-based
reasoning techniques [
        <xref ref-type="bibr" rid="ref1 ref11 ref12 ref16 ref3 ref6 ref7 ref9">1, 3, 6, 7, 9, 11, 12, 16</xref>
        ].
      </p>
      <p>
        However, there is a number of challenges common to all
these approaches. Unlike other computer science sub-areas,
such as natural language processing, information retrieval
or automated planning and scheduling, no standard
benchmarks exist for spreadsheet debugging methods. The
absence of commonly used benchmarks prevents the direct
comparison of spreadsheet debugging approaches.
Furthermore, fault localization and debugging for spreadsheets
require the design of a user-debugger interface. An important
question in this context is: what input or interaction can
realistically be expected from the user? Finally, the main
question to be answered is whether or not automated
debugging techniques actually help the developer as discussed
in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for imperative programs.
      </p>
      <p>In this position paper, we discuss some limitations of the
current research practice in the eld and outline potential
ways to improve the research practice in the future.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>LACK OF BENCHMARK PROBLEMS</title>
      <p>
        To demonstrate the usefulness of a new debugging
technique, we need spreadsheets containing faults. Since no
public set of such spreadsheets exists, researchers often
create their own suite of benchmark problems, e.g., by
applying mutation operators to existing correct spreadsheets [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
Unfortunately, these problems are only rarely made
publicly available. This makes a comparative evaluation of
approaches di cult and it is often unclear if the proposed
technique is applicable to a wider class of spreadsheets.
      </p>
      <p>
        In some papers, spreadsheets from the EUSES corpus1
are used for evaluations. As no information exists about the
intended semantics of these spreadsheets, mutations are
applied in order to obtain faulty versions of the spreadsheets.
The spreadsheets in this corpus are however quite diverse,
e.g., with respect to their size or the types of the used
formulas. Often only a subset of the documents is used in the
evaluations and the selection of the subset is not justi ed well.
Even when the benchmark problems are publicly shared like
the ones used in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], they may have special characteristics
that are advantageous for a certain method and, e.g.,
contain only one single fault or use only certain functions or cell
data types.
      </p>
      <p>
        A corpus of diverse benchmark problems is strongly needed
for spreadsheet debugging to make di erent research
approaches better comparable and to be able to identify
shortcomings of existing approaches. Such a corpus could be
incrementally built by researchers sharing their real-world
and arti cial benchmark problems. In addition, since it is
not always clear if typical spreadsheet mutation operators
truly correspond to mistakes developers make, insights and
practices from the Information Systems eld should be
better integrated into our research. This in particular includes
the use of spreadsheet construction exercises in laboratory
settings that help us identify which kinds of mistakes users
make and what their debugging strategies are, see, e.g., [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
1http://esquared.unl.edu/wikka.php?wakka=
EUSESSpreadsheetCorpus
      </p>
    </sec>
    <sec id="sec-3">
      <title>USABILITY AND USER ACCEPTANCE</title>
      <p>
        Spreadsheet debugging research is often based on o ine
experimental designs, e.g., by measuring how many of the
injected faults are successfully located with a given
technique, see, e.g., [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In some cases, plug-ins to spreadsheet
environments are developed like in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Similar to
plug-ins used for other purposes, e.g., spreadsheet testing,
the usability of these plug-ins for end users is seldom in the
focus of the research. The proposed plug-ins typically
require various types of input from the user at di erent stages
of the debugging process. Some of these inputs have to be
provided at the beginning of the process and some can be
requested by the debugger during fault localization. Typical
inputs of a debugger include statements about the
correctness of values/formulas in individual cells [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], information
about expected values for certain cells [
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ], speci cation of
multiple test cases [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], etc.
      </p>
      <p>In many cases, it remains unclear, if an average
spreadsheet developer will be willing or able to provide these inputs
since concepts like test cases do not exist in the spreadsheet
paradigm. Therefore, researchers have to ensure that a
developer interprets the requests from the debugger correctly
and provides appropriate inputs as expected by the
debugger. One additional problem in that context is that user
inputs, e.g., the test case speci cations, are usually
considered to be reliable and most existing approaches have no
built-in means to deal with errors in the inputs.</p>
      <p>
        Overall, we argue that o ine experimental evaluations
should be paired with user studies whenever possible as
done, e.g., in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] or [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Such studies should help us validate
whether our approaches are based on realistic assumptions
and are acceptable at least for ambitious users after some
training. At the same time, observations of the users'
behavior during debugging can be used to learn about their
problem solving strategies and to evaluate whether the tool
actually helped to nd a fault.
      </p>
      <p>Again, insights and practices both from the elds of
Information Systems and Human Computer Interaction should
be the basis for these forms of experiments.</p>
    </sec>
    <sec id="sec-4">
      <title>FIELD RESEARCH</title>
      <p>
        In addition to user studies in laboratory environments,
research on real spreadsheets as suggested in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] is required
to determine potential di erences between the experimental
usage of the proposed debugging methods and the everyday
use of such tools in companies or institutes. Error rates and
types found in practice could di er from what is observed in
user studies whose participants in many cases are students.
In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], e.g., a construction exercise with business managers
was done to determine error rates. In addition, the user
acceptance of fault localization tools could vary strongly
because of di erent expectations of professional users with
respect to the utilized tools. To ensure the usability for real
users, existing spreadsheets can be examined and
questionnaires with users can be made, as done, e.g., in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSIONS</title>
      <p>A number of proposals have been made in the recent
literature to assist the user in the process of locating faults in a
given spreadsheet. In this position paper, we have identi ed
some limitations of current research practice regarding the
comparability and reproducibility of the results. As
possible remedies to these shortcomings we advocate the
development of a corpus of benchmark problems and the increased
adoption of user studies of various types as an evaluation
instrument. As experimental settings di er from real-life, we
additionally propose to use eld studies to obtain insights
on how debugging methods are used in companies.
6.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Abraham</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Erwig</surname>
          </string-name>
          .
          <article-title>GoalDebug: A Spreadsheet Debugger for End Users</article-title>
          .
          <source>In Proc. ICSE</source>
          <year>2007</year>
          , pages
          <fpage>251</fpage>
          {
          <fpage>260</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Abraham</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Erwig</surname>
          </string-name>
          .
          <article-title>Mutation Operators for Spreadsheets</article-title>
          .
          <source>IEEE Trans. on Softw. Eng.</source>
          ,
          <volume>35</volume>
          (
          <issue>1</issue>
          ):
          <volume>94</volume>
          {
          <fpage>108</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Abreu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Riboira</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wotawa</surname>
          </string-name>
          .
          <article-title>Constraint-based debugging of spreadsheets</article-title>
          .
          <source>In Proc. CibSE'12</source>
          , pages
          <fpage>1</fpage>
          {
          <fpage>14</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Brown</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Gould</surname>
          </string-name>
          .
          <article-title>An Experimental Study of People Creating Spreadsheets</article-title>
          .
          <source>ACM TOIS</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):
          <volume>258</volume>
          {
          <fpage>272</fpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Chambers</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Erwig</surname>
          </string-name>
          .
          <article-title>Automatic Detection of Dimension Errors in Spreadsheets</article-title>
          . J.
          <string-name>
            <surname>Vis</surname>
          </string-name>
          . Lang. &amp;
          <string-name>
            <surname>Comp</surname>
          </string-name>
          .,
          <volume>20</volume>
          (
          <issue>4</issue>
          ):
          <volume>269</volume>
          {
          <fpage>283</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Cunha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. a. P.</given-names>
            <surname>Fernandes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. a.</given-names>
            <surname>Saraiva</surname>
          </string-name>
          .
          <article-title>Towards a catalog of spreadsheet smells</article-title>
          .
          <source>In Proc. ICCSA'12</source>
          , pages
          <fpage>202</fpage>
          {
          <fpage>216</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hermans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pinzger</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A. van Deursen. Supporting</given-names>
            <surname>Professional</surname>
          </string-name>
          <article-title>Spreadsheet Users by Generating Leveled Data ow Diagrams</article-title>
          .
          <source>In Proc. ICSE</source>
          <year>2011</year>
          , pages
          <fpage>451</fpage>
          {
          <fpage>460</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hermans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pinzger</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. van Deursen. Detecting</surname>
          </string-name>
          <article-title>and Visualizing Inter-Worksheet Smells in Spreadsheets</article-title>
          .
          <source>In ICSE 2012</source>
          , pages
          <fpage>441</fpage>
          {
          <fpage>451</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hermans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pinzger</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A. van Deursen. Detecting Code</surname>
          </string-name>
          <article-title>Smells in Spreadsheet Formulas</article-title>
          .
          <source>In Proc. ICSM</source>
          <year>2012</year>
          , pages
          <fpage>409</fpage>
          {
          <fpage>418</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>B.</given-names>
            <surname>Hofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Riboira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wotawa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Abreu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Getzner</surname>
          </string-name>
          .
          <article-title>On the Empirical Evaluation of Fault Localization Techniques for Spreadsheets</article-title>
          .
          <source>In Proc. FASE</source>
          <year>2013</year>
          , pages
          <fpage>68</fpage>
          {
          <fpage>82</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmitz</surname>
          </string-name>
          .
          <article-title>Model-based diagnosis of spreadsheet programs - A constraint-based debugging approach</article-title>
          . Autom. Softw. Eng., to appear,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Jannach</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Schmitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Hofer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Wotawa</surname>
          </string-name>
          .
          <article-title>Avoiding, nding and xing spreadsheet errors - a survey of automated approaches for spreadsheet QA</article-title>
          .
          <source>Journal of Systems and Software</source>
          , to appear,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Karlsson</surname>
          </string-name>
          .
          <article-title>Using two heads in practice</article-title>
          .
          <source>In Proc. WEUSE</source>
          <year>2008</year>
          , pages
          <fpage>43</fpage>
          {
          <fpage>47</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Parnin</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Orso</surname>
          </string-name>
          .
          <source>Are Automated Debugging Techniques Actually Helping Programmers? In Proc. ISSTA</source>
          <year>2011</year>
          , pages
          <fpage>199</fpage>
          {
          <fpage>209</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S. G.</given-names>
            <surname>Powell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. R.</given-names>
            <surname>Baker</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Lawson</surname>
          </string-name>
          .
          <article-title>A critical review of the literature on spreadsheet errors</article-title>
          .
          <source>Decision Support Systems</source>
          ,
          <volume>46</volume>
          (
          <issue>1</issue>
          ):
          <volume>128</volume>
          {
          <fpage>138</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Reichwein</surname>
          </string-name>
          , G. Rothermel, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Burnett</surname>
          </string-name>
          .
          <article-title>Slicing Spreadsheets: An Integrated Methodology for Spreadsheet Testing and Debugging</article-title>
          .
          <source>In Proc. DSL</source>
          <year>1999</year>
          , pages
          <fpage>25</fpage>
          {
          <fpage>38</fpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>