=Paper=
{{Paper
|id=Vol-1209/paper5
|storemode=property
|title=On the Usage of Dependency-based Models for Spreadsheet Debugging
|pdfUrl=https://ceur-ws.org/Vol-1209/paper_5.pdf
|volume=Vol-1209
}}
==On the Usage of Dependency-based Models for Spreadsheet Debugging==
On the Usage of Dependency-based Models
for Spreadsheet Debugging
Birgit Hofer Franz Wotawa
Graz University of Technology Graz University of Technology
Inffeldgasse 16b/II Inffeldgasse 16b/II
8010 Graz, Austria 8010 Graz, Austria
bhofer@ist.tugraz.at wotawa@ist.tugraz.at
ABSTRACT reason for this may be their inaccuracy. Dependency-based
Locating faults in spreadsheets can be difficult. Therefore, models compute a significantly higher number of diagnoses
tools supporting the localization of faults are needed. Model- than value-based models. In this paper, we propose a novel
based software debugging (MBSD) is a promising fault local- type of dependency-based model which uses equivalence in-
ization technique. This paper presents a novel dependency- stead of the implication to model the dependency relation
based model that can be used in MBSD. This model allows between cells. This allows improvements on the diagnostic
improvements of the diagnostic accuracy while keeping the accuracy while keeping the computation times short.
computation times short. In an empirical evaluation, we
show that dependency-based models of spreadsheets whose In order to demonstrate the differences between the value-
value-based models are often not solvable in an acceptable based, the original dependency-based, and our improved
amount of time can be solved in less than one second. Fur- dependency-based model, we make use of a running ex-
thermore, we show that the amount of diagnoses obtained ample. This example is a simplified version of the “home-
through dependency-based models is reduced by 15 % on work/budgetone”spreadsheet taken from the EUSES spread-
average when using the novel model instead of the origi- sheet corpus [6]. We manually injected a fault into the
nal dependency-based model. The short computation time spreadsheet in cell D5. Figure 1a shows the normal (or value)
and the improved diagnostic accuracy enable the usage of view of this faulty spreadsheet variant. Figure 1b shows the
model-based debugging for spreadsheets in practice. formula view of the same spreadsheet. The output cells1 of
the spreadsheet are shaded in gray. The faulty cell D5 is
Categories and Subject Descriptors highlighted with a red box. The fault manifests in the value
H.4.1 [Information Systems Applications]: Office Au- of the output cell D7. The expected value for this cell is
tomation—Spreadsheets; D.2.5 [Software Engineering]: 78,6 %.
Testing and Debugging—Debugging aids
Keywords
Spreadsheets, Debugging, Model-based Fault Localization
1. INTRODUCTION
Even for small spreadsheets, the localization of faults can
be time consuming and frustrating. Thus, approaches sup- (a) Normal view
porting fault localization in spreadsheets are needed. Some
fault localization techniques developed for the software en-
gineering discipline have been adapted to the spreadsheet
domain, for example [1, 9, 10, 8]. Model-based software de-
bugging (MBSD) [16, 18, 13] is one of these techniques. So
far, researchers have only focused on methods that use value-
based models [3, 4, 12, 11]. Value-based models compute a
small set of possible explanations (i.e., diagnoses) for an ob-
served misbehavior. This small set of diagnoses is helpful (b) Formula view
for users when debugging. Unfortunately, value-based mod-
els have high computation times and they do not scale: the Figure 1: Running example
underlying solving mechanisms have problems when dealing
with variables with large domains and real numbers. In an
When using model based debugging, the faulty cell of this
empirical evaluation, Außerlechner et al. [5] showed the limi-
example spreadsheet can be detected independent of the un-
tations of different constraint solvers and SMT (satisfiability
derlying model. There are however differences with respect
modulo theories) solvers when using these models.
to runtime and solution size. The value-based model and
our novel dependency-based model identify three cells that
To the best of our knowledge, dependency-based models
1
have not been used for localizing faults in spreadsheets. The An output cell is a cell that is not referenced by any other cell.
could explain the observed misbehavior, while the original Input: Output:
dependency-based model identifies six cells as possible ex-
B2 == 1000 D3 == 20.6
planations. When using the constraint solver Minion [7],
both dependency-based models require only one third of the C2 == 1500 B7 == 0.75
computation time compared with the value-based model. B3 == 20 C7 == 0.81
... D7 == 0.786
This work is related to the work of Mayer and Stumpt-
ner [15] and Wotawa [17]. Mayer and Stumptner evaluated
models for model-based debugging in the software domain. Formula constraints:
Wotawa discussed the relationship of model-based debug- AB(cellD2 ) ∨ D2 == B2 + C2
ging and program slicing. AB(cellD3 ) ∨ D3 == D4/D2
AB(cellB4 ) ∨ B4 == B3 × B2
In the remainder of this paper, we will explain the differ-
AB(cellC4 ) ∨ C4 == C3 × C2
ent types of models and show the model representations for
the above described running example (see Section 2). In AB(cellD4 ) ∨ D4 == B4 + C4
Section 3, we will empirically compare the different mod- AB(cellD5 ) ∨ D5 == B5
els with respect to efficiency and effectiveness. The novel ...
dependency-based model reduces the number of computed AB(cellD7 ) ∨ D7 == D6/D4
diagnoses by 15 % compared to original dependency-based
model. Furthermore, the empirical evaluation shows that
the dependency-based models can be solved in less than one Solving this constraint system leads to three possible solu-
second even for those spreadsheets whose value-based mod- tions: Either cell D5, D6 or D7 must contain the fault.
els require more than 20 minutes solving time.
2.2 Original dependency-based models
2. MODEL-BASED DEBUGGING When using dependency-based models, only the information
In model-based software debugging (MBSD), the cells of a about whether the computed values are correct is propa-
spreadsheet and the given observations (i.e., the test case2 ) gated. Therefore, all cell values are represented as Boolean
instead of Integer or Real values. All variables representing
are converted into constraints. As the given test case is a input cells are initialized with true. All variables represent-
failing test case, this constraint system results in a contradic- ing correct output cells are also initialized with true. The
tion. In order to determine which cells could resolve this con- variables representing erroneous output cells are initialized
tradiction, MBSD introduces abnormal variables (AB) for with false. Instead of using the concrete formulas in the
each cell. These abnormal variables represent the “health” constraints, only the correctness relation is modeled. If the
state of the cells. If a cell c is not abnormal, the formula of formula of cell c is correct and the input values of a formula
are correct then cell c must compute a correct value:
the cell must be correct [18]: ^ ′
¬AB(c) → constraint(c). (1) AB(cellc ) ∨ c →c (3)
c′ ∈ρ(c)
This logic expression can be transformed to
where ρ(c) is the set of all cells that are referenced in c. De-
AB(c) ∨ constraint(c). (2) tails about this modeling for software written in an imper-
ative language can be found e.g. in [17]. The dependency-
Having such a constraint system, we are able to use a con- based constraints for our running example are as follows:
straint or SMT solver to determine which abnormal variables
have to be set to true to eliminate the contradiction. We
refer the interested reader to [18, 14] for more information Input: Output:
about the principles of MBSD. MBSD can be performed B2 == true D3 == true
using either dependency-based or value-based models. We
C2 == true B7 == true
discuss these models and our novel dependency-based model
in the following subsections. B3 == true C7 == true
... D7 == false
2.1 Value-based models
When using value-based models, the values of the cells are Formula constraints:
propagated. A value-based constraint system contains (i) the
input cells and their values, (ii) the output cells and their ex- AB(cellD2 ) ∨ (B2 ∧ C2 → D2)
pected values, and (iii) all formulas concatenated with their AB(cellD3 ) ∨ (D2 ∧ D4 → D3)
abnormal variable. The constraint representation handles AB(cellB4 ) ∨ (B2 ∧ B3 → B4)
the formulas as equations instead of assignments. This al- AB(cellC4 ) ∨ (C2 ∧ C3 → C4)
lows to draw conclusions on the input from the output of a AB(cellD4 ) ∨ (B4 ∧ C4 → D4)
formula. Such a value-based model for spreadsheets is pro- AB(cellD5 ) ∨ (B5 → D5)
posed by Abreu et al. [3, 4]. The running example from
...
Figure 1b is converted into the following constraints:
2
AB(cellD7 ) ∨ (D4 ∧ D6 → D7)
A test case is a tuple (I, O), where I are the values for the input
cells and O the expected values for the output cells. A test case
is a failing test case if at least one computed output value differs Solving this constraint system leads to six possible solu-
from the expected value. tions: Either cell B4, C4, D4, D5, D6 or D7 must contain
the fault. This dependency-based model computes more relational, and logic operators such as multiplication, divi-
diagnoses because of the implication. In the value-based sion, and equality over Boolean and Integers.
model, the cells B4, C4, and D4 can be excluded from the
set of possible diagnoses because B4 and C4 are used to com- The evaluation was performed on an Intel Core2 Duo proces-
pute D4, and D4 is used to compute D3, which is known to sor (2.67 GHz) with 4 GB RAM and Windows 7 as operating
compute the correct value. Unfortunately, this information system. We used the Minion version 0.15. The computation
gets lost when using the implication because the implication time is the average time over 10 runs.
allows conclusions only from the input to the output but
not vice versa. This problem will be solved with the novel We evaluated the models by means of spreadsheets from the
dependency-based model that is explained in the following publicly available Integer spreadsheet corpus3 [5]. This cor-
subsection. pus contains 33 different spreadsheets (12 artificially created
spreadsheets and 21 real-life spreadsheets), e.g., a spread-
2.3 Novel dependency-based models sheet that calculates the lowest price combination on a shop-
In order to eliminate the previously described weakness of ping list or the winner of Wimbleton 2012. These spread-
dependency-based models, we use bi-implication (equiva- sheets contain both arithmetical and logical operators as well
lence) instead of the implication. The formula constraints as the functions SUM and IF. The spreadsheets contain on
for our running example from Figure 1b are as follows: average 39 formula cells, the largest spreadsheet contains 233
formulas. Faulty versions of the spreadsheets (containing
single, double and triple faults) were created by randomly
selecting formulas and applying mutation operators [2] on
AB(cellD2 ) ∨ (B2 ∧ C2 ↔ D2) them. The corpus contains in total 220 mutants. In the
AB(cellD3 ) ∨ (D2 ∧ D4 ↔ D3) empirical evaluation, we used only the spreadsheets which
contain a single fault, i.e. 94 spreadsheets.
AB(cellB4 ) ∨ (B2 ∧ B3 ↔ B4)
AB(cellC4 ) ∨ (C2 ∧ C3 ↔ C4) Table 1 compares the three types of models with respect
AB(cellD4 ) ∨ (B4 ∧ C4 ↔ D4) to fault localization capabilities and runtimes. The fault
localization capabilities are expressed by means of the num-
AB(cellD5 ) ∨ (B5 ↔ D5) ber of cells that are single fault diagnoses. The runtime
... is measured by means of Minion’s average solving time in
AB(cellD7 ) ∨ (D4 ∧ D6 ↔ D7) milliseconds. The spreadsheets are divided into two sub-
groups: spreadsheets whose value-based models are solved
by Minion in less then 20 minutes and spreadsheets whose
Solving this constraint system leads to the same 3 diagnoses value-based models could not be solved within 20 minutes
as when using a value-based model. (i.e. 31 from 94 spreadsheets). For these 31 spreadsheets,
the dependency-based models are solved in less than one sec-
ond. These runtime results indicate that dependency-based
The bi-implication cannot be used in case of coincidental models are better suited for debugging large spreadsheets
correctness. Coincidental correctness might occur for exam- than value-based models.
ple in the following situations:
Table 1: Evaluation results
• usage of conditional function (e.g., the IF-function),
• abstraction function like MIN, MAX, COUNT, Single fault Solving time
Model
• usage of Boolean, diagnoses (in ms)
63 spreadsheets
• multiplication with a 0-value, and
Value-based 4.0 56818.8
• power with 0 or 1 as base number or 0 as exponent. Original dep.-based 13.2 32.0
Novel dep.-based 11.0 31.6
Please note, that this list gives only examples. It is not 31 spreadsheets
a complete list, because the size of the list depends on the Value-based - > 20 minutes
functions that are supported by the concrete spreadsheet en-
vironment (e.g. Microsoft Excel, iWorks’Number, OpenOf- Original dep.-based 45.0 187.4
fice’s Calc). All formulas where coincidental correctness Novel dep.-based 38.6 164.8
might happen still have to be modeled with the implication
instead of the bi-implication. Considering the diagnostic accuracy, the value-based model
yields better results. It computes only one third of the di-
3. EMPIRICAL EVALUATION agnoses of the original dependency-based model. The im-
This section consists of two major parts: the empirical setup proved dependency-based model computes on average 15 %
(discussing the prototype implementation, the used plat- less diagnoses than the original dependency-based model.
form, and the evaluated spreadsheet corpora) and the re-
sults showing that dependency-based models are able to Table 2 gives an overview of the reduction that can be achieved
compute diagnoses within a fraction of a second even for when using the novel instead of the original dependency-
spreadsheets whose value-based models require more than based model. The reduction is expressed by means of the
20 minutes of solving time. In addition, this empirical eval- following metric:
uation shows that the number of diagnoses obtained by the
novel dependency-based model is reduced by 15 % on aver-
age compared to the original dependency-based model.
|Diagnoses in the novel model|
Reduction = 1 − . (4)
We developed a prototype in Java for performing the em- |Diagnoses in the original model|
pirical evaluation. This prototype uses Minion [7] as a con-
straint solver. Minion is an out-of-the-box, open source
3
constraint solver and offers support for almost all arithmetic, https://dl.dropbox.com/u/38372651/Spreadsheets/Integer Spreadsheets.zip
Table 2: Summary of the achieved reduction when using the 5. REFERENCES
novel model instead of the original dependency-based model [1] R. Abraham and M. Erwig. GoalDebug: A
Spreadsheet Debugger for End Users. In Proceedings
Reduction Number of spreadsheets of the 29th International Conference on Software
0% 64 Engineering (ICSE 2007), pages 251–260, 2007.
]0 %;10 %] 0 [2] R. Abraham and M. Erwig. Mutation Operators for
Spreadsheets. IEEE Transactions on Software
]10 %;20 %] 1 Engineering, 35(1):94–108, 2009.
]20 %;30 %] 1 [3] R. Abreu, A. Riboira, and F. Wotawa.
]30 %;40 %] 2 Constraint-based debugging of spreadsheets. In
]40 %;50 %] 5 CibSE’12, pages 1–14, 2012.
]50 %;60 %] 0 [4] R. Abreu, A. Riboira, and F. Wotawa. Debugging of
]60 %;70 %] 4 spreadsheets: A CSP-based approach. In 3th IEEE
]70 %;80 %] 2 Int. Workshop on Program Debugging, 2012.
]80 %;90 %] 7 [5] S. Außerlechner, S. Fruhmann, W. Wieser, B. Hofer,
R. Spörk, C. Mühlbacher, and F. Wotawa. The right
]90 %;100 %] 8 choice matters! SMT solving substantially improves
model-based debugging of spreadsheets. In Proceedings
of the 13th International Conference on Quality
For 64 spreadsheets, no reduction in the number of diagnoses Software (QSIC’13), pages 139–148. IEEE, 2013.
was achieved when using the novel dependency-based model [6] M. Fisher and G. Rothermel. The EUSES Spreadsheet
instead of the original model. However, for 15 spreadsheets, Corpus: A shared resource for supporting
a reduction of more than 80 % was achieved. experimentation with spreadsheet dependability
mechanisms. SIGSOFT Software Engineering Notes,
30(4):1–5, 2005.
4. DISCUSSION AND CONCLUSIONS [7] I. P. Gent, C. Jefferson, and I. Miguel. Minion: A fast,
Locating faulty formulas in spreadsheets can be time con- scalable, constraint solver. In Proceedings of the 7th
suming. This paper addresses the fault localization prob- European Conference on Artificial Intelligence (ECAI
lem by means of model-based diagnosis. Our most impor- 2006), pages 98–102, 2006.
tant contribution is the introduction of a novel dependency-
based model. This novel dependency-based model improves [8] B. Hofer, A. Perez, R. Abreu, and F. Wotawa. On the
previous work in two ways: (1) Compared to the original empirical evaluation of similarity coefficients for
dependency-based model, it reduces the amount of diagnoses spreadsheets fault localization. Automated Software
that have to be manually investigated by 15 %. (2) Com- Engineering, pages 1–28, 2014.
pared to the value-based model, it reduces the required solv- [9] B. Hofer, A. Riboira, F. Wotawa, R. Abreu, and
ing time and allows the computation of diagnoses in real- E. Getzner. On the Empirical Evaluation of Fault
time where the value-based model cannot compute solutions Localization Techniques for Spreadsheets. In
within 20 minutes. The savings in computation time can be Proceedings of the 16th International Conference on
explained by the reduction of the domain: The dependency- Fundamental Approaches to Software Engineering
based model requires only Boolean variables instead of In- (FASE 2013), pages 68–82, Rome, Italy, 2013.
tegers and Real numbers. [10] D. Jannach, A. Baharloo, and D. Williamson. Toward
an integrated framework for declarative and
The reduction of the domain comes with additional advan- interactive spreadsheet debugging. In Proceedings of
tages: (1) An arbitrary solver can be used, because all solvers the 8th International Conference on Evaluation of
support at least Boolean variables. Even spreadsheets con- Novel Approaches to Software Engineering (ENASE
taining Real numbers can be debugged with any solver when 2013), pages 117–124, 2013.
using dependency-based models. (2) The user does not need [11] D. Jannach and U. Engler. Toward model-based
to indicate concrete values for the erroneous output vari- debugging of spreadsheet programs. In JCKBSE 2010,
ables. The information that an output cell computes the pages 252–264, Kaunas, Lithuania, 2010.
wrong value is sufficient. [12] D. Jannach and T. Schmitz. Model-based diagnosis of
spreadsheet programs: a constraint-based debugging
In the description of the model, we listed all types of coinci- approach. Automated Software Engineering, pages
dental correctness occurring in the spreadsheets used in the 1–40, 2014.
empirical evaluation. This list is not exhaustive. For using [13] C. Mateis, M. Stumptner, D. Wieland, and
this model in practice, the list has to be extended. F. Wotawa. Model-based debugging of Java programs.
In Proceedings of AADEBUG, 2000.
We are convinced that the model presented improves the [14] W. Mayer and M. Stumptner. Model-based debugging
state of the art in model-based diagnosis. Further work in- – state of the art and future challenges. Electronic
cludes a user study and the adaptation to other types of Notes in Theoretical Computer Science, 174(4):61–82,
programs, e.g. programs written in imperative or object- May 2007.
oriented languages. [15] W. Mayer and M. Stumptner. Evaluating models for
model-based debugging. In Proceedings of the
International Conference on Automated Software
Acknowledgments Engineering (ASE), 2008.
The research herein is partially conducted within the compe- [16] R. Reiter. A Theory of Diagnosis from First
tence network Softnet Austria II (www.soft-net.at, COMET Principles. Artificial Intelligence, 32(1):57–95, 1987.
K-Projekt) and funded by the Austrian Federal Ministry of [17] F. Wotawa. On the Relationship between
Economy, Family and Youth (bmwfj), the province of Styria, Model-Based Debugging and Program Slicing.
the Steirische Wirtschaftsförderungsgesellschaft mbH. (SFG), Artificial Intelligence, 135:125–143, February 2002.
and the city of Vienna in terms of the center for innovation [18] F. Wotawa, M. Nica, and I.-D. Moraru. Automated
and technology (ZIT). debugging based on a constraint model of the program
and a test case. The journal of logic and algebraic
programming, 81(4), 2012.