<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>QR-Augmented Spectrum-based Fault Localization*</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alexandre Perez</string-name>
          <email>alexandre.perez@fe.up.pt</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rui Abreu</string-name>
          <email>rui@computer.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>In Memoriam: Danny Bobrow.</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IST, University of Lisbon and INESC-ID</institution>
          ,
          <country country="PT">Portugal</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Porto and HASLab</institution>
          ,
          <addr-line>INESC-TEC</addr-line>
          ,
          <country country="PT">Portugal</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>ion comes at the cost of missing certain faults, such as errors of omission, and failing to provide enough contextual information to explain why components are considered suspicious. We propose an approach, named Q-SFL, that leverages qualitative reasoning to augment the information made available to SFL techniques. It qualitatively partitions system components, and treats each qualitative state as a new SFL component to be used when diagnosing. Our empirical evaluation shows that augmenting SFL with qualitative components can improve diagnostic accuracy in 54% of the considered real-world subjects. *This is a preliminary version of the paper titled “Leveraging Qualitative Reasoning to Improve SFL”, to be presented at IJCAIECAI 2018.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        SFL [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] was shown to be a lightweight, yet effective,
technique for locating faults in a system. It consists of keeping
a record of which components are involved in each system
execution and subsequently ranking those components
according to their similarity to failing executions. The
intuition being that a faulty component is very likely to be
involved in failing executions and not as likely to be covered
in nominal ones. Over the years, many extensions were
proposed to improve SFL’s applicability and effectiveness, such
as exploring which similarity coefficients yield better fault
localization results [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and handling multiple or intermittent
faults [3].
      </p>
      <p>Despite the developments and achievements in SFL
research, we are unable to find many accounts of successful
transitions of this technology into the industry at large. We
argue that this is motivated largely by the issues raised by 4
in their 4 user study of automated debugging techniques [4;
5]. Namely, the authors found that there is significant
interest drop-off after users inspect a small number of
components from the ranked list of potential faults. This issue is
exacerbated as the scale of the system increases. Another
issue pointed out by 4 is the fact that many SFL studies
assume perfect fault understanding — that is, these studies
expect that once users inspect a faulty component, they will
correctly identify it as such —, which does not always hold
in practice [6].</p>
      <p>This paper proposes an approach that inspects the state
of system components, with the intent of augmenting
reports generated by SFL techniques and hence providing
more diagnostic information. Recording the state of
individual components in each execution quickly becomes
intractable, even for a lightweight approach as SFL.
Therefore, we leverage Qualitative Reasoning (QR), which
provides a way of describing a set of values by their discrete,
behavioral qualities, to enable the reasoning about a
system’s behavior without exact quantitative information [7;
8]. Precise numerical quantities are avoided and replaced
by qualitative descriptions — such as, for instance: high,
low, zero, increasing or decreasing.</p>
      <p>We apply QR to the SFL analysis, in an approach named
Q-SFL, enabling the introduction of quantitative landmarks
that will partition the domains of relevant components into
a set of qualitative descriptions, and insert new SFL
components for each of these descriptions. As behavioral
qualities are now considered as components, their involvement
in system executions is therefore recorded and ranked
according to their similarity to observed failures, enriching the
SFL report as a result. This can have benefits in fault
comprehension — because qualitative properties are considered
besides merely recording involvement — and even improve
diagnostic report accuracy — whenever a qualitative state
is more correlated with failing behavior than its enclosing
system component.</p>
      <p>We perform an empirical evaluation of Q-SFL with
realworld faults from the Defects4J [9] catalog of faulty
software programs. Results show that Q-SFL has the potential
to improve the accuracy of SFL reports —with 54% of
considered subjects exhibiting a lower effort to diagnose faults.
Although the results are promising, we discuss several
matters that need further research —namely, uncovering a
landmarking strategy that exhibits consistently better results and
studying to what extent fault comprehension is improved.</p>
      <p>This paper’s contributions are:
• An approach, named Q-SFL, inspired by qualitative
reasoning (QR) research, to augment program spectra
used by SFL techniques by partitioning system
components into a set of qualitative states which are treated as
SFL components.
• Empirical evidence that QR-enhanced spectra can
reduce the effort to diagnose real software bugs in 54%
of considered subjects.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>This section briefly summarizes the concepts upon which
our approach is based on.</p>
      <sec id="sec-2-1">
        <title>2.1 Spectrum-based Fault Localization (SFL)</title>
        <p>In SFL, the following is given:
• A finite set C = fc1; :::; cM g of M system
components;
• A finite set T = ft1; :::; tN g of N transactions, which
can be seen as records of a system execution;
• An error vector e = fe1; :::; eN g of transaction
outcomes, where ei = 1 if ti failed, ei = 0 otherwise;
• A M N activity matrix A, where Aij = 1 if
component cj is involved in transaction ti, Aij = 0 otherwise.
The goal of SFL is to pinpoint which (sets of) components
are more likely to have caused the system to fail. Earlier
approaches to SFL measure the similarity between a
component’s involvement in transactions and the error vector [1;
2]. Later on, spectrum-based reasoning (SR) was
introduced [3], leveraging a Bayesian reasoning framework to
diagnose, even when multiple, intermittent faults are present.
The two main steps of SR are candidate generation and
candidate ranking:</p>
      </sec>
      <sec id="sec-2-2">
        <title>Candidate Generation</title>
        <p>
          The first step in SR is to generate a set D = fd1; :::; dkg
of diagnosis candidates. A diagnosis candidate dk C is
valid if every failed transaction involved at least one
component from dk. Candidate dk is minimal if no valid
candidate is contained in dk. We are only interested in
minimal candidates, as they can subsume all others.
Heuristic approaches to finding these minimal hitting sets include
STACCATO [
          <xref ref-type="bibr" rid="ref11">10</xref>
          ], SAFARI [
          <xref ref-type="bibr" rid="ref12">11</xref>
          ] and MHS2 [
          <xref ref-type="bibr" rid="ref13">12</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Candidate Ranking</title>
        <p>
          For each candidate dk, their fault probability is calculated
using the Naïve Bayes rule1 [
          <xref ref-type="bibr" rid="ref14">13</xref>
          ]
        </p>
        <p>Pr(dk j (A; e)) = Pr(dk)</p>
        <p>Y
i 2 1::N</p>
        <p>Pr((Ai; ei) j dk)</p>
        <p>Pr(Ai)
(1)
Let Ai be short for fAij j1 2 1::M g, representing
component activity in ith transaction. Pr(Ai) is a normalizing term
that is identical for all candidates. Let pl denote the prior
probability2 that a component cl is at fault. The prior
probability for a candidate dk is given by</p>
        <p>Pr(dk) = Y pl Y (1 pl) (2)
l 2 dk</p>
        <p>l 2 Cndk
8
&gt;
&gt;
Pr((Ai; ei) j dk) = &lt;l 2 (dk\AYi)
gl</p>
        <p>Pr(Ai; ei j dk) is used to bias the prior probability taking
observations into account. Let gl (referred to as component
goodness) denote the probability that cl performs nominally
Y</p>
        <p>if ei = 0
gl
otherwise
(3)
&gt;1
&gt;
:</p>
        <p>l 2 (dk\Ai)
1Probabilities are calculated assuming conditional
independence throughout the process.</p>
        <p>
          2In the case of software diagnosis, one can approximate pl as
1=1000, i.e., 1 fault for each 1000 lines of code [
          <xref ref-type="bibr" rid="ref15">14</xref>
          ].
        </p>
        <p>Q1Q1: Solid</p>
        <p>Q1+</p>
        <p>Q2: Liquid</p>
        <p>Q2Q2+</p>
        <p>Q3Q3: Gas</p>
        <p>Q3+
Absolute zero</p>
        <p>Freezing/Melting</p>
        <p>Point</p>
        <p>Boiling/Condensation</p>
        <p>Point</p>
        <p>Infinity</p>
        <p>
          In cases where values for gl are not available, they can be
estimated by maximizing Pr((A; e) j dk) under parameters
fgl j l 2 dkg [
          <xref ref-type="bibr" rid="ref14">13</xref>
          ].
        </p>
        <p>To measure the accuracy of SFL approaches, the cost of
diagnosis (Cd) metric is often used [4]. It measures the
number of candidates to be inspected until the real fault is
reached, assuming candidates are inspected by descending
order of probability. A Cd of 0 indicates an ideal diagnosis
where the real fault is at the top of the ranked list of
candidates.
QR creates a discrete representation of the continuous
world [7; 8; 15], enabling the reasoning of space, time, and
quantity with merely a small amount of information. It is
motivated by the fact that humans are able to draw
conclusions about the physical world around them with limited
information, without the need of solving complex differential
equations.</p>
        <p>
          Figure 1 provides an example of a potential discretization
of the water temperature into three qualitative values: Q1,
Q2 and Q3. Our representation resolution — the
granularity of the information detail — coincides with that of the
three physical states of matter that water can assume: solid,
liquid, and gas. Note that the established resolution will
ultimately define the granularity of the conclusions one can
draw from QR. To define the qualitative states, one needs
to establish landmarks. Landmarks are constant
quantitative values that establish a point of comparison [
          <xref ref-type="bibr" rid="ref17">16</xref>
          ]. In
this example, we know that if the water is in the liquid
state (Q2), then its temperature is somewhere between
landmark L2 — corresponding to 0°C, the freezing point of
water — and landmark L3 — 100°C, its boiling point.
Similarly, we can derive that ice (Q1) temperature assumes a
value between the absolute zero (L1) and the melting point
(L2); and that water vapor (Q3) ranges between the
condensation point (L3) and positive infinity (L4).
        </p>
        <p>
          QR also supports the representation of derivatives
between two quantities. They are usually represented with
’+’ and ’-’ signs, denoting value increases and decreases,
respectively. This enables the use of sign algebra to
reason about direct influence and proportionality between two
qualitative values. Derivatives also enable envisionments.
An envisionment establishes a set of transitions between
qualitative states [
          <xref ref-type="bibr" rid="ref16">15</xref>
          ], essentially modeling the abstracted
world. A possible transition in our example’s envisionment
is the following: given that we observe Q2+ — that is, we
observe that the liquid water’s temperature is rising —, then
we know that the only possible following states are Q2
(continues in the liquid state) and Q3 (condensates into vapor),
but never Q1 (freezes into ice).
        </p>
        <p>Summarily, with a QR framework, we establish a way to
(i) represent quantities through discrete states, (ii) provide
a way to compare values between these states, (iii) enable
derivations and sign algebra, and (iv) model envisionments
detailing possible transitions between states. With such a
framework, we can model, plan, simulate and reason about
a multitude of intricate problems in an abstract way.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Approach</title>
      <p>This section motivates the need to augment standard
spectrum-based fault localization approaches with more
contextual information, and details our Q-SFL approach to
achieve it by qualitatively partitioning SFL components.
3.1</p>
      <sec id="sec-3-1">
        <title>Limitations of Spectrum-based Analyses</title>
        <p>
          SFL faces several issues preventing it from widespread
adoption and use. Not the least of which is the lack of
contextual information, essential for understanding why
diagnostic candidates are considered suspicious. This has been
pointed out in the software fault localization literature [4].
As SFL reasons about failures at the spectrum level, it only
has access to whether a component was involved or not
in each system transaction. While this enables a flexible,
lightweight analysis, the necessary abstraction can impose
a tradeoff both in accuracy and comprehension. Although
there have been efforts to incorporate more data into the
diagnostic process — by modeling component behavior that
considers the system’s state and previous diagnoses [
          <xref ref-type="bibr" rid="ref18">17</xref>
          ]; or
by leveraging prediction models trained from issue
tracking data [
          <xref ref-type="bibr" rid="ref19">18</xref>
          ] — they were focused on conditioning the fault
probability of existing diagnostic candidates, increasing
accuracy but not necessarily increasing the ability to
comprehend the diagnostic report.
        </p>
        <p>
          Aside from comprehension, and because SFL only
reasons about component involvement in failing transactions,
omission errors — such as bound checks — also become
difficult to diagnose [
          <xref ref-type="bibr" rid="ref20">19</xref>
          ]. The abstract nature of the
spectra that is fed into current SFL frameworks also leads to
the formation of ambiguity groups and facilitates the
occurrence of coincidental correctness. An ambiguity group is a
group of components with identical involvement in all
transactions [
          <xref ref-type="bibr" rid="ref21">20</xref>
          ]. Since they exhibit the same coverage pattern,
no component within an ambiguity group can be uniquely
identified as the root cause of failure, potentially hindering
accuracy. Coincidental correctness refers to the event when
no failure is detected, even though a fault has been executed
[
          <xref ref-type="bibr" rid="ref22">21</xref>
          ]. Depending on the component granularity selected for
the analysis, coincidental correctness can happen at a
frequent rate. In particular, when two tests share the same
coverage path, but produce different outcomes, it becomes
significantly harder to distinguish them without further
contextual information. Coincidental correctness can potentially
lead to exonerating real faults as they are observed to
behave nominally.
3.2
        </p>
        <p>Q-SFL
We argue that all of the issues described above can be at least
attenuated if we supplement the SFL framework with more
contextual information about the system under analysis to
perform the diagnosis. Our Q-SFL approach consists of
partitioning several SFL components into multiple,
meaningful, qualitatively distinct subcomponents, to be used in the
fault localization. We leverage the QR concept of domain
A
c1
partitioning to inspect existing components during each
system execution and assign them a qualitative state. Each of
these qualitative states is then considered as a separate SFL
component whose involvement per transaction is recorded
and fed into the SFL framework for diagnosis. Note that
we use software diagnosis to help describe Q-SFL and, later
on, to evaluate it. Despite this, the Q-SFL concept applies
to other SFL use cases where one can inspect the state of
components, as is the case of electronic circuit diagnosis,
and others.</p>
        <p>Figure 2 depicts how QR can be employed to enhance
program spectra. Figure 2a shows a 4 transaction by 4
component spectrum, along with resulting diagnostic scores
from applying reasoning-based SFL. Candidate generation,
following the methodology described in Section 2.1 yields
two candidate diagnoses — components c1 and c3 can
independently explain the observed failures as both cover all
failing test cases. For this example, suppose that c1 is the
faulty component. Since c1 is involved in more passing
transactions, the SFL framework will assign it a lower fault
probability than c3. To improve the accuracy of the SFL
framework, one needs more contextual information about
component executions.</p>
        <p>We envision three different types of landmarking
strategies that can be employed to define qualitative state
boundaries: (i) manual landmarking, where the system’s
developers manually define what are the possible qualitative states
for a given component; (ii) static landmarking, where
landmarks depend on the type of a component; and (iii) dynamic
landmarking, where a component’s value is inspected at
runtime, and partitioned into a set of categories. Examples
of dynamic strategies will be presented in Section 4.</p>
        <p>Figure 2b depicts the QR-augmented spectrum, where
components representing qualitative partitions of both c1
and c3 are added to the original spectrum. An example of
such partitioning using static landmarking: if c1 represents
a software procedure that contains a numeric parameter i,
we can create two qualitative components c01 and c010 that
represent invocations of c1 with i 0 and i &lt; 0,
respectively. This is a sign-based static partitioning strategy. Note
that the original components c1 and c3 are not removed from
the QR-augmented spectrum, as partitions may not provide
further fault isolation.</p>
        <p>If we are to diagnose the new spectrum from Figure 2b,
component c010 is now the top-ranked diagnostic candidate.
This QR-augmented spectrum avoids spurious inspections
of component c3, and provides additional contextual
information about the fault, namely that i &lt; 0 is often observed
in failing transactions.</p>
        <p>By landmarking data units associated with SFL
components so that they are assigned a qualitative state at runtime,
we are providing more context to the diagnostic process, and
in some cases, consequently reducing the diagnostic effort.
Such partitioning is also of crucial importance towards
minimizing the impact and frequency of ambiguity grouping and
coincidental correctness, as new, distinct components are
added to the system’s spectrum.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>To evaluate our approach, we compare the cost of
diagnosing a collection of faulty software programs using regular
spectra against using QR-augmented spectra.
4.1</p>
      <sec id="sec-4-1">
        <title>Methodology</title>
        <p>We have sourced experimental subjects from the Defects4J3
(D4J) database. D4J is a catalog of 395 real, reproducible
software bugs from 6 open-source projects — namely
JFreeChart, Google Closure compiler, Apache Commons
Lang, Apache Commons Math, Mockito, and Joda-Time.
For each bug, a developer-written, fault-revealing test suite
is made available.</p>
        <p>We run the fault-revealing test suite of each buggy D4J
subject, gathering method-level coverage and test outcomes,
to construct its spectrum. Besides coverage, we also record
the value of all primitive-type arguments and return values
for every method call. This enables us to experiment with
different qualitative partitioning strategies in an offline
manner.</p>
        <p>Using the recorded argument and return value data, we
create multiple (automated) partitioning models resulting in
several Q-SFL variants. A static partitioning variant using
automated sign partitioning based on the variable’s type, as
described in Section 3.2, was considered. For dynamic
partitioning, several clustering and classification algorithms4
were considered: k-NN, linear classification, logistic
regression, decision trees, random forest, and x-means clustering
Test outcomes are used as the class labels in the case of
supervised models. Note that we are not using the
aforementioned models for prediction, but rather as a partitioning
scheme based on observed values. Hence, we do not break
our data into training and test sets, as is customary in
prediction scenarios. Because we use automated, domain
independent partitioning, only primitive types are considered in
the evaluation.</p>
        <p>To evaluate a QR-enhanced spectrum against its
respective original spectrum, we first reduce the Q-SFL
diagnostic report to method components. This reduction is done
by considering the highest fault probability of any
subcomponent belonging to each method, to effectively be able to
compare method-level diagnostic effort between the two
approaches. A change in diagnostic effort is measured using
Cd = Cd(Original)</p>
        <p>Cd(QR-Enhanced)
(4)
3Defects4J 1.1.0 is available at https://github.com/
rjust/defects4j (accessed June 2018).</p>
        <p>
          4We chose popular classification algorithms implemented in
the Scikit-learn package. X-means, as implemented in the
pyclustering package, was selected as it can automatically
decide the optimal number of clusters to use [
          <xref ref-type="bibr" rid="ref23">22</xref>
          ].
where Cd is the cost of diagnosis, as explained in
Section 2.1. A positive Cd means that the faulty component
has risen in the ranking reported by SFL techniques when
QR is used, yielding a lowered cost of diagnosing.
4.2
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Results and Discussion</title>
        <p>We were able to automatically partition the faulty method
in 167 D4J subjects. The remaining D4J subjects were
discarded because (i) the faulty method does not contain
parameters nor does it return a value; because (ii) the faulty
method only contains non-primitive, non-null,
complextyped parameters, which cannot be handled by the set of
partitioning strategies described in Section 4.1; or because
(iii) the aforementioned partitioning strategies were unable
to create qualitative states whose coverage differs from their
enclosing method.</p>
        <p>Our first research question is
RQ1: Does augmenting spectra with qualitative
components improve their diagnosability?
In RQ1, we are concerned with finding if there exist.
qualitative partitionings able to improve the fault localization
ranking to the extent that faulty components are inspected
earlier — thus decreasing developer wasted effort in a
debugging task. Hence, for each D4J subject, we choose
as the landmarking strategy to consider in the evaluation
the one that is able to create the largest set of distinct,
non-ambiguous qualitative components out of the faulty
method(s). The breakdown of selected partitioning
strategies per subject is as follows: Sign partitioning: 102 (61%
of subjects); X-means: 25 (15%); k-NN: 8 (5%); Linear
Regression: 1 (1%); Logistic Regression: 4 (2%);
Decision Tree: 11 (7%); Random Forest: 16 (10%). Our
sign-partitioning default strategy was used to qualitatively
enhance the majority of considered subjects, while other
strategies such as linear classification and logistic
regression were rarely selected. We believe the reason that
supervised learning approaches — which were fed test case
outcomes as the target class label — only exhibited superior
performance in 40 subjects (24%) is due to the fact that the
number of failing tests in test suites is often much smaller
than the amount of passing tests, weakening the resulting
partitioning model.</p>
        <p>Figure 3 shows a scatter plot with the Cd of all subjects
under analysis. Shown in a red background are the 15
subjects (9%) with a negative Cd — meaning that the report
has suffered a decrease in accuracy after augmenting the
spectra. The majority of these subjects belong to the
Closure project. The 62 subjects (37%) with Cd = 0, where
the faulty component has remained in the same position of
the ranking, are shown in a white background. Lastly, 90
subjects (54%) that exhibited a positive Cd — cases where
QR-enhanced spectra improved diagnosability — are shown
in green. All in all, Q-SFL is at least as effective as the
original approach in 92% of scenarios.</p>
        <p>Table 1 presents statistics computed to assess whether the
observed metrics yield statistically significant results.
QRenhanced spectra exhibits an overall lower effort to diagnose
when compared to the original spectra, with less variance.
To assess significance, we first performed the Shapiro-Wilk
test for normality of effort data in both the original
spectra case and QR case. With 99% confidence, the test’s
reΔCd ≥0
ΔCd &gt;0</p>
        <p>Closure
Commons Lang
Commons Math
JFreeChart
JodaTime
Mockito
160
sults tell us that the distributions are not normal. Given
that Cd is not normally distributed and that each observation
is paired, we use the non-parametrical statistical hypothesis
test Wilcoxon signed-rank. Our null-hypothesis is that the
median difference between the two observations (i.e., Cd)
is zero. The fifth row from Table 1 shows the resulting Z
statistic and p-value of Wilcoxon’s test. With 99%
confidence, we refute the null-hypothesis.</p>
        <p>RQ1? Yes, augmenting faulty spectra with new
components resulting from qualitative landmarking of method
parameter (and method return) values yields a statistically
significant improved diagnostic report.</p>
        <p>To be able to answer RQ1, we have selected for each
subject the strategy with the highest number of qualitative
partitions targeting the faulty method, as we were only concerned
with the existence of a partitioning strategy that would
improve diagnosability. However, in practice, it is not realistic
to know a-priori what the faulty method is5. Hence, our
second research question is</p>
        <p>RQ2: Is there a particular automated landmarking
strategy that consistently shows improved
diagnosability?</p>
        <p>
          Figure 4 shows a breakdown of the number of subjects
that fall into the Cd &lt; 0, Cd = 0 and Cd &gt; 0
cate5Although some effort has been put forth to hierarchically
debug programs using SFL [
          <xref ref-type="bibr" rid="ref24">23</xref>
          ].
gories for every partitioning strategy considered in this
evaluation. This bar plot tells us that no single strategy achieves
the same number of positive Cd scenarios as the partition
cardinality selection scheme employed to answer RQ1 and
to produce Figure 3. Furthermore, strategies that were
often picked by that criterion (namely, sign partitioning and
X-means strategies) also show an increased number of
negative Cd scenarios when compared to others. This leads us
to conclude that no single strategy (out of the ones that were
analyzed) is able to consistently show improved diagnoses.
        </p>
        <p>RQ2? No, at least for the automated landmarking
strategies considered in the evaluation, there is no evidence that
a single automated strategy can consistently outperform the
original spectra. However, since Q-SFL can improve
diagnosability, as per the answer to RQ1, we presume that
manual or more complex, context-aware, automated white-box
strategies — which can perform static and dynamic source
code analysis — are much more suited to outperform the
original spectra due to more effective and more informed
partitioning.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Related Work</title>
      <p>There have been previous forays into enhancing the
diagnostic report of automated fault localization techniques to
either improve their accuracy or comprehension of the
failing component.</p>
      <p>
        SFL approaches to debugging typically present their
report to users as a list of suspicious components that is sorted
according to the likelihood of being faulty. 1 have proposed
a visual way of depicting the results of a similarity-based
software SFL diagnosis, color-coding each component
according to their suspiciousness score [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. 24 expand on the
visual concept by leveraging tree-based visualizations that
innately exploit the tree-like structure of Java code,
naturally aggregating neighboring components and aiding
exploration of suspicious code regions [
        <xref ref-type="bibr" rid="ref25">24</xref>
        ]. Another approach to
improve the comprehension of faults was proposed by 25,
called Whyline, which allows the users to obtain evidence
about the program’s execution before forming an
explanation of the cause by providing the ability to ask “why did”
and “why didn’t” questions about program output [
        <xref ref-type="bibr" rid="ref26">25</xref>
        ].
      </p>
      <p>
        26 have proposed an extension to SFL to improve
comprehension. It uses integration coverage data, by way of
capturing method invocation pairs, to guide the fault
localization process. By calculating the fault likelihood of
component pairs, the authors are able to generate roadmaps for
component investigation, guiding users through likely faulty
paths and increasing the amount of contextual cues [
        <xref ref-type="bibr" rid="ref27">26</xref>
        ].
      </p>
      <p>
        Advancements in bug prediction [
        <xref ref-type="bibr" rid="ref28">27</xref>
        ] have enabled its
use within automated fault localization processes. 28
propose an ensemble approach to fault localization that exploits
information from versioning systems, bug tracking
repositories and structured information retrieval from the source
code [
        <xref ref-type="bibr" rid="ref29">28</xref>
        ]. 17 rely on kernel density estimation models of
component behavior and previous diagnoses to better
estimate the component goodness parameter in spectrum-based
reasoning [
        <xref ref-type="bibr" rid="ref18">17</xref>
        ]. 18 also modify the traditional
spectrumbased reasoning framework by leveraging a fault prediction
model trained with historical information from the project’s
versioning system and bug tracker to compute the prior
probability distribution of diagnostic candidates [
        <xref ref-type="bibr" rid="ref19">18</xref>
        ].
      </p>
      <p>
        Augmenting fault-localization via slicing has also been
proposed. 29 have proposed the use of dynamic backward
slices — comprised of statements that directly or indirectly
effect the computation of the output value through data- or
control-dependency chains — as components in
similaritybased SFL [
        <xref ref-type="bibr" rid="ref30">29</xref>
        ]. 30 propose an approach that leverages a
model-based slicing-hitting-set-computation — which
computes the dynamic slices of all faulty variables in all failed
test cases, derives minimal diagnostic candidates from the
slices and computes fault probabilities for each statement
based on number of the diagnoses that contain it [
        <xref ref-type="bibr" rid="ref31">30</xref>
        ].
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>This paper proposes a new approach to spectrum-based fault
localization that leverages qualitative reasoning (QR). The
Q-SFL approach splits components form the software
system under analysis into a set of qualitative states through
the creation of qualitative landmarks that partition a
component’s domain. These qualitative states are then
considered as SFL components to be ranked using traditional
faultlocalization methodologies. Since we treat these qualitative
states as components, our diagnostic reports not only
recommend likely fault locations, but also provide an insight
on what behaviors the faulty components assume when
failures are detected, facilitating the comprehension of the fault.</p>
      <p>We evaluate the approach on subjects from the Defects4J
catalog of real faults from medium and large-sized open
source software projects. Results show that spectra which
were augmented using qualitative partitioning of method
parameters shows a (statistically significant) improvement in
the diagnostic accuracy in 54% of scenarios. However, we
also found no evidence of automated partitioning strategies
that were consistently better than the original spectra,
meaning that more intricate, context-aware partitioning strategies
will likely be necessary for practical applications of the
approach.</p>
      <p>This work lays the first stone in a series of efforts to
more deeply integrate reasoning-based AI approaches into
spectrum-based fault localization. It paves the way for
further efforts by the fault localization research community,
namely by:
1. Improving automated landmarking by expanding its
application to complex non-primitive objects and by
exploring ensembles of multiple strategies.
2. Conducting a systematic user study investigating the
extent that qualitative domain partitioning aids fault
understanding.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This material is based upon work supported by the
scholarship number SFRH/BD/95339/2013 from Fundação para a
Ciência e Tecnologia (FCT).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>James</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Jones</surname>
          </string-name>
          , Mary Jean Harrold, and John T. Stasko.
          <article-title>Visualization of test information to assist fault localization</article-title>
          .
          <source>In ICSE'02</source>
          , pages
          <fpage>467</fpage>
          -
          <lpage>477</lpage>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Lucia</surname>
            , David Lo,
            <given-names>Lingxiao</given-names>
          </string-name>
          <string-name>
            <surname>Jiang</surname>
            , Ferdian Thung, and
            <given-names>Aditya</given-names>
          </string-name>
          <string-name>
            <surname>Budi</surname>
          </string-name>
          .
          <article-title>Extended comprehensive study of association measures for fault localization</article-title>
          .
          <source>Journal of Software: Evolution and Process</source>
          ,
          <volume>26</volume>
          (
          <issue>2</issue>
          ):
          <fpage>172</fpage>
          -
          <lpage>219</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>In</surname>
            <given-names>ASE</given-names>
          </string-name>
          '
          <volume>09</volume>
          , pages
          <fpage>88</fpage>
          -
          <lpage>99</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Chris</given-names>
            <surname>Parnin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Orso</surname>
          </string-name>
          .
          <source>Are automated debugging techniques actually helping programmers? In ISSTA'11</source>
          , pages
          <fpage>199</fpage>
          -
          <lpage>209</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Aaron</given-names>
            <surname>Ang</surname>
          </string-name>
          , Alexandre Perez, Arie van Deursen,
          <string-name>
            <given-names>and Rui</given-names>
            <surname>Abreu</surname>
          </string-name>
          .
          <article-title>Revisiting the practical use of automated software fault localization techniques</article-title>
          .
          <source>In IWPD'17</source>
          , pages
          <fpage>175</fpage>
          -
          <lpage>182</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Carlos</given-names>
            <surname>Gouveia</surname>
          </string-name>
          , José Campos, and
          <string-name>
            <given-names>Rui</given-names>
            <surname>Abreu</surname>
          </string-name>
          .
          <article-title>Using HTML5 visualizations in software fault localization</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>In</surname>
            <given-names>VISSOFT</given-names>
          </string-name>
          '
          <volume>13</volume>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Kenneth D.</given-names>
            <surname>Forbus</surname>
          </string-name>
          .
          <article-title>Qualitative reasoning</article-title>
          .
          <source>In The Computer Science and Engineering Handbook</source>
          , pages
          <fpage>715</fpage>
          -
          <lpage>733</lpage>
          .
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Brian C.</given-names>
            <surname>Williams</surname>
          </string-name>
          and Johan de Kleer.
          <article-title>Qualitative reasoning about physical systems: A return to roots</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>51</volume>
          (
          <issue>1-3</issue>
          ):
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>René</given-names>
            <surname>Just</surname>
          </string-name>
          , Darioush Jalali, and
          <string-name>
            <given-names>Michael D.</given-names>
            <surname>Ernst</surname>
          </string-name>
          .
          <article-title>Defects4j: a database of existing faults to enable controlled testing studies for java programs</article-title>
          .
          <source>In ISSTA'14</source>
          , pages
          <fpage>437</fpage>
          -
          <lpage>440</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Rui</given-names>
            <surname>Abreu and Arjan J. C. van Gemund</surname>
          </string-name>
          .
          <article-title>A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis</article-title>
          .
          <source>In SARA'09</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Alexander</surname>
            <given-names>Feldman</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gregory M. Provan</surname>
          </string-name>
          , and
          <string-name>
            <surname>Arjan J. C. van Gemund</surname>
          </string-name>
          .
          <article-title>Computing minimal diagnoses by greedy stochastic search</article-title>
          .
          <source>In AAAI'08</source>
          , pages
          <fpage>911</fpage>
          -
          <lpage>918</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Nuno</given-names>
            <surname>Cardoso</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rui</given-names>
            <surname>Abreu</surname>
          </string-name>
          .
          <article-title>MHS2: A map-reduce heuristic-driven minimal hitting set search algorithm</article-title>
          .
          <source>In MUSEPAT'13</source>
          , pages
          <fpage>25</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Rui</surname>
            <given-names>Abreu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Peter</given-names>
            <surname>Zoeteweij</surname>
          </string-name>
          , and
          <string-name>
            <surname>Arjan J. C. van Gemund</surname>
          </string-name>
          .
          <article-title>A new bayesian approach to multiple intermittent fault diagnosis</article-title>
          .
          <source>In IJCAI'09</source>
          , pages
          <fpage>653</fpage>
          -
          <lpage>658</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <surname>John</surname>
            <given-names>Carey</given-names>
          </string-name>
          , Neil Gross, Marcia Stepanek, and
          <string-name>
            <given-names>Otis</given-names>
            <surname>Port</surname>
          </string-name>
          .
          <article-title>Software hell</article-title>
          .
          <source>In Business Week</source>
          , pages
          <fpage>391</fpage>
          -
          <lpage>411</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Johan de Kleer</surname>
          </string-name>
          .
          <article-title>Multiple representations of knowledge in a mechanics problem-solver</article-title>
          .
          <source>In IJCAI'77</source>
          , pages
          <fpage>299</fpage>
          -
          <lpage>304</lpage>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Kuipers</surname>
          </string-name>
          .
          <article-title>Qualitative simulation</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>29</volume>
          (
          <issue>3</issue>
          ):
          <fpage>289</fpage>
          -
          <lpage>338</lpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Nuno</given-names>
            <surname>Cardoso</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rui</given-names>
            <surname>Abreu</surname>
          </string-name>
          .
          <article-title>A kernel density estimate-based approach to component goodness modeling</article-title>
          .
          <source>In AAAI'13</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Amir</surname>
            <given-names>Elmishali</given-names>
          </string-name>
          , Roni Stern, and
          <string-name>
            <given-names>Meir</given-names>
            <surname>Kalech</surname>
          </string-name>
          .
          <article-title>Dataaugmented software diagnosis</article-title>
          .
          <source>In AAAI'16</source>
          , pages
          <fpage>4003</fpage>
          -
          <lpage>4009</lpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Xiaofeng</surname>
            <given-names>Xu</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Vidroha</given-names>
            <surname>Debroy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Eric</given-names>
            <surname>Wong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Donghui</given-names>
            <surname>Guo</surname>
          </string-name>
          .
          <article-title>Ties within fault localization rankings: Exposing and addressing the problem</article-title>
          .
          <source>International Journal of Software Engineering and Knowledge Engineering</source>
          ,
          <volume>21</volume>
          (
          <issue>6</issue>
          ):
          <fpage>803</fpage>
          -
          <lpage>827</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>G. N.</given-names>
            <surname>Stenbakken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Souders</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. W.</given-names>
            <surname>Stewart</surname>
          </string-name>
          .
          <article-title>Ambiguity groups and testability</article-title>
          .
          <source>IEEE Transactions on Instrumentation and Measurement</source>
          ,
          <volume>38</volume>
          (
          <issue>5</issue>
          ):
          <fpage>941</fpage>
          -
          <lpage>947</lpage>
          ,
          <year>1989</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Debra</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Richardson</surname>
            and
            <given-names>Margaret C.</given-names>
          </string-name>
          <string-name>
            <surname>Thompson</surname>
          </string-name>
          .
          <article-title>An analysis of test data selection criteria using the RELAY model of fault detection</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          ,
          <volume>19</volume>
          (
          <issue>6</issue>
          ):
          <fpage>533</fpage>
          -
          <lpage>553</lpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Dan</given-names>
            <surname>Pelleg</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew W.</given-names>
            <surname>Moore</surname>
          </string-name>
          .
          <article-title>X-means: Extending k-means with efficient estimation of the number of clusters</article-title>
          .
          <source>In ICML'00</source>
          , pages
          <fpage>727</fpage>
          -
          <lpage>734</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Alexandre</surname>
            <given-names>Perez</given-names>
          </string-name>
          , Rui Abreu, and
          <string-name>
            <given-names>André</given-names>
            <surname>Riboira</surname>
          </string-name>
          .
          <article-title>A dynamic code coverage approach to maximize fault localization efficiency</article-title>
          .
          <source>Journal of Systems and Software</source>
          ,
          <volume>90</volume>
          :
          <fpage>18</fpage>
          -
          <lpage>28</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [24]
          <string-name>
            <surname>José</surname>
            <given-names>Campos</given-names>
          </string-name>
          , André Riboira, Alexandre Perez, and Rui Abreu.
          <article-title>GZoltar: an eclipse plug-in for testing and debugging</article-title>
          .
          <source>In ASE'12</source>
          , pages
          <fpage>378</fpage>
          -
          <lpage>381</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [25]
          <article-title>Andrew Jensen Ko and Brad A. Myers. Designing the whyline: a debugging interface for asking questions about program behavior</article-title>
          .
          <source>In CHI'04</source>
          , pages
          <fpage>151</fpage>
          -
          <lpage>158</lpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Higor Amario de Souza</surname>
          </string-name>
          and
          <article-title>Marcos Lordello Chaim</article-title>
          .
          <article-title>Adding context to fault localization with integration coverage</article-title>
          .
          <source>In ASE'13</source>
          , pages
          <fpage>628</fpage>
          -
          <lpage>633</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Marco D'Ambros</surname>
            ,
            <given-names>Michele</given-names>
          </string-name>
          <string-name>
            <surname>Lanza</surname>
            , and
            <given-names>Romain</given-names>
          </string-name>
          <string-name>
            <surname>Robbes</surname>
          </string-name>
          .
          <article-title>An extensive comparison of bug prediction approaches</article-title>
          .
          <source>In MSR'10</source>
          , pages
          <fpage>31</fpage>
          -
          <lpage>41</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>Shaowei</given-names>
            <surname>Wang</surname>
          </string-name>
          and
          <string-name>
            <given-names>David</given-names>
            <surname>Lo</surname>
          </string-name>
          .
          <article-title>Version history, similar report, and structure: putting them together for improved bug localization</article-title>
          .
          <source>In ICPC'14</source>
          , pages
          <fpage>53</fpage>
          -
          <lpage>63</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [29]
          <string-name>
            <surname>Xiaoguang</surname>
            <given-names>Mao</given-names>
          </string-name>
          , Yan Lei, Ziying Dai, Yuhua Qi, and
          <string-name>
            <given-names>Chengsong</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Slice-based statistical fault localization</article-title>
          .
          <source>Journal of Systems and Software</source>
          ,
          <volume>89</volume>
          :
          <fpage>51</fpage>
          -
          <lpage>62</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>Birgit</given-names>
            <surname>Hofer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Franz</given-names>
            <surname>Wotawa</surname>
          </string-name>
          .
          <article-title>Spectrum enhanced dynamic slicing for better fault localization</article-title>
          .
          <source>In ECAI'12</source>
          , pages
          <fpage>420</fpage>
          -
          <lpage>425</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>