QR-Augmented Spectrum-based Fault Localization *

QR-Augmented Spectrum-based Fault Localization * AlexandrePerez alexandre.perez@fe.up.pt University of Porto and HASLab INESC-TEC

Portugal

RuiAbreu IST INESC-ID University of Lisbon

Portugal

Danny*Bobrow This QR-Augmented Spectrum-based Fault Localization * EBC64110CB618DFA43D6E852D15D6BA5 GROBID - A machine learning software for extracting information from scholarly documents

Spectrum-based fault localization (SFL) correlates a system's components with observed failures. By reasoning about coverage, SFL allows for a lightweight way of pinpointing faults. This abstraction comes at the cost of missing certain faults, such as errors of omission, and failing to provide enough contextual information to explain why components are considered suspicious. We propose an approach, named Q-SFL, that leverages qualitative reasoning to augment the information made available to SFL techniques. It qualitatively partitions system components, and treats each qualitative state as a new SFL component to be used when diagnosing. Our empirical evaluation shows that augmenting SFL with qualitative components can improve diagnostic accuracy in 54% of the considered real-world subjects.

Introduction

SFL [1] was shown to be a lightweight, yet effective, technique for locating faults in a system. It consists of keeping a record of which components are involved in each system execution and subsequently ranking those components according to their similarity to failing executions. The intuition being that a faulty component is very likely to be involved in failing executions and not as likely to be covered in nominal ones. Over the years, many extensions were proposed to improve SFL's applicability and effectiveness, such as exploring which similarity coefficients yield better fault localization results [2] and handling multiple or intermittent faults [3].

Despite the developments and achievements in SFL research, we are unable to find many accounts of successful transitions of this technology into the industry at large. We argue that this is motivated largely by the issues raised by 4 in their 4 user study of automated debugging techniques [4; 5]. Namely, the authors found that there is significant interest drop-off after users inspect a small number of components from the ranked list of potential faults. This issue is exacerbated as the scale of the system increases. Another issue pointed out by 4 is the fact that many SFL studies assume perfect fault understanding -that is, these studies expect that once users inspect a faulty component, they will correctly identify it as such -, which does not always hold in practice [6].

This paper proposes an approach that inspects the state of system components, with the intent of augmenting reports generated by SFL techniques and hence providing more diagnostic information. Recording the state of individual components in each execution quickly becomes intractable, even for a lightweight approach as SFL. Therefore, we leverage Qualitative Reasoning (QR), which provides a way of describing a set of values by their discrete, behavioral qualities, to enable the reasoning about a system's behavior without exact quantitative information [7; 8]. Precise numerical quantities are avoided and replaced by qualitative descriptions -such as, for instance: high, low, zero, increasing or decreasing.

We apply QR to the SFL analysis, in an approach named Q-SFL, enabling the introduction of quantitative landmarks that will partition the domains of relevant components into a set of qualitative descriptions, and insert new SFL components for each of these descriptions. As behavioral qualities are now considered as components, their involvement in system executions is therefore recorded and ranked according to their similarity to observed failures, enriching the SFL report as a result. This can have benefits in fault comprehension -because qualitative properties are considered besides merely recording involvement -and even improve diagnostic report accuracy -whenever a qualitative state is more correlated with failing behavior than its enclosing system component.

We perform an empirical evaluation of Q-SFL with realworld faults from the Defects4J [9] catalog of faulty software programs. Results show that Q-SFL has the potential to improve the accuracy of SFL reports -with 54% of considered subjects exhibiting a lower effort to diagnose faults. Although the results are promising, we discuss several matters that need further research -namely, uncovering a landmarking strategy that exhibits consistently better results and studying to what extent fault comprehension is improved. This paper's contributions are:

• An approach, named Q-SFL, inspired by qualitative reasoning (QR) research, to augment program spectra used by SFL techniques by partitioning system components into a set of qualitative states which are treated as SFL components. • Empirical evidence that QR-enhanced spectra can re-duce the effort to diagnose real software bugs in 54% of considered subjects.

Background

This section briefly summarizes the concepts upon which our approach is based on.

Spectrum-based Fault Localization (SFL)

In SFL, the following is given:

• A finite set C = {c 1 , ..., c M } of M system components; • A finite set T = {t 1 , ..., t N } of N transactions, which can be seen as records of a system execution; • An error vector e = {e 1 , ..., e N } of transaction outcomes, where

e i = 1 if t i failed, e i = 0 otherwise; • A M × N activity matrix A, where A ij = 1 if compo- nent c j is involved in transaction t i , A ij = 0 otherwise.

The goal of SFL is to pinpoint which (sets of) components are more likely to have caused the system to fail. Earlier approaches to SFL measure the similarity between a component's involvement in transactions and the error vector [1; 2]. Later on, spectrum-based reasoning (SR) was introduced [3], leveraging a Bayesian reasoning framework to diagnose, even when multiple, intermittent faults are present. The two main steps of SR are candidate generation and candidate ranking:

Candidate Generation The first step in SR is to generate a set D = {d 1 , ..., d k } of diagnosis candidates. A diagnosis candidate d k ⊆ C is valid if every failed transaction involved at least one com- ponent from d k . Candidate d k is minimal if no valid can- didate is contained in d k .

We are only interested in minimal candidates, as they can subsume all others. Heuristic approaches to finding these minimal hitting sets include STACCATO [10], SAFARI [11] and MHS 2 [12].

Candidate Ranking

For each candidate d k , their fault probability is calculated using the Naïve Bayes rule 1 [13] Pr

(d k | (A, e)) = Pr(d k ) • i ∈ 1..N Pr((A i , e i ) | d k ) Pr(A i )(1)

Let A i be short for {A ij |1 ∈ 1..M }, representing component activity in i th transaction. Pr(A i ) is a normalizing term that is identical for all candidates. Let p l denote the prior probability 2 that a component c l is at fault. The prior probability for a candidate d k is given by

Pr(d k ) = l ∈ d k p l • l ∈ C\d k (1 − p l )(2)

Pr(A i , e i | d k ) is used to bias the prior probability taking observations into account. Let g l (referred to as component goodness) denote the probability that c l performs nominally

Pr((A i , e i ) | d k ) =        l ∈ (d k ∩Ai) g l if e i = 0 1 − l ∈ (d k ∩Ai) g l otherwise(3)

1 Probabilities are calculated assuming conditional independence throughout the process. 2 In the case of software diagnosis, one can approximate p l as 1/1000, i.e., 1 fault for each 1000 lines of code [14]. In cases where values for g l are not available, they can be estimated by maximizing Pr((A, e) | d k ) under parameters

{g l | l ∈ d k } [13].

To measure the accuracy of SFL approaches, the cost of diagnosis (C d ) metric is often used [4]. It measures the number of candidates to be inspected until the real fault is reached, assuming candidates are inspected by descending order of probability. A C d of 0 indicates an ideal diagnosis where the real fault is at the top of the ranked list of candidates.

Qualitative Reasoning (QR)

QR creates a discrete representation of the continuous world [7; 8; 15], enabling the reasoning of space, time, and quantity with merely a small amount of information. It is motivated by the fact that humans are able to draw conclusions about the physical world around them with limited information, without the need of solving complex differential equations.

Figure 1 provides an example of a potential discretization of the water temperature into three qualitative values: Q1, Q2 and Q3. Our representation resolution -the granularity of the information detail -coincides with that of the three physical states of matter that water can assume: solid, liquid, and gas. Note that the established resolution will ultimately define the granularity of the conclusions one can draw from QR. To define the qualitative states, one needs to establish landmarks. Landmarks are constant quantitative values that establish a point of comparison [16]. In this example, we know that if the water is in the liquid state (Q2), then its temperature is somewhere between landmark L2 -corresponding to 0°C, the freezing point of water -and landmark L3 -100°C, its boiling point. Similarly, we can derive that ice (Q1) temperature assumes a value between the absolute zero (L1) and the melting point (L2); and that water vapor (Q3) ranges between the condensation point (L3) and positive infinity (L4).

QR also supports the representation of derivatives between two quantities. They are usually represented with '+' and '-' signs, denoting value increases and decreases, respectively. This enables the use of sign algebra to reason about direct influence and proportionality between two qualitative values. Derivatives also enable envisionments. An envisionment establishes a set of transitions between qualitative states [15], essentially modeling the abstracted world. A possible transition in our example's envisionment is the following: given that we observe Q2+ -that is, we observe that the liquid water's temperature is rising -, then we know that the only possible following states are Q2 (continues in the liquid state) and Q3 (condensates into vapor), but never Q1 (freezes into ice).

Summarily, with a QR framework, we establish a way to (i) represent quantities through discrete states, (ii) provide a way to compare values between these states, (iii) enable derivations and sign algebra, and (iv) model envisionments detailing possible transitions between states. With such a framework, we can model, plan, simulate and reason about a multitude of intricate problems in an abstract way.

Approach

This section motivates the need to augment standard spectrum-based fault localization approaches with more contextual information, and details our Q-SFL approach to achieve it by qualitatively partitioning SFL components.

Limitations of Spectrum-based Analyses

SFL faces several issues preventing it from widespread adoption and use. Not the least of which is the lack of contextual information, essential for understanding why diagnostic candidates are considered suspicious. This has been pointed out in the software fault localization literature [4].

As SFL reasons about failures at the spectrum level, it only has access to whether a component was involved or not in each system transaction. While this enables a flexible, lightweight analysis, the necessary abstraction can impose a tradeoff both in accuracy and comprehension. Although there have been efforts to incorporate more data into the diagnostic process -by modeling component behavior that considers the system's state and previous diagnoses [17]; or by leveraging prediction models trained from issue tracking data [18] -they were focused on conditioning the fault probability of existing diagnostic candidates, increasing accuracy but not necessarily increasing the ability to comprehend the diagnostic report. Aside from comprehension, and because SFL only reasons about component involvement in failing transactions, omission errors -such as bound checks -also become difficult to diagnose [19]. The abstract nature of the spectra that is fed into current SFL frameworks also leads to the formation of ambiguity groups and facilitates the occurrence of coincidental correctness. An ambiguity group is a group of components with identical involvement in all transactions [20]. Since they exhibit the same coverage pattern, no component within an ambiguity group can be uniquely identified as the root cause of failure, potentially hindering accuracy. Coincidental correctness refers to the event when no failure is detected, even though a fault has been executed [21]. Depending on the component granularity selected for the analysis, coincidental correctness can happen at a frequent rate. In particular, when two tests share the same coverage path, but produce different outcomes, it becomes significantly harder to distinguish them without further contextual information. Coincidental correctness can potentially lead to exonerating real faults as they are observed to behave nominally.

Q-SFL

We argue that all of the issues described above can be at least attenuated if we supplement the SFL framework with more contextual information about the system under analysis to perform the diagnosis. Our Q-SFL approach consists of partitioning several SFL components into multiple, meaningful, qualitatively distinct subcomponents, to be used in the fault localization. We leverage the QR concept of domain partitioning to inspect existing components during each system execution and assign them a qualitative state. Each of these qualitative states is then considered as a separate SFL component whose involvement per transaction is recorded and fed into the SFL framework for diagnosis. Note that we use software diagnosis to help describe Q-SFL and, later on, to evaluate it. Despite this, the Q-SFL concept applies to other SFL use cases where one can inspect the state of components, as is the case of electronic circuit diagnosis, and others.

A t 1 t 2 t 3 t 4 c 1 1 1 1 1 c 2 0 0 1 1 c 3 1 1 1 0 c 4 0 0 1 0 e 1 1 0 0 (a) Regular spectrum. Pr({c1} | (A, e)) = 0.30 Pr({c3} | (A, e)) = 0.70 A t 1 t 2 t 3 t 4 c 1 1 1 1 1 c 1 0 0 1 1 c 1 1 1 0 0 c 2 0 0 1 1 c 3 1 1 1 0 c 3 1 0 0 0 c 3 0 1 1 0 c 4 0 0 1 0 e 1 10

Figure 2 depicts how QR can be employed to enhance program spectra. Figure 2a shows a 4 transaction by 4 component spectrum, along with resulting diagnostic scores from applying reasoning-based SFL. Candidate generation, following the methodology described in Section 2.1 yields two candidate diagnoses -components c 1 and c 3 can independently explain the observed failures as both cover all failing test cases. For this example, suppose that c 1 is the faulty component. Since c 1 is involved in more passing transactions, the SFL framework will assign it a lower fault probability than c 3 . To improve the accuracy of the SFL framework, one needs more contextual information about component executions.

We envision three different types of landmarking strategies that can be employed to define qualitative state boundaries: (i) manual landmarking, where the system's developers manually define what are the possible qualitative states for a given component; (ii) static landmarking, where landmarks depend on the type of a component; and (iii) dynamic landmarking, where a component's value is inspected at runtime, and partitioned into a set of categories. Examples of dynamic strategies will be presented in Section 4.

Figure 2b depicts the QR-augmented spectrum, where components representing qualitative partitions of both c 1 and c 3 are added to the original spectrum. An example of such partitioning using static landmarking: if c 1 represents a software procedure that contains a numeric parameter i, we can create two qualitative components c 1 and c 1 that represent invocations of c 1 with i ≥ 0 and i < 0, respectively. This is a sign-based static partitioning strategy. Note that the original components c 1 and c 3 are not removed from the QR-augmented spectrum, as partitions may not provide further fault isolation.

If we are to diagnose the new spectrum from Figure 2b, component c 1 is now the top-ranked diagnostic candidate. This QR-augmented spectrum avoids spurious inspections of component c 3 , and provides additional contextual information about the fault, namely that i < 0 is often observed in failing transactions.

By landmarking data units associated with SFL components so that they are assigned a qualitative state at runtime, we are providing more context to the diagnostic process, and in some cases, consequently reducing the diagnostic effort. Such partitioning is also of crucial importance towards minimizing the impact and frequency of ambiguity grouping and coincidental correctness, as new, distinct components are added to the system's spectrum.

Evaluation

To evaluate our approach, we compare the cost of diagnosing a collection of faulty software programs using regular spectra against using QR-augmented spectra.

Methodology

We have sourced experimental subjects from the Defects4J3 (D4J) database. D4J is a catalog of 395 real, reproducible software bugs from 6 open-source projects -namely JFreeChart, Google Closure compiler, Apache Commons Lang, Apache Commons Math, Mockito, and Joda-Time. For each bug, a developer-written, fault-revealing test suite is made available.

We run the fault-revealing test suite of each buggy D4J subject, gathering method-level coverage and test outcomes, to construct its spectrum. Besides coverage, we also record the value of all primitive-type arguments and return values for every method call. This enables us to experiment with different qualitative partitioning strategies in an offline manner.

Using the recorded argument and return value data, we create multiple (automated) partitioning models resulting in several Q-SFL variants. A static partitioning variant using automated sign partitioning based on the variable's type, as described in Section 3.2, was considered. For dynamic partitioning, several clustering and classification algorithms 4were considered: k-NN, linear classification, logistic regression, decision trees, random forest, and x-means clustering Test outcomes are used as the class labels in the case of supervised models. Note that we are not using the aforementioned models for prediction, but rather as a partitioning scheme based on observed values. Hence, we do not break our data into training and test sets, as is customary in prediction scenarios. Because we use automated, domain independent partitioning, only primitive types are considered in the evaluation.

To evaluate a QR-enhanced spectrum against its respective original spectrum, we first reduce the Q-SFL diagnostic report to method components. This reduction is done by considering the highest fault probability of any subcomponent belonging to each method, to effectively be able to compare method-level diagnostic effort between the two approaches. A change in diagnostic effort is measured using

∆C d = C d (Original) − C d (QR-Enhanced) (4)

where C d is the cost of diagnosis, as explained in Section 2.1. A positive ∆C d means that the faulty component has risen in the ranking reported by SFL techniques when QR is used, yielding a lowered cost of diagnosing.

Results and Discussion

We were able to automatically partition the faulty method in 167 D4J subjects. The remaining D4J subjects were discarded because (i) the faulty method does not contain parameters nor does it return a value; because (ii) the faulty method only contains non-primitive, non-null, complextyped parameters, which cannot be handled by the set of partitioning strategies described in Section 4.1; or because (iii) the aforementioned partitioning strategies were unable to create qualitative states whose coverage differs from their enclosing method.

Our first research question is RQ1: Does augmenting spectra with qualitative components improve their diagnosability?

In RQ1, we are concerned with finding if there exist. qualitative partitionings able to improve the fault localization ranking to the extent that faulty components are inspected earlier -thus decreasing developer wasted effort in a debugging task. Hence, for each D4J subject, we choose as the landmarking strategy to consider in the evaluation the one that is able to create the largest set of distinct, non-ambiguous qualitative components out of the faulty method(s). The breakdown of selected partitioning strategies per subject is as follows: Sign partitioning: 102 (61% of subjects); X-means: 25 (15%); k-NN: 8 (5%); Linear Regression: 1 (1%); Logistic Regression: 4 (2%); Decision Tree: 11 (7%); Random Forest: 16 (10%). Our sign-partitioning default strategy was used to qualitatively enhance the majority of considered subjects, while other strategies such as linear classification and logistic regression were rarely selected. We believe the reason that supervised learning approaches -which were fed test case outcomes as the target class label -only exhibited superior performance in 40 subjects (24%) is due to the fact that the number of failing tests in test suites is often much smaller than the amount of passing tests, weakening the resulting partitioning model. Figure 3 shows a scatter plot with the ∆C d of all subjects under analysis. Shown in a red background are the 15 subjects (9%) with a negative ∆C d -meaning that the report has suffered a decrease in accuracy after augmenting the spectra. The majority of these subjects belong to the Closure project. The 62 subjects (37%) with ∆C d = 0, where the faulty component has remained in the same position of the ranking, are shown in a white background. Lastly, 90 subjects (54%) that exhibited a positive ∆C d -cases where QR-enhanced spectra improved diagnosability -are shown in green. All in all, Q-SFL is at least as effective as the original approach in 92% of scenarios.

Table 1 presents statistics computed to assess whether the observed metrics yield statistically significant results. QRenhanced spectra exhibits an overall lower effort to diagnose when compared to the original spectra, with less variance. To assess significance, we first performed the Shapiro-Wilk test for normality of effort data in both the original spectra case and QR case. With 99% confidence, the test's re- 1 shows the resulting Z statistic and p-value of Wilcoxon's test. With 99% confidence, we refute the null-hypothesis. RQ1? Yes, augmenting faulty spectra with new components resulting from qualitative landmarking of method parameter (and method return) values yields a statistically significant improved diagnostic report.

To be able to answer RQ1, we have selected for each subject the strategy with the highest number of qualitative partitions targeting the faulty method, as we were only concerned with the existence of a partitioning strategy that would improve diagnosability. However, in practice, it is not realistic to know a-priori what the faulty method is 5 . Hence, our second research question is RQ2: Is there a particular automated landmarking strategy that consistently shows improved diagnosability?

Figure 4 shows a breakdown of the number of subjects that fall into the ∆C d < 0, ∆C d = 0 and ∆C d > 0 cate- 5 Although some effort has been put forth to hierarchically debug programs using SFL [23].

gories for every partitioning strategy considered in this evaluation. This bar plot tells us that no single strategy achieves the same number of positive ∆C d scenarios as the partition cardinality selection scheme employed to answer RQ1 and to produce Figure 3. Furthermore, strategies that were often picked by that criterion (namely, sign partitioning and X-means strategies) also show an increased number of negative ∆C d scenarios when compared to others. This leads us to conclude that no single strategy (out of the ones that were analyzed) is able to consistently show improved diagnoses.

RQ2? No, at least for the automated landmarking strategies considered in the evaluation, there is no evidence that a single automated strategy can consistently outperform the original spectra. However, since Q-SFL can improve diagnosability, as per the answer to RQ1, we presume that manual or more complex, context-aware, automated white-box strategies -which can perform static and dynamic source code analysis -are much more suited to outperform the original spectra due to more effective and more informed partitioning.

Related Work

There have been previous forays into enhancing the diagnostic report of automated fault localization techniques to either improve their accuracy or comprehension of the failing component.

SFL approaches to debugging typically present their report to users as a list of suspicious components that is sorted according to the likelihood of being faulty. 1 have proposed a visual way of depicting the results of a similarity-based software SFL diagnosis, color-coding each component according to their suspiciousness score [1]. 24 expand on the visual concept by leveraging tree-based visualizations that innately exploit the tree-like structure of Java code, naturally aggregating neighboring components and aiding exploration of suspicious code regions [24]. Another approach to improve the comprehension of faults was proposed by 25, called Whyline, which allows the users to obtain evidence about the program's execution before forming an explanation of the cause by providing the ability to ask "why did" and "why didn't" questions about program output [25].

26 have proposed an extension to SFL to improve comprehension. It uses integration coverage data, by way of capturing method invocation pairs, to guide the fault local-ization process. By calculating the fault likelihood of component pairs, the authors are able to generate roadmaps for component investigation, guiding users through likely faulty paths and increasing the amount of contextual cues [26].

Advancements in bug prediction [27] have enabled its use within automated fault localization processes. 28 propose an ensemble approach to fault localization that exploits information from versioning systems, bug tracking repositories and structured information retrieval from the source code [28]. 17 rely on kernel density estimation models of component behavior and previous diagnoses to better estimate the component goodness parameter in spectrum-based reasoning [17]. 18 also modify the traditional spectrumbased reasoning framework by leveraging a fault prediction model trained with historical information from the project's versioning system and bug tracker to compute the prior probability distribution of diagnostic candidates [18].

Augmenting fault-localization via slicing has also been proposed. 29 have proposed the use of dynamic backward slices -comprised of statements that directly or indirectly effect the computation of the output value through data-or control-dependency chains -as components in similaritybased SFL [29]. 30 propose an approach that leverages a model-based slicing-hitting-set-computation -which computes the dynamic slices of all faulty variables in all failed test cases, derives minimal diagnostic candidates from the slices and computes fault probabilities for each statement based on number of the diagnoses that contain it [30].

Conclusion

This paper proposes a new approach to spectrum-based fault localization that leverages qualitative reasoning (QR). The Q-SFL approach splits components form the software system under analysis into a set of qualitative states through the creation of qualitative landmarks that partition a component's domain. These qualitative states are then considered as SFL components to be ranked using traditional faultlocalization methodologies. Since we treat these qualitative states as components, our diagnostic reports not only recommend likely fault locations, but also provide an insight on what behaviors the faulty components assume when failures are detected, facilitating the comprehension of the fault.

We evaluate the approach on subjects from the Defects4J catalog of real faults from medium and large-sized open source software projects. Results show that spectra which were augmented using qualitative partitioning of method parameters shows a (statistically significant) improvement in the diagnostic accuracy in 54% of scenarios. However, we also found no evidence of automated partitioning strategies that were consistently better than the original spectra, meaning that more intricate, context-aware partitioning strategies will likely be necessary for practical applications of the approach.

This work lays the first stone in a series of efforts to more deeply integrate reasoning-based AI approaches into spectrum-based fault localization. It paves the way for further efforts by the fault localization research community, namely by: 1. Improving automated landmarking by expanding its application to complex non-primitive objects and by exploring ensembles of multiple strategies.

2. Conducting a systematic user study investigating the extent that qualitative domain partitioning aids fault understanding.

Figure 1 :1Figure 1: Example of a possible qualitative discretization of water temperature.

Figure 2 :2Figure 2: Example of coverage partitioning via QR.

Figure 3 :Figure 4 :34Figure 3: Difference in C d between original and QR-enhanced spectra per subject.

Table 1 :1Statistical tests.OriginalQR-enhancedSpectraSpectraMean C d60.2837.56Median C d6.002.50C d Variance2.10×10 41.56×10 4W = 0.46 Shapiro-Wilk p-value = 2.20×10 −22 p-value = 1.10×10 −24 W = 0.32WilcoxonZ = 5.45Signed-rankp-value = 5.10×10 −10sults tell us that the distributions are not normal. Giventhat C d is not normally distributed and that each observationis paired, we use the non-parametrical statistical hypothesistest Wilcoxon signed-rank. Our null-hypothesis is that themedian difference between the two observations (i.e., ∆C d )is zero. The fifth row from Table

Defects4J 1.1.0 is available at https://github.com/ rjust/defects4j (accessed June 2018). We chose popular classification algorithms implemented in the Scikit-learn package. X-means, as implemented in the pyclustering package, was selected as it can automatically decide the optimal number of clusters to use[22].

Acknowledgments

This material is based upon work supported by the scholarship number SFRH/BD/95339/2013 from Fundação para a Ciência e Tecnologia (FCT).

Visualization of test information to assist fault localization JamesAJones MaryJeanHarrold JohnTStasko ICSE'02 2002 Extended comprehensive study of association measures for fault localization DavidLucia LingxiaoLo FerdianJiang AdityaThung Budi Journal of Software: Evolution and Process 26 2 2014 Spectrum-based multiple fault localization RuiAbreu PeterZoeteweij ArjanJ CVan Gemund ASE'09 2009 Are automated debugging techniques actually helping programmers? ChrisParnin AlessandroOrso ISSTA'11 2011 Revisiting the practical use of automated software fault localization techniques AaronAng AlexandrePerez ArieVan Deursen RuiAbreu IWPD' 17 2017 Using HTML5 visualizations in software fault localization CarlosGouveia JoséCampos RuiAbreu VISSOFT' 2013 13 Qualitative reasoning DKenneth Forbus The Computer Science and Engineering Handbook 1997 Qualitative reasoning about physical systems: A return to roots BrianCWilliams JohanDe Kleer Artificial Intelligence 51 1-3 1991 De-fects4j: a database of existing faults to enable controlled testing studies for java programs RenéJust DarioushJalali MichaelDErnst ISSTA'14 2014 A low-cost approximate minimal hitting set algorithm and its application to model-based diagnosis RuiAbreu And Arjan JCVan Gemund SARA'09 2009 Computing minimal diagnoses by greedy stochastic search AlexanderFeldman GregoryMProvan ArjanJ CVan Gemund AAAI'08 2008 MHS2: A map-reduce heuristic-driven minimal hitting set search algorithm NunoCardoso RuiAbreu MUSEPAT' 2013 13 A new bayesian approach to multiple intermittent fault diagnosis RuiAbreu PeterZoeteweij ArjanJ CVan Gemund IJCAI'09 2009 Software hell JohnCarey NeilGross MarciaStepanek OtisPort Business Week 1999 Multiple representations of knowledge in a mechanics problem-solver JohanDe Kleer IJCAI'77 1977 Qualitative simulation BenjaminKuipers Artificial Intelligence 29 3 1986 A kernel density estimate-based approach to component goodness modeling NunoCardoso RuiAbreu AAAI'13 2013 Dataaugmented software diagnosis AmirElmishali RoniStern MeirKalech AAAI'16 2016 Ties within fault localization rankings: Exposing and addressing the problem XiaofengXu WEricVidroha Debroy DonghuiWong Guo International Journal of Software Engineering and Knowledge Engineering 21 6 2011 Ambiguity groups and testability GNStenbakken TMSouders GWStewart IEEE Transactions on Instrumentation and Measurement 38 5 1989 An analysis of test data selection criteria using the RE-LAY model of fault detection DebraJRichardson MargaretCThompson IEEE Transactions on Software Engineering 19 6 1993 X-means: Extending k-means with efficient estimation of the number of clusters DanPelleg AndrewWMoore ICML'00 2000 A dynamic code coverage approach to maximize fault localization efficiency AlexandrePerez RuiAbreu AndréRiboira Journal of Systems and Software 90 2014 GZoltar: an eclipse plug-in for testing and debugging JoséCampos AndréRiboira AlexandrePerez RuiAbreu ASE' 12 2012 Designing the whyline: a debugging interface for asking questions about program behavior Andrew JensenKo BradAMyers CHI'04 2004 Adding context to fault localization with integration coverage HigorAmario DeSouza MarcosLordello Chaim ASE' 13 2013 An extensive comparison of bug prediction approaches MicheleMarco D'ambros RomainLanza Robbes MSR'10 2010 Version history, similar report, and structure: putting them together for improved bug localization ShaoweiWang DavidLo ICPC'14 2014 Slice-based statistical fault localization XiaoguangMao YanLei ZiyingDai YuhuaQi ChengsongWang Journal of Systems and Software 89 2014 Spectrum enhanced dynamic slicing for better fault localization BirgitHofer FranzWotawa ECAI'12 2012