=Paper=
{{Paper
|id=Vol-1519/paper3
|storemode=property
|title=Correctness of Semantic Code Smell Detection Tools
|pdfUrl=https://ceur-ws.org/Vol-1519/paper3.pdf
|volume=Vol-1519
|dblpUrl=https://dblp.org/rec/conf/apsec/MathurR15
}}
==Correctness of Semantic Code Smell Detection Tools==
Correctness of Semantic Code Smell Detection
Tools
Neeraj Mathur∗ and Y Raghu Reddy†
∗† Software Engineering Research Center,
International Institute of Information Technology, Hyderabad (IIIT-H), India
∗ neeraj.mathur@research.iiit.ac.in, † raghu.reddy@iiit.ac.in
Abstract—Refactoring is a set of techniques used to enhance language structure and context. In other words, any design-
the quality of code by restructuring existing code/design without based refactorings require the tool to understand the actual
changing its behavior. Refactoring tools can be used to detect spe- semantic intent of the code itself. For example, Long method
cific code smells, propose relevant refactorings, and in some cases
automate the refactoring process. However, usage of refactoring is one such code smell that requires the tool to understand the
tools in industry is still relatively low. One of the major reasons context of the method before an Extract method refactoring
being the veracity of the detected code smells, especially smells can be performed automatically while preserving the original
that aren’t purely syntactic in nature. We conduct an empirical semantic intent.
study on some refactoring tools and evaluate the correctness of
the code smells they identify. We analyze the level of confidence Despite the known benefits of refactoring tools, their usage
users have on the code smells detected by the tools and discuss is not widespread due to users’ lack of trust on the tools’
some issues with such tools. code smell detection capability, learning curve involved, and
Index Terms—Correctness, Detection, Maintenance, Refactor-
the inability of users to understand the code smell detection
ing, Semantic code smells. results [4], [6], [14]. In this paper, we study the correctness
of the detected code smells of multiple open source tools
like JDeodorant, InCode, etc. and the lack of trust of users
I. I NTRODUCTION on their detection capability through an empirical study of
open source projects like JHotDraw (www.jhotdraw.org) and
Refactoring improves various qualities of code/design like GanttProject (www.ganttproject.biz). We believe that lack of
maintainability (understandability and readability), extensibil- trust is proportional to the correctness of the tools detection
ity, etc. by changing the structure of the code/design without ability. Most code smell detection tools detect code smells that
changing the overall behaviour of the system. It was first require small-scale resolutions correctly. However, correctness
introduced by Opdyke and Johnson [11] and later popularized is an issue when design based refactorings or semantic intent
by Martin Fowler [7]. Fowler categorized various types of is considered. Hence, we restrict our focus to tools that
refactorings in terms of their applicability and suggested vari- detect code smells that require the tool to understand the
ous refactorings for code smells (bad indicators in code) within semantic intent (for example, Feature Envy, Long Methods,
classes and between classes. Over the years, other researchers Shotgun Surgery, God class, etc.). From now on, we refer to
have added to the knowledge base on code smells [16]. Any such code smells as ”semantic code smells”. We cross
refactoring done to address specific code smells requires one validate our study results on GanttProject with the study results
to test the refactored code with respect to preservation of the conducted by Fontana et al. [6] on GanttProject (v1.11.1).
original behaviour. This is primarily done by writing test cases Additionally, we used our own dummy code with a few
before the refactoring is done and testing the refactored code induced semantic code smells to check for correctness of the
against the test cases. As a result, validating the correctness tools. We focus on the following in this paper:
of a detected code smell and the automation of it’s refactoring
is very difficult. Explicit manual intervention may be needed. • Correctness of the identified code smells among the
chosen tools
Many code smell detection tools support (semi) automatic
• Deviation in confidence levels of developers in open
refactoring process [4], [8]. iPlasma, Jdeodorant, RefactorJ,
source code smell detection tools that detect semantic
etc. are some of the tools that can be used for detection of
code smells
code smells and application of specific refactoring techniques
in automated or semi-automated manner. As noted in our The rest of the paper is structured as follows: Section II
previous work, each of these tools can detect only certain type provides a brief overview of the code smells discussed in this
of code smells automatically [8]. Also, there is no standardized paper and section III presents some related work. In section IV,
approach for detecting such code smells and hence tools follow we detail the study design. Section V and VI present the results
their own approach [9], inevitably leading to different set of and analysis of the results. Based on the study, we provide
code smells being detected for the same piece of code. Most some guidelines for increasing the correctness of detecting
code smell detection techniques depend on static analysis and some code smells in section VII. Finally, in section VIII we
code metrics and do not consider factors like system size, discuss some limitations to our work.
3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015) 17
II. C ODE S MELLS IV. S TUDY D ESIGN
Complexity related metrics like coupling are commonly
used in tools to detect certain semantic code smells. Threshold The objective of our study is to analyze the correctness of
values are established for various metrics and the code smells tools relevant to semantic code smells by performing a study
are identified based on the threshold values. Code smells like with human subjects with prior refactoring experience. Tools
Feature envy, long methods, god class, etc. are widely studied like JDeodorant, PMD, InCode, iPlasma, and Stench Blossom,
in the literature. In our study, we primarily target the following and two large systems like JHotDraw and GanttProject that
code smells: have been widely studied in refactoring research were con-
sidered. Choosing these tools and systems helped us in cross
• Feature envy: A method is more interested in some other
validating our work with prior work done by us [8] and other
class than the one in which it is defined.
researchers using similar systems/tools.
• Long methods: Method that is too long (measured in
terms of lines of code or other metrics), possibly leading Initially, 35 human subjects volunteered to be part of the
to low cohesion and high coupling study. The exact hypothesis of the study was not informed
to the subject to avoid biasing the study results. The subjects
• God class: A God class performs too much work on
were only informed of the specific tasks that needed to be
its own delegating only minor details to a set of trivial
done, i.e. to assess certain refactoring tools with respect to
classes.
their ability to detect code smells based on a given criteria
• Large Class: A class with too many instance variables or
and fill in a template.
methods. It may become a God class.
All the human subjects had varying levels of prior knowl-
• Shotgun Surgery: A change in one classes necessitates a
edge about code smells and tool based refactorings. However,
change in many other classes
they had not worked on the specific tools used for this study.
• Refused Bequest: A sub-class not using its inherited
The subjects had varied experience (13% subjects had 1-5
functionality years of experience, 31% had 5-10 years of experience and
rest had less than a year of experience). The detailed statistics
III. R ELATED W ORK
of subjects is available at [3] . We asked them to focus
Fontana et al. [6] showed the comparative analysis of on specific code smells like Feature Envy, Long Methods,
code smells detected by various refactoring tools and their Shotgun Surgery, God class, etc. and list down all these smells
support of (semi) automatic refactoring. The study analyzes the and record their comments/rationale for detecting these as code
differences between code smell detection tools. In our previous smells. The template [1] had a column that asked them the
work [8], we analyzed various java based refactoring tools with semantic meaning of the detected code smells. In addition,
respect to its usability and reasoned about the automation of we asked them to provide reasoning for not agreeing with
various code smells. the refactorings detected by the tools used. The subjects were
Pinto et al. [14] investigated data from StackOverflow to given a three-week period to perform the activity and fill in
find out the barriers for adoption of code detection tools. templates. After evaluation of the templates, results from 32
They listed the issues mentioned in the forum related to subjects were taken into consideration. The other three did not
the adoption/usability issues users are talking about in the fill in the templates completely. To cross-check the correctness
StackOverflow. Olbrich et al. [10] performed an empirical of the tools with respect to their semantic code smell detection,
study for God Class and Brain Class to evaluate that detected in addition to the two open-source systems, we instrumented
smells are really smells. From their empirical study they have one of our own projects. It was interesting to note that most of
concluded that if the results are normalized with the size of the tools were not detecting code smells that seemed obvious
the system then smell results will become opposite. In fact the from our perspective.
detected smell classes were less likely for changes and errors.
Ferme et al. [5] conducted a study for God Class and
Data Class to find out that all smells are the real smells.
A. Subject Systems
They have proposed filters to be used to reduce or refine
the detection rules of these smells. This paper extends our In our study, we chose two open source systems:
previous work and complements the work done by other
• GanttProject - a free project management app where users
authors by considering semantic code smells. In [13], Palomba
can create tasks, create project baseline, organize tasks in
et al. studied developers perception of bad smells. Their study
a work break down structure
depicts gap between theory and practice, i.e., what is believed
• JHotDraw - a Java GUI framework for technical and
to be a problem (theory) and what is actually a problem
structured Graphics. Its design relies heavily on some
(practice). Their study provide insights on characteristics of
well-known design patterns.
bad smells not yet explored sufficiently.
Ouni et al. [12] proposed search based refactoring approach The subject systems are available for use under open source
to preserve domain semantics of a program when refactoring license and are fairly well documented. In addition to being
is decided/implemented automatically. They argued that refac- a part of Qualitas Corpus [15], these are widely studied in
toring might be syntactically correct, have the right behaviour, refactoring research and referenced in the literature. Table 1
but model incorrectly the domain semantics. provides details of these systems.
3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015) 18
TABLE I: Characteristics of subject systems number of god classes were significantly lesser: reduced from
JHotDraw Ganttproject 13 (v1.11.1) to 4 (v2.7). Unlike JDeodrant and inCode, the
Version 5.4 2.7.1891 results from iPlasma increased: from 13 (v1.11.1) to 42 (v2.7).
Total Lines of Code 32435 46698
This inconsistency between tools reduces the confidence level
Number of classes 368 1230
Number of methods 3341 5917 of the results.
Weighted methods per class 6990 8540 The degree of disagreement to jDeodrant code smells for
Number of static methods 280 235 jHotDraw project was 10 out of 56. Our human subjects
observed that tools were considering the data model classes
(getters and setters) and parser classes as smells. Usually these
B. Code Smell Detection Tools
classes are needed and are necessary evils. As a result, there is
There are several commercial and open source code smell a need of building some sort of intelligence/metrics to detect
detection tools. Some are research prototypes meant for detec- these kinds of classes which can safely be ignored to reduce
tion of specific code smells while others identify a wide range the false positives.
of code smells. In addition to detection, some tools refactor the
Large Class: This code smell is detected by the code size
code smells automatically, while others are semi-automated.
and is subjective to the threshold limit of LOC set by the user.
Semi-automated refactoring tools propose refactorings that
Classes that contain project utility methods can grow in size
require human intervention and can be automated to a certain
with a lot of small utility methods and usually developers make
extent by changing rules of detection. Some of these are more
these classes as singleton objects. Refactoring such smells
or less like recommender systems that recommend certain type
requires one to be able understand the intent of the sentences
of refactorings but leave it to the developers to refactor or
before an extract method or extract class refactoring is applied.
ignore the recommendations.
Refused Bequest: Some tools considered interface methods
Some tools are integrated with the IDE itself, while others
and abstract methods that are normally overridden by the
are written as plugins that can be installed on a need basis.
respective concrete classes as a code smell resulting in a lot
For example, checkstyle, PMD, etc. are some eclipse based
of false positives.
plugins that are well known. For our study, we focused on
tools widely studied in the literature, their semantic code smell Compared to Fontana et al.’s [6] study, inCode detected
detection ability and usage in the industry. The list of tools 6 (in V2.7) whereas it was 0 (in V1.11.1). The degree of
and the detected code smells relevant to our study are shown disagreement to the detected code smells was 2 in inCode,
in table II. 1 in iPlasma for GanttProject. For jHotDraw it was 1 (for
inCode) and 2 (for iPlasma).
V. E XPERIMENT R ESULTS Shotgun Surgery: The results of our study for shotgun
Table III provides a cumulative summary of code smells surgery on v2.7 were similar to the one’s for the GanttProject
detected and disagreements (weighted average of results from (v1.11.1). However, from our study we can conclude that
32 results) by our human subjects. To show the disparity the way shotgun surgery is detected in the different code
in results, we compare our results for the GanttProject with smells differs from tool to tool rather that different versions
Fontana et al.’s results [6]. For reference, the detailed list of of the same tool. The degree of disagreement to the detected
detected smell by our human subjects is available at [2]. code smells was 2 for GanttProject using inCode and 7 using
Feature Envy: The number of Feature Envy methods iPlasma.
detected by different tools varies significantly. Some tools Long Method: This code smell is related to number of
consider any three or more calls to method of the other class lines of code present in the method and subject to the
as a code smell and hence give rise to large number of false threshold limit set by the user for detecting the code smell. As
positive code smells. In such cases, when it’s just counting per the disagreements documented by our subjects, methods
numbers, it becomes tedious to filter out the actual smells. containing large switch statements should not be counted as
The degree of disagreement was found to be 9/44 for a long method. iPlasma had a complex detection mechanism
JDeodorant and 3/4 for inCode for the JHotDraw project. that considers such kind of things while detecting the Long
Disagreements by the human subjects were prevalent across Method code smells. On the contrary jDeodrant listed fairly
all tools for detected feature envy smells. We also observed small methods as a long method code smell.
a significant difference between the results of Fontana’s [6] The degree of disagreement to the code smells detected for
study and our JDeodorant results. Their study reveals that over GanttProject was 12 in jDeodrant, 27 in Stench Blossom and
a period of time, in GanttProject version from V1.10 to 1.11.1, 8 in PMD. For JHotDraw project, it was 10 in jDeodrant, 12
Feature Envy methods reduced to 2 (in v1.11.1) from 18 (in in Stench Blossom, 11 in iPlasma and 8 in PMD.
v1.10), whereas in our study of GanttProject v2.7, there were The detections from jDeodrant tool were significantly high
113 Feature Envy code smells. As it can be seen from the as compared to 57 (v1.111.1) of GanttProject from Fontana et
results, the number of detected smells is significantly different al. [6] study. Interestingly, we noticed that the long methods
from the numbers given in their work. were reduced from 160 (in v1.10) to 57 (in v1.11.1), whereas
God Class: The number of detected God classes detected as in our study it was still high i.e. 221.This was because as
by JDeodorant for GanttProject in Fontana et al.’s [6] study a software evolves, it is expected that long methods will grow
was 22 (v1.11.1) where as in v2.7 it is 127. In inCode the over a period of time.
3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015) 19
TABLE II: Code smell detection tools
Tool Code smells Detail
JDeodarant, This is an Eclipse plug-in that automatically identifies four code smells in Java programs.
(vv. 3.5 e3.6), FE, GC, LM, TC It ranks the refactoring according to their impact on the design and automatically applies
[EP],Java the most effective refactoring.
Stench
This tool provides a high-level overview of the smells in their code. It is an Eclipse
Blossom
FE, LM, LC, MC plugin with three different views that progressively offer more visualized information
(v. 3.3),
about the smells in the code.
[EP],Java
InCode [SA], This tool supports the analysis of a system at architectural and code level.It allows for
BM, FE, GC, IC, RB, SS
C, C++, Java detection of more than 20 design flaws and code smells.
It can be used for quality assessment of object-oriented systems and supports all phases
iPlasma [SA], BM, FE, GC, IC, SS,
of analysis: from model extraction up to high-level metrics based analysis, or detection
C++, Java RB, LM, SG
of code duplication.
Scans Java source code and looks for potential bugs such as dead code, empty
PMD,
LC, LM try/catch/finally/switch statements, unused local variables, parameters and duplicate
[EP or SA],Java
code.
Feature Envy (FE), Refuse Bequest (RB), God Class (GC), Long Method (LM), Lazy Class (LC), Intensive Coupling (IC),
Shotgun Surgery (SS), Speculative Generality (SG), Dispersed Coupling (DC), Brain Method (BM).
Type: Standalone Application (SA), Eclipse Plug-in (EP)
TABLE III: Code Smell Detected & Disagreement (as shown in table III). Additionally, the results were not
Stench complimenting the results provided in prior research [6]. In
Code jDeodrant inCode iPlasma PMD
Blossom other words, the correctness of the detected smells was not
Smell
$ # $ # $ # $ # $ #
JHotDraw
accurate with respect to the semantics of the code written.
FE 44 9 4 3 35 4 28 10 56 9 To validate the results, we had to further cross-check the
GC 56 10 14 15 22 2 - - 34 10 tools with some dummy examples. For instance, the dummy
LC - - - - - - 22 5 41 5 code (shown in Listing 1) was detected as a Feature Envy
RB - - 4 1 2 2 - - - -
SS - - 10 3 13 6 - - - - smell in some of the tools (for example, jDeodorant). If
LM 90 10 - - 94 11 113 12 73 8 ’doCommit’ method is moved to any of the classes (A, B
GnattProject and C in this example) we must pass the other two classes
FE 113 12 11 2 42 13 53 28 38 4
as a reference parameter, that in turn increases the coupling.
GC 127 8 4 3 24 3 - - 37 17
LC - - - - - - 9 - 40 9 Moreover, semantically it makes sense to call all commit
RB - - 6 2 3 1 - - - - methods in ’doCommit’ method itself.
SS - - 7 2 42 7 - - - -
LM 221 12 - - - - 55 27 36 8
An interesting observation can be made from the tools that
$ - Detected, # - Disagree detected Request Bequest code smell instances. For example,
code resembling the dummy code (shown in Listing 2) reveals
that there is no behavior written in the base method and just
VI. D ISCUSSION because it was being overridden without any invocation of
The major challenge in assessing correctness of detected base methods, it was detected as a smell. In other words,
smells is the knowledge possessed by human subjects in intuitively the authors could conclude even such code smells
regards to the code smells and the behavior of the code are primarily being thought about as syntactic where in the
itself. Since the subjects (users) were not familiar with the tool is just looking for redundant names in the superclass
tools chosen for the study, they complained about the time and subclass. Ideally, the tool should check if there is any
consumed in understanding and using the tool. At times, the meaningful behavior attached to the base method and only
focus seemed to be more on the user-interface of the tool rather then should it be detected as a smell.
than detected code smells. The authors had to revert back to The dummy example (shown in Listing 3) was not detected
the subjects to get additional clarification about the comments as feature envy except by stench blossom tool. The probable
written in the templates regarding specific code smells. reason for non detection is the declaration of the phone object
Some tools require explicit specification of rules (for exam- inside the getMobilePhoneNumber method. Logically
ple, PMD) for detecting a specific code smell. So, selecting may not be detected as a code smell but from a semantic
rules from the entire list of rules was tedious and time- perspective getMobilePhoneNumber should be part of
consuming. Most of the users seemed to struggle with the ”phone” class. This issue poses a question of correct semantic
setup and configuration issues of the tools. Few tools like code smell detection by tools.
jDeodorant had high memory utilization and performance The dummy example (shown in Listing 4) has shotgun
issues. So re-compilation after every change and re-running surgery code smell. The example shows a common mistake
the detection process was laborious. that users commit while creating database connection and
Contrary to our belief that the same code smell must be querying tables. Users tend to create individual connections
identified in the same way by different tools, the disagreement and command objects in each of the methods as shown in the
in the correctness of the detected code smells between the example. If we have a connection timeout occurs semantically
various tools for the same type of smell was pretty high is make sense to create a utility method that takes SQL as
3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015) 20
an argument and returns result set ”DBUtility.getList” 4 conn = DriverManager.getConnection(DB_URL,
in this method we will open connection and create SQL 5 USER, PASS);
6 stmt = conn.createStatement();
statements. InCode and iPlasma tools did not detect this code
7 String sql;
smell. 8 sql = "SELECT * FROM Employees";
Although, several tools detect code smells, they do not 9 ResultSet rs = stmt.executeQuery(sql);
consider the semantic intent of the code and hence end up 10 return rs.toList();
with lot of false positives. Reducing false positives is the 11 }
12 public List getCustomerList() {
first step towards increasing the confidence levels of users and
13 Connection conn = null;
proportionately increasing the usage of refactoring tools. 14 Statement stmt = null;
15 Class.forName("com.mysql.jdbc.Driver");
Listing 1: Feature Envy 16 conn = DriverManager.getConnection(DB_URL,
1 public class Main { 17 USER, PASS);
18 stmt1 = conn1.createStatement();
2 public void doCommit(){
19 String sql;
3 a.commit();
20 sql = "SELECT * FROM Customers";
4 b.commit();
21 ResultSet rs = stmt1.executeQuery(sql);
5 c.commit();}}
22 return rs.toList(); }
6 public class A {
7 public void commit() {
8 //do something
9 } } VII. C ODE SMELL DETECTION PRE - CHECKS
10 public class B {
Based on our study, we recommend some pre-checks spe-
11 public void commit() {
12 //do something
cific to particular code smells to improve the correctness of
13 } } detection:
14 public class C { Feature Envy:
15 public void commit() { • Are referencing methods from multiple classes. Check if
16 //do something
17 } } moving a method increases coupling and references to
the target class.
• Check if mutual work like object dispose, commit of
Listing 2: Refused Bequest
multiple transactions is accomplished in the method. Is it
1 public class Base{ semantically performing a cumulative task.
2 protected void m1() { }
• Take domain knowledge and system size into account
3 }
4 public class Inherited extends Base { before detecting smell, use semantic/information retrieval
5 protected void m1() { //do something } techniques to identify domain concepts
6 }
Long Class/ God class:
• Ignore utility classes, parsers, compiles, interpreters ex-
Listing 3: Feature Envy ception handlers
1 public class Phone { • Ignore Java beans, Data Models, Log file classes
2 private final String unformattedNumber; • Transaction manager classes
3 public Phone(String unformattedNumber) {
• Normalized results with system size
4 this.unformattedNumber = unformattedNumber;
5 } Long Method:
6 public String getAreaCode() {
• Check if large switch blocks are written in the methods
7 return unformattedNumber.substring(0,3);
8 } if multiple code blocks can be extracted with different
9 public String getPrefix() { functionality
10 return unformattedNumber.substring(3,6); Refuse Parent Bequest:
11 }
12 public String getNumber() { • Ignore Interface methods, they are meant to be overrid-
13 return unformattedNumber.substring(6,10); den.
14 } } • Check if base method has any meaning full behaviour
15 public class Customer {
attached to it.
16 public String getMobilePhoneNumber() {
17 Phone m_Phone = new Phone("111-123-2345");
18 return "(" + m_Phone.getAreaCode() + ") " VIII. L IMITATIONS & F UTURE W ORK
19 + m_Phone.getPrefix() + "-"
20 + m_Phone.getNumber(); Our initial study has provided evidence of disagreement
21 } } towards the detected code smell from tools by our human
subjects. We presented sample code for incorrectly detected
Listing 4: Shotgun Surgery code smells and semantic smells that were not detected by the
1public List getEmployeeList() { tools. The authors strongly felt the need of taking semantic
2 Connection conn = null; intent of the code into consideration while detecting smells
3 Statement stmt = null; and proposing (semi) automatic refactoring.
3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015) 21
The disagreement in detected code smells correlates to [6] F. Fontana, E. Mariani, A. Morniroli, R. Sormani, and A. Tonello. An
the confidence levels of users. We saw that the results from experience report on using code smells detection tools. In Software
Testing, Verification and Validation Workshops (ICSTW), 2011 IEEE
all the users were not the same for the same code smell Fourth International Conference on, pages 450–457, 2011.
detection tools. So, the accuracy of these results can always [7] M. Fowler. Refactoring: improving the design of existing code. Pearson
be questioned. The results of Fontana et al. were used for Education India, 1999.
[8] J. Mahmood and Y. Reddy. Automated refactorings in java using
cross-validation of our work. For the degree of disagreement intellij idea to extract and propogate constants. In Advance Computing
to the detected code smells, we took an average of the overall Conference (IACC), 2014 IEEE International, pages 1406–1414, 2014.
disagreement report by the users. But, code smell disagreement [9] M. MÃntylà and C. Lassenius. Subjective evaluation of software
evolvability using code smells: An empirical study. Empirical Software
is subjective until a standardized method for detection and Engineering, 11(3):395–431, 2006.
measurement is proposed. [10] S. Olbrich, D. Cruzes, and D. I. Sjoberg. Are all code smells harmful?
As a future work we would to extend our experiment a study of god classes and brain classes in the evolution of three
open source systems. In Software Maintenance (ICSM), 2010 IEEE
with more industry level users. We would like to share the International Conference on, pages 1–10, 2010.
disagreement detected by our human subjects with the actual [11] W. F. Opdyke and R. E. Johnson. Refactoring: An aid in designing appli-
developers of the system to validate our findings. Towards cation frameworks and evolving object-oriented systems. In Symposium
on Object-Oriented Programming Emphasizing Practical Applications,
understanding the correctness of the semantic code smell we September 1990., 1990.
would like to compare detection logic by tools. [12] A. Ouni, M. Kessentini, H. Sahraoui, and M. Hamdi. Search-based
refactoring: Towards semantics preservation. In Software Maintenance
(ICSM), 2012 28th IEEE International Conference on, pages 347–356,
R EFERENCES Sept 2012.
[13] F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, and A. De Lucia. Do
[1] Code smell evaluation template. http://bit.ly/1WBfZPy. [Online;
they really smell bad? a study on developers’ perception of bad code
accessed 30-June-2015].
smells. In Software Maintenance and Evolution (ICSME), 2014 IEEE
[2] Code smells detected by human subjects & their disagreements. http:
International Conference on, pages 101–110, Sept 2014.
//bit.ly/1BNJ6KS.
[14] G. H. Pinto and F. Kamei. What programmers say about refactoring
[3] Detailed profile of our human subjects. http://bit.ly/1KiuEvY. [Online;
tools?: An empirical investigation of stack overflow. In Proceedings of
accessed 30-June-2015].
the 2013 ACM Workshop on Workshop on Refactoring Tools, WRT ’13,
[4] D. Campbell and M. Miller. Designing refactoring tools for developers.
pages 33–36, New York, NY, USA, 2013. ACM.
In Proceedings of the 2Nd Workshop on Refactoring Tools, WRT ’08,
[15] E. Tempero, C. Anslow, J. Dietrich, T. Han, J. Li, M. Lumpe, H. Melton,
pages 9:1–9:2, New York, NY, USA, 2008. ACM.
and J. Noble. The qualitas corpus: A curated collection of java code for
[5] V. Ferme, A. Marino, and F. A. Fontana. Is it a real code smell to be
empirical studies. In Software Engineering Conference (APSEC), 2010
removed or not? In International Workshop on Refactoring & Testing
17th Asia Pacific, pages 336–345, 2010.
(RefTest), co-located event with XP 2013 Conference, 2013.
[16] W. C. Wake. Refactoring Workbook. Addison-Wesley Longman,
Publishing Co., Inc., Boston, MA, USA, 1 edition edition, 2003.
3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015) 22