<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Correctness of Semantic Code Smell Detection Tools</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Neeraj Mathur and Y Raghu Reddy</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>17</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>-Refactoring is a set of techniques used to enhance the quality of code by restructuring existing code/design without changing its behavior. Refactoring tools can be used to detect specific code smells, propose relevant refactorings, and in some cases automate the refactoring process. However, usage of refactoring tools in industry is still relatively low. One of the major reasons being the veracity of the detected code smells, especially smells that aren't purely syntactic in nature. We conduct an empirical study on some refactoring tools and evaluate the correctness of the code smells they identify. We analyze the level of confidence users have on the code smells detected by the tools and discuss some issues with such tools.</p>
      </abstract>
      <kwd-group>
        <kwd>Correctness</kwd>
        <kwd>Detection</kwd>
        <kwd>Maintenance</kwd>
        <kwd>Refactoring</kwd>
        <kwd>Semantic code smells</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>I. INTRODUCTION</title>
      <p>
        Refactoring improves various qualities of code/design like
maintainability (understandability and readability),
extensibility, etc. by changing the structure of the code/design without
changing the overall behaviour of the system. It was first
introduced by Opdyke and Johnson [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and later popularized
by Martin Fowler [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Fowler categorized various types of
refactorings in terms of their applicability and suggested
various refactorings for code smells (bad indicators in code) within
classes and between classes. Over the years, other researchers
have added to the knowledge base on code smells [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Any
refactoring done to address specific code smells requires one
to test the refactored code with respect to preservation of the
original behaviour. This is primarily done by writing test cases
before the refactoring is done and testing the refactored code
against the test cases. As a result, validating the correctness
of a detected code smell and the automation of it’s refactoring
is very difficult. Explicit manual intervention may be needed.
      </p>
      <p>
        Many code smell detection tools support (semi) automatic
refactoring process [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. iPlasma, Jdeodorant, RefactorJ,
etc. are some of the tools that can be used for detection of
code smells and application of specific refactoring techniques
in automated or semi-automated manner. As noted in our
previous work, each of these tools can detect only certain type
of code smells automatically [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Also, there is no standardized
approach for detecting such code smells and hence tools follow
their own approach [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], inevitably leading to different set of
code smells being detected for the same piece of code. Most
code smell detection techniques depend on static analysis and
code metrics and do not consider factors like system size,
language structure and context. In other words, any
designbased refactorings require the tool to understand the actual
semantic intent of the code itself. For example, Long method
is one such code smell that requires the tool to understand the
context of the method before an Extract method refactoring
can be performed automatically while preserving the original
semantic intent.
      </p>
      <p>
        Despite the known benefits of refactoring tools, their usage
is not widespread due to users’ lack of trust on the tools’
code smell detection capability, learning curve involved, and
the inability of users to understand the code smell detection
results [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In this paper, we study the correctness
of the detected code smells of multiple open source tools
like JDeodorant, InCode, etc. and the lack of trust of users
on their detection capability through an empirical study of
open source projects like JHotDraw (www.jhotdraw.org) and
GanttProject (www.ganttproject.biz). We believe that lack of
trust is proportional to the correctness of the tools detection
ability. Most code smell detection tools detect code smells that
require small-scale resolutions correctly. However, correctness
is an issue when design based refactorings or semantic intent
is considered. Hence, we restrict our focus to tools that
detect code smells that require the tool to understand the
semantic intent (for example, Feature Envy, Long Methods,
Shotgun Surgery, God class, etc.). From now on, we refer to
such code smells as ”semantic code smells”. We cross
validate our study results on GanttProject with the study results
conducted by Fontana et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] on GanttProject (v1.11.1).
Additionally, we used our own dummy code with a few
induced semantic code smells to check for correctness of the
tools. We focus on the following in this paper:
      </p>
      <p>Correctness of the identified code smells among the
chosen tools
Deviation in confidence levels of developers in open
source code smell detection tools that detect semantic
code smells</p>
      <p>The rest of the paper is structured as follows: Section II
provides a brief overview of the code smells discussed in this
paper and section III presents some related work. In section IV,
we detail the study design. Section V and VI present the results
and analysis of the results. Based on the study, we provide
some guidelines for increasing the correctness of detecting
some code smells in section VII. Finally, in section VIII we
discuss some limitations to our work.</p>
      <p>II. CODE SMELLS</p>
      <p>Complexity related metrics like coupling are commonly
used in tools to detect certain semantic code smells. Threshold
values are established for various metrics and the code smells
are identified based on the threshold values. Code smells like
Feature envy, long methods, god class, etc. are widely studied
in the literature. In our study, we primarily target the following
code smells:</p>
      <p>Feature envy: A method is more interested in some other
class than the one in which it is defined.</p>
      <p>Long methods: Method that is too long (measured in
terms of lines of code or other metrics), possibly leading
to low cohesion and high coupling
God class: A God class performs too much work on
its own delegating only minor details to a set of trivial
classes.</p>
      <p>Large Class: A class with too many instance variables or
methods. It may become a God class.</p>
      <p>Shotgun Surgery: A change in one classes necessitates a
change in many other classes
Refused Bequest: A sub-class not using its inherited
functionality</p>
    </sec>
    <sec id="sec-2">
      <title>III. RELATED WORK</title>
      <p>
        Fontana et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] showed the comparative analysis of
code smells detected by various refactoring tools and their
support of (semi) automatic refactoring. The study analyzes the
differences between code smell detection tools. In our previous
work [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], we analyzed various java based refactoring tools with
respect to its usability and reasoned about the automation of
various code smells.
      </p>
      <p>
        Pinto et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] investigated data from StackOverflow to
find out the barriers for adoption of code detection tools.
They listed the issues mentioned in the forum related to
the adoption/usability issues users are talking about in the
StackOverflow. Olbrich et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] performed an empirical
study for God Class and Brain Class to evaluate that detected
smells are really smells. From their empirical study they have
concluded that if the results are normalized with the size of
the system then smell results will become opposite. In fact the
detected smell classes were less likely for changes and errors.
      </p>
      <p>
        Ferme et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] conducted a study for God Class and
Data Class to find out that all smells are the real smells.
They have proposed filters to be used to reduce or refine
the detection rules of these smells. This paper extends our
previous work and complements the work done by other
authors by considering semantic code smells. In [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], Palomba
et al. studied developers perception of bad smells. Their study
depicts gap between theory and practice, i.e., what is believed
to be a problem (theory) and what is actually a problem
(practice). Their study provide insights on characteristics of
bad smells not yet explored sufficiently.
      </p>
      <p>
        Ouni et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] proposed search based refactoring approach
to preserve domain semantics of a program when refactoring
is decided/implemented automatically. They argued that
refactoring might be syntactically correct, have the right behaviour,
but model incorrectly the domain semantics.
      </p>
    </sec>
    <sec id="sec-3">
      <title>IV. STUDY DESIGN</title>
      <p>
        The objective of our study is to analyze the correctness of
tools relevant to semantic code smells by performing a study
with human subjects with prior refactoring experience. Tools
like JDeodorant, PMD, InCode, iPlasma, and Stench Blossom,
and two large systems like JHotDraw and GanttProject that
have been widely studied in refactoring research were
considered. Choosing these tools and systems helped us in cross
validating our work with prior work done by us [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and other
researchers using similar systems/tools.
      </p>
      <p>Initially, 35 human subjects volunteered to be part of the
study. The exact hypothesis of the study was not informed
to the subject to avoid biasing the study results. The subjects
were only informed of the specific tasks that needed to be
done, i.e. to assess certain refactoring tools with respect to
their ability to detect code smells based on a given criteria
and fill in a template.</p>
      <p>
        All the human subjects had varying levels of prior
knowledge about code smells and tool based refactorings. However,
they had not worked on the specific tools used for this study.
The subjects had varied experience (13% subjects had 1-5
years of experience, 31% had 5-10 years of experience and
rest had less than a year of experience). The detailed statistics
of subjects is available at [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] . We asked them to focus
on specific code smells like Feature Envy, Long Methods,
Shotgun Surgery, God class, etc. and list down all these smells
and record their comments/rationale for detecting these as code
smells. The template [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] had a column that asked them the
semantic meaning of the detected code smells. In addition,
we asked them to provide reasoning for not agreeing with
the refactorings detected by the tools used. The subjects were
given a three-week period to perform the activity and fill in
templates. After evaluation of the templates, results from 32
subjects were taken into consideration. The other three did not
fill in the templates completely. To cross-check the correctness
of the tools with respect to their semantic code smell detection,
in addition to the two open-source systems, we instrumented
one of our own projects. It was interesting to note that most of
the tools were not detecting code smells that seemed obvious
from our perspective.
      </p>
      <sec id="sec-3-1">
        <title>A. Subject Systems</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>In our study, we chose two open source systems: GanttProject - a free project management app where users can create tasks, create project baseline, organize tasks in a work break down structure</title>
      <p>JHotDraw - a Java GUI framework for technical and
structured Graphics. Its design relies heavily on some
well-known design patterns.</p>
      <p>
        The subject systems are available for use under open source
license and are fairly well documented. In addition to being
a part of Qualitas Corpus [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], these are widely studied in
refactoring research and referenced in the literature. Table 1
provides details of these systems.
      </p>
      <sec id="sec-4-1">
        <title>B. Code Smell Detection Tools</title>
        <p>There are several commercial and open source code smell
detection tools. Some are research prototypes meant for
detection of specific code smells while others identify a wide range
of code smells. In addition to detection, some tools refactor the
code smells automatically, while others are semi-automated.
Semi-automated refactoring tools propose refactorings that
require human intervention and can be automated to a certain
extent by changing rules of detection. Some of these are more
or less like recommender systems that recommend certain type
of refactorings but leave it to the developers to refactor or
ignore the recommendations.</p>
        <p>Some tools are integrated with the IDE itself, while others
are written as plugins that can be installed on a need basis.
For example, checkstyle, PMD, etc. are some eclipse based
plugins that are well known. For our study, we focused on
tools widely studied in the literature, their semantic code smell
detection ability and usage in the industry. The list of tools
and the detected code smells relevant to our study are shown
in table II.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>V. EXPERIMENT RESULTS</title>
      <p>
        Table III provides a cumulative summary of code smells
detected and disagreements (weighted average of results from
32 results) by our human subjects. To show the disparity
in results, we compare our results for the GanttProject with
Fontana et al.’s results [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For reference, the detailed list of
detected smell by our human subjects is available at [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Feature Envy: The number of Feature Envy methods
detected by different tools varies significantly. Some tools
consider any three or more calls to method of the other class
as a code smell and hence give rise to large number of false
positive code smells. In such cases, when it’s just counting
numbers, it becomes tedious to filter out the actual smells.</p>
      <p>The degree of disagreement was found to be 9/44 for
JDeodorant and 3/4 for inCode for the JHotDraw project.</p>
      <p>
        Disagreements by the human subjects were prevalent across
all tools for detected feature envy smells. We also observed
a significant difference between the results of Fontana’s [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
study and our JDeodorant results. Their study reveals that over
a period of time, in GanttProject version from V1.10 to 1.11.1,
Feature Envy methods reduced to 2 (in v1.11.1) from 18 (in
v1.10), whereas in our study of GanttProject v2.7, there were
113 Feature Envy code smells. As it can be seen from the
results, the number of detected smells is significantly different
from the numbers given in their work.
      </p>
      <p>
        God Class: The number of detected God classes detected
by JDeodorant for GanttProject in Fontana et al.’s [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] study
was 22 (v1.11.1) where as in v2.7 it is 127. In inCode the
number of god classes were significantly lesser: reduced from
13 (v1.11.1) to 4 (v2.7). Unlike JDeodrant and inCode, the
results from iPlasma increased: from 13 (v1.11.1) to 42 (v2.7).
      </p>
      <p>This inconsistency between tools reduces the confidence level
of the results.</p>
      <p>The degree of disagreement to jDeodrant code smells for
jHotDraw project was 10 out of 56. Our human subjects
observed that tools were considering the data model classes
(getters and setters) and parser classes as smells. Usually these
classes are needed and are necessary evils. As a result, there is
a need of building some sort of intelligence/metrics to detect
these kinds of classes which can safely be ignored to reduce
the false positives.</p>
      <p>Large Class: This code smell is detected by the code size
and is subjective to the threshold limit of LOC set by the user.</p>
      <p>Classes that contain project utility methods can grow in size
with a lot of small utility methods and usually developers make
these classes as singleton objects. Refactoring such smells
requires one to be able understand the intent of the sentences
before an extract method or extract class refactoring is applied.</p>
      <p>Refused Bequest: Some tools considered interface methods
and abstract methods that are normally overridden by the
respective concrete classes as a code smell resulting in a lot
of false positives.</p>
      <p>
        Compared to Fontana et al.’s [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] study, inCode detected
6 (in V2.7) whereas it was 0 (in V1.11.1). The degree of
disagreement to the detected code smells was 2 in inCode,
1 in iPlasma for GanttProject. For jHotDraw it was 1 (for
inCode) and 2 (for iPlasma).
      </p>
      <p>Shotgun Surgery: The results of our study for shotgun
surgery on v2.7 were similar to the one’s for the GanttProject
(v1.11.1). However, from our study we can conclude that
the way shotgun surgery is detected in the different code
smells differs from tool to tool rather that different versions
of the same tool. The degree of disagreement to the detected
code smells was 2 for GanttProject using inCode and 7 using
iPlasma.</p>
      <p>Long Method: This code smell is related to number of
lines of code present in the method and subject to the
threshold limit set by the user for detecting the code smell. As
per the disagreements documented by our subjects, methods
containing large switch statements should not be counted as
a long method. iPlasma had a complex detection mechanism
that considers such kind of things while detecting the Long
Method code smells. On the contrary jDeodrant listed fairly
small methods as a long method code smell.</p>
      <p>The degree of disagreement to the code smells detected for
GanttProject was 12 in jDeodrant, 27 in Stench Blossom and
8 in PMD. For JHotDraw project, it was 10 in jDeodrant, 12
in Stench Blossom, 11 in iPlasma and 8 in PMD.</p>
      <p>
        The detections from jDeodrant tool were significantly high
as compared to 57 (v1.111.1) of GanttProject from Fontana et
al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] study. Interestingly, we noticed that the long methods
were reduced from 160 (in v1.10) to 57 (in v1.11.1), whereas
as in our study it was still high i.e. 221.This was because as
a software evolves, it is expected that long methods will grow
over a period of time.
Tool
JDeodarant,
(vv. 3.5 e3.6),
[EP],Java
Stench
Blossom
(v. 3.3),
[EP],Java
InCode [SA],
      </p>
      <p>C, C++, Java</p>
      <p>The major challenge in assessing correctness of detected
smells is the knowledge possessed by human subjects in
regards to the code smells and the behavior of the code
itself. Since the subjects (users) were not familiar with the
tools chosen for the study, they complained about the time
consumed in understanding and using the tool. At times, the
focus seemed to be more on the user-interface of the tool rather
than detected code smells. The authors had to revert back to
the subjects to get additional clarification about the comments
written in the templates regarding specific code smells.</p>
      <p>Some tools require explicit specification of rules (for
example, PMD) for detecting a specific code smell. So, selecting
rules from the entire list of rules was tedious and
timeconsuming. Most of the users seemed to struggle with the
setup and configuration issues of the tools. Few tools like
jDeodorant had high memory utilization and performance
issues. So re-compilation after every change and re-running
the detection process was laborious.</p>
      <p>
        Contrary to our belief that the same code smell must be
identified in the same way by different tools, the disagreement
in the correctness of the detected code smells between the
various tools for the same type of smell was pretty high
(as shown in table III). Additionally, the results were not
complimenting the results provided in prior research [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In
other words, the correctness of the detected smells was not
accurate with respect to the semantics of the code written.
      </p>
      <p>To validate the results, we had to further cross-check the
tools with some dummy examples. For instance, the dummy
code (shown in Listing 1) was detected as a Feature Envy
smell in some of the tools (for example, jDeodorant). If
’doCommit’ method is moved to any of the classes (A, B
and C in this example) we must pass the other two classes
as a reference parameter, that in turn increases the coupling.</p>
      <p>Moreover, semantically it makes sense to call all commit
methods in ’doCommit’ method itself.</p>
      <p>An interesting observation can be made from the tools that
detected Request Bequest code smell instances. For example,
code resembling the dummy code (shown in Listing 2) reveals
that there is no behavior written in the base method and just
because it was being overridden without any invocation of
base methods, it was detected as a smell. In other words,
intuitively the authors could conclude even such code smells
are primarily being thought about as syntactic where in the
tool is just looking for redundant names in the superclass
and subclass. Ideally, the tool should check if there is any
meaningful behavior attached to the base method and only
then should it be detected as a smell.</p>
      <p>The dummy example (shown in Listing 3) was not detected
as feature envy except by stench blossom tool. The probable
reason for non detection is the declaration of the phone object
inside the getMobilePhoneNumber method. Logically
may not be detected as a code smell but from a semantic
perspective getMobilePhoneNumber should be part of
”phone” class. This issue poses a question of correct semantic
code smell detection by tools.</p>
      <p>The dummy example (shown in Listing 4) has shotgun
surgery code smell. The example shows a common mistake
that users commit while creating database connection and
querying tables. Users tend to create individual connections
and command objects in each of the methods as shown in the
example. If we have a connection timeout occurs semantically
is make sense to create a utility method that takes SQL as
an argument and returns result set ”DBUtility.getList” 4 conn = DriverManager.getConnection(DB_URL,
in this method we will open connection and create SQL 5 USER, PASS);
statements. InCode and iPlasma tools did not detect this code 6 stmt = conn.createStatement();
7 String sql;
smell. 8 sql = "SELECT * FROM Employees";</p>
      <p>Although, several tools detect code smells, they do not 9 ResultSet rs = stmt.executeQuery(sql);
consider the semantic intent of the code and hence end up 10 return rs.toList&lt;Employee&gt;();
with lot of false positives. Reducing false positives is the 11 }
first step towards increasing the confidence levels of users and 12 public List&lt;Customer&gt; getCustomerList() {
13 Connection conn = null;
proportionately increasing the usage of refactoring tools. 14 Statement stmt = null;
15 Class.forName("com.mysql.jdbc.Driver");
Listing 1: Feature Envy 16 conn = DriverManager.getConnection(DB_URL,
17 USER, PASS);
18 stmt1 = conn1.createStatement();
19 String sql;
20 sql = "SELECT * FROM Customers";
21 ResultSet rs = stmt1.executeQuery(sql);
22 return rs.toList&lt;Customer&gt;(); }
1 public class Main {
2 public void doCommit(){
3 a.commit();
4 b.commit();
5 c.commit();}}
6 public class A {
7 public void commit() {
8 //do something
9 } }
10 public class B {
11 public void commit() {
12 //do something
13 } }
14 public class C {
15 public void commit() {
16 //do something
17 } }
Listing 2: Refused Bequest
1 public class Base{
2 protected void m1() { }
3 }
4 public class Inherited extends Base {
5 protected void m1() { //do something }
6 }
Listing 3: Feature Envy
1 public class Phone {
2 private final String unformattedNumber;
3 public Phone(String unformattedNumber) {
4 this.unformattedNumber = unformattedNumber;
5 }
6 public String getAreaCode() {
7 return unformattedNumber.substring(0,3);
8 }
9 public String getPrefix() {
10 return unformattedNumber.substring(3,6);
11 }
12 public String getNumber() {
13 return unformattedNumber.substring(6,10);
14 } }
15 public class Customer {
16 public String getMobilePhoneNumber() {
17 Phone m_Phone = new Phone("111-123-2345");
18 return "(" + m_Phone.getAreaCode() + ") "
19 + m_Phone.getPrefix() + "-"
20 + m_Phone.getNumber();
21 } }
Listing 4: Shotgun Surgery
1 public List&lt;Employee&gt; getEmployeeList() {
2 Connection conn = null;
3 Statement stmt = null;</p>
    </sec>
    <sec id="sec-6">
      <title>VII. CODE SMELL DETECTION PRE-CHECKS Based on our study, we recommend some pre-checks specific to particular code smells to improve the correctness of detection:</title>
      <sec id="sec-6-1">
        <title>Feature Envy:</title>
        <p>Are referencing methods from multiple classes. Check if
moving a method increases coupling and references to
the target class.</p>
        <p>Check if mutual work like object dispose, commit of
multiple transactions is accomplished in the method. Is it
semantically performing a cumulative task.</p>
        <p>Take domain knowledge and system size into account
before detecting smell, use semantic/information retrieval
techniques to identify domain concepts</p>
      </sec>
      <sec id="sec-6-2">
        <title>Long Class/ God class:</title>
        <p>Ignore utility classes, parsers, compiles, interpreters
exception handlers
Ignore Java beans, Data Models, Log file classes
Transaction manager classes</p>
        <p>Normalized results with system size</p>
      </sec>
      <sec id="sec-6-3">
        <title>Long Method:</title>
        <p>Check if large switch blocks are written in the methods
if multiple code blocks can be extracted with different
functionality</p>
      </sec>
      <sec id="sec-6-4">
        <title>Refuse Parent Bequest:</title>
        <p>Ignore Interface methods, they are meant to be
overridden.</p>
        <p>Check if base method has any meaning full behaviour
attached to it.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>VIII. LIMITATIONS &amp; FUTURE WORK</title>
      <p>Our initial study has provided evidence of disagreement
towards the detected code smell from tools by our human
subjects. We presented sample code for incorrectly detected
code smells and semantic smells that were not detected by the
tools. The authors strongly felt the need of taking semantic
intent of the code into consideration while detecting smells
and proposing (semi) automatic refactoring.</p>
      <p>The disagreement in detected code smells correlates to
the confidence levels of users. We saw that the results from
all the users were not the same for the same code smell
detection tools. So, the accuracy of these results can always
be questioned. The results of Fontana et al. were used for
cross-validation of our work. For the degree of disagreement
to the detected code smells, we took an average of the overall
disagreement report by the users. But, code smell disagreement
is subjective until a standardized method for detection and
measurement is proposed.</p>
      <p>As a future work we would to extend our experiment
with more industry level users. We would like to share the
disagreement detected by our human subjects with the actual
developers of the system to validate our findings. Towards
understanding the correctness of the semantic code smell we
would like to compare detection logic by tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <article-title>Code smell evaluation template</article-title>
          . http://bit.ly/1WBfZPy. [Online; accessed 30-June-2015].
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <article-title>Code smells detected by human subjects &amp; their disagreements</article-title>
          . http: //bit.ly/1BNJ6KS.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] Detailed profile of our human subjects</article-title>
          . http://bit.ly/1KiuEvY. [Online; accessed 30-June-2015].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Campbell</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Miller</surname>
          </string-name>
          .
          <article-title>Designing refactoring tools for developers</article-title>
          .
          <source>In Proceedings of the 2Nd Workshop on Refactoring Tools, WRT '08</source>
          , pages
          <issue>9</issue>
          :
          <fpage>1</fpage>
          -
          <issue>9</issue>
          :
          <fpage>2</fpage>
          , New York, NY, USA,
          <year>2008</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Ferme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F. A.</given-names>
            <surname>Fontana</surname>
          </string-name>
          .
          <article-title>Is it a real code smell to be removed or not</article-title>
          ? In International Workshop on Refactoring &amp;
          <article-title>Testing (RefTest), co-located event with XP 2013 Conference</article-title>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>F.</given-names>
            <surname>Fontana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mariani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Morniroli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sormani</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Tonello</surname>
          </string-name>
          .
          <article-title>An experience report on using code smells detection tools</article-title>
          .
          <source>In Software Testing, Verification and Validation Workshops (ICSTW)</source>
          ,
          <source>2011 IEEE Fourth International Conference on</source>
          , pages
          <fpage>450</fpage>
          -
          <lpage>457</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Fowler</surname>
          </string-name>
          .
          <article-title>Refactoring: improving the design of existing code</article-title>
          .
          <source>Pearson Education India</source>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Mahmood</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Reddy</surname>
          </string-name>
          .
          <article-title>Automated refactorings in java using intellij idea to extract and propogate constants</article-title>
          .
          <source>In Advance Computing Conference (IACC)</source>
          ,
          <source>2014 IEEE International</source>
          , pages
          <fpage>1406</fpage>
          -
          <lpage>1414</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>M. M A</surname>
          </string-name>
          <article-title>˜ntylA˜</article-title>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lassenius</surname>
          </string-name>
          .
          <article-title>Subjective evaluation of software evolvability using code smells: An empirical study</article-title>
          .
          <source>Empirical Software Engineering</source>
          ,
          <volume>11</volume>
          (
          <issue>3</issue>
          ):
          <fpage>395</fpage>
          -
          <lpage>431</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Olbrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cruzes</surname>
          </string-name>
          , and
          <string-name>
            <surname>D. I. Sjoberg.</surname>
          </string-name>
          <article-title>Are all code smells harmful? a study of god classes and brain classes in the evolution of three open source systems</article-title>
          .
          <source>In Software Maintenance (ICSM)</source>
          ,
          <year>2010</year>
          IEEE International Conference on, pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>W. F.</given-names>
            <surname>Opdyke</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. E.</given-names>
            <surname>Johnson</surname>
          </string-name>
          . Refactoring:
          <article-title>An aid in designing application frameworks and evolving object-oriented systems</article-title>
          .
          <source>In Symposium on Object-Oriented Programming Emphasizing Practical Applications</source>
          ,
          <year>September 1990</year>
          .,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ouni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kessentini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sahraoui</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Hamdi</surname>
          </string-name>
          .
          <article-title>Search-based refactoring: Towards semantics preservation</article-title>
          .
          <source>In Software Maintenance (ICSM)</source>
          ,
          <year>2012</year>
          28th IEEE International Conference on, pages
          <fpage>347</fpage>
          -
          <lpage>356</lpage>
          ,
          <year>Sept 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Palomba</surname>
          </string-name>
          , G. Bavota,
          <string-name>
            <given-names>M. Di</given-names>
            <surname>Penta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Oliveto</surname>
          </string-name>
          , and
          <string-name>
            <surname>A. De Lucia</surname>
          </string-name>
          .
          <article-title>Do they really smell bad? a study on developers' perception of bad code smells</article-title>
          .
          <source>In Software Maintenance and Evolution (ICSME)</source>
          ,
          <year>2014</year>
          IEEE International Conference on, pages
          <fpage>101</fpage>
          -
          <lpage>110</lpage>
          ,
          <year>Sept 2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G. H.</given-names>
            <surname>Pinto</surname>
          </string-name>
          and
          <string-name>
            <given-names>F.</given-names>
            <surname>Kamei</surname>
          </string-name>
          .
          <article-title>What programmers say about refactoring tools?: An empirical investigation of stack overflow</article-title>
          .
          <source>In Proceedings of the 2013 ACM Workshop on Workshop on Refactoring Tools, WRT '13</source>
          , pages
          <fpage>33</fpage>
          -
          <lpage>36</lpage>
          , New York, NY, USA,
          <year>2013</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tempero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Anslow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Dietrich</surname>
          </string-name>
          , T. Han,
          <string-name>
            <given-names>J</given-names>
            .
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lumpe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Melton</surname>
          </string-name>
          , and
          <string-name>
            <surname>J. Noble.</surname>
          </string-name>
          <article-title>The qualitas corpus: A curated collection of java code for empirical studies</article-title>
          .
          <source>In Software Engineering Conference (APSEC)</source>
          ,
          <year>2010</year>
          17th
          <string-name>
            <given-names>Asia</given-names>
            <surname>Pacific</surname>
          </string-name>
          , pages
          <fpage>336</fpage>
          -
          <lpage>345</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>W. C.</given-names>
            <surname>Wake</surname>
          </string-name>
          . Refactoring Workbook.
          <string-name>
            <surname>Addison-Wesley</surname>
            <given-names>Longman</given-names>
          </string-name>
          , Publishing Co., Inc., Boston, MA, USA, 1 edition edition,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>