=Paper=
{{Paper
|id=Vol-1519/paper3
|storemode=property
|title=Correctness of Semantic Code Smell Detection Tools
|pdfUrl=https://ceur-ws.org/Vol-1519/paper3.pdf
|volume=Vol-1519
|dblpUrl=https://dblp.org/rec/conf/apsec/MathurR15
}}
==Correctness of Semantic Code Smell Detection Tools==
<pdf width="1500px">https://ceur-ws.org/Vol-1519/paper3.pdf</pdf>
<pre>
      Correctness of Semantic Code Smell Detection
                         Tools
                                             Neeraj Mathur∗ and Y Raghu Reddy†
                                           ∗† Software Engineering Research Center,

                      International Institute of Information Technology, Hyderabad (IIIT-H), India
                                ∗ neeraj.mathur@research.iiit.ac.in, † raghu.reddy@iiit.ac.in


   Abstract—Refactoring is a set of techniques used to enhance        language structure and context. In other words, any design-
the quality of code by restructuring existing code/design without     based refactorings require the tool to understand the actual
changing its behavior. Refactoring tools can be used to detect spe-   semantic intent of the code itself. For example, Long method
cific code smells, propose relevant refactorings, and in some cases
automate the refactoring process. However, usage of refactoring       is one such code smell that requires the tool to understand the
tools in industry is still relatively low. One of the major reasons   context of the method before an Extract method refactoring
being the veracity of the detected code smells, especially smells     can be performed automatically while preserving the original
that aren’t purely syntactic in nature. We conduct an empirical       semantic intent.
study on some refactoring tools and evaluate the correctness of
the code smells they identify. We analyze the level of confidence        Despite the known benefits of refactoring tools, their usage
users have on the code smells detected by the tools and discuss       is not widespread due to users’ lack of trust on the tools’
some issues with such tools.                                          code smell detection capability, learning curve involved, and
   Index Terms—Correctness, Detection, Maintenance, Refactor-
                                                                      the inability of users to understand the code smell detection
ing, Semantic code smells.                                            results [4], [6], [14]. In this paper, we study the correctness
                                                                      of the detected code smells of multiple open source tools
                                                                      like JDeodorant, InCode, etc. and the lack of trust of users
                       I. I NTRODUCTION                               on their detection capability through an empirical study of
                                                                      open source projects like JHotDraw (www.jhotdraw.org) and
   Refactoring improves various qualities of code/design like         GanttProject (www.ganttproject.biz). We believe that lack of
maintainability (understandability and readability), extensibil-      trust is proportional to the correctness of the tools detection
ity, etc. by changing the structure of the code/design without        ability. Most code smell detection tools detect code smells that
changing the overall behaviour of the system. It was first            require small-scale resolutions correctly. However, correctness
introduced by Opdyke and Johnson [11] and later popularized           is an issue when design based refactorings or semantic intent
by Martin Fowler [7]. Fowler categorized various types of             is considered. Hence, we restrict our focus to tools that
refactorings in terms of their applicability and suggested vari-      detect code smells that require the tool to understand the
ous refactorings for code smells (bad indicators in code) within      semantic intent (for example, Feature Envy, Long Methods,
classes and between classes. Over the years, other researchers        Shotgun Surgery, God class, etc.). From now on, we refer to
have added to the knowledge base on code smells [16]. Any             such code smells as ”semantic code smells”. We cross
refactoring done to address specific code smells requires one         validate our study results on GanttProject with the study results
to test the refactored code with respect to preservation of the       conducted by Fontana et al. [6] on GanttProject (v1.11.1).
original behaviour. This is primarily done by writing test cases      Additionally, we used our own dummy code with a few
before the refactoring is done and testing the refactored code        induced semantic code smells to check for correctness of the
against the test cases. As a result, validating the correctness       tools. We focus on the following in this paper:
of a detected code smell and the automation of it’s refactoring
is very difficult. Explicit manual intervention may be needed.          • Correctness of the identified code smells among the
                                                                          chosen tools
   Many code smell detection tools support (semi) automatic
                                                                        • Deviation in confidence levels of developers in open
refactoring process [4], [8]. iPlasma, Jdeodorant, RefactorJ,
                                                                          source code smell detection tools that detect semantic
etc. are some of the tools that can be used for detection of
                                                                          code smells
code smells and application of specific refactoring techniques
in automated or semi-automated manner. As noted in our                   The rest of the paper is structured as follows: Section II
previous work, each of these tools can detect only certain type       provides a brief overview of the code smells discussed in this
of code smells automatically [8]. Also, there is no standardized      paper and section III presents some related work. In section IV,
approach for detecting such code smells and hence tools follow        we detail the study design. Section V and VI present the results
their own approach [9], inevitably leading to different set of        and analysis of the results. Based on the study, we provide
code smells being detected for the same piece of code. Most           some guidelines for increasing the correctness of detecting
code smell detection techniques depend on static analysis and         some code smells in section VII. Finally, in section VIII we
code metrics and do not consider factors like system size,            discuss some limitations to our work.
      3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015)                                 17
                      II. C ODE S MELLS                                                   IV. S TUDY D ESIGN
   Complexity related metrics like coupling are commonly
used in tools to detect certain semantic code smells. Threshold          The objective of our study is to analyze the correctness of
values are established for various metrics and the code smells       tools relevant to semantic code smells by performing a study
are identified based on the threshold values. Code smells like       with human subjects with prior refactoring experience. Tools
Feature envy, long methods, god class, etc. are widely studied       like JDeodorant, PMD, InCode, iPlasma, and Stench Blossom,
in the literature. In our study, we primarily target the following   and two large systems like JHotDraw and GanttProject that
code smells:                                                         have been widely studied in refactoring research were con-
                                                                     sidered. Choosing these tools and systems helped us in cross
   • Feature envy: A method is more interested in some other
                                                                     validating our work with prior work done by us [8] and other
     class than the one in which it is defined.
                                                                     researchers using similar systems/tools.
   • Long methods: Method that is too long (measured in
     terms of lines of code or other metrics), possibly leading          Initially, 35 human subjects volunteered to be part of the
     to low cohesion and high coupling                               study. The exact hypothesis of the study was not informed
                                                                     to the subject to avoid biasing the study results. The subjects
   • God class: A God class performs too much work on
                                                                     were only informed of the specific tasks that needed to be
     its own delegating only minor details to a set of trivial
                                                                     done, i.e. to assess certain refactoring tools with respect to
     classes.
                                                                     their ability to detect code smells based on a given criteria
   • Large Class: A class with too many instance variables or
                                                                     and fill in a template.
     methods. It may become a God class.
                                                                         All the human subjects had varying levels of prior knowl-
   • Shotgun Surgery: A change in one classes necessitates a
                                                                     edge about code smells and tool based refactorings. However,
     change in many other classes
                                                                     they had not worked on the specific tools used for this study.
   • Refused Bequest: A sub-class not using its inherited
                                                                     The subjects had varied experience (13% subjects had 1-5
     functionality                                                   years of experience, 31% had 5-10 years of experience and
                                                                     rest had less than a year of experience). The detailed statistics
                     III. R ELATED W ORK
                                                                     of subjects is available at [3] . We asked them to focus
   Fontana et al. [6] showed the comparative analysis of             on specific code smells like Feature Envy, Long Methods,
code smells detected by various refactoring tools and their          Shotgun Surgery, God class, etc. and list down all these smells
support of (semi) automatic refactoring. The study analyzes the      and record their comments/rationale for detecting these as code
differences between code smell detection tools. In our previous      smells. The template [1] had a column that asked them the
work [8], we analyzed various java based refactoring tools with      semantic meaning of the detected code smells. In addition,
respect to its usability and reasoned about the automation of        we asked them to provide reasoning for not agreeing with
various code smells.                                                 the refactorings detected by the tools used. The subjects were
   Pinto et al. [14] investigated data from StackOverflow to         given a three-week period to perform the activity and fill in
find out the barriers for adoption of code detection tools.          templates. After evaluation of the templates, results from 32
They listed the issues mentioned in the forum related to             subjects were taken into consideration. The other three did not
the adoption/usability issues users are talking about in the         fill in the templates completely. To cross-check the correctness
StackOverflow. Olbrich et al. [10] performed an empirical            of the tools with respect to their semantic code smell detection,
study for God Class and Brain Class to evaluate that detected        in addition to the two open-source systems, we instrumented
smells are really smells. From their empirical study they have       one of our own projects. It was interesting to note that most of
concluded that if the results are normalized with the size of        the tools were not detecting code smells that seemed obvious
the system then smell results will become opposite. In fact the      from our perspective.
detected smell classes were less likely for changes and errors.
   Ferme et al. [5] conducted a study for God Class and
Data Class to find out that all smells are the real smells.
                                                                     A. Subject Systems
They have proposed filters to be used to reduce or refine
the detection rules of these smells. This paper extends our            In our study, we chose two open source systems:
previous work and complements the work done by other
                                                                       • GanttProject - a free project management app where users
authors by considering semantic code smells. In [13], Palomba
                                                                         can create tasks, create project baseline, organize tasks in
et al. studied developers perception of bad smells. Their study
                                                                         a work break down structure
depicts gap between theory and practice, i.e., what is believed
                                                                       • JHotDraw - a Java GUI framework for technical and
to be a problem (theory) and what is actually a problem
                                                                         structured Graphics. Its design relies heavily on some
(practice). Their study provide insights on characteristics of
                                                                         well-known design patterns.
bad smells not yet explored sufficiently.
   Ouni et al. [12] proposed search based refactoring approach       The subject systems are available for use under open source
to preserve domain semantics of a program when refactoring           license and are fairly well documented. In addition to being
is decided/implemented automatically. They argued that refac-        a part of Qualitas Corpus [15], these are widely studied in
toring might be syntactically correct, have the right behaviour,     refactoring research and referenced in the literature. Table 1
but model incorrectly the domain semantics.                          provides details of these systems.
      3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015)                                18
         TABLE I: Characteristics of subject systems                number of god classes were significantly lesser: reduced from
                                    JHotDraw   Ganttproject         13 (v1.11.1) to 4 (v2.7). Unlike JDeodrant and inCode, the
       Version                         5.4      2.7.1891            results from iPlasma increased: from 13 (v1.11.1) to 42 (v2.7).
       Total Lines of Code            32435      46698
                                                                    This inconsistency between tools reduces the confidence level
       Number of classes               368        1230
       Number of methods              3341        5917              of the results.
       Weighted methods per class     6990        8540                 The degree of disagreement to jDeodrant code smells for
       Number of static methods        280         235              jHotDraw project was 10 out of 56. Our human subjects
                                                                    observed that tools were considering the data model classes
                                                                    (getters and setters) and parser classes as smells. Usually these
B. Code Smell Detection Tools
                                                                    classes are needed and are necessary evils. As a result, there is
   There are several commercial and open source code smell          a need of building some sort of intelligence/metrics to detect
detection tools. Some are research prototypes meant for detec-      these kinds of classes which can safely be ignored to reduce
tion of specific code smells while others identify a wide range     the false positives.
of code smells. In addition to detection, some tools refactor the
                                                                       Large Class: This code smell is detected by the code size
code smells automatically, while others are semi-automated.
                                                                    and is subjective to the threshold limit of LOC set by the user.
Semi-automated refactoring tools propose refactorings that
                                                                    Classes that contain project utility methods can grow in size
require human intervention and can be automated to a certain
                                                                    with a lot of small utility methods and usually developers make
extent by changing rules of detection. Some of these are more
                                                                    these classes as singleton objects. Refactoring such smells
or less like recommender systems that recommend certain type
                                                                    requires one to be able understand the intent of the sentences
of refactorings but leave it to the developers to refactor or
                                                                    before an extract method or extract class refactoring is applied.
ignore the recommendations.
                                                                       Refused Bequest: Some tools considered interface methods
   Some tools are integrated with the IDE itself, while others
                                                                    and abstract methods that are normally overridden by the
are written as plugins that can be installed on a need basis.
                                                                    respective concrete classes as a code smell resulting in a lot
For example, checkstyle, PMD, etc. are some eclipse based
                                                                    of false positives.
plugins that are well known. For our study, we focused on
tools widely studied in the literature, their semantic code smell      Compared to Fontana et al.’s [6] study, inCode detected
detection ability and usage in the industry. The list of tools      6 (in V2.7) whereas it was 0 (in V1.11.1). The degree of
and the detected code smells relevant to our study are shown        disagreement to the detected code smells was 2 in inCode,
in table II.                                                        1 in iPlasma for GanttProject. For jHotDraw it was 1 (for
                                                                    inCode) and 2 (for iPlasma).
                   V. E XPERIMENT R ESULTS                             Shotgun Surgery: The results of our study for shotgun
   Table III provides a cumulative summary of code smells           surgery on v2.7 were similar to the one’s for the GanttProject
detected and disagreements (weighted average of results from        (v1.11.1). However, from our study we can conclude that
32 results) by our human subjects. To show the disparity            the way shotgun surgery is detected in the different code
in results, we compare our results for the GanttProject with        smells differs from tool to tool rather that different versions
Fontana et al.’s results [6]. For reference, the detailed list of   of the same tool. The degree of disagreement to the detected
detected smell by our human subjects is available at [2].           code smells was 2 for GanttProject using inCode and 7 using
   Feature Envy: The number of Feature Envy methods                 iPlasma.
detected by different tools varies significantly. Some tools           Long Method: This code smell is related to number of
consider any three or more calls to method of the other class       lines of code present in the method and subject to the
as a code smell and hence give rise to large number of false        threshold limit set by the user for detecting the code smell. As
positive code smells. In such cases, when it’s just counting        per the disagreements documented by our subjects, methods
numbers, it becomes tedious to filter out the actual smells.        containing large switch statements should not be counted as
   The degree of disagreement was found to be 9/44 for              a long method. iPlasma had a complex detection mechanism
JDeodorant and 3/4 for inCode for the JHotDraw project.             that considers such kind of things while detecting the Long
Disagreements by the human subjects were prevalent across           Method code smells. On the contrary jDeodrant listed fairly
all tools for detected feature envy smells. We also observed        small methods as a long method code smell.
a significant difference between the results of Fontana’s [6]          The degree of disagreement to the code smells detected for
study and our JDeodorant results. Their study reveals that over     GanttProject was 12 in jDeodrant, 27 in Stench Blossom and
a period of time, in GanttProject version from V1.10 to 1.11.1,     8 in PMD. For JHotDraw project, it was 10 in jDeodrant, 12
Feature Envy methods reduced to 2 (in v1.11.1) from 18 (in          in Stench Blossom, 11 in iPlasma and 8 in PMD.
v1.10), whereas in our study of GanttProject v2.7, there were          The detections from jDeodrant tool were significantly high
113 Feature Envy code smells. As it can be seen from the            as compared to 57 (v1.111.1) of GanttProject from Fontana et
results, the number of detected smells is significantly different   al. [6] study. Interestingly, we noticed that the long methods
from the numbers given in their work.                               were reduced from 160 (in v1.10) to 57 (in v1.11.1), whereas
   God Class: The number of detected God classes detected           as in our study it was still high i.e. 221.This was because as
by JDeodorant for GanttProject in Fontana et al.’s [6] study        a software evolves, it is expected that long methods will grow
was 22 (v1.11.1) where as in v2.7 it is 127. In inCode the          over a period of time.
     3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015)                                19
                                                        TABLE II: Code smell detection tools
        Tool                  Code smells                 Detail
        JDeodarant,                                        This is an Eclipse plug-in that automatically identifies four code smells in Java programs.
        (vv. 3.5 e3.6),       FE, GC, LM, TC               It ranks the refactoring according to their impact on the design and automatically applies
        [EP],Java                                          the most effective refactoring.
        Stench
                                                            This tool provides a high-level overview of the smells in their code. It is an Eclipse
        Blossom
                              FE, LM, LC, MC                plugin with three different views that progressively offer more visualized information
        (v. 3.3),
                                                            about the smells in the code.
        [EP],Java
        InCode [SA],                                      This tool supports the analysis of a system at architectural and code level.It allows for
                              BM, FE, GC, IC, RB, SS
        C, C++, Java                                      detection of more than 20 design flaws and code smells.
                                                          It can be used for quality assessment of object-oriented systems and supports all phases
        iPlasma [SA],      BM, FE, GC, IC, SS,
                                                          of analysis: from model extraction up to high-level metrics based analysis, or detection
        C++, Java          RB, LM, SG
                                                          of code duplication.
                                                          Scans Java source code and looks for potential bugs such as dead code, empty
        PMD,
                           LC, LM                         try/catch/finally/switch statements, unused local variables, parameters and duplicate
        [EP or SA],Java
                                                          code.
        Feature Envy (FE), Refuse Bequest (RB), God Class (GC), Long Method (LM), Lazy Class (LC), Intensive Coupling (IC),
        Shotgun Surgery (SS), Speculative Generality (SG), Dispersed Coupling (DC), Brain Method (BM).
        Type: Standalone Application (SA), Eclipse Plug-in (EP)


     TABLE III: Code Smell Detected & Disagreement                               (as shown in table III). Additionally, the results were not
                                                    Stench                       complimenting the results provided in prior research [6]. In
  Code      jDeodrant     inCode       iPlasma                    PMD
                                                   Blossom                       other words, the correctness of the detected smells was not
  Smell
             $      #     $       #    $     #     $      #      $      #
                                 JHotDraw
                                                                                 accurate with respect to the semantics of the code written.
   FE       44      9      4      3   35     4    28       10   56     9         To validate the results, we had to further cross-check the
   GC       56      10    14     15   22     2     -        -   34     10        tools with some dummy examples. For instance, the dummy
   LC        -       -     -      -    -     -    22        5   41     5         code (shown in Listing 1) was detected as a Feature Envy
   RB        -       -     4      1    2     2     -        -    -      -
   SS        -       -    10      3   13     6     -        -    -      -        smell in some of the tools (for example, jDeodorant). If
   LM       90      10     -      -   94    11    113      12   73     8         ’doCommit’ method is moved to any of the classes (A, B
                                GnattProject                                     and C in this example) we must pass the other two classes
   FE      113     12    11       2   42    13     53      28   38     4
                                                                                 as a reference parameter, that in turn increases the coupling.
   GC      127      8     4       3   24     3      -       -   37     17
   LC        -      -     -       -    -     -      9       -   40     9         Moreover, semantically it makes sense to call all commit
   RB        -      -     6       2    3     1      -       -    -      -        methods in ’doCommit’ method itself.
   SS        -      -     7       2   42     7      -       -    -      -
   LM      221     12     -       -    -     -     55      27   36     8
                                                                                    An interesting observation can be made from the tools that
  $ - Detected, # - Disagree                                                     detected Request Bequest code smell instances. For example,
                                                                                 code resembling the dummy code (shown in Listing 2) reveals
                                                                                 that there is no behavior written in the base method and just
                          VI. D ISCUSSION                                        because it was being overridden without any invocation of
   The major challenge in assessing correctness of detected                      base methods, it was detected as a smell. In other words,
smells is the knowledge possessed by human subjects in                           intuitively the authors could conclude even such code smells
regards to the code smells and the behavior of the code                          are primarily being thought about as syntactic where in the
itself. Since the subjects (users) were not familiar with the                    tool is just looking for redundant names in the superclass
tools chosen for the study, they complained about the time                       and subclass. Ideally, the tool should check if there is any
consumed in understanding and using the tool. At times, the                      meaningful behavior attached to the base method and only
focus seemed to be more on the user-interface of the tool rather                 then should it be detected as a smell.
than detected code smells. The authors had to revert back to                        The dummy example (shown in Listing 3) was not detected
the subjects to get additional clarification about the comments                  as feature envy except by stench blossom tool. The probable
written in the templates regarding specific code smells.                         reason for non detection is the declaration of the phone object
   Some tools require explicit specification of rules (for exam-                 inside the getMobilePhoneNumber method. Logically
ple, PMD) for detecting a specific code smell. So, selecting                     may not be detected as a code smell but from a semantic
rules from the entire list of rules was tedious and time-                        perspective getMobilePhoneNumber should be part of
consuming. Most of the users seemed to struggle with the                         ”phone” class. This issue poses a question of correct semantic
setup and configuration issues of the tools. Few tools like                      code smell detection by tools.
jDeodorant had high memory utilization and performance                              The dummy example (shown in Listing 4) has shotgun
issues. So re-compilation after every change and re-running                      surgery code smell. The example shows a common mistake
the detection process was laborious.                                             that users commit while creating database connection and
   Contrary to our belief that the same code smell must be                       querying tables. Users tend to create individual connections
identified in the same way by different tools, the disagreement                  and command objects in each of the methods as shown in the
in the correctness of the detected code smells between the                       example. If we have a connection timeout occurs semantically
various tools for the same type of smell was pretty high                         is make sense to create a utility method that takes SQL as
     3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015)                                                             20
an argument and returns result set ”DBUtility.getList”             4   conn = DriverManager.getConnection(DB_URL,
in this method we will open connection and create SQL              5           USER, PASS);
                                                                    6 stmt = conn.createStatement();
statements. InCode and iPlasma tools did not detect this code
                                                                    7 String sql;
smell.                                                              8 sql = "SELECT * FROM Employees";
   Although, several tools detect code smells, they do not          9 ResultSet rs = stmt.executeQuery(sql);

consider the semantic intent of the code and hence end up          10 return rs.toList<Employee>();

with lot of false positives. Reducing false positives is the       11 }

                                                                   12 public List<Customer> getCustomerList() {
first step towards increasing the confidence levels of users and
                                                                   13 Connection conn = null;
proportionately increasing the usage of refactoring tools.         14 Statement stmt = null;
                                                                   15 Class.forName("com.mysql.jdbc.Driver");

Listing 1: Feature Envy                                            16 conn = DriverManager.getConnection(DB_URL,

1  public class Main {                                             17          USER, PASS);
                                                                   18 stmt1 = conn1.createStatement();
2     public void doCommit(){
                                                                   19 String sql;
 3        a.commit();
                                                                   20 sql = "SELECT * FROM Customers";
 4        b.commit();
                                                                   21 ResultSet rs = stmt1.executeQuery(sql);
 5        c.commit();}}
                                                                   22 return rs.toList<Customer>(); }
 6 public class A {

 7    public void commit() {
 8          //do something
 9    } }                                                                  VII. C ODE SMELL DETECTION PRE - CHECKS
10 public class B {
                                                                      Based on our study, we recommend some pre-checks spe-
11    public void commit() {
12        //do something
                                                                   cific to particular code smells to improve the correctness of
13    } }                                                          detection:
14 public class C {                                                Feature Envy:
15    public void commit() {                                          • Are referencing methods from multiple classes. Check if
16        //do something
17    } }                                                                moving a method increases coupling and references to
                                                                         the target class.
                                                                      • Check if mutual work like object dispose, commit of
Listing 2: Refused Bequest
                                                                         multiple transactions is accomplished in the method. Is it
1 public class Base{                                                     semantically performing a cumulative task.
2   protected void m1() { }
                                                                      • Take domain knowledge and system size into account
3 }
4 public class Inherited extends Base {                                  before detecting smell, use semantic/information retrieval
5   protected void m1() { //do something }                               techniques to identify domain concepts
6 }
                                                                   Long Class/ God class:
                                                                      • Ignore utility classes, parsers, compiles, interpreters ex-
Listing 3: Feature Envy                                                  ception handlers
1  public class Phone {                                               • Ignore Java beans, Data Models, Log file classes
2   private final String unformattedNumber;                           • Transaction manager classes
 3   public Phone(String unformattedNumber) {
                                                                      • Normalized results with system size
 4    this.unformattedNumber = unformattedNumber;
 5   }                                                             Long Method:
 6   public String getAreaCode() {
                                                                      • Check if large switch blocks are written in the methods
 7    return unformattedNumber.substring(0,3);
 8   }                                                                   if multiple code blocks can be extracted with different
 9   public String getPrefix() {                                         functionality
10    return unformattedNumber.substring(3,6);                     Refuse Parent Bequest:
11   }
12   public String getNumber() {                                      • Ignore Interface methods, they are meant to be overrid-
13    return unformattedNumber.substring(6,10);                          den.
14   } }                                                              • Check if base method has any meaning full behaviour
15 public class Customer {
                                                                         attached to it.
16 public String getMobilePhoneNumber() {

17   Phone m_Phone = new Phone("111-123-2345");
18    return "(" + m_Phone.getAreaCode() + ") "                               VIII. L IMITATIONS & F UTURE W ORK
19   + m_Phone.getPrefix() + "-"
20   + m_Phone.getNumber();                                           Our initial study has provided evidence of disagreement
21 } }                                                             towards the detected code smell from tools by our human
                                                                   subjects. We presented sample code for incorrectly detected
Listing 4: Shotgun Surgery                                         code smells and semantic smells that were not detected by the
1public List<Employee> getEmployeeList() {                         tools. The authors strongly felt the need of taking semantic
2 Connection conn = null;                                          intent of the code into consideration while detecting smells
3 Statement stmt = null;                                           and proposing (semi) automatic refactoring.
     3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015)                              21
   The disagreement in detected code smells correlates to                      [6] F. Fontana, E. Mariani, A. Morniroli, R. Sormani, and A. Tonello. An
the confidence levels of users. We saw that the results from                       experience report on using code smells detection tools. In Software
                                                                                   Testing, Verification and Validation Workshops (ICSTW), 2011 IEEE
all the users were not the same for the same code smell                            Fourth International Conference on, pages 450–457, 2011.
detection tools. So, the accuracy of these results can always                  [7] M. Fowler. Refactoring: improving the design of existing code. Pearson
be questioned. The results of Fontana et al. were used for                         Education India, 1999.
                                                                               [8] J. Mahmood and Y. Reddy. Automated refactorings in java using
cross-validation of our work. For the degree of disagreement                       intellij idea to extract and propogate constants. In Advance Computing
to the detected code smells, we took an average of the overall                     Conference (IACC), 2014 IEEE International, pages 1406–1414, 2014.
disagreement report by the users. But, code smell disagreement                 [9] M. MÃntylÃ and C. Lassenius. Subjective evaluation of software
                                                                                   evolvability using code smells: An empirical study. Empirical Software
is subjective until a standardized method for detection and                        Engineering, 11(3):395–431, 2006.
measurement is proposed.                                                      [10] S. Olbrich, D. Cruzes, and D. I. Sjoberg. Are all code smells harmful?
   As a future work we would to extend our experiment                              a study of god classes and brain classes in the evolution of three
                                                                                   open source systems. In Software Maintenance (ICSM), 2010 IEEE
with more industry level users. We would like to share the                         International Conference on, pages 1–10, 2010.
disagreement detected by our human subjects with the actual                   [11] W. F. Opdyke and R. E. Johnson. Refactoring: An aid in designing appli-
developers of the system to validate our findings. Towards                         cation frameworks and evolving object-oriented systems. In Symposium
                                                                                   on Object-Oriented Programming Emphasizing Practical Applications,
understanding the correctness of the semantic code smell we                        September 1990., 1990.
would like to compare detection logic by tools.                               [12] A. Ouni, M. Kessentini, H. Sahraoui, and M. Hamdi. Search-based
                                                                                   refactoring: Towards semantics preservation. In Software Maintenance
                                                                                   (ICSM), 2012 28th IEEE International Conference on, pages 347–356,
                            R EFERENCES                                            Sept 2012.
                                                                              [13] F. Palomba, G. Bavota, M. Di Penta, R. Oliveto, and A. De Lucia. Do
[1] Code smell evaluation template. http://bit.ly/1WBfZPy. [Online;
                                                                                   they really smell bad? a study on developers’ perception of bad code
    accessed 30-June-2015].
                                                                                   smells. In Software Maintenance and Evolution (ICSME), 2014 IEEE
[2] Code smells detected by human subjects & their disagreements. http:
                                                                                   International Conference on, pages 101–110, Sept 2014.
    //bit.ly/1BNJ6KS.
                                                                              [14] G. H. Pinto and F. Kamei. What programmers say about refactoring
[3] Detailed profile of our human subjects. http://bit.ly/1KiuEvY. [Online;
                                                                                   tools?: An empirical investigation of stack overflow. In Proceedings of
    accessed 30-June-2015].
                                                                                   the 2013 ACM Workshop on Workshop on Refactoring Tools, WRT ’13,
[4] D. Campbell and M. Miller. Designing refactoring tools for developers.
                                                                                   pages 33–36, New York, NY, USA, 2013. ACM.
    In Proceedings of the 2Nd Workshop on Refactoring Tools, WRT ’08,
                                                                              [15] E. Tempero, C. Anslow, J. Dietrich, T. Han, J. Li, M. Lumpe, H. Melton,
    pages 9:1–9:2, New York, NY, USA, 2008. ACM.
                                                                                   and J. Noble. The qualitas corpus: A curated collection of java code for
[5] V. Ferme, A. Marino, and F. A. Fontana. Is it a real code smell to be
                                                                                   empirical studies. In Software Engineering Conference (APSEC), 2010
    removed or not? In International Workshop on Refactoring & Testing
                                                                                   17th Asia Pacific, pages 336–345, 2010.
    (RefTest), co-located event with XP 2013 Conference, 2013.
                                                                              [16] W. C. Wake. Refactoring Workbook. Addison-Wesley Longman,
                                                                                   Publishing Co., Inc., Boston, MA, USA, 1 edition edition, 2003.


     3rd International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2015)                                                     22

</pre>