<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Bart Van Rompaey, Bart Du Bois,
and Serge Demeyer. Characteriz-
ing the relative significance of a test
smell. In</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Test Refactoring: a Research Agenda</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>University of Antwerp,</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Middelheimlaan 1</institution>
          ,
          <addr-line>2020 Antwerp</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2006</year>
      </pub-date>
      <volume>22</volume>
      <fpage>4</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>Research on software testing generally focusses on the e↵ectiveness of test suites to detect bugs. The quality of the test code in terms of maintainability remains mostly ignored. However, just like production code, test code can su↵er from code smells that imply refactoring opportunities. In this paper, we will summerize the state-of-the-art in the field of test refactoring. We will show that there is a gap in the tool support, and propose future work which will aim to fill this gap.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Refactoring is “the process of changing a software
system in such a way that it does not alter the
external behaviour of the code yet improves its internal
structure” [Fow09]. If applied correctly, refactoring
improves the design of software, makes software
easier to understand, helps to find faults, and helps to
develop a program faster [Fow09].</p>
      <p>In most organizations, the test code is the final
“quality gate” for an application, allowing or
denying the move from development to release. With this
role comes a large responsibility: the success of an
application, and possibly the organization, rests on the
quality of the software product [Dus02]. Therefore,
it is critical that the test code itself is of high quality.</p>
    </sec>
    <sec id="sec-2">
      <title>Methods, such as code coverage analysis and mutation testing, help developers assess the e↵ectiveness of the</title>
      <p>tests suite. Yet, there is no metric or method to
measure the quality of the test code in terms of readability
and maintainability.</p>
      <p>One indication of the quality of test code could
be the presence of test smells. Similar to how
production code can su↵er from code smells, these test
specific smells can indicate problems with the test
code in terms of maintainability [VDMvdBK01].
However, refactoring test smells can be tricky, as there
is no reliable method to verify if a refactored test
suite preserves its external behaviour. Several
studies point out the peculiarities of test code
refactoring [VDMvdBK01, VDM02, Pip02, Fow09]. However,
none of them provided an operative method to
guarantee that such refactoring was preserving the behaviour
of the test.</p>
      <p>The rest of the paper is organized as follows. In
section 2 we will summerize the related work on test
smells and test refactoring, which shows test smells to
be an important issue. Section 3 we will go over the
existing test refactoring tools, showing there is a gap
in the current tool support. We will propose our future
work which aims to fill the gap in existing tool support
in section 4. In section 5 we define a theoretical model
for defining test behaviour, which will form the basis
of our proposed future work. We conlude in section 6.
2</p>
      <sec id="sec-2-1">
        <title>Related Work</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>The term test smell was first introduced by van</title>
    </sec>
    <sec id="sec-4">
      <title>Deursen et al. in 2001 as a name for any symptom</title>
      <p>in the test code of a program that possibly indicates
a deeper problem. In their paper, they defined a first
set of eleven common test smells and a set of specific
refactorings which solve those smells [VDMvdBK01].</p>
    </sec>
    <sec id="sec-5">
      <title>Meszaros expanded the list of test smells in 2007, mak</title>
      <p>ing a further distinction between test smells, behaviour
smells, and project smells [Mes07]. Greiler et al.
defined five new test smells specifically related to test
fixtures in 2013 [GvDS13].</p>
      <p>Several studies have investigated the impact test
smells have on the quality of the code. Van Rompaey
et al. performed a case study in 2006 in which they
investigated two test smells (General Fixture and
Eager Test ). They concluded that all tests which
suffer from these smells have a negative e↵ect on the
maintainability of the system [VRDBD06]. In 2012,</p>
    </sec>
    <sec id="sec-6">
      <title>Bavota et al. performed an experiment with master</title>
      <p>students in which they studied eight test smells
(Mystery Guest, General Fixture, Eager Test, Lazy Test,
Assertion Roulette, Indirect Testing, Sensitive
Equality, and Test Code Duplication). This study
provided the first empirical evidence of the negative
impact test smells have on maintainability [BQO+12].</p>
    </sec>
    <sec id="sec-7">
      <title>In 2015, they continued their research and performed</title>
      <p>the experiment with a larger group, containing more
students as well as developers from industry. They
conclude that test smells represent a potential
danger to the maintainability of production code and test
suites [BQO+15].</p>
    </sec>
    <sec id="sec-8">
      <title>In 2016, Tufano et al. investigated the nature of test</title>
      <p>smells. They conducted a large-scale empirical study
over the commit history of 152 open source projects.</p>
    </sec>
    <sec id="sec-9">
      <title>They found that test smells a↵ect the project since</title>
      <p>their creation and that they have a very high
survivability. This shows the importance of identifying test
smells early, preferably in the IDE before the commit.</p>
    </sec>
    <sec id="sec-10">
      <title>They also performed a survey with 19 developers which</title>
      <p>looked into their perception of test smells and design
issues. They showed that developers are not able to
identify the presence of test smells in their code, nor do
developers perceive them as actual design problems.</p>
    </sec>
    <sec id="sec-11">
      <title>This highlights the importance of investing e↵ort in</title>
      <p>the development of tools to identify and refactor test
smells [TPB+16].
3</p>
      <sec id="sec-11-1">
        <title>Tool Support</title>
        <sec id="sec-11-1-1">
          <title>Test Smell Detection</title>
          <p>There are many tools that can automatically detect
code smells, for example the JDeodorant Eclipse
plugin and the inFusion tool [FMM+11]. Test smells,
however, are very di↵erent from code smells and these
tools are not able to detect them. Tool support for
handling test smells and refactoring test code is
limited.</p>
          <p>In 2008, Breugelmans et al. presented TestQ, a
tool which can statically detect and visualize 12 test
smells [BVR08]. TestQ enables developers to quickly
identify test smell hot spots, indicating which tests
need refactoring. However, the lack of integration in
development environments and the overall slow
performance make TestQ unlikely to be useful in rapid
code-test-refactor cycles [BVR08].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-12">
      <title>In 2013, Greiler et al. presented a tool which can</title>
      <p>automatically detect test smells in fixtures [GvDS13].
Their tool, called TestHound, provides reports on test
smells and recommendations for refactoring the smelly
test code. They performed a case study where
developers are asked to use the tool and afterwards are
interviewed. They show that developers find that the tool
helps them to understand, reflect on and adjust test
code. However, their tool is limited to smells related
to test fixtures. Furthermore, they only report the
occurences of the di↵erent fixture-related test smells
in the code. They do not give one single metric that
represents the overall quality of the test code. During
the interviews, one developer said that the di↵erent
smells should be integrated in one high-level metric:
“This would give us an overall assessment, so that if
you make some improvements you should see it in the
metric.” [GvDS13].</p>
      <sec id="sec-12-1">
        <title>Defining Test Behaviour</title>
        <p>Refactoring of the production code can be done with
little risk using the test suite as a safeguard. Since
there is no safeguard when refactoring test code, there
is a need for tool support that can verify if a
refactored test suite preserves its behaviour pre- and
postrefactoring. Previous research on this topic has been
performed by Parsai et al. in 2015 [PMSD15]. They
propose the use of mutation testing to verify the test
behaviour. However, mutation testing requires the test
suite to be ran for each mutant, which can be hundreds
of times, making it unlikely to be useful in practice.</p>
      </sec>
    </sec>
    <sec id="sec-13">
      <title>Furthermore, while mutation testing gives an indication of the test behaviour, it cannot fully guarantee that the behaviour is preserved.</title>
      <p>4</p>
      <sec id="sec-13-1">
        <title>Research Plan</title>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>As we have shown, there is a lack of tool support when it comes to test refactoring. We plan on creating a tool that will help developers during this process. We present our future work in terms of a research agenda:</title>
      <sec id="sec-14-1">
        <title>Test Smell Detection</title>
        <p>• Objective - Create a tool that is able to detect
test smells. More specifically, the tool should
be able to detect all test smells defined by van</p>
      </sec>
    </sec>
    <sec id="sec-15">
      <title>Deursen, Meszaros, and Greiler [VDMvdBK01,</title>
      <p>Mes07, GvDS13]. This tool should also be able to
create a metric that represents the overall quality
of the test code in terms of maintainability.
• Approach - Breugelmans et al. proposed methods
for detecting all the original test smells (defined
by van Deursen et al.) [BVR08]. We will use these
methods in our tool. For the other test smells
(defined by Meszaros and Greiler et al.), we will use
a similar approach in order to define detection
methods ourselves. The metric that represents
the overall quality of the test code can be
calculated based on the amount of test smells present
in the test code.
stored as a sequence of operations. When
encounting an assert, a node which represents the assert is
added to the TBT. All child nodes of the assert are
also added, replacing variables with their stored value.</p>
      <sec id="sec-15-1">
        <title>Running Example</title>
      </sec>
      <sec id="sec-15-2">
        <title>Defining Test Behaviour</title>
        <p>• Validation - Verification of correctness will be As an example to illustrates the approach, we use the
made using a dataset consisting of a set of real following simple production code:
open-source software projects. We can compare
the tool with TestHound for fixture related test 1 c l a s s R e c t a n g l e {
smells and with TestQ for the other test smells. 2 p u b l i c :
Smells not covered by either TestHound or TestQ 43 Rinetc tgaentgHl ee(i g) t;h ( ) ;
will require manual verification. 5 i n t getWidth ( ) ;
6 v o i d s e t H e i g t h ( i n t h ) ;
7 v o i d setWidth ( i n t w) ;
8 p r i v a t e :
• Objective - Define test behaviour such that de- 9 i n t h e i g t h ;
velopers can verify if the test code is behaviour 10 i n t width ;
preserving between pre- and post- refactoring. 1121 } ;</p>
        <p>13 R e c t a n g l e : : R e c t a n g l e ( ) {}
• Approach - The production code should be deter- 14 i n t R e c t a n g l e : : g e t H e i g t h ( ) { r e t u r n h e i g t h ; } ;
ministic, and thus the same set of inputs should 15 i n t R e c t a n g l e : : getWidth ( ) { r e t u r n width ; } ;
always result in the same set of outputs. We will 16 v o i d R e c t a n g l e : : s e t H e i g t h ( i n t h ) { h e i g t h = h ; }
analyse the code in order to map all entry and 17 v o i d R e c t a n g l e : : setWidth ( i n t w) { width = w; }
exit points from test code to production code and
link them with the corresponding assertions. This
will result in the construction of a Test Behaviour
Tree (TBT), which defines the behaviour of the
test. Comparison of TBTs will allow for
validating behavior preservation between pre- and
postrefactoring. Section 5 will explain this concept in
more detail.
• Validation - We will run the algorithm on the
dataset of commits used for verifying the test
quality metric. We can do an initial check
using coverage metrics and mutation testing. When
these metrics change pre- and post-refactoring, we 1 R e c t a n g l e r = R e c t a n g l e ( ) ;
know for certain that the test behaviour changed. 2 r . setWidth ( 5 ) ;
When these metrics remain constant, we will have 3 r . s e t H e i g t h ( 1 0 ) ;
to manually verify wether the refactoring is be- 4 a s s e r t (5 == r . getWidth ( ) ) ;</p>
        <p>5 a s s e r t (10 == r . g e t H e i g t h ( ) ) ;
haviour preserving.</p>
        <p>It defines a class Rectangle which has two private data
members heigth and width, as well as getters and
setters for these data members. Note that even though
this is a toy example, there is no technical di↵erence
between simple getters and setters and large algoritmic
functions as the production code is considered a ’black
box’. There would be no di↵erence if the getters did
some advanced mathematical calculations, read from
a file, or contacted a networked database.</p>
      </sec>
    </sec>
    <sec id="sec-16">
      <title>We will start with a simple test for this production code:</title>
      <p>5</p>
      <sec id="sec-16-1">
        <title>Theoretical Model for Defining Test</title>
      </sec>
      <sec id="sec-16-2">
        <title>Behaviour</title>
      </sec>
    </sec>
    <sec id="sec-17">
      <title>In order to determine test behaviour, a Test Behaviour</title>
    </sec>
    <sec id="sec-18">
      <title>Tree (TBT) can be constructed from the Abstract Syn</title>
      <p>tax Tree (AST). This can be done by simply traversing
the AST once. During this pass of the AST, all
variables and objects need to be stored with their value.</p>
    </sec>
    <sec id="sec-19">
      <title>All subsequent operations on variables are then performed on the stored value. If a variable is initialized with a functioncall to production code, it can be stored as that call. Operations on that variable will then be</title>
      <p>This test will result in the Test Behaviour Tree shown
in figure 1. As shown, the TBT has one root node
which has a child for every assert. Each assert node
has the full comparison as a child, where variables are
replaced with their value. Since the call on the
rectangle object is considered a call to production code, the
sequence of operations is appended as a child rather
than a single value, because we consider production
code as a ’black box’. We can safely assume this, since
the production code should be deterministic
(otherwise you could not write tests for it) and should not
change when refactoring test code.
assert
assert
==
10
5</p>
      <p>FunctionMemberCallNode
FunctionMemberCallNode
FunctionMemberCallNode
getWidth</p>
      <p>FunctionMemberCallNode
getHeigth
FunctionMemberCallNode
setHeigth
10</p>
      <p>FunctionMemberCallNode
getWidth
FunctionCallNode
setWidth
5</p>
      <p>FunctionMemberCallNode
setHeigth
10
Rectangle
FunctionCallNode
setWidth</p>
      <p>5</p>
      <p>Rectangle</p>
      <sec id="sec-19-1">
        <title>Variable Refactorings</title>
        <p>One way to refactor this test would be to replace the
’magic numbers’ in the with variables. This would
greatly increase maintainability, as consistency
between input and expected output would be
guaranteed. Because variables are replaced with their value
in our approach, the following refactored test code will
result in the exact same TBT:
1 i n t x = 5 ;
2 i n t y = 1 0 ;
3 R e c t a n g l e r = R e c t a n g l e ( ) ;
4 r . setWidth ( x ) ;
5 r . s e t H e i g t h ( y ) ;
6 a s s e r t ( x == r . getWidth ( ) ) ;
7 a s s e r t ( y == r . g e t H e i g t h ( ) ) ;</p>
      </sec>
    </sec>
    <sec id="sec-20">
      <title>Similarly, the common refactoring where a variable is renamed can be performed without changing the TBT. The following code also generates the same TBT:</title>
      <p>1 i n t testWidth = 5 ;
2 i n t t e s t H e i g t h = 1 0 ;
3 R e c t a n g l e t e s t R e c t a n g l e = R e c t a n g l e ( ) ;
4 t e s t R e c t a n g l e . setWidth ( testWidth ) ;
5 t e s t R e c t a n g l e . s e t H e i g t h ( t e s t H e i g t h ) ;
6 a s s e r t ( testWidth == t e s t R e c t a n g l e . getWidth ( ) ) ;
7 a s s e r t ( t e s t H e i g t h == t e s t R e c t a n g l e . g e t H e i g t h ( )
) ;</p>
      <p>These refactorings did not change behaviour, which
is why we get the same resulting TBT. If you would
change the value of testWidth or testHeigth, the
behaviour of the test would change as you would be
testing di↵erent input - output pairs. This change in
behaviour would be detected easily detected by our
approach, as the values in the TBT would change
accordingly, resulting in a di↵erent TBT.</p>
      <sec id="sec-20-1">
        <title>Expression Refactorings</title>
      </sec>
    </sec>
    <sec id="sec-21">
      <title>Detecting a change in input - output pairs is more im</title>
      <p>portant when the test code contains some arithmetic
operations. Sometimes it is necessary to make a
calculation in the test code to use as an oracle. When it
comes to these kind of expressions in the AST, it is
possible to simply evaluate them during traversal of the
AST. The values of all variables are stored upto that
point in the program, and the result can be stored as
the new value for the corresponding variable.
Therefore, the following code still generates the same TBT,
as the behaviour did not change since the values for
testWidth and testHeigth still evaluate to 5 and 10
respectively 1):</p>
      <p>1Note that it would be bad practice to write this test, but
we use it here simply to showcase the approach.
1 i n t testWidth = 1 ; and rewrite our test to:
2 i n t t e s t H e i g t h = ((++ testWidth ) ⇤ 2) + ( (
3 testWteisdttWh i=dtht+es+t)W⇤ id3th) +++;2 ; 21 ii nn tt tteessttWH ei digt hth==sesteutuppDDataata( 1( 2) ); ;
4 R e c t a n g l e t e s t R e c t a n g l e = R e c t a n g l e ( ) ; 3 R e c t a n g l e t e s t R e c t a n g l e = R e c t a n g l e ( ) ;
5 t e s t R e c t a n g l e . setWidth ( testWidth ) ; 4 t e s t R e c t a n g l e . setWidth ( testWidth ) ;
6 t e s t R e c t a n g l e . s e t H e i g t h ( t e s t H e i g t h ) ; 5 t e s t R e c t a n g l e . s e t H e i g t h ( t e s t H e i g t h ) ;
7 a s s e r t ( testWidth == t e s t R e c t a n g l e . getWidth ( ) ) ; 6 a s s e r t ( testWidth == t e s t R e c t a n g l e . getWidth ( ) ) ;
8 a s s e r t ( t e s t H e i g t h == t e s t R e c t a n g l e . g e t H e i g t h ( ) 7 a s s e r t ( t e s t H e i g t h == t e s t R e c t a n g l e . g e t H e i g t h ( )
) ; ) ;</p>
      <sec id="sec-21-1">
        <title>Function Refactorings</title>
        <p>Another common refactoring is to extract part of the
test code to a function. As an example, we could define
the following functions:
1 i n t setupWidth ( i n t x ) {
2 r e t u r n x / 2 ;
3 }
4
5 i n t s e t u p H e i g t h ( i n t y ) {
6 r e t u r n y ⇤ 2 ;
7 }</p>
        <p>and rewrite our test to:
1 i n t testWidth = setupWidth ( 1 0 ) ;
2 i n t t e s t H e i g t h = s e t u p H e i g t h ( 5 ) ;
3 R e c t a n g l e t e s t R e c t a n g l e = R e c t a n g l e ( ) ;
4 t e s t R e c t a n g l e . setWidth ( testWidth ) ;
5 t e s t R e c t a n g l e . s e t H e i g t h ( t e s t H e i g t h ) ;
6 a s s e r t ( testWidth == t e s t R e c t a n g l e . getWidth ( ) ) ;
7 a s s e r t ( t e s t H e i g t h == t e s t R e c t a n g l e . g e t H e i g t h ( )
) ;</p>
      </sec>
    </sec>
    <sec id="sec-22">
      <title>If these functions are marked as part of the pro</title>
      <p>duction code, they will be treated as ’black box’
functions. This is not desirable, since then the TBT will
change while behaviour is preserved. Therefore, these
functions need to be evaluated similarly to expressions.</p>
    </sec>
    <sec id="sec-23">
      <title>Again this is perfectly possible since we have the val</title>
      <p>ues of all variables at each point in the program. Upon
evaluation, the values for testWidth and testHeigth
still result in 5 and 10 respectively, and thus the TBT
would be unchanged.</p>
      <sec id="sec-23-1">
        <title>Conditionals and Loops</title>
        <p>Upto now, our examples did not contain any
conditionals or loops, since they are not desirable in test
code. However, sometimes they could appear in test
code, in which case they can be evaluated similarly to
expressions and function calls. For example, we could
define the following function:
1 i n t setupData ( i n t i ) {
2 i f ( i == 1) {
3 r e t u r n 5 ;
4 } e l s e {
5 i f ( i == 2) {
6 r e t u r n 5 + 5 ;
7 }
8 }
9 r e t u r n 0 ;
10 }</p>
      </sec>
    </sec>
    <sec id="sec-24">
      <title>Again, the values for testWidth and testHeigth still</title>
      <p>evaluate to 5 and 10 respectively, resulting in the same</p>
    </sec>
    <sec id="sec-25">
      <title>TBT. When conditionals or loops are used in combina</title>
      <p>tion with calls to production code, it would be handled
similarly to how the testRectangle object is handled.</p>
    </sec>
    <sec id="sec-26">
      <title>The sequence of operations would be kept, including the conditional or loop, similarly to how they would be represented in AST form.</title>
      <p>6</p>
      <sec id="sec-26-1">
        <title>Conclusion</title>
      </sec>
    </sec>
    <sec id="sec-27">
      <title>We have presented an overview of the research done in</title>
      <p>the field of test smells and test refactoring. Research
has indicated that test smells have a negative impact
on maintainability and therefore need to be refactored.
We have shown that there is a lack of tool support to
aid developers with test refactoring. We also provided
a theoretical model that defines test behaviour, in the
form of Test Behaviour Trees, which can be used to
compare test behaviour pre- and post-refactoring. We
plan to create a tool for test refactoring which can
detect test code smells, evaluate the test quality, and
assure behaviour is preserved after test refactoring using
our theoretical model. We currently have a working
prototype for the latter. Our final tool will help
developers decide when and where to refactor the test code,
as well as help them perform the refactorings correctly,
allowing developers to improve their test suite quickly
and with confidence.
[BQO+12]
[BQO+15]</p>
    </sec>
    <sec id="sec-28">
      <title>Gabriele Bavota, Abdallah Qusef,</title>
    </sec>
    <sec id="sec-29">
      <title>Rocco Oliveto, Andrea De Lucia, and</title>
    </sec>
    <sec id="sec-30">
      <title>David Binkley. An empirical anal</title>
      <p>ysis of the distribution of unit test
smells and their impact on software
maintenance. In Software
Maintenance (ICSM), 2012 28th IEEE
International Conference on, pages 56–65.
IEEE, 2012.</p>
    </sec>
    <sec id="sec-31">
      <title>Gabriele Bavota, Abdallah Qusef,</title>
    </sec>
    <sec id="sec-32">
      <title>Rocco Oliveto, Andrea De Lucia, and</title>
      <p>Dave Binkley. Are test smells really
harmful? an empirical study.
Empirical Software Engineering, 20(4):1052–
1094, 2015.
[FMM+11]
[TPB+16]</p>
    </sec>
    <sec id="sec-33">
      <title>Manuel Breugelmans and Bart</title>
    </sec>
    <sec id="sec-34">
      <title>Van Rompaey. Testq: Exploring structural and maintenance characteristics of unit test suites. In</title>
      <p>WASDeTT-1: 1st International
Workshop on Advanced Software
Development Tools and Techniques,
2008.</p>
    </sec>
    <sec id="sec-35">
      <title>Elfriede Dustin. E↵ective Soft</title>
      <p>ware Testing: 50 Ways to Improve
Your Software Testing.
Addison</p>
    </sec>
    <sec id="sec-36">
      <title>Wesley Longman Publishing Co., Inc.,</title>
      <p>Boston, MA, USA, 2002.</p>
      <p>Francesca Arcelli Fontana, Elia
Mariani, Andrea Mornioli, Raul Sormani,
and Alberto Tonello. An experience
report on using code smells
detection tools. In Software Testing,
Verification and Validation Workshops
(ICSTW), 2011 IEEE Fourth
International Conference on, pages 450–
457. IEEE, 2011.</p>
      <p>Martin Fowler. Refactoring:
improving the design of existing code.
Pearson Education India, 2009.</p>
    </sec>
    <sec id="sec-37">
      <title>Michaela Greiler, Arie van Deursen, and Margaret-Anne Storey. Automated detection of test fixture strategies and smells. In 2013 IEEE Sixth</title>
      <p>International Conference on Software
Testing, Verification and Validation,
pages 322–331. IEEE, 2013.</p>
      <p>Gerard Meszaros. xUnit test patterns:
Refactoring test code. Pearson
Education, 2007.</p>
    </sec>
    <sec id="sec-38">
      <title>Jens Uwe Pipka. Refactoring in a test</title>
      <p>first-world. In Proc. Third Intl Conf.
eXtreme Programming and Flexible
Processes in Software Eng, 2002.</p>
    </sec>
    <sec id="sec-39">
      <title>Ali Parsai, Alessandro Murgia, Quin</title>
      <p>ten David Soetens, and Serge
Demeyer. Mutation testing as a safety
net for test code refactoring. In
Scientific Workshop Proceedings of the
XP2015, page 8. ACM, 2015.</p>
    </sec>
    <sec id="sec-40">
      <title>Michele Tufano, Fabio Palomba,</title>
    </sec>
    <sec id="sec-41">
      <title>Gabriele Bavota, Massimiliano</title>
    </sec>
    <sec id="sec-42">
      <title>Di Penta, Rocco Oliveto, Andrea</title>
    </sec>
    <sec id="sec-43">
      <title>De Lucia, and Denys Poshyvanyk.</title>
    </sec>
    <sec id="sec-44">
      <title>An empirical investigation into the</title>
      <p>[VDM02]</p>
    </sec>
    <sec id="sec-45">
      <title>Arie Van Deursen and Leon Moonen.</title>
    </sec>
    <sec id="sec-46">
      <title>The video store revisited–thoughts on refactoring and testing. In Proc. 3rd</title>
      <p>Intl Conf. eXtreme Programming and
Flexible Processes in Software
Engineering, pages 71–76. Citeseer, 2002.
[VRDBD06]</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>