-

Towards Automated Refactoring of Code Clones in Ob ject-Oriented Programming Languages

Simon Baars

simon.j.baars@gmail.com 0

Ana Oprescu

A.M.Oprescu@uva.nl 0 0 University of Amsterdam , Amsterdam , Netherlands

2017

Duplication in source code can have a major negative impact on the maintainability of source code, as it creates implicit dependencies between fragments of code. Such implicit dependencies often cause bugs and increase maintenance e orts. In this study, we look into the opportunities to automatically refactor these duplication problems for objectoriented programming languages. We propose a method to detect clones that are suitable for refactoring. This method focuses on the context and scope of clones, ensuring our refactoring improves the design and does not create side e ects. Our intermediate results indicate that more than half of the duplication in code is related to each other through inheritance, making it easier to refactor these clones in a clean way. About 40 percent of the duplication can be refactored through method extraction, while other clones require other refactoring techniques or further transformations. Future measurements will provide further insight into what clones should be refactored to improve the design of software systems.

Duplication in source code is often seen as one of the most harmful types of technical debt. In Martin

Fowler's \Refactoring" book [Fow99], he claims that \If you see the same code structure in more than one place, you can be sure that your program will be better if you nd a way to unify them.". Bruntink et al. [BVDVET05] show that code clones can contribute up to 25% of the code size, which has a negative impact on the maintainability.

Refactoring is used to improve the quality-related attributes of a codebase (maintainability, performance, etc.) without changing the functionality. Many methods were introduced to aid the process of refactoring [Fow99, Wak04], and are integrated into most modern IDE's. However, most of these methods still require a manual assessment of where and when to apply them. This means refactoring is either a signi cant part of the development process [LST78, MT04], or does not happen at all [MVD+03]. For a large part, proper refactoring requires domain knowledge. However, there are also refactoring opportunities that are rather trivial and repetitive to execute. Our goal is investigating to what extend code clones can be automatically refactored.

A survey by Roy et al. [RC07] describes various ways in which clones can be identi ed. Most clone detection tools focus on nding clones that align with these de nitions. In this paper, we outline challenges with these clone type de nitions when considered in a refactoring context. We next propose solutions to these problems that would enable the detection of clones that can and should be refactored, rather than fragments of code that are just similar.

We focus mainly on the Java programming language as refactoring opportunities feature paradigm and programming language dependent aspects [CYI+11]. However, most practices featured currently in our work will also be applicable to other object-oriented languages, like C# and Python. This is because these programming languages share many similarities regarding refactoring opportunities.

Our end goal is to improve upon the current stateof-the-art in clone research [FZ15, Alw17] by building a clone refactoring tool that can analyze the context of code clones to get a pro le of how a clone can be refactored. This tool then automatically applies refactorings to a large percentage of clones found. The design decisions for this tool are made on basis of data gathered from a large corpus of software systems together with our own experience and ndings from literature. 1.1

Research questions

We have formalized the following research questions in order to improve upon the state-of-the-art in code duplication refactoring: RQ1. How can we de ne clone types such that they can be automatically refactored? RQ2. What are the discriminating factors to decide when a clone should be refactored? RQ3. To what extent can we automate the process of refactoring clones?

For RQ1 we look into current clone de nitions and clone detection methods and assess their suitability for refactoring purposes. For RQ2 we look into what thresholds we should use to identify clones that, when refactored, improve the design of the system. RQ3 is currently work in progress. 2

Background

As code clones are seen as one of the most harmful types of technical debt, they have been studied extensively. A survey by Roy et al. [RC07] states de nitions of important concepts in code clone research. For instance, \clone pair" is de ned as a set of two code portions/fragments which are identical or similar to each other ; \clone class" as the union of all clone pairs ; \clone instance" as a single code portion/fragment that is part of either a clone pair or clone class. 2.1

Advantages of clone classes over clone pairs Regarding clone detection, there is a lot of variability in literature whether clone pairs or clone classes should be considered for detection. We decided to focus on clone classes, because of the advantages for refactoring. Clone pairs do not provide a general overview of all entities containing the clones, with all their related issues and characteristics [FZZ12]. Although clone classes are harder to manage, they provide all information needed to plan a suitable refactoring strategy, since this way all instances of a clone are considered. Another issue that results from grouping clones by pairs: the amount of clone references increases according to the binomial coe cient formula (two clones form a pair, three clones form three pairs, four clones form six pairs, and so on), which causes a heavy information redundancy [FZZ12]. 2.2 In a 2007 survey by Roy et al. [RC07] he de nes several types of clones: Type 1: Identical code fragments except for variations in whitespace (may also be variations in layout) and comments.

Type 2: Structurally/syntactically identical fragments except for variations in identi ers, literals, types, layout and comments.

Type 3: Copied fragments with further modi cations. Statements can be changed, added or removed in addition to variations in identi ers, literals, types, layout, and comments.

A higher type of clone means that it is harder to detect. It also makes the clone harder to refactor, as more transformations would be required. Higher clone types also become more disputable whether they actually indicate a harmful anti-pattern (as not every clone is harmful [JX10, KG08]).

There also exists a type 4 clone, denoting functionally equal code. We decided not to consider these clones in this study, because of the serious challenges in their detection and refactoring. 2.3

Related work in clone refactoring tools The Duplicated Code Refactoring Advisor (DCRA) looks into refactoring opportunities for clone pairs [FZZ12, FZ15]. DCRA only focuses on refactoring clone pairs, with the authors arguing that clone pairs are much easier to manage when considered singularly. As intermediate steps, the authors measure a corpus of Java systems for some clonerelated properties of the systems, like the relation (in terms of inheritance) between code fragments in a clone pair. We further look into these measurements in Sec. 6.4.1.

A tool named Aries [HKK+04, HKI08] focuses on the detection of refactorable clones. They do this based on the relation between clone instances through inheritance, similar to Fontana et al. [FZZ12]. This tool only proposes a refactoring opportunity and does not provide help in the process of applying the refactoring.

We investigated several research e orts that look into code clone refactoring [Alw17, CKS18, KN01]. However, all of these tools only support a subset of all harmful clones that are found. Also, these tools are limited to suggesting refactoring opportunities, rather than actually applying refactorings where suitable. Finally, all published approaches have limitations, such as false positives in their clone detection [CKS18] or being limited to clone pairs [HKI08]. 3

Addressing problems with clone type de nitions

For each of type 1-3 clones [RC07] (further explain in Sec. 2.2) we list our solutions to their shortcomings to increase the chance that we can refactor the clone while improving the design. 4

Shortcomings of clone types

Clone type 1-3 [RC07] allow reasoning about the duplication in a software system. Clones by these de nitions can relatively easily and e ciently be detected. This has allowed for large scale analyses of duplication [LHMI07]. However, these clone type de nitions have shortcomings which make the clones detected in correspondence with these de nitions less valuable for (automated) refactoring purposes.

In this section, we discuss the shortcomings of the di erent clone type de nitions. Because of these shortcomings, clones found by these de nitions are often found to require additional judgment whether they should and can be refactored. 4.1

Type 1 clones Type 1 clones are identical clone fragments except for

variations in whitespace and comments [RC07]. This allows for the detection of clones that are the result of copying and pasting existing code, along with other reasons why duplicates might get into a codebase.

Type 1 clones are by most clone detection tools [KKI02, SYCI17, RC08, SR16, SR14] implemented as textual equality between code fragments (except for whitespace and comments). Although textually equal, method calls can still refer to di erent methods, type declarations can still refer to di erent types and variables can be of a di erent type. In such cases, refactoring opportunities could be invalidated. This can make type 1 clones less suitable for refactoring purposes, as they require additional judgment regarding the refactorability of such a clone. When aiming to automatically refactor clones, applying refactorings to type 1 clones is bound to be error prone and can result in an uncompilable project or a di erence in functionality.

Because of this, not all type 1 clones may be subject to refactoring. In section we describe an alternative approach towards detecting type 1 clones, which results in only clones that can be refactored. 4.2

Type 2 clones

Type 2 clones are structurally/syntactically identical fragments except for variations in identi ers, literals, types, layout and comments [RC07]. This de nition allows for the reasoning about code fragments that were copied and pasted, and then slightly modi ed. For refactoring purposes, this de nition is unsuitable; if we allow any change in identi ers, literals, and types, we cannot distinguish between di erent variables, different types and di erent method calls anymore. This could render two methods that have an entirely di erent functionality as clones. Merging such clones can be harmful instead of helpful.

The example in Fig. 1 shows a type 2 clone that poses no harm to the design of the system. Both methods are, except for their matching structure, completely di erent in functionality. They operate on different types, call di erent methods, return di erent things, etc. Having such a method agged as a clone does not provide much useful information.

When looking at refactoring, type 2 clones can be di cult to refactor. For instance, if we have variability in types, the code can describe operations on two completely dissimilar types. Type 2 clones do not di erentiate between primitives and reference types, which further undermines the usefulness of clones detected by this de nition. Type 3 clones are copied fragments with further modi cation (having added, removed or changed statements) [RC07]. Detection of clones by this de nition can be hard, as it may be hard to detect whether a fragment was copied in the rst place if it was severely changed. Because of this, most clone detection implementations of type 3 clones work on basis of a similarity threshold [RC08, RK19, JMSG07, SYCI17]. This similarity threshold has been implemented in di erent ways: textual similarity (for instance using Levenshtein distance) [LM11], token-level similarity [SSS+16] or statement-level similarity [KS17].

Having a de nition that allows for any change in code poses serious challenges on refactoring. A Levenshtein distance of one can already change the meaning of a code fragment signi cantly, for instance, if the name of a type di ers by a character (and thus referring to di erent types). 4.4

Refactoring-oriented clone types

To resolve the shortcomings of clone types as outlined in the previous section, we propose alternative de nitions for clone types directed at detecting clones that can and should be refactored. We have named these clones T1R (type 1R), T2R and T3R clones. These de nitions address problems of the corresponding literature de nitions. The \R" stands for refactoringoriented. 4.4.1

Type 1R clones

To solve the issues identi ed in Sec. 4.1, we introduce an alternative de nition: cloned fragments have to be both textually and functionally equal. Therefore, T1R clones are a subset of type 1 clones.

We check functional equality of two fragments by comparing the equality of the fully quali ed identier (FQI) for referenced types, methods and variables. If an identi er is fully quali ed, it means we specify the full location of its declaration (e.g. com.sb.fruit.Apple for an Apple object). For method calls, we also compare the equality of the FQI of the type of each of its arguments, to di erentiate between overloaded method variants. 4.4.2

Type 2R clones

To solve the issues identi er in Sec. 4.2, we introduce an alternative de nition. All rules that apply to T1R clones also apply to T2R clones. Additionally, T2R clones allow variability in literals, variables and method calls. Furthermore, T2R clones allow variability in method names and class/enum/interface names.

When refactoring two fragments that di er by literals, called methods or used variables, we are faced with a design tradeo . When replacing the cloned fragments by a new method, we need an extra argument for each literal, called method or used variable that di ers. This can be done as long as the type of used literals/variables and the signature of called methods are equal.

To limit the negative impact of this tradeo on the design of the system, we formalized a threshold for the variability between fragments. The formula by which this threshold is calculated is displayed in equation 1. In this formula, di erent expressions refers to the number of literals, variables and method calls that di er from other clone instances in a clone class. We divide this by the total number of tokens in the clone instance. Based on this threshold, we decide whether a clone should be considered for refactoring.

T2R Variability =

Di erent expressions

Total tokens 100 Type 3 clones allow any change in statements (added, removed and changed statements). When looking at how we can refactor a statement that is not included by one clone instance but is in another, we nd that we require a conditional block to make up for the difference in statements. This is a tradeo , as an added conditional block increases the complexity of the system. Because of that, we de ned T3R clones in such a way that they are directed towards nding clones that are worth this tradeo .

All rules that apply to T2R clones also apply to T3R clones. Additionally, T3R clones allow a gap between two clone classes of statements that are not cloned. The following rules apply to this gap:

The di erence in statements must bridge a gap between two clones that were valid by the original thresholds. This entails that, different from type 3 clones, the di erence in statements cannot be at the beginning or the end of a cloned block. It is rather somewhere within, as it must bridge two existent clones.

The size of the gap between two clones is limited by a threshold. This threshold is calculated by taking the percentage of the number of statements in the gap over the number of statements that both clones that are being bridged span. This is displayed in equation 2.

The gap may not span a partial block. To make sure that the T3R clone can be refactored, we do not allow the gap to span a part of a block, for instance, the declaration and a part of the body of a for-loop. The reason for this is that it is not possible to wrap a partially spanned block in a single conditional statement. We could, however, use multiple conditional blocks (one for each block spanned), but due to the detrimental e ect on the design of the code (as each conditional block adds a certain complexity), we decided not to allow this for T3R clones.

T3R Gap Size =

Statements in gap Statements in clones 100 (2)

The challenge of detecting these clones To detect each type of clone, we need to parse the FQI of all types, method calls, and variables. This comes with challenges, regarding both performance and implementation. To trace the declarations of variables, methods, and types, we might need to follow cross- le references. The referenced types/variables/methods might even not be part of the project, but rather of an external library or the standard libraries of the programming language. All these factors need to be considered for the referenced entity to be found, on basis of which an FQI can be created. 5

Clone Detection

As duplication in source code is a serious problem in many software systems, many tools have been proposed to detect various types of code clones [SK16, SR14]. However, these tools were not yet assessed in terms of automatically refactoring clones. In this section, we rst assess a set of modern clone detection tools for their applicability to this domain. Next, we introduce our own tool geared towards automatic clone refactoring, CloneRefactor. 5.1

Survey on Clone Detection Tools

We conducted a short survey on (recent) clone detection tools that we could use to analyze refactoring possibilities. The results of our survey are displayed in table 1. We chose a set of tools that are open source and can analyze a popular object-oriented programming language. Next, we formulate the following four criteria by which we analyze these tools: 1. Should nd clones in any context. Some tools only nd clones in speci c contexts, such as only method level clones. We want to perform an analysis of all clones in projects to get a complete overview. 2. Finds clone classes in control projects. We assembled a number of control projects to assess the validity of clone detection tools. 3. Can analyze resolved symbols. When detecting the types proposed in Sec. 4.4, it is important that we can analyze resolved symbols (for instance a type reference). The rationale for this is further explained in 4.4.4. 4. Extensive detection con guration. Detecting our clone de nitions, as proposed in Sec. 4.4, require to have some understanding about the meaning of tokens in the source code (whether a certain token is a type, variable, etc.). The tool should recognize such structures, in order for us to con gure our clone type de nitions in the tool. (3) (4)

Apart from these criteria, we found that the output of these clone detection tools cannot be post-processed to nd the clone types proposed in Sec. 4.4. This is mainly because the clones of T2R are not a subset of type 2 clones and will thus require an analysis of the entire system. 5.2

CloneRefactor

None of the state-of-the-art tools we identi ed implement all our criteria, so we decided to implement our own clone detection tool: CloneRefactor1. In this tool, we implemented both the literature clone types [RC07] and our refactoring-oriented clone types as described in Sec. 4.4.

Our tool is based on the JavaParser library [SvBT18]. This library can parse Java source code to an AST representation. This AST can then be analyzed, modi ed and eventually written back to source code. This allows us to perform AST-based clone detection and apply transformations to the AST based on the clones found.

CloneRefactor walks the AST and collects each declaration and statement (similar to the Scorpio clone detection tool [HMK13]). It then builds a graph representation on basis of this AST, in which each declaration and statement becomes a node. This graph representation maps the following relations for each declaration and statement:

The declaration/statement preceding it The declaration/statement following it

The previous statement/declaration that is cloned 1CloneRefactor is available on GitHub: https://github. com/SimonBaars/CloneRefactor

Whether a node is considered a clone depends on the clone type that is being analyzed. On basis of this graph, we detect clone classes. We compare clone classes against thresholds and remove the clone classes that do not pass the test. 6

Experiments

This section outlines the conducted experiments to determine refactoring opportunities for code clones. We show the corpus on which we have performed our measurements in Sec. 6.1. Next, we look into the thresholds that determine whether duplicated fragments are considered a clone in Sec. 6.2. We then show a comparison between the di erent proposed clone types in Sec. 6.3. Afterward, we perform an analysis of the context of code clones in Sec. 6.4. Finally, we map refactoring opportunities for clones in Sec. 6.5.

All experiments were conducted using CloneRefactor, which contains scripts for running large scale clone detection over the indicated corpus. 6.1

The corpus

For our experiments, we use a large corpus of open source projects [AS13]. This corpus has been assembled to contain relatively higher quality projects. Also, any duplicate projects and les were removed from this corpus. This results in a variety of Java projects that re ect the quality of average open source Java systems and are useful to perform measurements on. We ltered this corpus further to only include projects that use Maven, which is a build tool which is mainly used to manage dependencies in the Java ecosystem. We then ltered the corpus further to only include projects for which all external dependencies are available, as CloneRefactor requires all dependencies of a project in order to accurately resolve all its symbols. This resulted in 1.343 projects of varying sizes averaging at 980 source lines of code (omitting whitespace, comments) per project. 6.2

Thresholds

Thresholds are a tool that aid in the process of deciding whether a clone should be refactored. Many clone detection tools focus on either the number of lines [KKI02, SR16], number of tokens [RC08, SSS+16, RK19] and/or number of nodes (declarations/statements) [HMK13] to decide whether code fragments should be considered clones of each other. A comparison of clone detection tools by Bellon et al. [BKA+07] shows that most clone detection tools choose a minimum of 6 lines of code to be duplicate to consider code fragments a clone. The Scorpio clone detection tool uses this same number as minimum the number of nodes [HMK13]. Many token based tools use a threshold of 50 tokens for fragments to be considered a clone [SSS+16].

We would argue that going with this \magic number 6" eliminates a lot of harmful clones that should be refactored. For instance, a single 100-token statement will not be considered by such a threshold, which can still be harmful to the system design when cloned. Because of that, we decided to perform our measurements using thresholds that will include most clones that should be refactored, while eliminating most of the noise. With \noise" we mean duplication that has no relevancy towards refactoring, like a single token that is duplicated elsewhere.

To nd such a threshold, we measured the number of clone classes found for a certain number of tokens. We then looked at the number of nodes (statements/declarations) that these clones span. The resulting chart is displayed in Fig. 2. 2 nodes 3 nodes 4 nodes 5 nodes 6 nodes

In this chart, we see that when seeking clones with a minimum of 10 tokens, there are more clones that span 2 statements than clones that span one statement. We manually assessed these clones, and mainly the two-statement clones contain many clones that we classi ed as \should be refactored". Because of this, we decided to go with a minimum of 10 tokens threshold for our experiments. 6.3

Clone types

In this section we display the di erences between clone type 1-3 [RC07] and type 1R-3R as proposed in Sec. 4.4. When running our clone detection script over the corpus, we get the results displayed in Fig. 3.

In this gure, the number of cloned nodes per clone type are displayed. The di erence between T1R and T1 is small (10.9%), because most often textually equal code is also functionally equal. The di erence between T2R and T2 is bigger (34.7%) because the T2R de nition is more strict. T3R and T3 are similar to T2R and T2 because our dataset does not have so many gapped clones for the thresholds used.

We also measured the duration of nding clones by the di erent clone types. Fig. 4 shows the duration of detecting all clones in the corpus using CloneRefactor for di erent clone types. Although this data is partly dependent on our implementation of the clone types, there is a notable di erence between the refactoringoriented clone types and the literature clone types. The reason for this is further explained in Sec. 4.4.4. 6.4

Context Analysis of Clones

To be able to refactor code clones, it is important to consider the context of the clone. We de ne the following aspects of the clone as its context: 1. Relation: The relation of clone instances among each other through inheritance. 2. Location: Where a clone instance occurs in the code. 3. Contents: The statements/declarations of a clone instance.

We perform experiments on each of these aspects, de ning categories and measuring these categories over the corpus.

Fig. 5 shows an abstract representation of clone classes and clone instances. The relation of clones through inheritance is measured for each clone class. The location and contents of clones are measured for each clone instance.

All data shown in this section is measured using the T1R clone de nition. We have performed the same measurements for the other type de nitions and found that they follow similar trends. Because of that, we decided not to further show them in this section. 6.4.1

Relations Between Clone Instances

When merging code clones in object-oriented languages, it is important to consider the inheritance relation between clone instances. This relation has a big impact on how a clone should be refactored.

Fontana et al. [FZ15] describe measurements on 50 open source projects on the relation of clone instances to each other. To do this, they rst de ne several categories for the relation between clone instances in object-oriented languages. These categories are as follows: 1. Same method: All instances of the clone class are in the same method. 2. Same class: All instances of the clone class are in the same class. 3. Superclass: All instances of the clone class are child or parent of each other. 4. Ancestor class: All instances of the clone class are superclasses except for the direct superclass. 5. Sibling class: All instances of the clone class have the same parent class. 6. First cousin class: All instances of the clone class have the same grandparent class. 7. Common hierarchy class: All instances of the clone class belong to the same inheritance hierarchy, but do not belong to any of the other categories. 8. Same external superclass: All instances of the clone class have the same superclass, but this superclass is not included in the project but part of a library. 9. Unrelated class: There is at least one instance in the clone class that is not in the same hierarchy.

We use a similar setup to that used by Fontana et al. (Table 3 of Fontana et al. [FZ15]). Fontana et al. measure clones using their own tool (DCRA). As explained in Sec. 5.1, we chose to implement our own tool, CloneRefactor. Therefore, the setup for our measurements di ers as follows from Fontana et al.: We consider clone classes rather than clone pairs. The rationale for this is given in Sec. 2.1.

We use di erent thresholds regarding when a clone should be considered. Fontana et al. seek clones that span a minimum of 7 source lines of code (SLOC). We seek clones with a minimum size of 6 statements/declarations. This is explained detail in Sec. 6.2.

We seek duplicates by statement/declaration rather than SLOC. This makes our analysis depend less on the coding style (in terms of newline usage) of the author of the software project. We test a broader range of projects. Fontana et al. use a set of 50 relatively large projects. We use the corpus as explained in 6.1, which contains a diverse set of projects (diverse both in volume and code quality).

Table 2 contains our results regarding the relations between clone instances.

The most notable di erence when comparing it to the results of Fontana et al. [FZ15] is that in our results most of the clones are unrelated (34.44%), while for them it was only 15.70%. This is likely due to the fact that we consider clone classes rather than clone pairs, and mark the clone class \Unrelated" even if just one of the clone instances is outside a hierarchy. It could also be that the corpus which we use, as it has generally smaller projects, uses more classes from outside the project (which are marked \Unrelated" if they do not have a common external superclass). About a third of all clone classes have all instances in the same class, which is generally easy to refactor. On the third place come the clones that are in the same method, which are similarly easy to refactor. 6.4.2

Clone instance location

After mapping the relations between individual clones, we considered at the location of individual clone instances. A paper by Lozano et al. [LWN07] discusses the harmfulness of cloning. The authors argue that 98% are produced at method level. However, this claim is based on a small dataset and based on human copy-paste behavior rather than static code analysis. We validated this claim over our corpus, using the following categories: 1. Method/Constructor Level: A clone instance that does not exceed the boundaries of a single method or constructor (optionally including the declaration of the method or constructor itself). 2. Class Level: A clone instance in a class, that exceeds the boundaries of a single method or contains something else in the class (like eld declarations, other methods, etc.). 3. Interface/Enumeration Level: A clone that is (a part of) an interface or enumeration.

The results are shown in Table 3. Our results indicate that, quite signi cantly, most clones are found at methodlevel. The number of clones found in interfaces and enumerations is very low. Finally, we looked at the contents of individual clone instances: what combination of declarations and statements they span. We selected the following categories to be relevant for refactoring: 1. Full Method/Class/Interface/Enumeration: A clone that spans a full class, method, constructor, interface or enumeration, including its declaration. 2. Partial Method/Constructor: A clone that spans a method partially. 3. Several Methods: A clone that spans over two or more methods, either fully or partially. 4. Only Fields: A clone that spans only global variables. 5. Includes Fields/Constructor: A clone that spans elds/constructors, but can also span other statements or declarations. 6. Method/Class/Interface/Enumeration Declaration: A clone that contains the declaration (usually the rst line) of a class, method, interface or enumeration.

The results for these categories are displayed in Table 4.

Unsurprisingly, most clones span a part of a method. The most used refactoring technique for clones that span part of a method is \Extract Method". Because of that, we focus our research efforts on refactoring such clones. The most used technique to refactor clones is method extraction (creating a new method on basis of the contents of clones). However, method extraction cannot be applied in all cases. In these instances, more conditions may apply to be able to conduct a refactoring, if bene cial at all.

We measured the number of clones that can be refactored through method extraction (without additional transformations being required). Our results are displayed in Table 5. In this table we use the following categories:

Can be extracted: This clone is a fragment of code that can directly be extracted to a method. Then, based on the relation between the clone instances, further refactoring techniques can be used to refactor the extracted methods (for instance \pull up method" for clones in sibling classes).

Complex control ow: This clone contains break, continue or return statements, obstructing the possibility of method extraction.

Spans part of a block: This clone spans a part of a statement.

Is not a partial method: If the clone does not fall in the \Partial method" category of Table 4, the \extract method" refactoring technique cannot be applied.

From Table 5, we can see that 41% of the clones can directly be refactored through method extraction (and possibly other refactoring techniques based on the relation of the clone instances). For the other clones, other techniques or transformations will be required. 7

Threats to validity

We noticed that, when doing measurements on a corpus of this size, the thresholds that we use for the clone detection have a big impact on the results. There does not seem to be one golden set of thresholds, some thresholds work in some situations but fail in others. We have chosen thresholds that, according to our experiments and assessments, seemed optimal. However, by using these, we might have some \noise" in our results of clones that should not be considered for refactoring. 8

Conclusion and next steps

In this research we made three novel contributions: We proposed a method with which we can detect clones that can/should be refactored.

We mapped the context of clones in a large corpus of open source systems.

We mapped the opportunities to perform the \Extract Method" refactoring technique on clones this corpus.

We have looked into existing de nitions for di erent types of clones [RC07] and proposed solutions for problems that these types have with regards to automated refactoring. We propose that fully quali ed identi ers (FQIs) of method call signatures and type references should be considered instead of their plain text representation, to ensure refactorability. Furthermore, we propose that one should de ne a threshold for variability in variables, literals, and method calls, in order to limit the number of parameters that the refactored method will have.

We analyzed the context of di erent kinds of clones and prioritized their refactoring. Firstly, we looked at the inheritance relation of clone instances in a clone class. We found that a little more than a third of all clone classes are agged unrelated, which means that they have at least one instance that has no relation through inheritance with the other instances. For about a third of the clone classes all of their instances are in the same class.

Secondly, we looked at the location of clone instances. Most clone instances (79 percent) are found at method level. Because of that, we concluded that our main refactoring focus should be aimed at method level clones. A common method to refactor such clones is by extracting a new method on basis of the contents of the clone. However, method extraction cannot be applied in all cases. According to our experiments, about 40 percent of the clones can be refactored by extracting them to a new method. Our next step is to implement the \Extract Method" refactoring for the identi ed automatic refactoring opportunities. On basis of the resulting code, we can perform experiments that compare the maintainability index of the refactored code to the original code. The maintainability index of a system is an aggregation of various metrics that test a systems' maintainability (like volume, complexity, duplication, etc.). If we can automatically apply refactorings to the identied clones, this maintainability index can be used to select more optimal threshold values.

Acknowledgements

We would like to thank the Software Improvement Group (SIG) for their continuous support in this project. [Alw17] [AS13] [BKA+07]

Asif Alwaq . A Refactoring Technique for Large Groups of Software Clones.

PhD thesis, Concordia University, 2017. Miltiadis Allamanis and Charles Sutton. Mining Source Code Repositories at Massive Scale using Language Modeling. In The 10th Working Conference on Mining Software Repositories, pages 207{216. IEEE, 2013.

Stefan Bellon, Rainer Koschke, Giulio Antoniol, Jens Krinke, and Ettore Merlo. Comparison and evaluation of clone detection tools. IEEE

Transactions on software engineering,

33(9):577{591, 2007.

Zhiyuan Chen, Young-Woo Kwon, and Myoungkyu Song. Clone refactoring inspection by summarizing clone refactorings and detecting inconsistent changes during software evolution. Journal of Software: Evolution and Process, 30(10):e1951, 2018.

James R Cordy and Chanchal K Roy. The nicad clone detector. In 2011 IEEE 19th International Conference on Program Comprehension, pages 219{220. IEEE, 2011.

Eunjong Choi, Norihiro Yoshida, Takashi Ishio, Katsuro Inoue, and Tateki Sano. Extracting code clones for refactoring using combinations of clone metrics. In Proceedings of the 5th

International Workshop on Software

Clones, pages 7{13. ACM, 2011.

Martin Fowler. Refactoring: improving the design of existing code. Addison

Wesley Professional, 1999.

Francesca Arcelli Fontana and Marco Zanoni. A duplicated code refactoring advisor. In International Conference on

Agile Software Development, pages 3{

14. Springer, 2015.

Francesca Arcelli Fontana, Marco Zanoni, and Francesco Zanoni. Duplicated Code Refactoring Advisor (DCRA): a tool aimed at suggesting the best refactoring techniques of Java code clones. PhD thesis, UNIVERSIT DEGLI STUDI DI MILANO-BICOCCA, 2012.

Yoshiki Higo, Shinji Kusumoto, and Katsuro Inoue. A metric-based approach to identifying refactoring opportunities for merging code clones in a java software system. Journal of Software Maintenance and Evolution: [HKK+04] [HMK13] [JMSG07] [JX10] [KG08] [KKI02] [KN01] [KS17]

Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, and K Words. Aries: Refactoring support environment based on code clone analysis. In IASTED Conf. on Software

Engineering and Applications, pages

222{229, 2004. [LHMI07]

Simone Livieri, Yoshiki Higo, Makoto Matushita, and Katsuro Inoue. Verylarge scale code clone analysis and visualization of open source programs using distributed cc nder: D-cc nder.

In 29th International Conference on Software Engineering (ICSE'07), pages

106{115. IEEE, 2007.

Thierry Lavoie and Ettore Merlo. Automated type-3 clone oracle using levenshtein metric. In Proceedings of the

5th international workshop on software

clones, pages 34{40. ACM, 2011. Bennet P Lientz, E. Burton Swanson, and Gail E Tompkins. Characteristics of application software maintenance. Communications of the ACM, 21(6):466{471, 1978.

Angela Lozano, Michel Wermelinger, and Bashar Nuseibeh. Evaluating the harmfulness of cloning: A change based experiment. In Fourth International Workshop on Mining Software

Repositories (MSR'07: ICSE Work

shops 2007), pages 18{18. IEEE, 2007. Tom Mens and Tom Tourwe. A survey of software refactoring. IEEE

Transactions on software engineering,

30(2):126{139, 2004.

Tom Mens, Arie Van Deursen, et al. Refactoring: Emerging trends and open problems. In Proceedings First International Workshop on REFactoring: Achievements, Challenges, E ects (REFACE), 2003.

Chanchal Kumar Roy and James R Cordy. A survey on software clone detection research. Queens School of Computing TR, 541(115):64{68, 2007. Chanchal K Roy and James R Cordy. Nicad: Accurate detection of near-miss intentional clones using exible prettyprinting and code normalization. In 2008 16th iEEE international conference on program comprehension, pages 172{181. IEEE, 2008.

Chanchal K Roy, James R Cordy, and Rainer Koschke. Comparison and evaluation of code clone detection techniques and tools: A qualitative ap[RK19] [SFL+18] [SK16] [SR14] [SR16] [SSS+16] [SvBT18] [SYCI17] proach. Science of computer programming, 74(7):470{495, 2009.

Chaiyong Ragkhitwetsagul and Jens Krinke. Siamese: scalable and incremental code clone search via multiple code representations. Empirical Software Engineering, pages 1{49, 2019. Vaibhav Saini, Farima Farmahinifarahani, Yadong Lu, Pierre Baldi, and Cristina V Lopes. Oreo: Detection of clones in the twilight zone. In Proceedings of the 2018 26th ACM Joint

Meeting on European Software Engi

neering Conference and Symposium on the Foundations of Software Engineering, pages 354{365. ACM, 2018. Abdullah Sheneamer and Jugal Kalita. A survey of software clone detection techniques. International Journal of

Computer Applications, 137(10):1{21,

2016.

Je rey Svajlenko and Chanchal K Roy. Evaluating modern clone detection tools. In 2014 IEEE International

Conference on Software Maintenance

and Evolution, pages 321{330. IEEE, 2014.

Je rey Svajlenko and Chanchal K Roy. Bigcloneeval: A clone detection tool evaluation framework with bigclonebench. In 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME), pages 596{600. IEEE, 2016.

Hitesh Sajnani, Vaibhav Saini, Jeffrey Svajlenko, Chanchal K Roy, and Cristina V Lopes. Sourcerercc: scaling code clone detection to big-code.

In 2016 IEEE/ACM 38th International Conference on Software Engineering

(ICSE), pages 1157{1168. IEEE, 2016. Nicholas Smith, Danny van Bruggen, and Federico Tomassetti. Javaparser, 05 2018.

Yuichi Semura, Norihiro Yoshida, Eunjong Choi, and Katsuro Inoue. Cc ndersw: Clone detection tool with exible multilingual tokenization. In 2017 24th Asia-Paci c Software Engineering [Wak04]

Research and Practice, 20 ( 6 ): 435 { 461 , 2008 .

Yoshiki

Higo , Hiroaki Murakami, and

Shinji

Kusumoto . Revisiting capability of pdg-based clone detection . Technical report, Citeseer , 2013 .

Lingxiao

Jiang , Ghassan Misherghi, Zhendong Su, and

Stephane

Glondu . Deckard: Scalable and accurate treebased detection of code clones . In Proceedings of the 29th international conference on Software Engineering , pages 96 { 105 . IEEE Computer Society, 2007 .

Stan

Jarzabek and

Yinxing

Xue . Are clones harmful for maintenance? In Proceedings of the 4th International Workshop on Software Clones, IWSC '10 , pages 73 { 74 , New York, NY, USA, 2010 . ACM.

Cory J Kapser and Michael W Godfrey. cloning considered harmful considered harmful: patterns of cloning in software . Empirical Software Engineering , 13 ( 6 ): 645 , 2008 .

Toshihiro

Kamiya , Shinji Kusumoto, and

Katsuro

Inoue . Cc nder: a multilinguistic token-based code clone detection system for large scale source code . IEEE Transactions on Software Engineering , 28 ( 7 ): 654 { 670 , 2002 .

Georges

Golomingi Koni-NSapu . A scenario based approach for refactoring duplicated code in object oriented systems . Master's thesis , University of Bern, 2001 .

Kamalpriya and

Paramvir

Singh . Enhancing program dependency graph based clone detection using approximate subgraph matching . In 2017 IEEE 11th International Workshop on Software Clones (IWSC) , pages 1 {7 . IEEE, 2017 .

William C Wake . Refactoring workbook . Addison-Wesley Professional , 2004 .