<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Refactorable Clones by Slicing Program Dependence Graphs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ammar Hamid</string-name>
          <email>ammarhamid84@gmail.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vadim Zaytsev</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Code duplication in a program can make understanding and maintenance di cult. The problem can be reduced by detecting duplicated code, refactoring it into a separate procedure, and replacing all the clones by appropriate calls to the new procedure. In this paper, we report on a con rmatory replication of a tool that was used to detect such refactorable clones based on program dependence graphs and program slicing.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Motivation</title>
      <p>
        With the discussion about the extent to which code clones are harmful for
software readability, maintainability and ultimately quality, still ongoing, there is
still signi cant evidence on cost increases being caused by code duplication in
at least some scenarios [
        <xref ref-type="bibr" rid="ref18 ref5">5,18</xref>
        ]. For simplicity, we intend to adopt that view and
look a step further. Once clones are identi ed, ideally we would like to provide
advanced support for programmers or maintenance engineers to remove them
| that is, to use refactorings [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to \de-clone" source code by merging identical
code fragments and parametrising similar ones [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        The sheer number of code clone detection techniques and tools is immensely
overwhelming [
        <xref ref-type="bibr" rid="ref13 ref15 ref16">15,16,13</xref>
        ]. In section 2, we will give a very brief overview of the
eld and explain terminology needed to understand the rest of the paper. One
of the promising family of methods which is not too complex for a nal Master's
project yet also not too much of a beaten track in code clone research, is
graphbased. Given two programs, we automatically build a graph-like structure with
known properties, employ some slicing and/or matching and based on that can
diagnose them with duplication.
      </p>
      <p>
        Eventually we have converged to a relatively well-known paper of
Raghavan Komondoor and Susan Horwitz [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and dedicated ourselves to replicating it.
Some details about that project can be found in section 3, but in general they
propose to use program dependence graphs [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] (PDG) and program slicing [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
The authors of the original study were able to nd isomorphic subgraphs of
the PDG by implementing a program slicing technique using a combination of
backward slicing and forward slicing. Basically they search for sets of
syntactically equivalent node pairs and perform backward slicing from each pair with a
single forward slicing pass for matching predicates nodes. The theoretical
foundation behind this method mostly lies in plan calculus [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and an advanced
graph partitioning algorithm [
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ] and essentially allows to detect clones
semantically, regardless of various refactorings that may have been applied to some of
the copies but not to others. This leads to reporting only those clones that can
indeed be refactored | as we show on Table 1.
      </p>
      <p>We modi ed the original study in several ways: some were forced upon us by
technicalities, for others we had our own reasons | all explained in section 4.
We report our results, compare them to the original study and try to explain
the di erences in section 5 and conclude the paper with section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        Several studies show that 7{23% of the source code for large programs is
duplicated code [
        <xref ref-type="bibr" rid="ref2 ref9">2,9</xref>
        ]. Within a project, code duplication will increase code size,
and make maintenance di cult. Fixing a bug within a project with a lot of
duplicated code is becoming a challenge because it is necessary to make sure that
the x is applied to all of the duplicated instances. Lague et al [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] studied the
development of a large software system over multiple releases and found that
programmers often missed some copies of the duplicated code when performing
modi cation. Similar results have been observed by Geiger et al [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and
Thummalapenta et al [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] who observe the already expected negative impact of clone
co-evolution on software maintenance e ort.
      </p>
      <p>
        In code duplication studies we usually distinguish among the following types
of clones [
        <xref ref-type="bibr" rid="ref13 ref15">13,15</xref>
        ]:
{ Exact clones (type 1) | identical duplicates with some variations allowed
in whitespace and comments;
{ Parametrised clones (type 2) | variations are allowed in identi er names,
literals, even variable types;
{ Near miss clones (type 3) | statements are allowed to be changed, added
or removed up to some extent;
{ Semantic clones (type 4) | same computation with a di erent syntax and
possibly even di erent algorithms;
{ Structural clones | higher level similarities, conceptually bottom-up-detected
implementation patterns;
{ Artefact clones | function clones and le clones;
{ Model clones | duplicates over artefacts other than code;
{ Contextual clones | code fragments deemed duplicate due to their usage
patterns.
      </p>
      <p>Out of these, type 2 and type 3 are the most well-researched ones, with model
clones quickly getting more and more attention every year.</p>
      <p>
        Techniques and tools can be roughly classi ed into these groups [
        <xref ref-type="bibr" rid="ref13 ref16">13,16</xref>
        ] (in
the parenthesis we show a software artefact category in the terms of
parsing-ina-broad-sense megamodel [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]):
      </p>
      <sec id="sec-2-1">
        <title>Procedure 1 Rewritten Procedure 1</title>
        <p>int foo(void) {
bool w = false;
int t = 10;
return new_procedure_bar();
int foo(void) {
++ int i = 1;
bool z = true;
int t = 10;
++ int j = i + 1;
++ int n;
++ for (n=0; n&lt;10; n++) {
++ j = j + 5;
++
}
int k = i + j - 1;
return k;</p>
        <p>{ Text based (Str) such as Simian | blazingly fast methods usually looking for
exact clones, quite often in a language-independent, -parametric or -agnostic
manner;
{ Token based (TTk, Lex) such as CCFinder | somewhat more sophisticated
lexical tools;
{ Tree based (Ptr, Cst, Ast) such as Deckard | looking for clones in parse
trees, su x trees or abstract syntax trees;
{ Graph based (Ast, Dia) such as Duplix | making decisions based on control
ow graphs, data dependency graphs, program dependence graphs or partite
sets and vertices;
{ Model based (Dia) such as ConQAT | metamodel-speci c representation,
usually graph-like;
{ Metrics based such as Covet | using metrics, ngerprinting and/or
clustering to work on text or ASTs;
{ Hybrid such as CloneMiner | independent component analysis, some
variants of semantic indexing and longest subsequence methods that require
reasoning over trees, memory states, vector spaces, etc.</p>
        <p>
          Following the original paper [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], we use CodeSurfer3, a commercial tool that
can be used to generate program dependence graphs (PDGs) from C programs.
It provides an API that can be used from Scheme programs [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In general, PDG
nodes represent program statements and predicates, while PDG edges
represent data and control dependencies. PDG provides an abstraction that ignores
arbitrary sequencing choices made by a programmer, and instead captures the
important dependences among program components. Essentially, a program
dependence graph is built starting from a control ow graph (CFG) with statements
as nodes and possible transitions among them as edges, which is then analysed
for dominance to form an acyclic post-dominator graph | the two are merged
into a control dependence graph. A program dependence graph is formed from
the control dependence graph by enhancing it with additional edges for all data
dependencies, an example is given on Figure 1. The resulting complex
structure is graph-like with nodes of several kinds (regions, statements, entry/exit
points) and edges of several kinds (data/control dominance, possibly labelled)
| there are algorithmic variations which are not important for understanding
the current paper. Such a PDG is remarkable in a sense that it captures many
structural aspects of a program and still allows to abstract from concrete details
such as variable names and precise positioning of the code. For a larger/smaller
scale, related methods are used such as system dependence graphs (SDGs) or
execution dependence graphs (EDGs) [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>
          The last bit of background needed for understanding this paper is program
slicing [
          <xref ref-type="bibr" rid="ref19 ref3">19,3</xref>
          ], which is a well-known technique for obtaining a \view" of a
program with only those statements that are relevant for the chosen variable. In
terms of PDG we can have two query types in program slicing [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Backward
slicing from node x means nding all the nodes that in uence the value of node
3 CodeSurfer, http://www.grammatech.com/research/technologies/codesurfer.
x. Forward slicing from node y means nding all the nodes that are in uenced
by node y. This is an important technique to lter out any statements that are
irrelevant for clone detection.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Original study</title>
      <p>
        The main research question asked by Komondoor and Horwitz is the following:
can we nd code clones of type 3 (non-contiguous, reordered, intertwined), which
are refactorable into new procedures? [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
3.1
      </p>
      <p>Approach
To detect clones in a program, we represent each procedure using its PDG. In
PDG, vertex represents program statement or predicate, and edge represents
data or control dependences. The algorithm performs four steps (described in
the following subsections):
{ Step 1: Find relevant procedures
{ Step 2: Find pair of vertices with equivalent syntactic structure
{ Step 3: Find clones
{ Step 4: Group clones</p>
      <p>Find relevant procedures. We are only interested in nding clones for
procedures that are reachable from the main program execution. Only then we
can safely remove unreachable procedures from our program and just not detect
clones of them. We do this by getting a system initialisation vertex and
forwardslicing with data and control ow. This will return all PDGs (including user
de ned and system PDGs) that are reachable from the main program execution.
From that result, we further lter those PDGs to nd only the user de ned ones,
ignoring system libraries.</p>
      <p>Find pairs of vertices with equivalent syntactic structure. We scan
all PDGs from the previous step to nd vertices that have the type expression
(e.g. int a = b + 1). From those expression vertices, we try to match their
syntactic structure with each other. To nd two expressions with equivalent
syntactic structures, we make use of Abstract Syntax Tree (AST). This way,
we ignore variable names, literal values, and focus only on the structure, e.g.
int a = b + 1 is equivalent with int k = l + 1, where both expression has
the same type, which is int ).</p>
      <p>Find clones. From a pair of equivalent structure expressions, we back-slice to
nd their predecessors and compare them with each other. If the AST structures
of their predecessors are the same then we store it in the collection of clones
found. Because of this step, we can nd non-contiguous, reordered, intertwined
and refactorable clones. Refactorable clones in this case mean that the found
clones are meaningful and it should be possible to move it into a new procedure
without changing their semantic.</p>
      <p>Group clones. This is the step where we make sense of the found clones
before displaying them. For example, when using CodeSurfer, the vertex for a
while-loop doesn't really show that it is a while loop but rather showing its
predicate, e.g. while(i&lt;10) will show as a control-point vertex i&lt;10. Therefore,
it is important that the found clones are mapped back to the actual program
text representation and grouped together before displaying them. It is important
that the programmer can understand and take action on the reported clones.</p>
      <p>Experimental setup. The authors of the original paper used CodeSurfer
version 1.8 to generate PDGs and wrote Scheme program of 6123 lines that
access CodeSurfer API to work with the generated PDGs. They also had to
have a C implementation of 4380 lines to do the processing of those PDG to
actually nd clones.</p>
      <p>They were able to nd isomorphic subgraphs of the PDG by implementing
a program slicing technique that used a combination of backward slicing and
forward slicing. They applied it to some open-source software written in C (tail,
sort, bison) and demonstrated the capability of slicing to detect non-contagious
code clones. We will show the actual numbers later when we compare the results
with the replication.</p>
    </sec>
    <sec id="sec-4">
      <title>Changes to the original study</title>
      <p>The motivation of this replication study is to be able to validate algorithm and
results of the original study. Once validated, we would like to publish our code
and intermediate results into a public repository so that it is easier for any future
researchers to either re-validate our results or to extend our program.</p>
      <p>We had to use CodeSurfer 2.3 instead of 1.8 used in the original study: just
to get it running was already a challenge impossible to overcome | we would
eventually need to do a sandboxing of some 2001 version of OS, which would
then need to be properly licensed (CodeSurfer is not open source, but we have
applied for the academic license and got both 1.8 and 2.3 to experiment with).</p>
      <p>Porting the existing code (kindly provided to use by Raghavan Komondoor)
to the new version of CodeSurfer was also ruled out as a viable option: the API
changed too much, and actually covered many things with standard calls that
needed to be programmed in full when working with version 1.8. In the end, we
reimplemented the algorithm from scratch, using both the original paper and the
code behind it as guides. Our implementation has 536 LOC of Scheme, which is
huge improvement against the 6123 LOC of the original study. The improvement
is mostly not ours to claim, but CodeSurfer API's. For post-processing of clones,
we wrote a Ruby script, which was again shorter: 161 LOC versus the original
4380 LOC, partly due to the improved API, but partly also due to the language
choice (the original post-processing was done in C++). Actually, given a bit
more time, it should have been possible to avoid post-processing entirely, or
rather to implement in all in Scheme. The code is available online for anybody
to do this | http://github.com/ammarhamid/clone-detection | we accept
pull requests.</p>
      <p>There are several other important changes from the original paper that we
need to explain. As mentioned above, we only detect clones within the reachable
procedures, excluding any unused procedures that are not reachable from main
execution. This makes the result more accurate, since dead code is out of our
consideration.</p>
      <p>
        Furthermore, we only use backward slicing and no forward slicing to detect
clones. Let us have a look at the example on Table 2. According to the original
paper, only statements indicated by ++ will be reported as clones while statement
marked with ** is excluded. The main argument according to the original paper
is that fp3 is used inside a loop but the loop predicate itself is not matching
(for loop and the rst while loop predicate doesn't match) { or a so called cross
loop [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>However, we argue that we should still report the statement marked with **
as a clone together with the fact that their loop predicate doesn't match. For a
software developer it would mean one could still refactor this into two separate
procedures, instead of a single procedure proposed by the original paper (Table 3
and Table 4). Therefore, we consider that forward slicing is only necessary to
de ne refactoring strategy and not for detecting the clone itself.</p>
      <sec id="sec-4-1">
        <title>Fragment 1 Fragment 2</title>
        <p>... ...
** fp3 = lookaheadset + tokensetsize; ** fp3 = base + tokensetsize;
for(i = lookaheads(state); ...</p>
        <p>i &lt; k; i++) { if(rp) {
++ fp1 = LA + i * tokensetsize; while((j = *rp++) &gt; 0) {
++ fp2 = lookaheadset; ...
++ while (fp2 &lt; fp3) { ++ fp1 = base;
++ *fp2++ |= *fp1++; ++ fp2 = F + j * tokensetsize;
} ++ while(fp1 &lt; fp3) {
} ++ *fp1++ |= *fp2++;
int location(int base, int size) { return base + size; }
To be as close to the original paper as possible, we used the GNU git repositories4
to locate versions that were released around 2001: CoreUtils 4.5.2 (for tail and
sort) and Bison 1.29 (for bison).</p>
        <p>Table 5 shows the comparison of the sizes of the three programs (in number of
LOC and in number of nodes), and the running times for the algorithm between
the original and replication study. Figure 2 shows the comparison of the result
in details between the original and replication study.</p>
        <p>We do not have a solid explanation for the di erences observed, but we can
hypothesise on some issues:</p>
        <p>Altered algorithm. We did use a slightly di erent algorithm (only
reachable code; no forward slicing) to detect clones. However, we have also tried
running it exactly as it was intended originally, and the di erences were rather
minor and could not explain some of the drastic di erences.</p>
        <p>Manual inspection was performed to ensure that the clones detected by
our tool are indeed clones and are indeed refactorable. It was possible to review
all clones from tail and sort and cover a random selection of clones for bison
| no false positives were found.</p>
        <p>Bison running time in the original study is suspiciously short, which does
not re ect the explosive performance behaviour that we have observed in our
implementation. This could indicate a bug in one of the implementations, or
point to a drastically di erent (optimised, distributed) algorithm used for the
actual run of the original experiment. It could also be a simple reporting mistake
(e.g., \one and a half hours" reported instead of actual \one and a half days").</p>
        <p>Size of some clones reported for tail and bison is longer than most
functions (group 70+), which means either a mistake or some unreported
procedure used in the original experiment to combine several subsequent full-function
clones into one.</p>
        <p>Testing a program of 10 KLOC is always harder than testing a program of
1 KLOC, especially if both programs are algorithmically heavy yet the shorter
one relies on a more advanced API. More investigation is needed to see which of
these factors were at play and which results are closer to the truth.
4 http://git.savannah.gnu.org/cgit/
25
20
e
n
o
l
c
ian 15
s
e
d
o
n
fro 10
e
b
m
u
N
5
0
110
100
90
leon 80
ca 70
n
i
se 60
d
fon 50
o
reb 40
uNm 30
20
10
0
1600
1400
e 1200
n
o
l
c
a 1000
n
i
s
eod 800
n
f
o
re 600
b
m
uN 400
200
0</p>
        <p>tail
Original</p>
        <p>Replication
5–9
10–19
20–29
50–59
60–69</p>
        <p>70+</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>We have departed on a quest to nd refactorable semantic clones and have
conducted a replication of a paper that did it with PDG and program slicing.
Our results are statistically somewhat di erent from the results of the original
study, but we can conclude nevertheless that the algorithm described there,
works. So, the fusion of PDG and slicing is suitable for Type 3 clone detection.</p>
      <p>As a side product, we have noticed how signi cantly CodeSurfer has improved
over the years: the amount of code we needed to write to achieve the same
objectives, is ten times less than what had to be done 13 years ago, with almost
no postprocessing of the obtained results needed.</p>
      <p>As for quantitative di erences, unfortunately we could not compare them
in detail since we lack the original data, and we failed in getting the code
operational (it would require an old version of CodeSurfer operating on an old
system, preferably with performance comparable to the machine used for the
original experiment). However, we do present some evidence of correctness in
the form of manually reviewed code clones that we reported. We can also
conclude that the clones are indeed refactorable | this has been evaluated through
manual inspection of the tool reports.</p>
      <p>Both the code and the intermediate results of our experiments have been
shared as open source: http://github.com/ammarhamid/clone-detection, to
make it easier to revalidate, replicate, and extend. We hope our clone detector is
a suitable tool to use for future work. Possible future extensions should include
detecting interprocedural clones as well, which would allow detection of type
4 clones and refactorings such as inlining variables and extracting methods.
Intuitively, it would be more useful to provide results over bigger related code
fragments | however, the practical consequences remain to be seen.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>We would like to thank Raghavan Komondoor for sharing his code: we ended up not
using it directly, but having it at hand helped us to understand some details and make
a better comparison. We would also like to thank David Vitek from GrammaTech for
helping us obtaining an academic license for CodeSurfer and trying to work us through
the changes from 1.8 to 2.3 | again, we ended up not opting for a migration, but
quickly estimating that to be too much of an undertaking, was a part of this project's
success. Last but not least, our thanks go to the participants of SATToSE 2014 for all
the discussions we have had in L'Aquila.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>H.</given-names>
            <surname>Abelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Dybvig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Haynes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Rozas</surname>
          </string-name>
          , I. Adams,
          <string-name>
            <surname>N. I.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Friedman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kohlbecker</surname>
          </string-name>
          , J. Steele,
          <string-name>
            <surname>G. L.</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Bartley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Halstead</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Oxley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. J.</given-names>
            <surname>Sussman</surname>
          </string-name>
          , G. Brooks,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hanson</surname>
          </string-name>
          , K. M. Pitman, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Wand</surname>
          </string-name>
          .
          <source>Revised5 Report on the Algorithmic Language Scheme. Higher-Order and Symbolic Computation</source>
          ,
          <volume>11</volume>
          (
          <issue>1</issue>
          ):7{
          <fpage>105</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>B. S.</given-names>
            <surname>Baker</surname>
          </string-name>
          .
          <article-title>On Finding Duplication and Near-Duplication in Large Software Systems</article-title>
          . In WCRE, pages
          <volume>86</volume>
          {
          <fpage>95</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Beck</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Eichmann</surname>
          </string-name>
          .
          <article-title>Program and Interface Slicing for Reverse Engineering</article-title>
          . In ICSE, pages
          <volume>509</volume>
          {
          <fpage>518</fpage>
          . IEEE,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>M.</given-names>
            <surname>Fowler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Beck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Brant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Opdyke</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Roberts</surname>
          </string-name>
          .
          <article-title>Refactoring: Improving the Design or Existing Code</article-title>
          .
          <string-name>
            <surname>Addison-Wesley Professional</surname>
          </string-name>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Geiger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fluri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. C.</given-names>
            <surname>Gall</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pinzger</surname>
          </string-name>
          .
          <article-title>Relation of Code Clones and Change Couplings</article-title>
          .
          <source>In FASE</source>
          , pages
          <volume>411</volume>
          {
          <fpage>425</fpage>
          . Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>S.</given-names>
            <surname>Horwitz</surname>
          </string-name>
          .
          <article-title>Identifying the Semantic and Textual Di erences Between Two Versions of a Program</article-title>
          .
          <source>In B. N. Fischer, editor, PLDI</source>
          , pages
          <volume>234</volume>
          {
          <fpage>245</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>S.</given-names>
            <surname>Horwitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. W.</given-names>
            <surname>Reps</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Binkley</surname>
          </string-name>
          .
          <article-title>Interprocedural Slicing Using Dependence Graphs</article-title>
          .
          <source>ACM ToPLaS</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ):
          <volume>26</volume>
          {
          <fpage>60</fpage>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>R.</given-names>
            <surname>Komondoor</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Horwitz</surname>
          </string-name>
          .
          <article-title>Using Slicing to Identify Duplication in Source Code</article-title>
          .
          <source>In SAS</source>
          , pages
          <volume>40</volume>
          {
          <fpage>56</fpage>
          . Springer,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>K.</given-names>
            <surname>Kontogiannis</surname>
          </string-name>
          , R. de Mori, E. Merlo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Galler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>Pattern Matching for Clone and Concept Detection</article-title>
          . ASE,
          <volume>3</volume>
          (
          <issue>1</issue>
          /2):
          <volume>77</volume>
          {
          <fpage>108</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. B. Lague,
          <string-name>
            <given-names>D.</given-names>
            <surname>Proulx</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mayrand</surname>
          </string-name>
          , E. Merlo, and
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Hudepohl</surname>
          </string-name>
          .
          <article-title>Assessing the Bene ts of Incorporating Function Clone Detection in a Development Process</article-title>
          .
          <source>In ICSM</source>
          , pages
          <volume>314</volume>
          {
          <fpage>321</fpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>K. J. Ottenstein</surname>
            and
            <given-names>L. M.</given-names>
          </string-name>
          <string-name>
            <surname>Ottenstein</surname>
          </string-name>
          .
          <article-title>The Program Dependence Graph in a Software Development Environment</article-title>
          .
          <source>In SDE</source>
          , pages
          <volume>177</volume>
          {
          <fpage>184</fpage>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>J. Qiu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          <string-name>
            <surname>Su</surname>
            , and
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Ma</surname>
          </string-name>
          .
          <article-title>Library Functions Identi cation in Binary Code by Using Graph Isomorphism Testings</article-title>
          . In Y.-G. Gueheneuc,
          <string-name>
            <given-names>B.</given-names>
            <surname>Adams</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A</surname>
          </string-name>
          . Serebrenik, editors,
          <source>SANER</source>
          , pages
          <volume>261</volume>
          {
          <fpage>270</fpage>
          . IEEE, Mar.
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>D.</given-names>
            <surname>Rattan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. K.</given-names>
            <surname>Bhatia</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Singh</surname>
          </string-name>
          .
          <article-title>Software Clone Detection: A Systematic Review</article-title>
          .
          <source>Information &amp; Software Technology</source>
          ,
          <volume>55</volume>
          (
          <issue>7</issue>
          ):
          <volume>1165</volume>
          {
          <fpage>1199</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. C. Rich and
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Waters</surname>
          </string-name>
          .
          <source>The Programmer's Apprentice. ACM</source>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>C. K. Roy</surname>
            ,
            <given-names>J. R.</given-names>
          </string-name>
          <string-name>
            <surname>Cordy</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Koschke</surname>
          </string-name>
          .
          <article-title>Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach</article-title>
          . SCP,
          <volume>74</volume>
          (
          <issue>7</issue>
          ):
          <volume>470</volume>
          {
          <fpage>495</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>C. K. Roy</surname>
            ,
            <given-names>M. F.</given-names>
          </string-name>
          <string-name>
            <surname>Zibran</surname>
            , and
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Koschke</surname>
          </string-name>
          .
          <article-title>The Vision of Software Clone Management: Past, Present and Future</article-title>
          . In S. Demeyer,
          <string-name>
            <given-names>D.</given-names>
            <surname>Binkley</surname>
          </string-name>
          , and F. Ricca, editors,
          <source>CSMR-WCRE</source>
          , pages
          <volume>18</volume>
          {
          <fpage>33</fpage>
          . IEEE,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>R.</given-names>
            <surname>Tairas</surname>
          </string-name>
          .
          <article-title>Clone Detection and Refactoring</article-title>
          .
          <source>In OOPSLA</source>
          , pages
          <volume>780</volume>
          {
          <fpage>781</fpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <given-names>S.</given-names>
            <surname>Thummalapenta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cerulo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Aversano</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Penta</surname>
          </string-name>
          .
          <article-title>An Empirical Study on the Maintenance of Source Code Clones</article-title>
          . EMSE,
          <volume>15</volume>
          (
          <issue>1</issue>
          ):1{
          <fpage>34</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <given-names>M.</given-names>
            <surname>Weiser</surname>
          </string-name>
          . Program Slicing.
          <source>IEEE TSE</source>
          ,
          <volume>10</volume>
          (
          <issue>4</issue>
          ):
          <volume>352</volume>
          {
          <fpage>357</fpage>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>V.</given-names>
            <surname>Zaytsev</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. H.</given-names>
            <surname>Bagge</surname>
          </string-name>
          .
          <article-title>Parsing in a Broad Sense</article-title>
          . In J. Dingel,
          <string-name>
            <given-names>W.</given-names>
            <surname>Schulte</surname>
          </string-name>
          , I. Ramos, S. Abraha~o, and E. Insfran, editors,
          <source>MoDELS</source>
          , volume
          <volume>8767</volume>
          <source>of LNCS</source>
          , pages
          <volume>50</volume>
          {
          <fpage>67</fpage>
          . Springer, Oct.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>Implementing a PDG Library in Rascal</article-title>
          .
          <source>Master's thesis</source>
          , Universiteit van Amsterdam, The Netherlands, Sept.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>