<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Design Pattern Detection using Software Metrics and Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Aoyama Media Laboratory Tokyo</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. Computer Science and Engineering Waseda University Tokyo</institution>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>-The understandability, maintainability, and reusability of object-oriented programs could be improved by automatically detecting well-known design patterns in programs. Many existing detection techniques are based on static analysis and use strict conditions composed of class structure data. Hence, it is difficult for them to detect design patterns in which the class structures are similar. Moreover, it is difficult for them to deal with diversity in design pattern applications. We propose a design pattern detection technique using metrics and machine learning. Our technique judges candidates for the roles that compose the design patterns by using machine learning and measurements of metrics, and it detects design patterns by analyzing the relations between candidates. It suppresses false negatives and distinguishes patterns in which the class structures are similar. We conducted experiments that showed that our technique was more accurate than two previous techniques.</p>
      </abstract>
      <kwd-group>
        <kwd>component</kwd>
        <kwd>Object-oriented software</kwd>
        <kwd>Design pattern</kwd>
        <kwd>Software metrics</kwd>
        <kwd>Machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>INTRODUCTION</p>
      <p>
        Design patterns (hereafter, patterns) are defined as
descriptions of communicating classes that form a common
solution to a common design problem. Gang of Four (GoF)
patterns [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] are representative patterns for object-oriented
software. Patterns are composed of classes that describe the
roles and abilities of objects. For example, Figure 1 shows
one GoF pattern named the State pattern. This pattern
is composed of roles named Context, State, and
ConcreteState. The use of patterns enables software
development with high maintainability, high reusability, and
improved understandability, and it facilitates smooth
communications between developers.
      </p>
      <p>Programs implemented by a third party and open source
software may take a lot of time to understand, and patterns
may be applied without explicit class names, comments, or
attached documents in existing programs. Thus, pattern
detection improves the understandability of programs.
However, manually detecting patterns in existing programs
is inefficient, and patterns may be overlooked.</p>
      <p>Many studies on using automatic pattern detection to
solve the above problems have used static analysis. However,
static analysis has difficulty identifying patterns in which
class structures are similar and patterns with few features. In
addition, there is still a possibility that software developers
might overlook patterns if they use strict conditions like the
class structure analysis, and if the applied patterns vary from
the intended conditions even a little.</p>
      <p>
        We propose a pattern detection technique that uses
software metrics (hereafter, metrics) and machine learning.
Although our technique can be classified as a type of static
analysis, unlike previous detection techniques it detects
patterns by using identifying elements derived by machine
learning based on measurement of metrics without using
strict condition descriptions (class structural data, etc.). A
metric is a quantitative measure of a software property that
can be used to evaluate software development. For example,
one such metric, number of methods (NOM), refers to the
number of methods in a class [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Moreover, by using
machine learning, we can in some cases obtain previously
unknown identifying elements from combinations of metrics.
To cover a diverse range of pattern applications, our method
uses a variety of learning data because the results of our
technique may depend on the kind and number of learning
data used during the machine learning process. Finally, we
conducted experiments comparing our technique with two
previous techniques and found that our approach was the
most accurate of the three.
      </p>
      <p>II.</p>
      <p>PREVIOUS DESIGN PATTERN DETECTION TECHNIQUES</p>
      <p>AND THEIR PROBLEMS</p>
      <p>
        Most of the existing detection techniques use static
analysis [
        <xref ref-type="bibr" rid="ref4">3</xref>
        ][
        <xref ref-type="bibr" rid="ref5">4</xref>
        ]. These techniques chiefly analyze information
such as class structures that satisfy certain conditions. If they
vary from the intended strict conditions even a little, or two
or more roles are assigned in a class, there is a possibility
that developers might overlook patterns.
      </p>
      <p>
        There is a technique that detects patterns based on the
degrees of similarity between graphs of pattern structure and
graphs of programs to be detected [
        <xref ref-type="bibr" rid="ref4">3</xref>
        ]. However,
distinguishing the State pattern from the Strategy
pattern is difficult because their class structures are similar
(see Figure 1 and 2). Unlike this method, we distinguish the
patterns to which the structure is similar by the identification
of the roles from the quantity and the ratio of metrics by the
machine learning. In addition, this technique [
        <xref ref-type="bibr" rid="ref4">3</xref>
        ] is available
to the public as a web-based tool.
      </p>
      <p>
        There is a technique that outputs pattern candidates based
on features derived from metric measurements [
        <xref ref-type="bibr" rid="ref6">5</xref>
        ]. However,
it requires manual confirmation; this technique can roughly
identify pattern candidates, but the final choice depends on
the developer's skill. Our technique detects patterns without
manual filtering by metrics and machine learning but also by
analyzing class structure information. Moreover, this
technique uses general metrics concerning an object-oriented
system without using metrics for each role. Our technique
uses metrics that specialize in each role.
      </p>
      <p>
        Another existing technique improves precision by
filtering detection results using machine learning. This
technique inputs measurements of the classes and methods of
each pattern [
        <xref ref-type="bibr" rid="ref7">6</xref>
        ]. However, it uses the existing static
analytical approach, whereas our technique instead uses
machine learning throughout the entire process.
      </p>
      <p>
        One current technique analyzes programs both before and
after patterns are applied [
        <xref ref-type="bibr" rid="ref8">7</xref>
        ]. This method requires a revision
history of the programs used. Our technique detects patterns
by analyzing only the current programs.
      </p>
      <p>
        Yet another approach detects patterns from the class
structure and behavior of a system after classifying its
patterns [
        <xref ref-type="bibr" rid="ref9">8</xref>
        ][
        <xref ref-type="bibr" rid="ref10">9</xref>
        ]. It is difficult to use, however, when patterns
are applied more than once and when pattern application is
diverse. In contrast, our technique copes well with both of
these challenges.
      </p>
      <p>
        Other detection techniques use dynamic analysis. These
methods identify patterns by referring to the execution route
information of programs [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ][
        <xref ref-type="bibr" rid="ref12">11</xref>
        ]. However, it is difficult to
analyze the entire execution route and use fragmentary class
sets in an analysis. Additionally, the results of dynamic
analysis depend on the representativeness of the execution
sequences.
      </p>
      <p>
        Some detection techniques use a multilayered
(multiphase) approach [
        <xref ref-type="bibr" rid="ref13">12</xref>
        ][
        <xref ref-type="bibr" rid="ref14">13</xref>
        ]. Lucia et al. use a two-phase,
static analysis approach [
        <xref ref-type="bibr" rid="ref13">12</xref>
        ]. This method has difficulty,
however, in detecting creational and behavioral patterns
because it analyzes pattern structures and source code level
conditions. Guéhéneuc et al. use “DeMIMA,” an approach
that consists of three layers: two layers to recover an abstract
model of the source code, including binary class
relationships, and a third layer to identify patterns in the
abstract model. However, distinguishing the State pattern
from the Strategy pattern is difficult because their
structures are identical. Our technique can detect patterns in
all categories and distinguish the State pattern from the
Strategy pattern using metrics and machine learning.
      </p>
      <p>
        Finally, one existing technique detects patterns using
formal OWL (Web Ontology Language) definitions [
        <xref ref-type="bibr" rid="ref15">14</xref>
        ].
However, false negatives arise because this technique does
not accommodate the diversity in pattern applications. The
technique [
        <xref ref-type="bibr" rid="ref15">14</xref>
        ] is available to the public via the web as an
Eclipse plug-in.
      </p>
      <p>
        We suppress false negatives by using metrics and
machine learning to accommodate diverse pattern
applications and to distinguish patterns in which the class
structures are similar. It should be noted that only techniques
[
        <xref ref-type="bibr" rid="ref4">3</xref>
        ], [
        <xref ref-type="bibr" rid="ref15">14</xref>
        ] discussed above have been released as publicly
accessible tools.
      </p>
      <p>III.</p>
    </sec>
    <sec id="sec-2">
      <title>OUR TECHNIQUE</title>
      <p>Our technique is composed of a learning phase and a
detection phase. The learning phase is composed of three
processes, and the detection phase is composed of two
processes, as shown in Figure 3. Each process is described
below, with pattern specialists and developers included as
the parties concerned. Pattern specialists mean persons that
have the knowledge about the patterns. Developers mean
persons that maintain the object-oriented software. Our
technique currently uses Java as the program language.
[Learning Phase]</p>
      <sec id="sec-2-1">
        <title>P1. Define Patterns</title>
        <p>Pattern specialists determine the detectable patterns and
define the structures and roles composing these patterns.</p>
      </sec>
      <sec id="sec-2-2">
        <title>P2. Decide Metrics</title>
        <p>Pattern specialists determine useful metrics to judge the
roles defined in P1 by using the Goal Question Metric
decision technique.</p>
      </sec>
      <sec id="sec-2-3">
        <title>P3. Machine Learning</title>
        <p>Pattern specialists input programs applied patterns into
the metrics measurement system, and obtain measurements
for each role. And specialists input these measurements into
the machine learning simulator to learn. After machine
learning they verify the judgment for each role, and if
the verification results are not good, they return to P2 and
revise the metrics.
[Detection Phase]</p>
      </sec>
      <sec id="sec-2-4">
        <title>P4. Role Candidate Judgment</title>
        <p>Developers input programs to be detected into the
metrics measurement system, and obtain measurements for
each class. And developers input these measurements into
the machine learning simulator. Machine learning simulator
identifies role candidates.</p>
      </sec>
      <sec id="sec-2-5">
        <title>P5. Pattern Detection</title>
        <p>Developers input role candidates judged in P4 to the
pattern detection system by using the pattern structure
definitions defined in P1. This system detects patterns
automatically. The structure definitions correspond to the
letters P, R, and E of section III-B.</p>
        <p>A.</p>
      </sec>
      <sec id="sec-2-6">
        <title>Learning Phase</title>
      </sec>
      <sec id="sec-2-7">
        <title>P1. Define Patterns</title>
        <p>Currently, our technique considers five GoF patterns
(Singleton, TemplateMethod, Adapter, State,
and Strategy) and 12 roles. The GoF patterns are grouped
into creational patterns, structural patterns, and behavioral
patterns. Our technique uses these patterns to cover all these
groups.</p>
      </sec>
      <sec id="sec-2-8">
        <title>P2. Decide Metrics</title>
        <p>
          Pattern specialists decide on useful metrics to judge roles
by using the Goal Question Metric decision technique [
          <xref ref-type="bibr" rid="ref15">14</xref>
          ]
(hereafter, GQM). GQM is a top-down approach used to
clarify relations between goals and metrics.
        </p>
        <p>We experimented with judging roles by using general
metrics without GQM. However, the machine learning
results were unsatisfactory because the measurements of
some metrics were irregular. Consequently, we chose GQM
so that the machine learning could function appropriately by
stable metrics in each role. With our technique, the pattern
specialists set as a goal the accurate judgment of each role.
To achieve this goal they defined a set of questions to be
evaluated. Next, they decided on useful metrics to help
answer the questions they had established. The pattern
specialists decide metrics to identify roles by the quantity
and the ratio of measurements. Therefore, they decide
questions by paying attention to the attributes and operations
of the roles by reading the description of the pattern
definition. They decide simple metrics concerning the static
aspect like structure first to improve the recall. However, the
lack of questions might occur because GQM is qualitative.
Therefore, if the machine learning results were unsatisfactory
by irregular measurements of metrics, the procedure loops
back to P2 to reconsider metrics also concerning behavior.
Moreover, it will be possible to apply GQM to roles with
new patterns in the future.</p>
        <p>For example, Figure 4 illustrates the goal of making a
judgment about the AbstractClass role in the
TemplateMethod pattern. AbstractClass roles have
abstract methods or methods using written logic that use
abstract methods as shown in Figure 5. The
AbstractClass role can be distinguished by the ratio of
ConcreteClass</p>
        <p>Target
the number of methods to the number of abstract methods
because with this role the former exceeds the latter.
Therefore, the number of abstract methods (NOAM) and
number of methods (NOM) are useful metrics for judging
this role. Table I shows the results of applying GQM to all
roles. The details of metrics are described in Table II of
paragraph IV-A.</p>
        <p>
          The previous technique [
          <xref ref-type="bibr" rid="ref6">5</xref>
          ] uses GQM, too. In this
technique, the goal is set as “Recover design patterns”. And
this technique uses general metrics concerning an
objectoriented system without deciding metrics at each role. On the
other hand, our technique uses metrics that specialize in each
role.
        </p>
        <p>P3.</p>
      </sec>
      <sec id="sec-2-9">
        <title>Machine Learning</title>
        <p>Machine learning is a technique that analyzes sample
data by computer and acquires useful rules with which to
make forecasts about unknown data. We used the machine
learning so as to be able to evaluate patterns with a variety of
application forms. Machine learning suppresses false
negatives and achieves extensive detection.</p>
        <p>
          Our technique uses a neural network [
          <xref ref-type="bibr" rid="ref17">16</xref>
          ] algorithm. A
support vector machine [
          <xref ref-type="bibr" rid="ref17">16</xref>
          ] could also be used to distinguish
a pattern of two groups by using linear input elements.
However, we chose a neural network because it outputs the
values to all roles, taking into consideration the dependency
among the different metrics. Therefore, it can deal with cases
in which one class plays two or more roles.
        </p>
        <p>A neural network is composed of an input layer, hidden
layers, and an output layer, as shown in Figure 6, and each
layer is composed of elements called units. Values are given
a weight when they move from unit to unit, and a judgment
rule is acquired by changing the weights. A typical algorithm
for adjusting weights is back propagation. Back propagation
calculates an error margin between output result y and the
correct answer T, and it sequentially adjusts weights from the
layer nearest the output to the input layer, as shown in Figure
7. These weights are adjusted until the output error margin of
the network reaches a certain value.</p>
        <p>
          Our technique uses a hierarchical neural network
simulator [
          <xref ref-type="bibr" rid="ref18">17</xref>
          ]. This simulator uses back propagation. The
hierarchy number in the neural network is set to three, the
number of units in the input layer and the hidden layer are set
to the number of decided metrics, and the number of units of
the output layer is set to the number of roles being judged.
The input consists of the metric measurements of each role in
a program to which patterns have already been applied, and
the output is an expected role. Pattern specialists obtain
measurements for each role by using metrics measurement
system. And, specialists input these measurements into the
machine learning simulator to learn. The learning repetitions
cease when the error margin curve of the simulator
converges. The specialists verify the convergence of the
error margin curve manually at present. After machine
learning they verify the judgment for each role, and if the
verification results are not good, they return to P2 and revise
the metrics.
        </p>
      </sec>
      <sec id="sec-2-10">
        <title>Detection Phase</title>
      </sec>
      <sec id="sec-2-11">
        <title>P4. Role Candidate Judgment</title>
        <p>Developers input programs to be detected into the
metrics measurement system, and obtain measurements for
each class. And developers input these measurements to the
machine learning simulator. This simulator outputs values
between 0–1 to all roles to be judged. The output values are
normalized such that the sum of all values becomes 1. These
output values are called role agreement values. A larger role
agreement value means that the role candidate is more likely
correct. The reciprocal of the number of roles to be detected
is set as a threshold, and the role agreement values that are
higher than the threshold are taken to be role candidates. The
threshold is 1/12 (i.e., 0.0834) because we treat 12 roles at
present. The sum of the output values is different at each
input in the neural net work. Therefore, to use a common
threshold for all the output, our technique normalizes the
output value.</p>
        <p>For example, Figure 8 shows the role candidate judgment
results with NOM of 3 and NOAM of 2 and other metrics of
0; the output value of AbstractClass is the highest value.
By regularizing the values of Figure 8, the roles are judged to
be AbstractClass and Target.</p>
      </sec>
      <sec id="sec-2-12">
        <title>P5. Pattern Detection</title>
        <p>Developers input role candidates judged in P4 into the
pattern detection system by using the pattern structure
definitions defined in P1. And, this system detects patterns
by matching the direction of the relations between role
candidates and the roles of pattern in programs. The
matching moves sequentially from the role candidate with
the highest agreement value to that with the lowest value.
The pattern detection system searches all combinations of
role candidates that accord with the pattern structures. The
pattern detection system detects patterns when the directions
of relations between role candidates accord with the pattern
structure and when the role candidates accord with roles at
both ends of the relations. Moreover, the relation agreement
values reflect the kind of relation.</p>
        <p>Currently, our method deals with inheritance, interface
implementation, and aggregation relations. The kind of
relations will increase as more patterns get added in the
future. The relation agreement value is 1.0 when the kind
agrees with the relation of the pattern, and it is 0.5 when the
kind does not agree. If the relation agreement value is 0 then
the kind of relation does not agree, the pattern agreement
value might become 0, and these classes will not be detected
as patterns. In such cases, a problem similar to those of the
previous detection techniques will occur because the
difference in the kind of relation is not recognized.</p>
        <p>The pattern agreement value is calculated from the role
agreement values and the relation agreement values. The
pattern to be detected is denoted as P, the role set that
composes the pattern is denoted as R, and the relation set is
denoted as E. Moreover, the program that is the target of
detection is defined as P’, the set of classes comprising the
role candidates is R’, and the set of relations between
elements of R' is denoted as E’. The role agreement value is
denoted as Role, and the relation agreement is denoted as Rel.
Role means the function which is input the element of R and
the one of R' , and . Rel means the function which is input the
element of E and the one of E'. The product of the average of
two roles at both ends of the relation and Rel is denoted as
Com, and the average of Com is denoted as Pat. Moreover,
the average of two Roles is calculated when Com is
calculated, and the average value of Com is calculated to
adjust Pat and Role to values from 0 to 1 when Pat is
calculated. If the directions of the relations do not agree, Rel
is assumed to be 0.</p>
        <sec id="sec-2-12-1">
          <title>Role(rm , rn)  The role agreement value</title>
        </sec>
        <sec id="sec-2-12-2">
          <title>R el(ep , eq )  The relation agreement value</title>
          <p>rm  R, rn  R
e p  E, eq  E
Com(ep , eq )  Role(ra , rb)  Role(rc , rd )  R el(ep , eq )</p>
          <p>2
　　　　　　　　　　ra , rc  R, rb, rd  R, ep  (ra , rc ), ep  (rb, rd )
1
Pat(P, P) 

(e p , eq )  E  E R el(e p , eq )  0
</p>
          <p>Figure 9 shows an example of detecting the
TemplateMethod pattern. In this example, it is assumed
that class SampleA has the highest role agreement value for
an AbstractClass. The pattern agreement value between
the program Samples and the TemplateMethod pattern is
calculated with the following algorithm.
epE,eqE Com(e p , eq )
P  TemplateMethod  (R, E)　
R  {AbstractClass, ConcreteClass}　
E  {AbstractClass　 ConcreteClass}
Samples  (R, E)
R  {SampleA, SampleB, SampleC}
E  {SampleA  SampleB, SampleA ◇SampleC}
(   : inheritance, ◇ : aggregation)
Role( AbstractClass, SampleA)  0.82　Role(ConcreteClass, SampleB)  0.45　
Role(ConcreteClass, SampleC)  0.57
R el( AbstractClass　 ConcreteClass, SampleA  SampleB)  1.0　　
R el( AbstractClass　 ConcreteClass, SampleA ◇SampleC)  0.5</p>
        </sec>
      </sec>
      <sec id="sec-2-13">
        <title>Com(AbstractClass　 ConcreteClass, SampleA  SampleB)  0.82  0.45 1.0  0.635</title>
        <p>2</p>
      </sec>
      <sec id="sec-2-14">
        <title>Com(AbstractClass　 ConcreteClass, SampleA ◇SampleC)  0.82  0.57  0.5  0.348</title>
        <p>2
Pat(TemplateMethod , Samples)  1  0.635  0.348  0.492</p>
        <p>2</p>
        <p>In the program shown in Figure 9, the pattern agreement
value of the TemplateMethod pattern was calculated to
be 0.492. Pattern agreement values are normalized from 0 to
1, just like role agreement values. Our technique uses the
same threshold among of pattern agreement value as role
agreement value because a lot of classes are detected as the
pattern that composed of the only class like the singleton
pattern. Classes with a pattern agreement value that exceeds
the threshold are output as the detection result. The
reciprocal of the number of roles for detection is taken to be
the threshold, similar to the case of role candidate judgment,
and pattern agreement values that are higher than the
threshold are output as the detection result.</p>
        <p>In Figure 9, SampleA, SampleB, and SampleC were
detected as TemplateMethod patterns. Moreover,
SampleA and SampleB, SampleA and SampleC can
also be considered to match the TemplateMethod
pattern. In this case, the relation of
“SampleA   SampleB” is more similar to a
TemplateMethod pattern than the relation of
“SampleA ◇ SampleC” because its agreement value of
the former pair is 0.635 while that of the latter pair is only
0.348.</p>
        <p>IV.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>EVALUATION AND DISCUSSION</title>
      <p>We determined whether the machine learning simulator
derived identifying elements of the roles after learning.
Moreover, we compared our technique with two previous
techniques to verify the precision and recall of our approach
and to confirm whether it could match its detected patterns
with similar structures and diverse patterns.</p>
      <p>A.</p>
      <sec id="sec-3-1">
        <title>Verification of Role Candidate Judgement</title>
        <p>
          We used cross-validation to verify the role candidate
judgment. In cross-validation, data are divided into n groups,
and a test to verify a role candidate judgment is executed
such that the testing data are one data group and the learning
data are n-1 data groups. We executed the test five times by
dividing the data into five groups. In this paper, programs
such as programs in the reference [
          <xref ref-type="bibr" rid="ref19">18</xref>
          ], etc., are called small
scale, whereas programs in practical use are called large
scale. We used the set of programs where patterns are
applied in small-scale programs (60 in total) 1 [
          <xref ref-type="bibr" rid="ref19">18</xref>
          ][
          <xref ref-type="bibr" rid="ref20">19</xref>
          ] and
large-scale programs (158 in total from the Java library
1.6.0_13 [20], JUnit 4.5 [21], and Spring Framework 2.5
RC2 [22]) as experimental data. We judged manually and
qualitatively whether the patterns were appropriately applied
in this set of programs.
        </p>
        <p>Table II shows the metrics that were chosen for the
small-scale and large-scale programs. We used different
metrics depending on the magnitude of the programs. For
instance, we chose the metric called number of methods
generating instance (NMGI) for small-scale programs
because the method for transit states in the
ConcreteState role in the State pattern generates
other ConcreteState roles in small-scale programs. We
guessed that the difference appeared in ratios of metrics
about State and Strategy, so we used the same metrics for the
large-scale programs without NMGI. Because State pattern
treats the states in State role and treats the actions of the
states in the Context role. On the other hand Strategy pattern
encapsulates the processing of each algorithm into a Strategy
role, and Context processing becomes simpler compared
with that of State pattern.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>1 All small-scale code:</title>
      <p>http://www.washi.cs.waseda.ac.jp/ja/paper/uchiyama/dp.html</p>
      <p>We focused our attention on recall because the purpose
of our technique was detection covering diverse pattern
applications. Recall indicates the degree to which detection
results are free of leakage, whereas precision shows how free
of disagreement these result are. The data in Table III was
used to calculate recall. wr, xr, yr, and zr are numbers of roles,
and wp, xp, yp, and zp are numbers of patterns. Recall was
calculated from the data in Table III by the following
expressions.</p>
      <p>Recall of role candidate judgment: Rer  wr
wr  xr</p>
      <p>Table IV shows the average recall for each role. Role
candidates must be judged accurately because the State
pattern and Strategy pattern have the same class structure.
Therefore, our technique regards the roles of the patterns
other than State and Strategy patterns as role
candidates when the role agreement value was above the
threshold, whereas our technique regards the roles of State
and Strategy patterns as role candidates when the role
agreement value was above the threshold and the roles of
both patterns were distinguished State pattern from Strategy
pattern.</p>
      <p>As shown in Table IV, the recalls for the large-scale
programs were lower than those for the small-scale programs.
Accurate judgment of large-scale programs was more
difficult because these programs possessed attributes and
operations that were unnecessary for pattern composition.
Therefore, it will be necessary to collect a significant amount
of learning data to adequately cover a variety of large-scale
programs.</p>
      <p>The results in Table IV pertain to instances where the
State and Strategy patterns could be distinguished. The
Context role had high recall, but State and
ConcreteState roles had especially low recalls for
largescale programs. However, the candidates for the State role
were output with high recall when the threshold was
exceeded. Therefore, the State pattern can be distinguished
by initiating searching from the Context role in P5, and
this improves recall.</p>
      <sec id="sec-4-1">
        <title>Pattern Detection Results</title>
        <p>Our technique detects patterns using test data in both the
small-scale and large-scale programs, and this result is
evaluated. We used 40 sets of programs where patterns are
applied in small-scale programs and 126 sets of programs
where patterns are applied in large-scale programs as
learning data. We judged manually and qualitatively whether
the patterns were appropriately applied in the detection
results. Table V shows precision and recall of the detected
patterns. Precision and recall were calculated from the data
in Table III by the following expressions:</p>
        <p>Recall of pattern detection : Re p  wp
wp  x p
Precision of pattern detection : Prp  wp
wp  y p</p>
        <p>Small-scale and large-scale programs shared a common
point in that they both had recalls that were higher than
precisions. However, there were many non-agreements about
the State patterns and Strategy patterns in the
largescale programs. Recall was 90% or more with the
smallscale programs, but it dropped as low as 60% with the
largescale programs.</p>
        <p>The large-scale programs resulted in especially low recall
for the Adapter pattern. Table IV shows the cause: The
recall of the role candidate judgment for the Adapter
pattern was not high enough. It is necessary to show that all
roles that compose patterns have agreement values that are
above the threshold so that patterns will be detected. There
were many cases in which neither of the roles that composed
patterns was judged as a role candidate for the Adapter
pattern. It will be necessary to return to P2 and choose new
metrics. The State pattern was identified by searching
from the Context role, for instance, in the State pattern
detection in the large-scale programs, and the recall of the
pattern detection was higher than the recall of role candidate
judgment. Table V shows holistically that our technique
suppresses false negatives because the recall is high.</p>
      </sec>
      <sec id="sec-4-2">
        <title>C. Experiment Comparing Previous Detection Techniques</title>
        <p>
          We experimentally compared our technique with
previous detection techniques [
          <xref ref-type="bibr" rid="ref4">3</xref>
          ][
          <xref ref-type="bibr" rid="ref15">14</xref>
          ]. These previous
techniques have been publicly released, and they consider
three or more patterns addressed by our own technique. Both
target Java programs, as does our own. The technique
proposed by Tsantails’s technique [
          <xref ref-type="bibr" rid="ref4">3</xref>
          ](hereafter, TSAN) has
four patterns in common with ours (Singleton,
TemplateMethod, Adapter and State/Strategy).
Because this technique cannot distinguish the State pattern
from the Strategy pattern, these are detected as one
pattern. Dietrich’s technique [
          <xref ref-type="bibr" rid="ref15">14</xref>
          ] (hereafter, DIET) has three
patterns in common (Singleton, TemplateMethod,
Adapter) with our own. TSAN detects patterns based on
the degree of similarity between the graphs of the pattern
structure and graphs of the programs to be detected, whereas
DIET detects patterns by using formal OWL (Web Ontology
Language) definitions. Patterns were detected and evaluated
with the small-scale and large-scale test data. Moreover, the
test data and learning data were different.
        </p>
        <p>Figure 10 shows the recall and precision graphs for our
technique and TSAN, and Figure 11 shows the
corresponding graphs for our technique and DIET. We
ranked the detection results of our technique with the pattern
agreement values. Next, we calculated recall and precision
according to the ranking and plotted them. Recall and
precision were calculated from the data in Table III by using
the expressions of paragraph IV -B. In the results of TSAN
and DIET, we alternately plotted results because these
previous detection techniques output no value to rank. In the
recall and precision graphs higher values are better.</p>
        <p>Figure 10 and 11 show particularly good results for all
techniques when small-scale programs was examined. This
is because small-scale programs do not include unnecessary
attributes and operations in the composition of patterns.</p>
        <p>Table VI and VII show the average F measure for each
plot of Figure 10 and 11. The F measure is calculated with
recall and precision calculated by the above-mentioned
expression as follows.</p>
        <p>F  measure 
2  Prp  Re p</p>
        <p>Prp  Re p</p>
        <p>A large F measure means higher accuracy, and these
tables show that our technique had a larger F measure than
the previous techniques had.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Distinction between State pattern and Strategy pattern</title>
        <p>Our technique distinguished State pattern Strategy
pattern. Table VIII is an excerpt of the metrics measurements
for the Context role in State pattern and Strategy
pattern that were distinguished by the experiment on the
large-scale programs. State pattern treats the states in
State role and treats the actions of the states in the
Context role. Strategy pattern encapsulates the
processing of each algorithm into a Strategy role, and
Context processing becomes simpler compared with that
of State pattern. Table VIII shows 45 fields and 204
methods as the largest in Context role in State pattern
(18 and 31 respectively in Context role of Strategy
pattern). Therefore, the complexity of Context role of both
patterns appears in the number of fields and the number of
methods, and these are distinguishing elements. Figure 10
shows that our technique is especially good because State
pattern and Strategy pattern could not be distinguished
with TSAN.</p>
      </sec>
      <sec id="sec-4-4">
        <title>Detection of Subspecies of Patterns</title>
        <p>Figure 11 shows that the recall of DIET is low in the case
of large-scale programs because this technique doesn't
accommodate the diversity in pattern applications.
Additionally, large-scale programs not only contain many
attributes and operations in the composition of patterns but
also subspecies of patterns.</p>
        <p>Our technique detected subspecies of patterns. For
example, our technique detected the source code of the
Singleton pattern that used the boolean variable as shown
in Figure 12. This Singleton pattern was not detected in
TSAN or DIET. However, unlike the previous techniques,
our technique is affected by false positives because it is a
gradual detection using metrics and machine learning instead
of strict conditions. False positives of the Singleton
pattern especially stood out because Singleton pattern is
composed of only one role. It will be necessary to use
metrics that are specialized to one or a few roles to make
judgments about patterns composed of one role like the
Singleton pattern (P4).</p>
        <p>Therefore, our technique is superior to previous one
because the curve of our technique is above the previous in
Figures 10 and 11.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSION AND FUTURE WORK</title>
      <p>We devised a pattern detection technique using metrics
and machine learning. Role candidates are judged using
machine learning that relies on measured metrics, and
patterns are detected from the relations between classes. We
worked on the problems associated with overlooking patterns
and distinguishing patterns in which the class structures are
similar.</p>
      <p>We demonstrated that our technique was superior to two
previous detection techniques by experimentally
distinguishing patterns in which the class structures are
similar. Moreover, subspecies of patterns were detected, so
we could deal with a very diverse set of pattern applications.
However, our technique was more susceptible to false
positives because it does not use strict conditions such as
those used by the previous techniques.</p>
      <p>We have several goals for our future work. First, we plan
to add more patterns to be detected. Our technique can
currently cope with five patterns. However, we predict it will
be possible to detect other patterns if we can decide upon
metrics to identify them. It is also necessary to collect more
learning data to cover the diversity in pattern applications.
Moreover, we plan to more narrowly adapt the metrics to
each role by returning to step P2 because results might
depend on the data. This process would lead to the
enhancement of recall and precision.</p>
      <p>Second, we currently qualitatively and manually judge
whether to return to step P2 and to apply GQM again; hence,
in the future, we should find an appropriate automatic
judgment method.</p>
      <p>Third, we plan to prove the validity of the expressions
and the parameters of agreement values and thresholds. We
consider that it is possible to reduce the false positive rate by
deciding on the optimum thresholds for role agreement
values and pattern agreement values.</p>
      <p>Finally, we plan to determine the learning number of
times automatically and examine the correlation of the
learning number of times and precision.
[21] JUnit.org. Resources for Test Driven Development.</p>
      <p>http://www.junit.org/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Gamma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Helm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Vlissides</surname>
          </string-name>
          . Design Patterns:
          <article-title>Elements of Reusable Object-Oriented Software</article-title>
          .
          <source>Addison-Wesley</source>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. Kidd</given-names>
            <surname>Object-Oriented Software</surname>
          </string-name>
          Metrics.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Prentice</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tsantalis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chatzigeorgiou</surname>
          </string-name>
          , G. Stephanides, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Halkidis</surname>
          </string-name>
          .
          <article-title>Design Pattern Detection Using Similarity Scoring</article-title>
          .
          <source>IEEE Trans. Software Engineering</source>
          , Vol.
          <volume>32</volume>
          , No.
          <volume>11</volume>
          , pp.
          <fpage>896</fpage>
          -
          <lpage>909</lpage>
          2006.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Blewitt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bundy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Stark</surname>
          </string-name>
          .
          <article-title>Automatic Verification of Design Patterns in Java</article-title>
          .
          <source>In Proceedings of the 20th International Conference on Automated Software Engineering</source>
          , pp.
          <fpage>224</fpage>
          -
          <lpage>232</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kim</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Boldyreff</surname>
          </string-name>
          .
          <article-title>A Method to Recover Design Patterns Using Software Product Metrics</article-title>
          .
          <source>In Proceedings of the 6th International Conference on Software Reuse: Advances in Software Reusability</source>
          , pp.
          <fpage>318</fpage>
          -
          <lpage>335</lpage>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ferenc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Beszedes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fulop</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Lele</surname>
          </string-name>
          .
          <article-title>Design Pattern Mining Enhanced by Machine Learning</article-title>
          .
          <source>21st IEEE International Conference on Software Maintenance</source>
          , pp.
          <fpage>295</fpage>
          -
          <lpage>304</lpage>
          2005.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Washizaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Fukaya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kubo</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fukazawa. Detecting Design Patterns Using</surname>
          </string-name>
          <article-title>Source Code of Before Applying Design Patterns</article-title>
          .
          <source>8th IEEE/ACIS International Conference on Computer and Information Science</source>
          , pp.
          <fpage>933</fpage>
          -
          <lpage>938</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Shi</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.A.</given-names>
            <surname>Olsson</surname>
          </string-name>
          .
          <article-title>Reverse Engineering of Design Patterns from Java Source Code</article-title>
          .
          <source>21st IEEE/ACM International Conference on Automated Software Engineering</source>
          , pp.
          <fpage>123</fpage>
          -
          <lpage>134</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Youn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Automatic Detection of Design Pattern for Reverse Engineering</article-title>
          .
          <source>5th ACIS International Conference on Software Engineering Research, Management and Applications</source>
          , pp.
          <fpage>577</fpage>
          -
          <lpage>583</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wendehals</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Orso</surname>
          </string-name>
          .
          <article-title>Recognizing Behavioral Patterns at Runtime Using Finite Automata</article-title>
          .
          <source>In 4th ICSE 2006 Workshop on Dynamic Analysis</source>
          , pp.
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hayashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Katada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sakamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Kobayashi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Saeki</surname>
          </string-name>
          .
          <article-title>Design Pattern Detection by Using Meta Patterns</article-title>
          .
          <source>IEICE Transactions</source>
          , Vol. 91-D, No.
          <issue>4</issue>
          , pp.
          <fpage>933</fpage>
          -
          <lpage>944</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Lucia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Deufemia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gravino</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Risi</surname>
          </string-name>
          .
          <article-title>Design pattern recovery through visual language parsing and source code analysis</article-title>
          .
          <source>Journal of Systems and Software</source>
          , Vol.
          <volume>82</volume>
          (
          <issue>7</issue>
          ), pp.
          <fpage>1177</fpage>
          -
          <lpage>1193</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guéhéneuc</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Antoniol.</surname>
          </string-name>
          <article-title>DeMIMA: A Multilayered Approach for Design Pattern Identification</article-title>
          .
          <source>IEEE Trans. Software Engineering</source>
          . Vol.
          <volume>34</volume>
          , No.
          <issue>5</issue>
          , pp.
          <fpage>667</fpage>
          -
          <lpage>684</lpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Dietrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Elgar</surname>
          </string-name>
          .
          <article-title>Towards a Web of Patterns</article-title>
          .
          <source>In Proceedings of First International Workshop Semantic Web Enabled Software Engineering</source>
          , pp.
          <fpage>117</fpage>
          -
          <lpage>132</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V. R.</given-names>
            <surname>Basili</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.M.</given-names>
            <surname>Weiss</surname>
          </string-name>
          .
          <article-title>A Methodology for Collecting Valid Software Engineering Data</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          , Vol.
          <volume>10</volume>
          , No.
          <issue>6</issue>
          , pp.
          <fpage>728</fpage>
          -
          <lpage>738</lpage>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Segaran. Programming Collective Intelligence. O'Reilly</surname>
          </string-name>
          .
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Hirano</surname>
          </string-name>
          .
          <article-title>Neural network implemented with C++ and</article-title>
          <string-name>
            <given-names>Java. Personal</given-names>
            <surname>Media</surname>
          </string-name>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>H.</given-names>
            <surname>Yuki</surname>
          </string-name>
          .
          <article-title>An introduction to design pattern to study by Java</article-title>
          . http://www.hyuki.com/dp/
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Tanaka</surname>
          </string-name>
          . Hello World with Java! http://www.hellohiro.com/pattern/
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>