Design Pattern Detection using Software Metrics and Machine Learning


                   Satoru Uchiyama                                                       Atsuto Kubo
                  Hironori Washizaki                                              Aoyama Media Laboratory
                  Yoshiaki Fukazawa                                                    Tokyo, Japan
       Dept. Computer Science and Engineering                                         kubo@nii.ac.jp
                 Waseda University
                    Tokyo, Japan
          s.uchiyama1104@toki.waseda.jp
               washizaki@waseda.jp
                fukazawa@waseda.jp


Abstract—The       understandability,     maintainability,  and        Many studies on using automatic pattern detection to
reusability of object-oriented programs could be improved by       solve the above problems have used static analysis. However,
automatically detecting well-known design patterns in              static analysis has difficulty identifying patterns in which
programs. Many existing detection techniques are based on          class structures are similar and patterns with few features. In
static analysis and use strict conditions composed of class        addition, there is still a possibility that software developers
structure data. Hence, it is difficult for them to detect design   might overlook patterns if they use strict conditions like the
patterns in which the class structures are similar. Moreover, it   class structure analysis, and if the applied patterns vary from
is difficult for them to deal with diversity in design pattern     the intended conditions even a little.
applications. We propose a design pattern detection technique
                                                                       We propose a pattern detection technique that uses
using metrics and machine learning. Our technique judges
                                                                   software metrics (hereafter, metrics) and machine learning.
candidates for the roles that compose the design patterns by
using machine learning and measurements of metrics, and it
                                                                   Although our technique can be classified as a type of static
detects design patterns by analyzing the relations between         analysis, unlike previous detection techniques it detects
candidates. It suppresses false negatives and distinguishes        patterns by using identifying elements derived by machine
patterns in which the class structures are similar. We             learning based on measurement of metrics without using
conducted experiments that showed that our technique was           strict condition descriptions (class structural data, etc.). A
more accurate than two previous techniques.                        metric is a quantitative measure of a software property that
                                                                   can be used to evaluate software development. For example,
    Keywords—component; Object-oriented software, Design           one such metric, number of methods (NOM), refers to the
pattern, Software metrics, Machine learning                        number of methods in a class [2]. Moreover, by using
                                                                   machine learning, we can in some cases obtain previously
                       I. INTRODUCTION                             unknown identifying elements from combinations of metrics.
                                                                   To cover a diverse range of pattern applications, our method
     Design patterns (hereafter, patterns) are defined as          uses a variety of learning data because the results of our
descriptions of communicating classes that form a common           technique may depend on the kind and number of learning
solution to a common design problem. Gang of Four (GoF)            data used during the machine learning process. Finally, we
patterns [1] are representative patterns for object-oriented       conducted experiments comparing our technique with two
software. Patterns are composed of classes that describe the       previous techniques and found that our approach was the
roles and abilities of objects. For example, Figure 1 shows        most accurate of the three.
one GoF pattern named the State pattern. This pattern
is composed of roles named Context, State, and                      II.    PREVIOUS DESIGN PATTERN DETECTION TECHNIQUES
ConcreteState. The use of patterns enables software                                    AND THEIR PROBLEMS
development with high maintainability, high reusability, and           Most of the existing detection techniques use static
improved understandability, and it facilitates smooth              analysis [3][4]. These techniques chiefly analyze information
communications between developers.                                 such as class structures that satisfy certain conditions. If they
     Programs implemented by a third party and open source         vary from the intended strict conditions even a little, or two
software may take a lot of time to understand, and patterns        or more roles are assigned in a class, there is a possibility
may be applied without explicit class names, comments, or          that developers might overlook patterns.
attached documents in existing programs. Thus, pattern                 There is a technique that detects patterns based on the
detection improves the understandability of programs.              degrees of similarity between graphs of pattern structure and
However, manually detecting patterns in existing programs          graphs of programs to be detected [3]. However,
is inefficient, and patterns may be overlooked.
                                                                       Other detection techniques use dynamic analysis. These
                                                                   methods identify patterns by referring to the execution route
                                                                   information of programs [10][11]. However, it is difficult to
                                                                   analyze the entire execution route and use fragmentary class
                                                                   sets in an analysis. Additionally, the results of dynamic
                                                                   analysis depend on the representativeness of the execution
                                                                   sequences.
                                                                       Some detection techniques use a multilayered
                                                                   (multiphase) approach [12][13]. Lucia et al. use a two-phase,
                                                                   static analysis approach [12]. This method has difficulty,
                     Figure 1. State pattern
                                                                   however, in detecting creational and behavioral patterns
                                                                   because it analyzes pattern structures and source code level
                                                                   conditions. Guéhéneuc et al. use “DeMIMA,” an approach
                                                                   that consists of three layers: two layers to recover an abstract
                                                                   model of the source code, including binary class
                                                                   relationships, and a third layer to identify patterns in the
                                                                   abstract model. However, distinguishing the State pattern
                                                                   from the Strategy pattern is difficult because their
                                                                   structures are identical. Our technique can detect patterns in
                                                                   all categories and distinguish the State pattern from the
                                                                   Strategy pattern using metrics and machine learning.
                                                                       Finally, one existing technique detects patterns using
                                                                   formal OWL (Web Ontology Language) definitions [14].
                    Figure 2. Strategy pattern                     However, false negatives arise because this technique does
                                                                   not accommodate the diversity in pattern applications. The
distinguishing the State pattern from the Strategy                 technique [14] is available to the public via the web as an
pattern is difficult because their class structures are similar    Eclipse plug-in.
(see Figure 1 and 2). Unlike this method, we distinguish the           We suppress false negatives by using metrics and
patterns to which the structure is similar by the identification   machine learning to accommodate diverse pattern
of the roles from the quantity and the ratio of metrics by the     applications and to distinguish patterns in which the class
machine learning. In addition, this technique [3] is available     structures are similar. It should be noted that only techniques
to the public as a web-based tool.                                 [3], [14] discussed above have been released as publicly
     There is a technique that outputs pattern candidates based    accessible tools.
on features derived from metric measurements [5]. However,
it requires manual confirmation; this technique can roughly                            III. OUR TECHNIQUE
identify pattern candidates, but the final choice depends on           Our technique is composed of a learning phase and a
the developer's skill. Our technique detects patterns without      detection phase. The learning phase is composed of three
manual filtering by metrics and machine learning but also by       processes, and the detection phase is composed of two
analyzing class structure information. Moreover, this              processes, as shown in Figure 3. Each process is described
technique uses general metrics concerning an object-oriented       below, with pattern specialists and developers included as
system without using metrics for each role. Our technique          the parties concerned. Pattern specialists mean persons that
uses metrics that specialize in each role.                         have the knowledge about the patterns. Developers mean
     Another existing technique improves precision by              persons that maintain the object-oriented software. Our
filtering detection results using machine learning. This           technique currently uses Java as the program language.
technique inputs measurements of the classes and methods of        [Learning Phase]
each pattern [6]. However, it uses the existing static             P1. Define Patterns
analytical approach, whereas our technique instead uses                Pattern specialists determine the detectable patterns and
machine learning throughout the entire process.                    define the structures and roles composing these patterns.
     One current technique analyzes programs both before and       P2. Decide Metrics
after patterns are applied [7]. This method requires a revision        Pattern specialists determine useful metrics to judge the
history of the programs used. Our technique detects patterns       roles defined in P1 by using the Goal Question Metric
by analyzing only the current programs.                            decision technique.
     Yet another approach detects patterns from the class          P3. Machine Learning
structure and behavior of a system after classifying its               Pattern specialists input programs applied patterns into
patterns [8][9]. It is difficult to use, however, when patterns    the metrics measurement system, and obtain measurements
are applied more than once and when pattern application is         for each role. And specialists input these measurements into
diverse. In contrast, our technique copes well with both of        the machine learning simulator to learn. After machine
these challenges.                                                  learning they verify the judgment for each role, and if
                                                  Figure 3. Process of our technique

the verification results are not good, they return to P2 and         specialists decide metrics to identify roles by the quantity
revise the metrics.                                                  and the ratio of measurements. Therefore, they decide
[Detection Phase]                                                    questions by paying attention to the attributes and operations
P4. Role Candidate Judgment                                          of the roles by reading the description of the pattern
    Developers input programs to be detected into the                definition. They decide simple metrics concerning the static
metrics measurement system, and obtain measurements for              aspect like structure first to improve the recall. However, the
each class. And developers input these measurements into             lack of questions might occur because GQM is qualitative.
the machine learning simulator. Machine learning simulator           Therefore, if the machine learning results were unsatisfactory
identifies role candidates.                                          by irregular measurements of metrics, the procedure loops
P5. Pattern Detection                                                back to P2 to reconsider metrics also concerning behavior.
    Developers input role candidates judged in P4 to the             Moreover, it will be possible to apply GQM to roles with
pattern detection system by using the pattern structure              new patterns in the future.
definitions defined in P1. This system detects patterns                  For example, Figure 4 illustrates the goal of making a
automatically. The structure definitions correspond to the           judgment about the AbstractClass role in the
letters P, R, and E of section III-B.                                TemplateMethod pattern. AbstractClass roles have
A. Learning Phase                                                    abstract methods or methods using written logic that use
                                                                     abstract methods as shown in Figure 5. The
P1. Define Patterns                                                  AbstractClass role can be distinguished by the ratio of
    Currently, our technique considers five GoF patterns
(Singleton, TemplateMethod, Adapter, State,
and Strategy) and 12 roles. The GoF patterns are grouped
into creational patterns, structural patterns, and behavioral
patterns. Our technique uses these patterns to cover all these
groups.
P2. Decide Metrics
    Pattern specialists decide on useful metrics to judge roles
by using the Goal Question Metric decision technique [14]                       Figure 4. Example of GQM（AbstractClass role）
(hereafter, GQM). GQM is a top-down approach used to
clarify relations between goals and metrics.
    We experimented with judging roles by using general
metrics without GQM. However, the machine learning
results were unsatisfactory because the measurements of
some metrics were irregular. Consequently, we chose GQM
so that the machine learning could function appropriately by
stable metrics in each role. With our technique, the pattern
specialists set as a goal the accurate judgment of each role.
To achieve this goal they defined a set of questions to be
evaluated. Next, they decided on useful metrics to help
answer the questions they had established. The pattern                         Figure 5. Example of source code (AbstractClass role)
                                               TABLE I.      RESULTS OF APRLING GQM

          Pattern            Role               Goal                                Question                    Metric
                                                              Is the static field defined?                    NSF
                                        Identification of
         Singleton   Singleton                                Is the constructor called from other class?     NOPC
                                        Singleton role
                                                              Is the method that return singleton instance?   NSM
                                        Identification of     Are abstract methods defined?                   NOAM
                     AbstractClass
         Template                       AbstractClass role    Is the template method defined?                 NOM
          Method                        Identification of
                     ConcreteClass      ConcreteClass role    Is the override method defined?                 NORM
                                        Identification of     Are abstract methods defined?                   NOAM
                     Target
                                        Target role           Is the class defined as an interface?           NOI
                                                              Are override methods defined?                   NORM
                                        Identification of
          Adaper     Adapter            Adapter role                                                          NOF
                                                              Is Adaptee field defined?
                                                                                                              NOOF
                                        Identification of     Are methods used by Adapter role defined?       NOM
                     Adaptee
                                        Adaptee role          Is the class referred by other classes?         NCOF
                                                              Are methods to set states defined?              NOM
                                        Identification of
                     Context                                                                                  NOF
                                        Context role          Is State field defined?
                                                                                                              NOOF
                                                              Are abstract methods defined?                   NOAM
                                        Identification of
           State     State                                    Is the class defined as an interface?           NOI
                                        State role
                                                              Is the class referred by other classes?         NCOF
                                                              Is the override method defined?                 NORM
                     Concrete           Identification of
                                        ConcreteState role    Is the method that describes change state       NOM
                     State
                                                              defined?                                        NMGI
                                                              Are methods to set states defined?              NOM
                                        Identification of
                     Context                                                                                  NOF
                                        Context role          Is Strategy field defined?
                                                                                                              NOOF
                                                              Are abstract methods defined?                   NOAM
         Strategy                       Identification of
                     Strategy                                 Is the class defined as an interface?           NOI
                                        Strategy role
                                                              Is the class referred by other classes?         NCOF
                                        Identification of
                     Concrete
                                        ConcreteStrategy      Is the override method defined?                 NORM
                     Strategy           role

the number of methods to the number of abstract methods             a pattern of two groups by using linear input elements.
because with this role the former exceeds the latter.               However, we chose a neural network because it outputs the
Therefore, the number of abstract methods (NOAM) and                values to all roles, taking into consideration the dependency
number of methods (NOM) are useful metrics for judging              among the different metrics. Therefore, it can deal with cases
this role. Table I shows the results of applying GQM to all         in which one class plays two or more roles.
roles. The details of metrics are described in Table II of              A neural network is composed of an input layer, hidden
paragraph IV-A.                                                     layers, and an output layer, as shown in Figure 6, and each
    The previous technique [5] uses GQM, too. In this               layer is composed of elements called units. Values are given
technique, the goal is set as “Recover design patterns”. And        a weight when they move from unit to unit, and a judgment
this technique uses general metrics concerning an object-           rule is acquired by changing the weights. A typical algorithm
oriented system without deciding metrics at each role. On the       for adjusting weights is back propagation. Back propagation
other hand, our technique uses metrics that specialize in each      calculates an error margin between output result y and the
role.                                                               correct answer T, and it sequentially adjusts weights from the
P3. Machine Learning                                                layer nearest the output to the input layer, as shown in Figure
    Machine learning is a technique that analyzes sample            7. These weights are adjusted until the output error margin of
data by computer and acquires useful rules with which to            the network reaches a certain value.
make forecasts about unknown data. We used the machine                  Our technique uses a hierarchical neural network
learning so as to be able to evaluate patterns with a variety of    simulator [17]. This simulator uses back propagation. The
application forms. Machine learning suppresses false                hierarchy number in the neural network is set to three, the
negatives and achieves extensive detection.                         number of units in the input layer and the hidden layer are set
    Our technique uses a neural network [16] algorithm. A           to the number of decided metrics, and the number of units of
support vector machine [16] could also be used to distinguish       the output layer is set to the number of roles being judged.
                                                                  is set as a threshold, and the role agreement values that are
                                                                  higher than the threshold are taken to be role candidates. The
                                                                  threshold is 1/12 (i.e., 0.0834) because we treat 12 roles at
                                                                  present. The sum of the output values is different at each
                                                                  input in the neural net work. Therefore, to use a common
                                                                  threshold for all the output, our technique normalizes the
                                                                  output value.
                                                                      For example, Figure 8 shows the role candidate judgment
                                                                  results with NOM of 3 and NOAM of 2 and other metrics of
                                                                  0; the output value of AbstractClass is the highest value.
                                                                  By regularizing the values of Figure 8, the roles are judged to
                                                                  be AbstractClass and Target.
                                                                  P5. Pattern Detection
                                                                      Developers input role candidates judged in P4 into the
                                                                  pattern detection system by using the pattern structure
                    Figure 6. Neural network
                                                                  definitions defined in P1. And, this system detects patterns
                                                                  by matching the direction of the relations between role
                                                                  candidates and the roles of pattern in programs. The
                                                                  matching moves sequentially from the role candidate with
                                                                  the highest agreement value to that with the lowest value.
                                                                  The pattern detection system searches all combinations of
                                                                  role candidates that accord with the pattern structures. The
                                                                  pattern detection system detects patterns when the directions
                                                                  of relations between role candidates accord with the pattern
                                                                  structure and when the role candidates accord with roles at
                                                                  both ends of the relations. Moreover, the relation agreement
                                                                  values reflect the kind of relation.
                                                                      Currently, our method deals with inheritance, interface
                                                                  implementation, and aggregation relations. The kind of
                                                                  relations will increase as more patterns get added in the
                   Figure 7. Back propagation                     future. The relation agreement value is 1.0 when the kind
                                                                  agrees with the relation of the pattern, and it is 0.5 when the
The input consists of the metric measurements of each role in     kind does not agree. If the relation agreement value is 0 then
a program to which patterns have already been applied, and        the kind of relation does not agree, the pattern agreement
the output is an expected role. Pattern specialists obtain        value might become 0, and these classes will not be detected
measurements for each role by using metrics measurement           as patterns. In such cases, a problem similar to those of the
system. And, specialists input these measurements into the        previous detection techniques will occur because the
machine learning simulator to learn. The learning repetitions     difference in the kind of relation is not recognized.
cease when the error margin curve of the simulator                    The pattern agreement value is calculated from the role
converges. The specialists verify the convergence of the          agreement values and the relation agreement values. The
error margin curve manually at present. After machine             pattern to be detected is denoted as P, the role set that
learning they verify the judgment for each role, and if the       composes the pattern is denoted as R, and the relation set is
verification results are not good, they return to P2 and revise   denoted as E. Moreover, the program that is the target of
the metrics.                                                      detection is defined as P’, the set of classes comprising the
                                                                  role candidates is R’, and the set of relations between
                                                                  elements of R' is denoted as E’. The role agreement value is
B. Detection Phase                                                denoted as Role, and the relation agreement is denoted as Rel.
P4. Role Candidate Judgment                                       Role means the function which is input the element of R and
    Developers input programs to be detected into the             the one of R' , and . Rel means the function which is input the
metrics measurement system, and obtain measurements for           element of E and the one of E'. The product of the average of
each class. And developers input these measurements to the        two roles at both ends of the relation and Rel is denoted as
machine learning simulator. This simulator outputs values         Com, and the average of Com is denoted as Pat. Moreover,
between 0–1 to all roles to be judged. The output values are      the average of two Roles is calculated when Com is
normalized such that the sum of all values becomes 1. These       calculated, and the average value of Com is calculated to
output values are called role agreement values. A larger role     adjust Pat and Role to values from 0 to 1 when Pat is
agreement value means that the role candidate is more likely      calculated. If the directions of the relations do not agree, Rel
correct. The reciprocal of the number of roles to be detected     is assumed to be 0.
                                                                              Figure 8. Example of machine learning output


                                                              Figure 9. Example of pattern detection (TemplateMethod pattern)

P  ( R, E )                       P   ( R , E )                                                       Samples  ( R, E)
R  {r1 , r2 , , ri }            R   {r1, r2 , , rk }                                               R  {SampleA, SampleB, SampleC}
E  {e1 , e 2 ,  , e j }  R  R           E   {e1 , e 2 ,  , el}  R   R                        E  {SampleA   SampleB, SampleA ◇SampleC}

Role (rm , rn)  The role agreement value                            rm  R, rn  R 
                                                                                                           (   : inheritance, ◇ : aggregation)
                                                                                                           Role ( AbstractCl ass, SampleA)  0.82       Role (ConcreteCl ass, SampleB )  0.45
R el (e p , eq )  The relation agreement value                               e p  E, e q  E          Role (ConcreteCl ass, SampleC )  0.57
                    Role (ra , rb )  Role (rc , rd )                                                    R el ( AbstractCl ass  ConcreteCl ass, SampleA   SampleB )  1.0
Com(e p , eq )                                         R el (e p , eq )
                                     2
                                                                                                           R el ( AbstractCl ass  ConcreteCl ass, SampleA ◇SampleC )  0.5
                                 ra , rc  R, rb , rd  R , e p  (ra , rc ), ep  (rb , rd )
                                                                                                                                                                      0.82  0.45
                                          1                                                                Com( AbstractCl ass  ConcreteCl ass, SampleA   SampleB )           1.0  0.635
Pat ( P, P ) 
                                                                 
                                                                        e E , e E 
                                                                          p     q
                                                                                        Com(e p , e q )                                                                   2
                                                                                                                                                                        0.82  0.57
                  (e p , e q )  E  E  R el (e p , e q )  0                                         Com( AbstractCl ass  ConcreteCl ass, SampleA ◇SampleC )               0.5  0.348
                                                                                                                                                                           2

                                                                                                           Pat (TemplateMethod , Samples )   0.635  0.348  0.492
    Figure 9 shows an example of detecting the                                                                                              1
                                                                                                                                            2
TemplateMethod pattern. In this example, it is assumed
                                                                                                               In the program shown in Figure 9, the pattern agreement
that class SampleA has the highest role agreement value for
                                                                                                           value of the TemplateMethod pattern was calculated to
an AbstractClass. The pattern agreement value between                                                      be 0.492. Pattern agreement values are normalized from 0 to
the program Samples and the TemplateMethod pattern is                                                      1, just like role agreement values. Our technique uses the
calculated with the following algorithm.                                                                   same threshold among of pattern agreement value as role
                                                                                                           agreement value because a lot of classes are detected as the
P  TemplateMe thod  ( R, E )                                                                             pattern that composed of the only class like the singleton
R  { AbstractCl ass, ConcreteCl ass}                                                                      pattern. Classes with a pattern agreement value that exceeds
E  { AbstractCl ass  ConcreteCl ass}                                                                    the threshold are output as the detection result. The
                                                                                                           reciprocal of the number of roles for detection is taken to be
the threshold, similar to the case of role candidate judgment,       We focused our attention on recall because the purpose
and pattern agreement values that are higher than the            of our technique was detection covering diverse pattern
threshold are output as the detection result.                    applications. Recall indicates the degree to which detection
    In Figure 9, SampleA, SampleB, and SampleC were              results are free of leakage, whereas precision shows how free
detected as TemplateMethod patterns. Moreover,                   of disagreement these result are. The data in Table III was
SampleA and SampleB, SampleA and SampleC can                     used to calculate recall. wr, xr, yr, and zr are numbers of roles,
also be considered to match the TemplateMethod                   and wp, xp, yp, and zp are numbers of patterns. Recall was
pattern.     In     this      case,     the     relation    of   calculated from the data in Table III by the following
“SampleA   SampleB” is more similar to a                       expressions.
                                                                                                                           wr
TemplateMethod pattern than the relation of                           Recall of role candidate judgment:        Re r 
                                                                                                                         wr  xr
“SampleA ◇ SampleC” because its agreement value of
                                                                     Table IV shows the average recall for each role. Role
the former pair is 0.635 while that of the latter pair is only   candidates must be judged accurately because the State
0.348.
                                                                 pattern and Strategy pattern have the same class structure.
             IV. EVALUATION AND DISCUSSION                       Therefore, our technique regards the roles of the patterns
                                                                 other than State and Strategy patterns as role
    We determined whether the machine learning simulator         candidates when the role agreement value was above the
derived identifying elements of the roles after learning.
                                                                 threshold, whereas our technique regards the roles of State
Moreover, we compared our technique with two previous
techniques to verify the precision and recall of our approach    and Strategy patterns as role candidates when the role
and to confirm whether it could match its detected patterns      agreement value was above the threshold and the roles of
with similar structures and diverse patterns.                    both patterns were distinguished State pattern from Strategy
                                                                 pattern.
A. Verification of Role Candidate Judgement                          As shown in Table IV, the recalls for the large-scale
    We used cross-validation to verify the role candidate        programs were lower than those for the small-scale programs.
judgment. In cross-validation, data are divided into n groups,   Accurate judgment of large-scale programs was more
and a test to verify a role candidate judgment is executed       difficult because these programs possessed attributes and
such that the testing data are one data group and the learning   operations that were unnecessary for pattern composition.
data are n-1 data groups. We executed the test five times by     Therefore, it will be necessary to collect a significant amount
dividing the data into five groups. In this paper, programs      of learning data to adequately cover a variety of large-scale
such as programs in the reference [18], etc., are called small   programs.
scale, whereas programs in practical use are called large            The results in Table IV pertain to instances where the
scale. We used the set of programs where patterns are            State and Strategy patterns could be distinguished. The
applied in small-scale programs (60 in total) 1 [18][19] and     Context role had high recall, but State and
large-scale programs (158 in total from the Java library         ConcreteState roles had especially low recalls for large-
1.6.0_13 [20], JUnit 4.5 [21], and Spring Framework 2.5          scale programs. However, the candidates for the State role
RC2 [22]) as experimental data. We judged manually and           were output with high recall when the threshold was
qualitatively whether the patterns were appropriately applied    exceeded. Therefore, the State pattern can be distinguished
in this set of programs.                                         by initiating searching from the Context role in P5, and
    Table II shows the metrics that were chosen for the          this improves recall.
small-scale and large-scale programs. We used different
metrics depending on the magnitude of the programs. For
instance, we chose the metric called number of methods                            TABLE II.      CHOSEN METRICS
generating instance (NMGI) for small-scale programs                   Abbreviation                        Content
because the method for transit states in the                             NOF               Number of fields
ConcreteState role in the State pattern generates                        NSF               Number of static fields
other ConcreteState roles in small-scale programs. We                    NOM               Number of methods
guessed that the difference appeared in ratios of metrics                NSM               Number of static methods
about State and Strategy, so we used the same metrics for the            NOI               Number of interfaces
large-scale programs without NMGI. Because State pattern                NOAM               Number of abstract methods
treats the states in State role and treats the actions of the           NORM               Number of overridden methods
states in the Context role. On the other hand Strategy pattern          NOPC               Number of private constructors
encapsulates the processing of each algorithm into a Strategy                              Number of constructors with argument
role, and Context processing becomes simpler compared                    NOTC
                                                                                           of object type
with that of State pattern.                                              NOOF              Number of object fields
                                                                                           Number of other classes with field of
                                                                         NCOF
                                                                                           own type
1
    All small-scale code:                                                                  Number of methods to generate
                                                                         NMGI
http://www.washi.cs.waseda.ac.jp/ja/paper/uchiyama/dp.html                                 instances
               TABLE III.      INTERSECTION PROCESSION                     TABLE V.        PRECISION AND RECALL RATIO OF PATTERN DETECTION

                            detected                not detected                           Number of
                                                                                                                Precision (%)           Recall (%)
                             wr, wp                     xr, xp                              test data
     correct
                         (true positive)          (false negative)                      Small-     Large-     Small-     Large-     Small-     Large-
                              yr, yp                    zr, zp               Pattern     scale      scale      scale      scale      scale      scale
    incorrect                                                                           programs   programs   programs   programs   programs   programs
                        (false positive)          (true negative)
                                                                            Singleton      6          6        60.0       63.6      100.0      100.0
  TABLE IV.       RECALL OF CANDIDATE ROLE JUDGMENT ( AVERAGE)              Template
                                                                                           6          7        85.7       71.4      100.0       83.3
                                                                             Method
                                              Average recall (%)
                                           Small-scale   Large-scale         Adapter       4          7       100.0      100.0      90.0        60.0
   Pattern              Role                                                  State        2          6       50.0        40.0      100.0       66.6
                                           programs          programs
  Singleton          Singleton               100.0                  84.7    Strategy       2          6       66.7        30.8      100.0       80.0
  Template         AbstractClass             100.0                  88.6
   Method          ConcreteClass             100.0                  58.5   metrics. The State pattern was identified by searching
                      Target                 90.0                   75.2   from the Context role, for instance, in the State pattern
   Adapter            Adapter                100.0                  66.7   detection in the large-scale programs, and the recall of the
                     Adaptee                 90.0                   60.9   pattern detection was higher than the recall of role candidate
                      Context                60.0                   70.0   judgment. Table V shows holistically that our technique
    State              State                 60.0                   46.7   suppresses false negatives because the recall is high.
                   ConcreteState             82.0                   46.6
                      Context                80.0                   55.3
                                                                           C. Experiment Comparing Previous Detection Techniques
  Strategy           Strategy                100.0                  76.7       We experimentally compared our technique with
                  ConcreteStrategy           100.0                  72.4   previous detection techniques [3][14]. These previous
                                                                           techniques have been publicly released, and they consider
                                                                           three or more patterns addressed by our own technique. Both
B. Pattern Detection Results                                               target Java programs, as does our own. The technique
    Our technique detects patterns using test data in both the             proposed by Tsantails’s technique [3](hereafter, TSAN) has
small-scale and large-scale programs, and this result is                   four patterns in common with ours (Singleton,
evaluated. We used 40 sets of programs where patterns are                  TemplateMethod, Adapter and State/Strategy).
applied in small-scale programs and 126 sets of programs                   Because this technique cannot distinguish the State pattern
where patterns are applied in large-scale programs as                      from the Strategy pattern, these are detected as one
learning data. We judged manually and qualitatively whether                pattern. Dietrich’s technique [14] (hereafter, DIET) has three
the patterns were appropriately applied in the detection                   patterns in common (Singleton, TemplateMethod,
results. Table V shows precision and recall of the detected                Adapter) with our own. TSAN detects patterns based on
patterns. Precision and recall were calculated from the data               the degree of similarity between the graphs of the pattern
in Table III by the following expressions:                                 structure and graphs of the programs to be detected, whereas
                                                                           DIET detects patterns by using formal OWL (Web Ontology
                                                            wp
       Recall of pattern detection :            Re p                      Language) definitions. Patterns were detected and evaluated
                                                         wp  x p          with the small-scale and large-scale test data. Moreover, the
                                                          wp               test data and learning data were different.
      Precision of pattern detection :          Pr p 
                                                         wp  y p              Figure 10 shows the recall and precision graphs for our
    Small-scale and large-scale programs shared a common                   technique and TSAN, and Figure 11 shows the
point in that they both had recalls that were higher than                  corresponding graphs for our technique and DIET. We
precisions. However, there were many non-agreements about                  ranked the detection results of our technique with the pattern
the State patterns and Strategy patterns in the large-                     agreement values. Next, we calculated recall and precision
scale programs. Recall was 90% or more with the small-                     according to the ranking and plotted them. Recall and
scale programs, but it dropped as low as 60% with the large-               precision were calculated from the data in Table III by using
scale programs.                                                            the expressions of paragraph IV -B. In the results of TSAN
    The large-scale programs resulted in especially low recall             and DIET, we alternately plotted results because these
for the Adapter pattern. Table IV shows the cause: The                     previous detection techniques output no value to rank. In the
recall of the role candidate judgment for the Adapter                      recall and precision graphs higher values are better.
pattern was not high enough. It is necessary to show that all                  Figure 10 and 11 show particularly good results for all
roles that compose patterns have agreement values that are                 techniques when small-scale programs was examined. This
above the threshold so that patterns will be detected. There               is because small-scale programs do not include unnecessary
were many cases in which neither of the roles that composed                attributes and operations in the composition of patterns.
patterns was judged as a role candidate for the Adapter                        Table VI and VII show the average F measure for each
pattern. It will be necessary to return to P2 and choose new               plot of Figure 10 and 11. The F measure is calculated with
                                                                                of State pattern. Table VIII shows 45 fields and 204
                                                                                methods as the largest in Context role in State pattern
                                                                                (18 and 31 respectively in Context role of Strategy
                                                                                pattern). Therefore, the complexity of Context role of both
                                                                                patterns appears in the number of fields and the number of
                                                                                methods, and these are distinguishing elements. Figure 10
                                                                                shows that our technique is especially good because State
                                                                                pattern and Strategy pattern could not be distinguished
                                                                                with TSAN.
                                                                                Detection of Subspecies of Patterns
                                                                                    Figure 11 shows that the recall of DIET is low in the case
    Figure 10. Recall-precision graph of detection results (vs. TSAN)           of large-scale programs because this technique doesn't
                                                                                accommodate the diversity in pattern applications.
                                                                                Additionally, large-scale programs not only contain many
                                                                                attributes and operations in the composition of patterns but
                                                                                also subspecies of patterns.
                                                                                    Our technique detected subspecies of patterns. For
                                                                                example, our technique detected the source code of the
                                                                                Singleton pattern that used the boolean variable as shown
                                                                                in Figure 12. This Singleton pattern was not detected in
                                                                                TSAN or DIET. However, unlike the previous techniques,
                                                                                our technique is affected by false positives because it is a
                                                                                gradual detection using metrics and machine learning instead
                                                                                of strict conditions. False positives of the Singleton
                                                                                pattern especially stood out because Singleton pattern is
    Figure 11. Recallprecision graph of detection results (vs. DIET)           composed of only one role. It will be necessary to use
                                                                                metrics that are specialized to one or a few roles to make
       TABLE VI.       THE AVERAGE OF F MEASURE (VS. TSAN)                      judgments about patterns composed of one role like the
                                                                                Singleton pattern (P4).
                         Small-scale programs            Large-scale programs
                                                                                    Therefore, our technique is superior to previous one
Our technique                    0.67                            0.56
Previous technique                                                              because the curve of our technique is above the previous in
                                  0.39                           0.36           Figures 10 and 11.
(TSAN)

        TABLE VII.      THE AVERAGE OF F MEASURE (VS. DIET)                       TABLE VIII.     MEASUREMENTS OF THE CONTEXT ROLE’S METRICS

                         Small-scale programs            Large-scale programs      Pattern - Role         Number of fields         Number of methods
Our technique                    0.69                            0.55                                          12                         58
Previous technique                                                                State - Context              45                        204
                                  0.50                           0.35
(DIET)
                                                                                                               11                         72
                                                                                                               18                         31
recall and precision calculated by the above-mentioned
                                                                                 Strategy - Context             3                         16
expression as follows.
                                                                                                                3                         5
                                       2  Pr p  Re p
                       F  measure 
                                         Pr p  Re p
    A large F measure means higher accuracy, and these
tables show that our technique had a larger F measure than
the previous techniques had.
Distinction between State pattern and Strategy pattern
    Our technique distinguished State pattern Strategy
pattern. Table VIII is an excerpt of the metrics measurements
for the Context role in State pattern and Strategy
pattern that were distinguished by the experiment on the
large-scale programs. State pattern treats the states in
State role and treats the actions of the states in the
Context role. Strategy pattern encapsulates the
                                                                                Figure 12. Example of diversity in pattern application (Singleton pattern)
processing of each algorithm into a Strategy role, and
Context processing becomes simpler compared with that
            V. CONCLUSION AND FUTURE WORK                           [6] R. Ferenc, A. Beszedes, L. Fulop, and J. Lele. Design Pattern
                                                                        Mining Enhanced by Machine Learning. 21st IEEE
    We devised a pattern detection technique using metrics              International Conference on Software Maintenance, pp. 295-
and machine learning. Role candidates are judged using                  304 2005.
machine learning that relies on measured metrics, and
patterns are detected from the relations between classes. We        [7] H. Washizaki, K. Fukaya, A. Kubo, and Y. Fukazawa.
worked on the problems associated with overlooking patterns             Detecting Design Patterns Using Source Code of Before
and distinguishing patterns in which the class structures are           Applying Design Patterns. 8th IEEE/ACIS International
                                                                        Conference on Computer and Information Science, pp. 933-
similar.
                                                                        938, 2009.
    We demonstrated that our technique was superior to two
previous     detection     techniques     by experimentally         [8] N. Shi and R.A. Olsson. Reverse Engineering of Design
distinguishing patterns in which the class structures are               Patterns from Java Source Code. 21st IEEE/ACM
similar. Moreover, subspecies of patterns were detected, so             International Conference on Automated Software Engineering,
we could deal with a very diverse set of pattern applications.          pp. 123-134, 2006.
However, our technique was more susceptible to false                [9] H. Lee, H. Youn, and E. Lee. Automatic Detection of Design
positives because it does not use strict conditions such as             Pattern for Reverse Engineering. 5th ACIS International
those used by the previous techniques.                                  Conference on Software Engineering Research, Management
    We have several goals for our future work. First, we plan           and Applications, pp. 577-583, 2007.
to add more patterns to be detected. Our technique can              [10] L. Wendehals and A. Orso. Recognizing Behavioral Patterns
currently cope with five patterns. However, we predict it will           at Runtime Using Finite Automata. In 4th ICSE 2006
be possible to detect other patterns if we can decide upon               Workshop on Dynamic Analysis, pp. 33–40, 2006.
metrics to identify them. It is also necessary to collect more      [11] S. Hayashi, J. Katada, R. Sakamoto, T. Kobayashi, and M.
learning data to cover the diversity in pattern applications.            Saeki. Design Pattern Detection by Using Meta Patterns.
Moreover, we plan to more narrowly adapt the metrics to                  IEICE Transactions, Vol. 91-D, No. 4, pp. 933–944, 2008.
each role by returning to step P2 because results might
                                                                    [12] A. Lucia, V. Deufemia, C. Gravino and M. Risi. Design
depend on the data. This process would lead to the
                                                                         pattern recovery through visual language parsing and source
enhancement of recall and precision.                                     code analysis. Journal of Systems and Software, Vol.82 (7),
    Second, we currently qualitatively and manually judge                pp. 1177-1193, 2009.
whether to return to step P2 and to apply GQM again; hence,
in the future, we should find an appropriate automatic              [13] Y. Guéhéneuc and G. Antoniol. DeMIMA: A Multilayered
judgment method.                                                         Approach for Design Pattern Identification. IEEE Trans.
                                                                         Software Engineering. Vol.34, No. 5, pp. 667–684, 2008.
    Third, we plan to prove the validity of the expressions
and the parameters of agreement values and thresholds. We           [14] J. Dietrich, C. Elgar. Towards a Web of Patterns. In
consider that it is possible to reduce the false positive rate by        Proceedings of First International Workshop Semantic Web
deciding on the optimum thresholds for role agreement                    Enabled Software Engineering, pp. 117-132, 2005.
values and pattern agreement values.                                [15] V. R. Basili and D.M. Weiss. A Methodology for Collecting
    Finally, we plan to determine the learning number of                 Valid Software Engineering Data. IEEE Transactions on
times automatically and examine the correlation of the                   Software Engineering, Vol. 10, No. 6, pp. 728–738, 1984.
learning number of times and precision.                             [16] T. Segaran. Programming Collective Intelligence. O’Reilly.
                                                                         2007.
                         REFERENCES
                                                                    [17] H. Hirano. Neural network implemented with C++ and Java.
[1] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design              Personal Media. 2008.
    Patterns: Elements of Reusable Object-Oriented Software.
    Addison-Wesley, 1994.                                           [18] H. Yuki. An introduction to design pattern to study by Java.
                                                                         http://www.hyuki.com/dp/
[2] M. Lorenz and J. Kidd Object-Oriented Software Metrics.
    Prentice Hall, 1994.                                            [19] H. Tanaka. Hello World with Java!
                                                                         http://www.hellohiro.com/pattern/
[3] N. Tsantalis, A. Chatzigeorgiou, G. Stephanides, and S.
    Halkidis. Design Pattern Detection Using Similarity Scoring.    [20] Oracle Technology Network for Java Developers.
    IEEE Trans. Software Engineering, Vol.32, No.11, pp. 896-            http://www.oracle.com/technetwork/java/index.html
    909 2006.                                                       [21] JUnit.org. Resources for Test Driven Development.
[4] A. Blewitt, A. Bundy, and L. Stark. Automatic Verification of        http://www.junit.org/
    Design Patterns in Java. In Proceedings of the 20th             [22] SpringSource.org. Spring Source.
    International Conference on Automated Software Engineering,          http://www.springsource.org/
    pp. 224–232, 2005.
[5] H. Kim and C. Boldyreff. A Method to Recover Design
    Patterns Using Software Product Metrics. In Proceedings of
    the 6th International Conference on Software Reuse:
    Advances in Software Reusability, pp. 318-335, 2000.