=Paper= {{Paper |id=Vol-2273/QuASoQ-08 |storemode=property |title=Use-Relationship Based Classification for Software Components |pdfUrl=https://ceur-ws.org/Vol-2273/QuASoQ-08.pdf |volume=Vol-2273 |authors=Reishi Yokomori,Norihiro Yoshida,Masami Noro,Katsuro Inoue |dblpUrl=https://dblp.org/rec/conf/apsec/YokomoriYNI18 }} ==Use-Relationship Based Classification for Software Components== https://ceur-ws.org/Vol-2273/QuASoQ-08.pdf
                   6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)



                  Use-Relationship Based Classification
                       for Software Components
            Reishi Yokomori                 Norihiro Yoshida                    Masami Noro                    Katsuro Inoue
          Nanzan University                 Nagoya University                 Nanzan University               Osaka University
           Nagoya, Japan                     Nagoya, Japan                     Nagoya, Japan                   Osaka, Japan
      yokomori@se.nanzan-u.ac.jp             yoshida@ertl.jp                yoshie@nanzan-u.ac.jp          inoue@ist.osaka-u.ac.jp



   Abstract—In recent years, the maintenance period of the                remember its functions and role. This type of rework needs
software system is increasing. The size of the software system            time, however, it makes their understanding deeper. However,
has grown, and the number of classes and the relationship                 as the number of files and classes increases through a long
between classes are also increasingly complicated. If we can
categorize software components based on information such as               period of maintenance, the frequency of the rework in under-
functions and roles, we believe that these classified components          standing increases, and the content to be understood becomes
can be understood together, and are useful for understanding              also complicated. The effort required for understanding the
the system. In this paper, we proposed a classification method            internal structure by developers participating later will be
for software components based on similarity of use relation. For          much greater than the effort expected by the original developer.
each component, a set of components used by the component was
analyzed. And then, for each pair of components, the distance was         By analyzing the source code by the statement unit, a lot
calculated from the coincidence of the two sets. A distance matrix        of methods for extracting code clone [3], extracting coding-
was created and components were classified by hierarchical                pattern [4], and extracting coding-rule [5] have been proposed.
cluster analysis. We applied this method to jlGui consisting of           As the scale of the software system increases, the relationship
70 components. 8 clusters of 36 components were extracted from            between statements becomes complicated, and the analysis
the 70 components. Characteristics of the extracted clusters were
evaluated, and the content of each cluster was introduced as a            cost gradually increases, so we would like to propose a method
case study. In 7 clusters out of the 8 clusters, components of the        with a coarser granularity, based on the anlysis by the class
cluster were strongly similar with each other from the viewpoint          unit.
of their functions. Through these experiments, we confirmed that             In the evaluation experiment for the component retrieval
our method is effective for classifying components of the target          system SPARS-J [2], it was effective to guide the component
software, and is useful for understanding them.
   Index Terms—component, use-relations, hierarchical cluster
                                                                          to be browsed next based on its use relationships, and it was
analysis                                                                  popular among developers. If we can categorize software com-
                                                                          ponents based on their functions and roles by obtaining infor-
                      I. I NTRODUCTION                                    mation from analysis of use-relationships, we can understand
                                                                          these classified components together and this classification
   In recent years, the maintenance period of the software                would be useful for understanding the target software. For
system has increased, by also experiencing the entry of new               example, when realizing a specific feature, it is often necessary
developers. In order to participate in the maintenance of such            to use certain components together. We compare use-relations
a software system, it is the first important task to become ac-           of two components (-A, -B), that means components that
customed to the environment surrounding the software system               component-A uses and components that component-B uses are
[1], and it is necessary to understand the configuration of the           compared. In the case where there are many common compo-
target software. In the process of understanding, the developer           nents in the two component set, we consider that component-A
first understands the diagram and documents representing the              and -B are supposed to implement the same specific function
composition of the software and then reads the source code of             or very similar function. We think that understanding of these
each component of the software in order to grasp the details              components can be performed efficiently by also considering
of each subsystem and the overall image.                                  components that are commonly used by these components. By
   In reading the source code, the developer browses the source           being able to understand efficiently, we believe that developers
code of another component one by one on an editor or a                    devote much effort to other work and can carry out with high
source code browser like SPARS-J [2], by referring to the                 quality. As a result, we think it can improve productivity and
package hierarchy and the name of another class appearing in              quality of maintenance.
the source code, and so on. In this way, the source codes                    In this paper, we proposed a classification method for
of a lot of components are read strategically in order to                 software components based on similarity of use relation. In
understand efficiently. In fact, when the developer encounters a          this method, for each component, a set of components used
component similar to the components the developer confirmed               by the component was analyzed. And then, for each pair of
once, it often happens that reworking occurs in order to                  components, the distance between the two components was



      Copyright © 2018 for this paper by its authors.                59
                  6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

calculated from the coincidence of the two sets. In this way,           Graph [2]. In the component graph (V, E) , a node v ∈ V
a distance matrix of the target software components was cre-            represents a software component, and a directed edge exy ∈ E
ated and software components were classified by hierarchical            from node x to y represents a use relation meaning that
cluster analysis.                                                       component x uses component y. This use relation indicates
   We implemented a prototype for the proposed method, and              that a component realizes its own function by utilizing features
its analysis target is a software system described in Java. The         of other components. If you need to understand how features
prototype system treats the Java source files as a component            in a particular component are specifically implemented, you
and excludes components defined outside the software, such              can use use-relations to understand which parts of the software
as components in libraries. Our system generates a distance             you need to reference together.
matrix from the analysis result of Classycle’s analyzer [6], and
hierarchical cluster analysis is executed by R. We conducted            B. Classification methods using metrics that represent the
two evaluation experiments. In the first experiment, we applied         characteristics of software components
it to the actual software system of open source project and                For a lot of components extracted from multiple software
evaluated the similarity of the components in the obtained              systems, Kobori suggested a method for extracting the same
cluster from several viewpoints. As a result, many of the               or similar components as the target components [8]. In this
components in the cluster showed strong similarity in terms of          method, metrics representing the characteristics of software
their functionality, and more than half of similar components           components are obtained from the source code of each compo-
were joined as clusters. In the second experiment, we intro-            nent. These metrics are the number of methods and the number
duced how the components in the clusters were combined, as              of tokens that represent the scale of the component, and
a case study. Some clusters were composed of a collection               the cyclomatic complexity that represents the complexity of
of components that have a simple structure and a simple use             control structure of the component, and so on. In the method,
relationship, and some clusters were composed of a collection           components that have the same or similar values in these
of components that realize multiple functions by using several          metrics are extracted. Through experiments, they showed that
other components. In the case study, knowing the components             extracted components are mainly copied components or copied
commonly used by the components in the cluster was effective            and modified components from the original ones. This method
in understanding the contents of the components in the cluster.         is a method assumed for using in the software component
Through these experiments, we introduce that the obtained               retrieval system. In the software component retrieval system,
clusters have sufficient accuracy to be used for software               it collects a lot of software system. The method could help to
understanding, and our method is effective to classify and              extract the almost identical (copied or copied and modified)
understand the components of the target software.                       components across a lot of software systems. Our method is to
   The main contributions of our research are as follows:               classify components in one software system, based on the fact
   • We have hypothesized that “When two components use                 that components are using a lot of common components. The
      many common components, these two components are                  purpose of our method is to obtain a set of components using
      implemented for the same or similar purpose”, and con-            similar functions and to obtain a set of components whose role
      firmed it by experiment.                                          and features in the software are similar.
   • We showed that the proposed method can combine most
                                                                                          III. P ROPOSED M ETHOD
      of the functionally related components as clusters.
   • We showed that the obtained clusters have sufficient                  When implementing a feature in a new component, we also
      accuracy for software understanding, and our method is            use the features realized by other existing components as nec-
      effective to classify and understand the components of            essary. From this, it can be considered that which components
      the target software.                                              are used by a certain component can also be information for
   • We showed that the components in the cluster could be              estimating the features realized in the component. In some
      used as examples when adding new components that use              cases, when using a certain feature, it is necessary to use
      the same components.                                              some other components together. In such a case, there is
                                                                        a commonality in the set of using components among the
   Section II introduces the component graph as a background.
                                                                        components that use the feature. Based on the fact that the
Section III introduces our method and implementation. Section
                                                                        using components are similar, we thought that these compo-
IV conducts two experiments. Finally, Section V discusses the
                                                                        nents are similar in function and role. Therefore, we make a
results and improved ways and introduces related works.
                                                                        hypothesis that “When two components use many common
                      II. BACKGROUND                                    components, these two components are implemented for the
                                                                        same or similar purpose”, and confirm it by experiments. We
A. Component Graph and Use Relation between Components                  propose a method for classifying components based on the
  In general, a component is a modular part of a system that            commonality of components that are used by the components.
encapsulates its content and whose manifestation is replace-            We thought that this method can extract several components
able within its environment [7]. We model software systems              groups that are recognizable as collections of components that
by using a weighted directed graph, called a Component                  are similar in function or role. Through evaluation experiments



     Copyright © 2018 for this paper by its authors.               60
                          6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

to an open source project, we introduce characteristics of the                                               This value has a range of 0 to 1, and the higher this value,
obtained component groups and components in the obtained                                                  the more likely that LA and LB are similar. And then, we
component groups have a strong relationship.                                                              define the distance of components A and B dist(LA , LB ) as
                                                                                                          following; (For representation, the distance was 10 timed, and
A. Implementation of the analysis tool                                                                    it does not have other special meaning.)
   We implemented a classification tool for Java source files of
a target software system. The system constructs a component                                                         dist(LA , LB ) = (1 − sim(LA , LB )) ∗ 10
graph for Java source files of the target software system and
produces a distance matrix used by the hierarchical cluster                                               C. About Linkage Criterion for Hierarchical Clustering
analysis on R. For extracting use relations, we used Classycle’s                                             In the hierarchical clustering, there are candidates to com-
analyzer [6], that is an analyzing tool for java class and pack-                                          pute the distance for a cluster consists of multiple objects, so
age dependencies. In the analysis of the Classycle’s analyzer,                                            the user needs to decide on the linkage criterion to use. For
we get the following as use relations: inheritance of class,                                              our study, we decided to use the linkage criterions provided by
declaration of variables, a creation of instances, method calls,                                          R, since it was intended to create a prototype of an analytical
and reference of fields, and inner classes and use relations                                              environment. The linkage criterions are roughly divided into
about inner classes are merged into the major class in the                                                two methods: methods to recalculate a distance directly from
Java source files. Figure 1 is an overview of the system, and                                             distances of multiple objects, such as a single linkage, a
the following is the analysis procedure:                                                                  complete linkage, and UPGMA, and methods to recalculate
                                                                                                          a distance from the center or the center of gravity of the
                     džƚƌĂĐƚĞĚ                                                                           cluster, such as Ward, and centroid method. From them, we
   dĂƌŐĞƚ   ^ƚĞƉϭ      hƐĞ                                              ^ƚĞƉϲ     'ƌŽƵƉƐŽĨ
 ƐLJƐƚĞŵ͛Ɛ            ZĞůĂƚŝŽŶƐ                                                     ^ŝŵŝůĂƌ
                                                                                                          decided to use the former methods. In general clustering, it is
                                                       ĞŶĚƌŽŐƌĂŵ
  ͘ũĂƌ&ŝůĞ ůĂƐƐLJĐůĞ ;͘džŵůͿ                                                      ŽŵƉŽŶĞŶƚƐ              considered that the latter methods can be utilized because a set
                ^ƚĞƉϮ                                                Z
                                                                                                          of vectors is used as input and hierarchical clustering is done
                                                         ^ƚĞƉϱ
                                         ^ĞƚƐŽĨ                                                         by calculating the distance between vectors. However, in our
                                      ŽŵƉŽŶĞŶƚƐ
                ŽŵƉŽŶĞŶƚ            ƚŚĂƚĂƌĞƵƐĞĚ             ŝƐƚĂŶĐĞ                                 analysis, only the distances between components are given as
  KƵƌƚŽŽů                                                        ŵĂƚƌŝdž            ϬϬ͘ϱ͙͘Ϭ͘ϯ
                  'ƌĂƉŚ                  ďLJĞĂĐŚ
                                                                  ;͘ĐƐǀͿ
                                                                                 Ϭ͘ϱϬ͙͘Ϭ͘Ϯ         input, so it is meaningless to consider the center of the cluster.
                              ^ƚĞƉϯ    ĐŽŵƉŽŶĞŶƚ        ^ƚĞƉϰ                    ͙͙͙͘͘͘͘͘͘͘

  KƚŚĞƌƚŽŽůƐ
                                                                                 Ϭ͘ϯϬ͘Ϯ͙͘Ϭ            We introduce three types of results, based on the single link-
  ŽƌďLJŚĂŶĚ                                                                                              age method, the complete linkage method, and the UPGMA, as
                                                                                                          a preliminary experiment, and examine which linkage criterion
                                                                                                          should be used. The target system is GoGui that is a graphical
                   Fig. 1. Overview of the Proposed Method                                                interface program for playing Go and is composed of 181
                                                                                                          Java source files. Figure 2 is a dendrogram of the hierarchical
  1.         Classycle’s analyzer [6] provides the use relations of                                       clustering using the single linkage method. In the case of the
             the target system.                                                                           single linkage method, if a component has a similarity to any
  2.         Build a component graph based on the use-relations.                                          component in the cluster, the component is also merged into
  3.         For each component, a set of components used by                                              the cluster. So a very large cluster is easily formed, however,
             the component is obtained from the graph.                                                    similarities between components in a cluster would not be
  4.         For each pair of components, a distance between                                              strong. Figure 3 is a dendrogram of the hierarchical clustering
             the two components is calculated from a degree of                                            using the complete linkage method. In the case of the complete
             coincidence of the two sets. A distance matrix for                                           linkage method, a component is merged into the cluster only
             the target software components is produced.                                                  if the component has similarities with all components in the
  5.         By using the distance matrix, hierarchical cluster                                           cluster. A lot of small clusters are formed, however, similarities
             analysis is performed on R.                                                                  between components in each cluster would be strong.
  6.         The dendrogram is obtained by R. We decide a                                                    Figure 4 is a dendrogram of the hierarchical clustering using
             threshold value and clusters under the threshold value                                       the UPGMA. We can confirm a lot of small clusters as in
             are recognized as groups of similar components.                                              the case of Figure 3, and we can also confirm that these
                                                                                                          clusters are merged into a large cluster as in the case of Figure
B. Distance between two Components                                                                        2. In this way, the dendrogram obtained by UPGMA is a
  Consider two components A and B. LA represents a set of                                                 balanced result. By adjusting a threshold height for getting
components that are used by A, and LB represents a set of                                                 clusters, we can obtain very similar clusters obtained by the
components that are used by B, respectively.                                                              other two approaches. In this study, we use the UPGMA as a
                                                                                                          linkage criterion in the hierarchical clustering. The problem of
  At first, we define the similarity of components A and B
                                                                                                          deciding the threshold height for getting cluster is considered
sim(LA , LB ) by using the Jaccard coefficient.
                                                                                                          to be a problem to be tackled in the future, and we will show
                                                 |LA ∩ LB |                                               how the results change by the upper or lower thresholds in
                        sim(LA , LB ) =                                                                   the experiments. As another linkage criterion, we can also
                                                 |LA ∪ LB |



       Copyright © 2018 for this paper by its authors.                                               61
                  6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)




                                                                                Fig. 4. A Dendrogram of GoGui Components (UPGMA)
         Fig. 2. A Dendrogram of GoGui Components (SLM)

                                                                                  component, as a case study. We will show the results
                                                                                  classified by our method were realistic ones.
                                                                        A. About Criterion for similarity
                                                                           At first, we prepared several situations that we can imagine
                                                                        when the roles and objectives of the components are similar.
                                                                           1) Many of the components that have common process-
                                                                               ing to achieve similar functionality are derived from
                                                                               the same class or implement the same interface. The
                                                                               processing performed by these components and the
                                                                               APIs provided by them is often quite similar. These
                                                                               components share similar code fragments, so they would
         Fig. 3. A Dendrogram of GoGui Components (CLM)                        be comprehensible at once.
                                                                           2) Components that realize different objectives for the same
                                                                               target should be considered as a group associated with
consider a method to recalculate a group of using components                   the target. Common processing would be performed
for each cluster and recalculate the distance between clusters.                as pre- or post-processing. So these components share
This also would be a problem to be tackled in the future,                      similar code fragments and would be comprehensible at
however, we consider that it contributes to the improvement                    once to understand how to use the target.
of accuracy and the overall trend does not change significantly.           3) Components that are considered to realize a large fea-
                                                                               ture are also considered as a functional group when
                     IV. E XPERIMENTS                                          observing from a vantage point. The targets of these
   For the experiments, we applied our method to jlGui Ver.3.0.                components are often slightly different, but it is possible
It is a graphical music player which supports Java Sound                       to verify the relationship between them. These compo-
capabilities and consists of 70 Java source files. We obtained                 nents will help you to get a rough understanding of what
a dendrogram of hierarchical clustering using UPGMA. By                        features are provided for a particular element.
showing the results of the following two experiments, it is                4) If the same design pattern is deployed in multiple
shown that the obtained clusters were composed of compo-                       locations in a software system, you can see that there
nents with strong similarity.                                                  are multiple components that have a specific role on the
   1      Can we say that the components in the resulting                      design pattern in the software. We believe that using the
          cluster are strongly related components?                             same design pattern has a common intention, so these
          Initially, we prepared several situations that we can                components are considered to be comprehensible to
          imagine when the roles and objectives of the com-                    understand a part of software design. In addition, these
          ponents are similar. Considering some conditions                     components sometimes share similar code fragments to
          that similar components satisfy, we treated them as                  perform the common processing.
          criteria to show similarity. For each criterion, we              In such cases, we thought that the components used by
          examined how many components in the cluster meets             these components had a commonality. Besides, the similarity
          the criteria. It corresponded to the accuracy. Then,          of components may be confirmed by the similarity of package
          we examined how many similar components exist                 classification and their names. Based on those, seven criteria
          outside the cluster. It corresponded to the recall.           were set to confirm that the two components are similar, and
   2      What kind of clusters have been obtained?                     they were used for calculating the precision and the recall.
          For some clusters, we introduced the collection of               C1      The two components were derived from the same
          components in the clusters and the role of each                          class or implemented the same interface.



     Copyright © 2018 for this paper by its authors.               62
                    6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

  C2      The two components belonged to the same package.                   number of components in the cluster. In cluster #1-#7, the
  C3      The two components used the same word as a part                    high percentage was shown on most criteria, and you can
          of the name.                                                       confirm strong relationships among components in the clusters.
  C4      The two components shared similar code fragments.                  For these components, the structure, location, and origin of
  C5      When comparing the components used by the two                      the name were very similar and provides similar methods,
          components, the proportion of matching components                  so these components can be recognized as functional groups.
          was over 50%.                                                      Some of these components shared similar code fragments,
  C6      The two components were thought to constitute a                    others do not. In experiment 2, we detailed the contents of
          part of a specific functional group.                               these clusters. On the other hand, we could not confirm any
  C7      There were several methods with the same signature                 strong relationship between the components in cluster #8.
          in the two components.                                             The cluster #8 consisted of two components, one was the
                                                                             delegation component for the music playlist and the other was
B. About precision and recall                                                the component to search music files. Since the two components
   In order to calculate the precision, we randomly extracted a              used the Playlist and Playlistitem in common, there was
component from each cluster and treated it as component A.                   a common point that the two components manipulate the
Next, at the given criterion, components related to component                playlist. However, we could not confirm any relevance other
A were searched from the components in the cluster. We                       than that.
calculated the number of related components in the cluster
divided by the number of components in the cluster and we                                                TABLE II
regarded it as the precision for the criteria. If there is no related                           R ECALL FOR JL G UI C LUSTERS
component without component A, it is represented by ×.                                 NC    C1     C2     C3     C4     C5      C6     C7
   The recall was calculated only for items where related                         #1    5    4/4    4/6    4/5    ×      5/5    5/5    4/5
components existed. At the given criteria, components related                     #2    4    4/5    4/7    4/7    3/8    4/8    4/7    4/7
                                                                                  #3    9    4/8   7/21   7/10    ×     7/12    6/10   8/10
to component A were searched from all components in the
                                                                                  #4    4    4/8   4/10   4/10    4/4   4/10    4/9    4/13
software. We calculated the number of related components                          #5    5    4/8   5/10   5/10    3/3   5/10    5/9    5/13
in the cluster divided by the number of related components                        #6    3    3/8    ×      3/5    3/7    3/6    3/5    3/3
in the software and we regarded it as the recall for the                          #7    4    ×      2/4    2/6    ×     4/17    3/7     ×
                                                                                  #8    2    ×      ×      ×      ×      ×       ×      ×
criteria. If there were no related components in the cluster
without component A, we express it by ×. For both cases, the
component A was also counted for the number of components.                      Next, the Table I is a summary of the recall for each criterion
                                                                             for each cluster. For each cluster, the table shows the number
C. Experiment 1 : Can we say that the components in the
                                                                             of components in the cluster (NC) and the recall for each
resulting cluster are strongly related components?
                                                                             criterion, represented by the number of similar components
   1) Overview: For 70 jlGui components, we performed                        in the cluster and the number of similar components in the
the hierarchical clustering based on UPGMA. Figure 5 is a                    software. From the result, we confirmed that most of the
dendrogram of the analysis, and we cut the dendrogram by a                   related components were collected into clusters #1. Clusters
red line on Figure 5 and got the eight sets of components                    #4 and #5 seemed to be integrated because the correct set of
joining under the red line as clusters. These clusters were                  components were almost identical for them. Approximately
numbered as shown in Figure 5, and we investigated the                       half of the related components were collected into clusters #2,
precision and recall for each criterion for each cluster.                    #3 and #6. There was no big difference between the criteria,
                                                                             and the overall trend was high. Overall, we found that many
                              TABLE I                                        of the components considered to be similar were included in
                T HE P RECISION FOR JL G UI C LUSTERS
                                                                             the clusters. However, as in the case of #7 and #8, there were
              NC    C1    C2    C3    C4     C5    C6    C7                  cases where no strong association was found.
         #1    5    4/5   4/5   4/5   ×      5/5   5/5   4/5
         #2    4    4/4   4/4   4/4   3/4    4/4   4/4   4/4                 D. Experiment 2 : What kind of clusters have been obtained?
         #3    9    4/9   7/9   7/9   ×      7/9   6/9   8/9
                                                                                As a case study, we will explain components in the cluster
         #4    4    4/4   4/4   4/4   4/4    4/4   4/4   4/4
         #5    5    4/5   5/5   5/5   3/5    5/5   5/5   5/5                 #1-#7 of the Figure 5. First, we introduced the components
         #6    3    3/3   ×     3/3   3/3    3/3   3/3   3/3                 obtained by the red lines in the Figure 5, and explained about
         #7    4    ×     2/4   2/4   ×      4/4   3/4   ×                   components that were commonly used. We also introduced
         #8    2    ×     ×     ×     ×      ×     ×     ×
                                                                             which components were newly added when moving the line
                                                                             upward. Through these introductions, we will show that the
  The Table I is a summary of the precision. For each cluster,               results classified by this method are realistic.
the table shows the number of components in the cluster                         1) Cluster #1: 5 components: Four of the five components
(NC) and the precision for each criterion, represented by                    were components such as OggVorbisInfo, which handled
the number of similar components in the cluster and the                      information for several compression types, such as Ogg Voris,



       Copyright © 2018 for this paper by its authors.                  63
                   6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)




                                                               ٬                                                   ٬
                                                                                                        ٬



                                                                                                             ٬
                                                                                                   ٬


                                                                     ٬
                                                 ٬‫ڭ‬                                           ٬



                                                Fig. 5. A Dendrogram for jlGui (UPGMA)



Mpeg, Flac, APE etc. and these were derived from TagInfo.                 components were not used in common. If you move the red
The fifth component was EmptyDialog. This also used Tag-                  line upward, cluster #4 and #5 were integrated. Components in
Info, and was a part of the dialog related components. These              Cluster #4 only used PreferenceItem, not other components.
five components were clustered because they used TagInfo                     5) Cluster #5: 5 components: As in the case of cluster #4,
in common. And any other components were not used in                      five components were components like Preferences, which
common. If you move the red line upward, Playlistitem                     handled the preferences. These components were also derived
representing each playlist item will be joined to the cluster.            from PreferenceItem, and these components were clustered
   2) Cluster #2: 4 components: Four components were com-                 because these components used PreferenceItem. And any
ponents such as OggVorbisDialog, which handled dialogs for                other components were not used in common. If you move
several compression types, such as Ogg Voris, Mpeg, Flac,                 the red line upward, cluster #4 and #5 were integrated.
APE etc. and these were derived from TagInfoDialog, and                   Components in Cluster #5 used PreferenceItem and several
used a corresponding component in Cluster #1. These four                  other components and this was a difference.
components were clustered because they used TagInfoDialog                    6) Cluster #6: 3 components: Three components were
in common. And any other components were not used in                      components that realized user interfaces, such as PlayerUI,
common. If you move the red line upward, TagInfoFactory,                  PlaylistUI and EqualizerUI. These components were clus-
that was a factory class for TagInfo and TagInfoDialog joined             tered because these components used 8-10 GUI components in
to the cluster #2, and finally, cluster #1 and #2 were integrated.        common, including Loader, Skin and components in Cluster
   3) Cluster #3: 9 components: Nine components were sev-                 #3. If you move the red line upward, Skin and Standalone-
eral types of GUI components, such as icons, bars, sliders,               Player joined to the cluster #6, because these components
popups, buttons, and so on. These components were derived                 managed a core feature using several GUI components.
from the swing components in the standard library for GUI.                   7) Cluster #7: 4 components: First, BasePlaylist and
These components were clustered because they used Absolute-               PlaylistFactory joined the cluster #7. These two components
Constraints in common. And any other components were not                  used PlayList and Config, and realized playlist-related fea-
used in common. AbsoluteConstraints was a component for                   tures. After that, SkinLoader and FileUtil joined the cluster
managing information about size and position on the screen.               #7. The latter two components didn’t use PlayList, however,
There were 15 components using AbsoluteConstraints, and                   they used Config and other classes in common. If you move
they were scattered across Cluster #3, #6 and #8. If you                  the red line upward, components using Config joined to the
move the red line upward, ActiveJLabel joined the cluster                 cluster. There were 15 components using Config, which were
#3 because it also used AbsoluteConstraints.                              scattered across Cluster #5, #6 and #7.
   4) Cluster #4: 4 components: Four components were com-                    In this way, most of the components in the cluster were
ponents like VisualizationPreference, which handled the pref-             functionally related components, and these components were
erences of some items. These components were derived from                 divided into two large categories. The components of the first
PreferenceItem, and these components were clustered be-                   category had a simple structure and a simple use-relation and
cause these components used PreferenceItem. And any other                 realized a single function by derivation. The purpose of these



      Copyright © 2018 for this paper by its authors.                64
                    6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

components was consistent with the purpose of commonly                    Classification would be affected by how use-relations to the
used components. The components of the second category                    external components are treated in the calculation of similarity.
used several components and realized multiple functions. The              So we would like to confirm whether consideration of external
purposes of the commonly used components were integrated                  components is effective or not, and propose a refined method.
and it becomes the purposes of the components. When we have                  Moreover, for Cluster #1 and #2, and for #4 and #5, an
to realize another screen for the software, these components              integrated cluster may be appropriate categorization when we
might be useful as examples. In this way, understanding the               consider functional groups in a broad sense. In some cases,
commonly-using components was essential to understand the                 the result would be improved by moving the threshold height
roles of the components in the cluster.                                   up and increasing the size of the cluster. On the other hand,
   Table III lists the jlGui components in order of the number            in other cases, the result would be improved by moving the
of components that use the component. Most of the compo-                  threshold height down and selecting components carefully. We
nents in the list were mentioned in our case study, so we                 would like to confirm the appropriate threshold trend.
thought the classification result was appropriate as a whole.
                                                                          C. Usages of the Proposed Method
                           TABLE III
   T HE N UMBER OF C OMPONENTS U SING E ACH JL G UI C OMPONENT
                                                                             First of all, as a method of using the proposed method, we
                                                                          would like to use it as a method for the efficient grasp of
                          Components                       Used by        components and inside of the software. Next, we also believe
     javazoom.jlgui.player.amp.skin.AbsoluteConstraints      15           that some contributions can be made to maintenance work. For
              javazoom.jlgui.player.amp.util.Config          15           example, we consider that it can also be used for checking for
               javazoom.jlgui.player.amp.PlayerUI            10           omissions of correction in maintenance work. As a method
             javazoom.jlgui.player.amp.tag.TagInfo           10
       javazoom.jlgui.player.amp.util.ui.PreferenceItem       9
                                                                          of assisting such checking work, you may use the commit
        javazoom.jlgui.player.amp.tag.ui.TagInfoDialog        7           history or similar code fragments, and we expect the same
           javazoom.jlgui.player.amp.playlist.Playlist        6           effect as those. Moreover, we think that it is effective to
         javazoom.jlgui.player.amp.playlist.PlaylistItem      6           suggest components with similar use-relationships, as refer-
               javazoom.jlgui.player.amp.skin.Skin            6
                                                                          ence examples, when adding new components to implement
                                                                          additional features in maintenance. Finally, we consider that
                       V. C ONSIDERATION                                  our method supplies a good viewpoint to express the software’s
                                                                          evolutionary process. By comparing the dendrograms of each
A. Evaluation of Experiments                                              version in the middle of development, you can grasp which
   We classified jlGui components based on the similarity of              part of the software is growing and can grasp the direction of
the components used by each component and confirmed that                  evolution and take countermeasures for it.
many of the components in the obtained clusters showed strong
relationships with each other. It is effective to understand the          D. Comparison with Other Existing Methods
role of the components in the cluster based on the role of
                                                                             Currently, a comparison is not done as an experiment, and
commonly used components and this approach is effective as
                                                                          comparison with existing classification method is a problem
a systematic method to efficiently understand the inside of the
                                                                          to be tackled in future. In the feeling of calculating the
software. As in the situation that components using Absolute-
                                                                          recalls of Experiment 1, our method seems to perform better
Constraints distributed in several clusters, components that
                                                                          classification within the scope of analysis between classes,
use the same components may spread across multiple clusters.
                                                                          compared to the result of classification based on package
In that case, we think that the use-relations may be different
                                                                          hierarchy and based on similar code fragments. In the case
depending on the role difference, so these components are
                                                                          of package-based classification, classification is done with a
classified into different clusters.
                                                                          larger granularity, so a certain package contains several types
   However, there are cases where the roles of commonly
                                                                          of components. Our method can classify them components
used components do not directly indicate the role of the
                                                                          based on their purposes. In the case of code clone-based
components, and the roles of commonly used components are
                                                                          analysis, as the modification after copying increases, a similar
ambiguous. There seem to be some useless cases in this way,
                                                                          code fragment becomes a gaped clone and it becomes difficult
but we consider that our method is effective as a whole.
                                                                          to grasp. Descriptions using other classes remain at the root,
B. Refining Approaches for the Proposed Method                            so the method is often highly resistant to modification.
   In the case of jlGui, there were about 20 components which
did not use any classes in jlGui, and these were not classified.          E. Threats to Validity
In the maximum case, about 30% to 50% of the components                      In this study, we applied our method to only one software
may not be classified. Some components were realized by not               system, so it is necessary to discuss generalization by applying
using components in jlGui but using only library components.              to various software systems. The main concern is whether the
So the number of classified components would be increased                 same tendencies can be confirmed in other domain software
if we also consider use relations to the external components.             and whether similar tendencies can be confirmed in different



     Copyright © 2018 for this paper by its authors.                 65
                   6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018)

software systems of different sizes. The purposes of compo-             the role of components in the cluster. We confirmed that our
nents in one software system are considered consistent, so it           method is effective for classifying components, and is useful
may have derived good results. In order to work well, a certain         for understanding them. For future works, we would like to
degree of commonality is required for target components, so it          compare properties with other classification methods including
might not work well if the target of the analysis was multiple          stochastic block model, confirm its generality, and improve
software systems. In this regard, the classifications may be            accuracy. Moreover, we would like to realize supporting tools
effective if components generated by a certain product line             for software understanding and for maintenance activities.
were analyzed. There is a possibility that the approach of the
                                                                                                 ACKNOWLEDGMENT
class definition in Java and our classification method have
successfully engaged. Whether our method works well for                    This research is supported by Nanzan University Pache
software written in other programming languages is unknown.             Research Subsidy I-A-2 for the 2018 academic year. We would
                                                                        like to thank H. Kagami, N. Mase, and M. Sawai. This research
F. Related Works                                                        is based on their bachelor thesis [15]. We were supervisors of
   From the viewpoint of use-relation analysis, architecture            it and added experiments and evaluations.
recovery is an active research field. Zhang represented the                                           R EFERENCES
object-oriented system using a class graph and proposed a
                                                                         [1] B. Dagenais, H. Ossher, R. K. E. Bellamy, M. P. Robillard, and J. P.
clustering algorithm to restore the high-level software archi-               de Vries, “Moving into a new software project landscape,” in Pro-
tecture [9]. Constantinou represented the hierarchical relation-             ceedings of the 32Nd ACM/IEEE International Conference on Software
ships between components as a D-layer and examined the                       Engineering (ICSE2010), 2010, pp. 275–284.
                                                                         [2] K. Inoue, R. Yokomori, T. Yamamoto, M. Matsushita, and S. Kusumoto,
relationship between the architecture layer and the design                   “Ranking significance of software components based on use relations,”
metrics [10]. Kula et al. proposed the Software Universe Graph               IEEE Transactions on Software Engineering, vol. 31, no. 3, pp. 213–225,
(SUG) Model as a structured abstraction of the evolution of                  2005.
                                                                         [3] T. Kamiya, S. Kusumoto, and K. Inoue, “Ccfinder: A multi-linguistic
software systems and their library dependencies over time                    token-based code clone detection system for large scale source code,”
and demonstrated its usefulness by showing several views of                  IEEE Transactions. Software Engineering, vol. 28, no. 7, pp. 654–670,
visualizations [11].                                                         2002.
                                                                         [4] H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei, “Mapo: Mining and
   There are many studies to extract patterns from the call                  recommending api usage patterns,” in proceedings of the 23rd European
history of API on the source code. Zhong at el. proposed a                   Conference on Object-Oriented Programming (ECOOP 2009), 2009, pp.
system to extract call sequences of API from open source                     318–343.
                                                                         [5] Z. Li and Y. Zhou, “Pr-miner: automatically extracting implicit pro-
software’s repositories [4], and showed that the system is                   gramming rules and detecting violations in large software code,” in
useful for learning how to use the library’s API. Li at el.                  proceedings of the 10th European software engineering conference,
proposed a system to extract function call sequences from                    2005, pp. 306–315.
                                                                         [6] Classycle, http://classycle.sourceforge.net/.
C codes as programming rules, and the system provided a                  [7] C. Krueger, “Software reuse,” ACM Computing Surveys, vol. 24, no. 2,
mechanism for detecting a portion that violates the rules [5].               pp. 131–183, 1992.
Our method differs from these methods in that the granularity            [8] K. Kobori, T. Yamamoto, M. Matsushita, and K. Inouee, “Classification
                                                                             of java programs in spars-j,” in proceedings of the International Work-
of our analysis is between components. We believe that it                    shop on Community-Driven Evolution of Knowledge Artifacts, 2003.
can extract informative information even in component-based              [9] Q. Zhangs, D. Qiu, Q. Tian, and L. Sun, “Object-oriented software
approach.                                                                    architecture recovery using a new hybrid clustering algorithm,” in
                                                                             Proceedings of the Seventh International Conference on Fuzzy Systems
   As a method to extract similar code fragments, developers                 and Knowledge Discovery, 2010, pp. 2546–2550.
can use code clone analysis. Mondal analyzed the stability of           [10] E. Constantinou, G. Kakarontzas, and I. Stamelos, “Towards open
several kinds of cloned codes, and they reported that Type3                  source software system architecture recovery using design metrics,” in
                                                                             Proceedings of the 15th Panhellenic Conference on Informatics, 2011,
clones, known as gapped clones, have higher stability than                   pp. 166–170.
other clones [12]. Antoniol analyzed the evolution of code              [11] R. G. Kula, C. D. Roover, D. M. German, T. Ishio, and K. Inoue, “A gen-
duplications in 19 versions of the Linux kernel [13]. Yoshida                eralized model for visualizing library popularity, adoption, and diffusion
                                                                             within a software ecosystem,” in 25th IEEE International Conference on
et al. proposed an approach to support clone refactoring based               Software Analysis, Evolution and Reengineering (SANER2018), 2018,
on code dependency among code clones [14].                                   pp. 288–299.
                                                                        [12] M. Mondal, C. Roy, M. Rahman, R. Saha, J. Krinke, and K. Schneider,
                      VI. C ONCLUSION                                        “Comparative stability of cloned and non-cloned code: An empirical
                                                                             study,” in Proceedings of the 27th ACM Symposium on Applied Com-
   In this paper, we proposed a classification method for                    puting, 2012, pp. 1227–1234.
software components based on similarity of use relation.                [13] G. Antoniol, U. Villano, E. Merio, and M. D. Penta, “Analyzing cloning
                                                                             evolution in the linux kernel,” Information and Software Technology,
For each pair of components, the distance was calculated                     vol. 44, no. 13, pp. 755–765, 2002.
from the coincidence of their using components, and the                 [14] N. Yoshida, Y. Higo, T. Kamiya, S. Kusumoto, and K. Inoue, “On refac-
hierarchical clustering is performed based on the distances.                 toring support based on code clone dependency relation,” in Proceedings
                                                                             of the 11th IEEE International Software Metrics Symposium, 2005, pp.
From the result of experiments, we confirmed that many                       16:1–16:10.
components of the cluster were strongly similar with each               [15] H. Kagami, N. Mase, and M. Sawai, “Evaluation of software parts classi
other from the viewpoint of their functions, and the purpose                 cation method based on commonality of parts to be used and use source,”
                                                                             in Bachelor Thesis of Nanzan University, 2018, (In Japanese).
of commonly used components greatly helps to understand



     Copyright © 2018 for this paper by its authors.               66