6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) Use-Relationship Based Classification for Software Components Reishi Yokomori Norihiro Yoshida Masami Noro Katsuro Inoue Nanzan University Nagoya University Nanzan University Osaka University Nagoya, Japan Nagoya, Japan Nagoya, Japan Osaka, Japan yokomori@se.nanzan-u.ac.jp yoshida@ertl.jp yoshie@nanzan-u.ac.jp inoue@ist.osaka-u.ac.jp Abstract—In recent years, the maintenance period of the remember its functions and role. This type of rework needs software system is increasing. The size of the software system time, however, it makes their understanding deeper. However, has grown, and the number of classes and the relationship as the number of files and classes increases through a long between classes are also increasingly complicated. If we can categorize software components based on information such as period of maintenance, the frequency of the rework in under- functions and roles, we believe that these classified components standing increases, and the content to be understood becomes can be understood together, and are useful for understanding also complicated. The effort required for understanding the the system. In this paper, we proposed a classification method internal structure by developers participating later will be for software components based on similarity of use relation. For much greater than the effort expected by the original developer. each component, a set of components used by the component was analyzed. And then, for each pair of components, the distance was By analyzing the source code by the statement unit, a lot calculated from the coincidence of the two sets. A distance matrix of methods for extracting code clone [3], extracting coding- was created and components were classified by hierarchical pattern [4], and extracting coding-rule [5] have been proposed. cluster analysis. We applied this method to jlGui consisting of As the scale of the software system increases, the relationship 70 components. 8 clusters of 36 components were extracted from between statements becomes complicated, and the analysis the 70 components. Characteristics of the extracted clusters were evaluated, and the content of each cluster was introduced as a cost gradually increases, so we would like to propose a method case study. In 7 clusters out of the 8 clusters, components of the with a coarser granularity, based on the anlysis by the class cluster were strongly similar with each other from the viewpoint unit. of their functions. Through these experiments, we confirmed that In the evaluation experiment for the component retrieval our method is effective for classifying components of the target system SPARS-J [2], it was effective to guide the component software, and is useful for understanding them. Index Terms—component, use-relations, hierarchical cluster to be browsed next based on its use relationships, and it was analysis popular among developers. If we can categorize software com- ponents based on their functions and roles by obtaining infor- I. I NTRODUCTION mation from analysis of use-relationships, we can understand these classified components together and this classification In recent years, the maintenance period of the software would be useful for understanding the target software. For system has increased, by also experiencing the entry of new example, when realizing a specific feature, it is often necessary developers. In order to participate in the maintenance of such to use certain components together. We compare use-relations a software system, it is the first important task to become ac- of two components (-A, -B), that means components that customed to the environment surrounding the software system component-A uses and components that component-B uses are [1], and it is necessary to understand the configuration of the compared. In the case where there are many common compo- target software. In the process of understanding, the developer nents in the two component set, we consider that component-A first understands the diagram and documents representing the and -B are supposed to implement the same specific function composition of the software and then reads the source code of or very similar function. We think that understanding of these each component of the software in order to grasp the details components can be performed efficiently by also considering of each subsystem and the overall image. components that are commonly used by these components. By In reading the source code, the developer browses the source being able to understand efficiently, we believe that developers code of another component one by one on an editor or a devote much effort to other work and can carry out with high source code browser like SPARS-J [2], by referring to the quality. As a result, we think it can improve productivity and package hierarchy and the name of another class appearing in quality of maintenance. the source code, and so on. In this way, the source codes In this paper, we proposed a classification method for of a lot of components are read strategically in order to software components based on similarity of use relation. In understand efficiently. In fact, when the developer encounters a this method, for each component, a set of components used component similar to the components the developer confirmed by the component was analyzed. And then, for each pair of once, it often happens that reworking occurs in order to components, the distance between the two components was Copyright © 2018 for this paper by its authors. 59 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) calculated from the coincidence of the two sets. In this way, Graph [2]. In the component graph (V, E) , a node v ∈ V a distance matrix of the target software components was cre- represents a software component, and a directed edge exy ∈ E ated and software components were classified by hierarchical from node x to y represents a use relation meaning that cluster analysis. component x uses component y. This use relation indicates We implemented a prototype for the proposed method, and that a component realizes its own function by utilizing features its analysis target is a software system described in Java. The of other components. If you need to understand how features prototype system treats the Java source files as a component in a particular component are specifically implemented, you and excludes components defined outside the software, such can use use-relations to understand which parts of the software as components in libraries. Our system generates a distance you need to reference together. matrix from the analysis result of Classycle’s analyzer [6], and hierarchical cluster analysis is executed by R. We conducted B. Classification methods using metrics that represent the two evaluation experiments. In the first experiment, we applied characteristics of software components it to the actual software system of open source project and For a lot of components extracted from multiple software evaluated the similarity of the components in the obtained systems, Kobori suggested a method for extracting the same cluster from several viewpoints. As a result, many of the or similar components as the target components [8]. In this components in the cluster showed strong similarity in terms of method, metrics representing the characteristics of software their functionality, and more than half of similar components components are obtained from the source code of each compo- were joined as clusters. In the second experiment, we intro- nent. These metrics are the number of methods and the number duced how the components in the clusters were combined, as of tokens that represent the scale of the component, and a case study. Some clusters were composed of a collection the cyclomatic complexity that represents the complexity of of components that have a simple structure and a simple use control structure of the component, and so on. In the method, relationship, and some clusters were composed of a collection components that have the same or similar values in these of components that realize multiple functions by using several metrics are extracted. Through experiments, they showed that other components. In the case study, knowing the components extracted components are mainly copied components or copied commonly used by the components in the cluster was effective and modified components from the original ones. This method in understanding the contents of the components in the cluster. is a method assumed for using in the software component Through these experiments, we introduce that the obtained retrieval system. In the software component retrieval system, clusters have sufficient accuracy to be used for software it collects a lot of software system. The method could help to understanding, and our method is effective to classify and extract the almost identical (copied or copied and modified) understand the components of the target software. components across a lot of software systems. Our method is to The main contributions of our research are as follows: classify components in one software system, based on the fact • We have hypothesized that “When two components use that components are using a lot of common components. The many common components, these two components are purpose of our method is to obtain a set of components using implemented for the same or similar purpose”, and con- similar functions and to obtain a set of components whose role firmed it by experiment. and features in the software are similar. • We showed that the proposed method can combine most III. P ROPOSED M ETHOD of the functionally related components as clusters. • We showed that the obtained clusters have sufficient When implementing a feature in a new component, we also accuracy for software understanding, and our method is use the features realized by other existing components as nec- effective to classify and understand the components of essary. From this, it can be considered that which components the target software. are used by a certain component can also be information for • We showed that the components in the cluster could be estimating the features realized in the component. In some used as examples when adding new components that use cases, when using a certain feature, it is necessary to use the same components. some other components together. In such a case, there is a commonality in the set of using components among the Section II introduces the component graph as a background. components that use the feature. Based on the fact that the Section III introduces our method and implementation. Section using components are similar, we thought that these compo- IV conducts two experiments. Finally, Section V discusses the nents are similar in function and role. Therefore, we make a results and improved ways and introduces related works. hypothesis that “When two components use many common II. BACKGROUND components, these two components are implemented for the same or similar purpose”, and confirm it by experiments. We A. Component Graph and Use Relation between Components propose a method for classifying components based on the In general, a component is a modular part of a system that commonality of components that are used by the components. encapsulates its content and whose manifestation is replace- We thought that this method can extract several components able within its environment [7]. We model software systems groups that are recognizable as collections of components that by using a weighted directed graph, called a Component are similar in function or role. Through evaluation experiments Copyright © 2018 for this paper by its authors. 60 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) to an open source project, we introduce characteristics of the This value has a range of 0 to 1, and the higher this value, obtained component groups and components in the obtained the more likely that LA and LB are similar. And then, we component groups have a strong relationship. define the distance of components A and B dist(LA , LB ) as following; (For representation, the distance was 10 timed, and A. Implementation of the analysis tool it does not have other special meaning.) We implemented a classification tool for Java source files of a target software system. The system constructs a component dist(LA , LB ) = (1 − sim(LA , LB )) ∗ 10 graph for Java source files of the target software system and produces a distance matrix used by the hierarchical cluster C. About Linkage Criterion for Hierarchical Clustering analysis on R. For extracting use relations, we used Classycle’s In the hierarchical clustering, there are candidates to com- analyzer [6], that is an analyzing tool for java class and pack- pute the distance for a cluster consists of multiple objects, so age dependencies. In the analysis of the Classycle’s analyzer, the user needs to decide on the linkage criterion to use. For we get the following as use relations: inheritance of class, our study, we decided to use the linkage criterions provided by declaration of variables, a creation of instances, method calls, R, since it was intended to create a prototype of an analytical and reference of fields, and inner classes and use relations environment. The linkage criterions are roughly divided into about inner classes are merged into the major class in the two methods: methods to recalculate a distance directly from Java source files. Figure 1 is an overview of the system, and distances of multiple objects, such as a single linkage, a the following is the analysis procedure: complete linkage, and UPGMA, and methods to recalculate a distance from the center or the center of gravity of the džƚƌĂĐƚĞĚ cluster, such as Ward, and centroid method. From them, we dĂƌŐĞƚ ^ƚĞƉϭ hƐĞ ^ƚĞƉϲ 'ƌŽƵƉƐŽĨ ƐLJƐƚĞŵ͛Ɛ ZĞůĂƚŝŽŶƐ ^ŝŵŝůĂƌ decided to use the former methods. In general clustering, it is ĞŶĚƌŽŐƌĂŵ ͘ũĂƌ&ŝůĞ ůĂƐƐLJĐůĞ ;͘džŵůͿ ŽŵƉŽŶĞŶƚƐ considered that the latter methods can be utilized because a set ^ƚĞƉϮ Z of vectors is used as input and hierarchical clustering is done ^ƚĞƉϱ ^ĞƚƐŽĨ by calculating the distance between vectors. However, in our ŽŵƉŽŶĞŶƚƐ ŽŵƉŽŶĞŶƚ ƚŚĂƚĂƌĞƵƐĞĚ ŝƐƚĂŶĐĞ analysis, only the distances between components are given as KƵƌƚŽŽů ŵĂƚƌŝdž ϬϬ͘ϱ͙͘Ϭ͘ϯ 'ƌĂƉŚ ďLJĞĂĐŚ ;͘ĐƐǀͿ Ϭ͘ϱϬ͙͘Ϭ͘Ϯ input, so it is meaningless to consider the center of the cluster. ^ƚĞƉϯ ĐŽŵƉŽŶĞŶƚ ^ƚĞƉϰ ͙͙͙͘͘͘͘͘͘͘ KƚŚĞƌƚŽŽůƐ Ϭ͘ϯϬ͘Ϯ͙͘Ϭ We introduce three types of results, based on the single link- ŽƌďLJŚĂŶĚ age method, the complete linkage method, and the UPGMA, as a preliminary experiment, and examine which linkage criterion should be used. The target system is GoGui that is a graphical Fig. 1. Overview of the Proposed Method interface program for playing Go and is composed of 181 Java source files. Figure 2 is a dendrogram of the hierarchical 1. Classycle’s analyzer [6] provides the use relations of clustering using the single linkage method. In the case of the the target system. single linkage method, if a component has a similarity to any 2. Build a component graph based on the use-relations. component in the cluster, the component is also merged into 3. For each component, a set of components used by the cluster. So a very large cluster is easily formed, however, the component is obtained from the graph. similarities between components in a cluster would not be 4. For each pair of components, a distance between strong. Figure 3 is a dendrogram of the hierarchical clustering the two components is calculated from a degree of using the complete linkage method. In the case of the complete coincidence of the two sets. A distance matrix for linkage method, a component is merged into the cluster only the target software components is produced. if the component has similarities with all components in the 5. By using the distance matrix, hierarchical cluster cluster. A lot of small clusters are formed, however, similarities analysis is performed on R. between components in each cluster would be strong. 6. The dendrogram is obtained by R. We decide a Figure 4 is a dendrogram of the hierarchical clustering using threshold value and clusters under the threshold value the UPGMA. We can confirm a lot of small clusters as in are recognized as groups of similar components. the case of Figure 3, and we can also confirm that these clusters are merged into a large cluster as in the case of Figure B. Distance between two Components 2. In this way, the dendrogram obtained by UPGMA is a Consider two components A and B. LA represents a set of balanced result. By adjusting a threshold height for getting components that are used by A, and LB represents a set of clusters, we can obtain very similar clusters obtained by the components that are used by B, respectively. other two approaches. In this study, we use the UPGMA as a linkage criterion in the hierarchical clustering. The problem of At first, we define the similarity of components A and B deciding the threshold height for getting cluster is considered sim(LA , LB ) by using the Jaccard coefficient. to be a problem to be tackled in the future, and we will show |LA ∩ LB | how the results change by the upper or lower thresholds in sim(LA , LB ) = the experiments. As another linkage criterion, we can also |LA ∪ LB | Copyright © 2018 for this paper by its authors. 61 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) Fig. 4. A Dendrogram of GoGui Components (UPGMA) Fig. 2. A Dendrogram of GoGui Components (SLM) component, as a case study. We will show the results classified by our method were realistic ones. A. About Criterion for similarity At first, we prepared several situations that we can imagine when the roles and objectives of the components are similar. 1) Many of the components that have common process- ing to achieve similar functionality are derived from the same class or implement the same interface. The processing performed by these components and the APIs provided by them is often quite similar. These components share similar code fragments, so they would Fig. 3. A Dendrogram of GoGui Components (CLM) be comprehensible at once. 2) Components that realize different objectives for the same target should be considered as a group associated with consider a method to recalculate a group of using components the target. Common processing would be performed for each cluster and recalculate the distance between clusters. as pre- or post-processing. So these components share This also would be a problem to be tackled in the future, similar code fragments and would be comprehensible at however, we consider that it contributes to the improvement once to understand how to use the target. of accuracy and the overall trend does not change significantly. 3) Components that are considered to realize a large fea- ture are also considered as a functional group when IV. E XPERIMENTS observing from a vantage point. The targets of these For the experiments, we applied our method to jlGui Ver.3.0. components are often slightly different, but it is possible It is a graphical music player which supports Java Sound to verify the relationship between them. These compo- capabilities and consists of 70 Java source files. We obtained nents will help you to get a rough understanding of what a dendrogram of hierarchical clustering using UPGMA. By features are provided for a particular element. showing the results of the following two experiments, it is 4) If the same design pattern is deployed in multiple shown that the obtained clusters were composed of compo- locations in a software system, you can see that there nents with strong similarity. are multiple components that have a specific role on the 1 Can we say that the components in the resulting design pattern in the software. We believe that using the cluster are strongly related components? same design pattern has a common intention, so these Initially, we prepared several situations that we can components are considered to be comprehensible to imagine when the roles and objectives of the com- understand a part of software design. In addition, these ponents are similar. Considering some conditions components sometimes share similar code fragments to that similar components satisfy, we treated them as perform the common processing. criteria to show similarity. For each criterion, we In such cases, we thought that the components used by examined how many components in the cluster meets these components had a commonality. Besides, the similarity the criteria. It corresponded to the accuracy. Then, of components may be confirmed by the similarity of package we examined how many similar components exist classification and their names. Based on those, seven criteria outside the cluster. It corresponded to the recall. were set to confirm that the two components are similar, and 2 What kind of clusters have been obtained? they were used for calculating the precision and the recall. For some clusters, we introduced the collection of C1 The two components were derived from the same components in the clusters and the role of each class or implemented the same interface. Copyright © 2018 for this paper by its authors. 62 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) C2 The two components belonged to the same package. number of components in the cluster. In cluster #1-#7, the C3 The two components used the same word as a part high percentage was shown on most criteria, and you can of the name. confirm strong relationships among components in the clusters. C4 The two components shared similar code fragments. For these components, the structure, location, and origin of C5 When comparing the components used by the two the name were very similar and provides similar methods, components, the proportion of matching components so these components can be recognized as functional groups. was over 50%. Some of these components shared similar code fragments, C6 The two components were thought to constitute a others do not. In experiment 2, we detailed the contents of part of a specific functional group. these clusters. On the other hand, we could not confirm any C7 There were several methods with the same signature strong relationship between the components in cluster #8. in the two components. The cluster #8 consisted of two components, one was the delegation component for the music playlist and the other was B. About precision and recall the component to search music files. Since the two components In order to calculate the precision, we randomly extracted a used the Playlist and Playlistitem in common, there was component from each cluster and treated it as component A. a common point that the two components manipulate the Next, at the given criterion, components related to component playlist. However, we could not confirm any relevance other A were searched from the components in the cluster. We than that. calculated the number of related components in the cluster divided by the number of components in the cluster and we TABLE II regarded it as the precision for the criteria. If there is no related R ECALL FOR JL G UI C LUSTERS component without component A, it is represented by ×. NC C1 C2 C3 C4 C5 C6 C7 The recall was calculated only for items where related #1 5 4/4 4/6 4/5 × 5/5 5/5 4/5 components existed. At the given criteria, components related #2 4 4/5 4/7 4/7 3/8 4/8 4/7 4/7 #3 9 4/8 7/21 7/10 × 7/12 6/10 8/10 to component A were searched from all components in the #4 4 4/8 4/10 4/10 4/4 4/10 4/9 4/13 software. We calculated the number of related components #5 5 4/8 5/10 5/10 3/3 5/10 5/9 5/13 in the cluster divided by the number of related components #6 3 3/8 × 3/5 3/7 3/6 3/5 3/3 in the software and we regarded it as the recall for the #7 4 × 2/4 2/6 × 4/17 3/7 × #8 2 × × × × × × × criteria. If there were no related components in the cluster without component A, we express it by ×. For both cases, the component A was also counted for the number of components. Next, the Table I is a summary of the recall for each criterion for each cluster. For each cluster, the table shows the number C. Experiment 1 : Can we say that the components in the of components in the cluster (NC) and the recall for each resulting cluster are strongly related components? criterion, represented by the number of similar components 1) Overview: For 70 jlGui components, we performed in the cluster and the number of similar components in the the hierarchical clustering based on UPGMA. Figure 5 is a software. From the result, we confirmed that most of the dendrogram of the analysis, and we cut the dendrogram by a related components were collected into clusters #1. Clusters red line on Figure 5 and got the eight sets of components #4 and #5 seemed to be integrated because the correct set of joining under the red line as clusters. These clusters were components were almost identical for them. Approximately numbered as shown in Figure 5, and we investigated the half of the related components were collected into clusters #2, precision and recall for each criterion for each cluster. #3 and #6. There was no big difference between the criteria, and the overall trend was high. Overall, we found that many TABLE I of the components considered to be similar were included in T HE P RECISION FOR JL G UI C LUSTERS the clusters. However, as in the case of #7 and #8, there were NC C1 C2 C3 C4 C5 C6 C7 cases where no strong association was found. #1 5 4/5 4/5 4/5 × 5/5 5/5 4/5 #2 4 4/4 4/4 4/4 3/4 4/4 4/4 4/4 D. Experiment 2 : What kind of clusters have been obtained? #3 9 4/9 7/9 7/9 × 7/9 6/9 8/9 As a case study, we will explain components in the cluster #4 4 4/4 4/4 4/4 4/4 4/4 4/4 4/4 #5 5 4/5 5/5 5/5 3/5 5/5 5/5 5/5 #1-#7 of the Figure 5. First, we introduced the components #6 3 3/3 × 3/3 3/3 3/3 3/3 3/3 obtained by the red lines in the Figure 5, and explained about #7 4 × 2/4 2/4 × 4/4 3/4 × components that were commonly used. We also introduced #8 2 × × × × × × × which components were newly added when moving the line upward. Through these introductions, we will show that the The Table I is a summary of the precision. For each cluster, results classified by this method are realistic. the table shows the number of components in the cluster 1) Cluster #1: 5 components: Four of the five components (NC) and the precision for each criterion, represented by were components such as OggVorbisInfo, which handled the number of similar components in the cluster and the information for several compression types, such as Ogg Voris, Copyright © 2018 for this paper by its authors. 63 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) ٬ ٬ ٬ ٬ ٬ ٬ ٬‫ڭ‬ ٬ Fig. 5. A Dendrogram for jlGui (UPGMA) Mpeg, Flac, APE etc. and these were derived from TagInfo. components were not used in common. If you move the red The fifth component was EmptyDialog. This also used Tag- line upward, cluster #4 and #5 were integrated. Components in Info, and was a part of the dialog related components. These Cluster #4 only used PreferenceItem, not other components. five components were clustered because they used TagInfo 5) Cluster #5: 5 components: As in the case of cluster #4, in common. And any other components were not used in five components were components like Preferences, which common. If you move the red line upward, Playlistitem handled the preferences. These components were also derived representing each playlist item will be joined to the cluster. from PreferenceItem, and these components were clustered 2) Cluster #2: 4 components: Four components were com- because these components used PreferenceItem. And any ponents such as OggVorbisDialog, which handled dialogs for other components were not used in common. If you move several compression types, such as Ogg Voris, Mpeg, Flac, the red line upward, cluster #4 and #5 were integrated. APE etc. and these were derived from TagInfoDialog, and Components in Cluster #5 used PreferenceItem and several used a corresponding component in Cluster #1. These four other components and this was a difference. components were clustered because they used TagInfoDialog 6) Cluster #6: 3 components: Three components were in common. And any other components were not used in components that realized user interfaces, such as PlayerUI, common. If you move the red line upward, TagInfoFactory, PlaylistUI and EqualizerUI. These components were clus- that was a factory class for TagInfo and TagInfoDialog joined tered because these components used 8-10 GUI components in to the cluster #2, and finally, cluster #1 and #2 were integrated. common, including Loader, Skin and components in Cluster 3) Cluster #3: 9 components: Nine components were sev- #3. If you move the red line upward, Skin and Standalone- eral types of GUI components, such as icons, bars, sliders, Player joined to the cluster #6, because these components popups, buttons, and so on. These components were derived managed a core feature using several GUI components. from the swing components in the standard library for GUI. 7) Cluster #7: 4 components: First, BasePlaylist and These components were clustered because they used Absolute- PlaylistFactory joined the cluster #7. These two components Constraints in common. And any other components were not used PlayList and Config, and realized playlist-related fea- used in common. AbsoluteConstraints was a component for tures. After that, SkinLoader and FileUtil joined the cluster managing information about size and position on the screen. #7. The latter two components didn’t use PlayList, however, There were 15 components using AbsoluteConstraints, and they used Config and other classes in common. If you move they were scattered across Cluster #3, #6 and #8. If you the red line upward, components using Config joined to the move the red line upward, ActiveJLabel joined the cluster cluster. There were 15 components using Config, which were #3 because it also used AbsoluteConstraints. scattered across Cluster #5, #6 and #7. 4) Cluster #4: 4 components: Four components were com- In this way, most of the components in the cluster were ponents like VisualizationPreference, which handled the pref- functionally related components, and these components were erences of some items. These components were derived from divided into two large categories. The components of the first PreferenceItem, and these components were clustered be- category had a simple structure and a simple use-relation and cause these components used PreferenceItem. And any other realized a single function by derivation. The purpose of these Copyright © 2018 for this paper by its authors. 64 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) components was consistent with the purpose of commonly Classification would be affected by how use-relations to the used components. The components of the second category external components are treated in the calculation of similarity. used several components and realized multiple functions. The So we would like to confirm whether consideration of external purposes of the commonly used components were integrated components is effective or not, and propose a refined method. and it becomes the purposes of the components. When we have Moreover, for Cluster #1 and #2, and for #4 and #5, an to realize another screen for the software, these components integrated cluster may be appropriate categorization when we might be useful as examples. In this way, understanding the consider functional groups in a broad sense. In some cases, commonly-using components was essential to understand the the result would be improved by moving the threshold height roles of the components in the cluster. up and increasing the size of the cluster. On the other hand, Table III lists the jlGui components in order of the number in other cases, the result would be improved by moving the of components that use the component. Most of the compo- threshold height down and selecting components carefully. We nents in the list were mentioned in our case study, so we would like to confirm the appropriate threshold trend. thought the classification result was appropriate as a whole. C. Usages of the Proposed Method TABLE III T HE N UMBER OF C OMPONENTS U SING E ACH JL G UI C OMPONENT First of all, as a method of using the proposed method, we would like to use it as a method for the efficient grasp of Components Used by components and inside of the software. Next, we also believe javazoom.jlgui.player.amp.skin.AbsoluteConstraints 15 that some contributions can be made to maintenance work. For javazoom.jlgui.player.amp.util.Config 15 example, we consider that it can also be used for checking for javazoom.jlgui.player.amp.PlayerUI 10 omissions of correction in maintenance work. As a method javazoom.jlgui.player.amp.tag.TagInfo 10 javazoom.jlgui.player.amp.util.ui.PreferenceItem 9 of assisting such checking work, you may use the commit javazoom.jlgui.player.amp.tag.ui.TagInfoDialog 7 history or similar code fragments, and we expect the same javazoom.jlgui.player.amp.playlist.Playlist 6 effect as those. Moreover, we think that it is effective to javazoom.jlgui.player.amp.playlist.PlaylistItem 6 suggest components with similar use-relationships, as refer- javazoom.jlgui.player.amp.skin.Skin 6 ence examples, when adding new components to implement additional features in maintenance. Finally, we consider that V. C ONSIDERATION our method supplies a good viewpoint to express the software’s evolutionary process. By comparing the dendrograms of each A. Evaluation of Experiments version in the middle of development, you can grasp which We classified jlGui components based on the similarity of part of the software is growing and can grasp the direction of the components used by each component and confirmed that evolution and take countermeasures for it. many of the components in the obtained clusters showed strong relationships with each other. It is effective to understand the D. Comparison with Other Existing Methods role of the components in the cluster based on the role of Currently, a comparison is not done as an experiment, and commonly used components and this approach is effective as comparison with existing classification method is a problem a systematic method to efficiently understand the inside of the to be tackled in future. In the feeling of calculating the software. As in the situation that components using Absolute- recalls of Experiment 1, our method seems to perform better Constraints distributed in several clusters, components that classification within the scope of analysis between classes, use the same components may spread across multiple clusters. compared to the result of classification based on package In that case, we think that the use-relations may be different hierarchy and based on similar code fragments. In the case depending on the role difference, so these components are of package-based classification, classification is done with a classified into different clusters. larger granularity, so a certain package contains several types However, there are cases where the roles of commonly of components. Our method can classify them components used components do not directly indicate the role of the based on their purposes. In the case of code clone-based components, and the roles of commonly used components are analysis, as the modification after copying increases, a similar ambiguous. There seem to be some useless cases in this way, code fragment becomes a gaped clone and it becomes difficult but we consider that our method is effective as a whole. to grasp. Descriptions using other classes remain at the root, B. Refining Approaches for the Proposed Method so the method is often highly resistant to modification. In the case of jlGui, there were about 20 components which did not use any classes in jlGui, and these were not classified. E. Threats to Validity In the maximum case, about 30% to 50% of the components In this study, we applied our method to only one software may not be classified. Some components were realized by not system, so it is necessary to discuss generalization by applying using components in jlGui but using only library components. to various software systems. The main concern is whether the So the number of classified components would be increased same tendencies can be confirmed in other domain software if we also consider use relations to the external components. and whether similar tendencies can be confirmed in different Copyright © 2018 for this paper by its authors. 65 6th International Workshop on Quantitative Approaches to Software Quality (QuASoQ 2018) software systems of different sizes. The purposes of compo- the role of components in the cluster. We confirmed that our nents in one software system are considered consistent, so it method is effective for classifying components, and is useful may have derived good results. In order to work well, a certain for understanding them. For future works, we would like to degree of commonality is required for target components, so it compare properties with other classification methods including might not work well if the target of the analysis was multiple stochastic block model, confirm its generality, and improve software systems. In this regard, the classifications may be accuracy. Moreover, we would like to realize supporting tools effective if components generated by a certain product line for software understanding and for maintenance activities. were analyzed. There is a possibility that the approach of the ACKNOWLEDGMENT class definition in Java and our classification method have successfully engaged. Whether our method works well for This research is supported by Nanzan University Pache software written in other programming languages is unknown. Research Subsidy I-A-2 for the 2018 academic year. We would like to thank H. Kagami, N. Mase, and M. Sawai. This research F. Related Works is based on their bachelor thesis [15]. We were supervisors of From the viewpoint of use-relation analysis, architecture it and added experiments and evaluations. recovery is an active research field. Zhang represented the R EFERENCES object-oriented system using a class graph and proposed a [1] B. Dagenais, H. Ossher, R. K. E. Bellamy, M. P. Robillard, and J. P. clustering algorithm to restore the high-level software archi- de Vries, “Moving into a new software project landscape,” in Pro- tecture [9]. Constantinou represented the hierarchical relation- ceedings of the 32Nd ACM/IEEE International Conference on Software ships between components as a D-layer and examined the Engineering (ICSE2010), 2010, pp. 275–284. [2] K. Inoue, R. Yokomori, T. Yamamoto, M. Matsushita, and S. Kusumoto, relationship between the architecture layer and the design “Ranking significance of software components based on use relations,” metrics [10]. Kula et al. proposed the Software Universe Graph IEEE Transactions on Software Engineering, vol. 31, no. 3, pp. 213–225, (SUG) Model as a structured abstraction of the evolution of 2005. [3] T. Kamiya, S. Kusumoto, and K. Inoue, “Ccfinder: A multi-linguistic software systems and their library dependencies over time token-based code clone detection system for large scale source code,” and demonstrated its usefulness by showing several views of IEEE Transactions. Software Engineering, vol. 28, no. 7, pp. 654–670, visualizations [11]. 2002. [4] H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei, “Mapo: Mining and There are many studies to extract patterns from the call recommending api usage patterns,” in proceedings of the 23rd European history of API on the source code. Zhong at el. proposed a Conference on Object-Oriented Programming (ECOOP 2009), 2009, pp. system to extract call sequences of API from open source 318–343. [5] Z. Li and Y. Zhou, “Pr-miner: automatically extracting implicit pro- software’s repositories [4], and showed that the system is gramming rules and detecting violations in large software code,” in useful for learning how to use the library’s API. Li at el. proceedings of the 10th European software engineering conference, proposed a system to extract function call sequences from 2005, pp. 306–315. [6] Classycle, http://classycle.sourceforge.net/. C codes as programming rules, and the system provided a [7] C. Krueger, “Software reuse,” ACM Computing Surveys, vol. 24, no. 2, mechanism for detecting a portion that violates the rules [5]. pp. 131–183, 1992. Our method differs from these methods in that the granularity [8] K. Kobori, T. Yamamoto, M. Matsushita, and K. Inouee, “Classification of java programs in spars-j,” in proceedings of the International Work- of our analysis is between components. We believe that it shop on Community-Driven Evolution of Knowledge Artifacts, 2003. can extract informative information even in component-based [9] Q. Zhangs, D. Qiu, Q. Tian, and L. Sun, “Object-oriented software approach. architecture recovery using a new hybrid clustering algorithm,” in Proceedings of the Seventh International Conference on Fuzzy Systems As a method to extract similar code fragments, developers and Knowledge Discovery, 2010, pp. 2546–2550. can use code clone analysis. Mondal analyzed the stability of [10] E. Constantinou, G. Kakarontzas, and I. Stamelos, “Towards open several kinds of cloned codes, and they reported that Type3 source software system architecture recovery using design metrics,” in Proceedings of the 15th Panhellenic Conference on Informatics, 2011, clones, known as gapped clones, have higher stability than pp. 166–170. other clones [12]. Antoniol analyzed the evolution of code [11] R. G. Kula, C. D. Roover, D. M. German, T. Ishio, and K. Inoue, “A gen- duplications in 19 versions of the Linux kernel [13]. Yoshida eralized model for visualizing library popularity, adoption, and diffusion within a software ecosystem,” in 25th IEEE International Conference on et al. proposed an approach to support clone refactoring based Software Analysis, Evolution and Reengineering (SANER2018), 2018, on code dependency among code clones [14]. pp. 288–299. [12] M. Mondal, C. Roy, M. Rahman, R. Saha, J. Krinke, and K. Schneider, VI. C ONCLUSION “Comparative stability of cloned and non-cloned code: An empirical study,” in Proceedings of the 27th ACM Symposium on Applied Com- In this paper, we proposed a classification method for puting, 2012, pp. 1227–1234. software components based on similarity of use relation. [13] G. Antoniol, U. Villano, E. Merio, and M. D. Penta, “Analyzing cloning evolution in the linux kernel,” Information and Software Technology, For each pair of components, the distance was calculated vol. 44, no. 13, pp. 755–765, 2002. from the coincidence of their using components, and the [14] N. Yoshida, Y. Higo, T. Kamiya, S. Kusumoto, and K. Inoue, “On refac- hierarchical clustering is performed based on the distances. toring support based on code clone dependency relation,” in Proceedings of the 11th IEEE International Software Metrics Symposium, 2005, pp. From the result of experiments, we confirmed that many 16:1–16:10. components of the cluster were strongly similar with each [15] H. Kagami, N. Mase, and M. Sawai, “Evaluation of software parts classi other from the viewpoint of their functions, and the purpose cation method based on commonality of parts to be used and use source,” in Bachelor Thesis of Nanzan University, 2018, (In Japanese). of commonly used components greatly helps to understand Copyright © 2018 for this paper by its authors. 66