What is in a Name? An Analysis of Associations Among Java Packaging and Artifact Names Farshad Ghassemi Toosi, Anila Mjeda Computer Science Department at Munster Technological University, Cork Campus Computer Science Department at Munster Technological University, Cork Campus Abstract Modern Programming Languages (Object Oriented Languages), are equipped with sophisticated mechanisms to assist devel- opers in organizing the source code. For instance, Java and Python use package names to resolve symbols. In Java, a package is a namespace declared at the top of each class or interface. There are several reasons for using packages in the source code: 1) Packages can prevent naming conflicts, (e.g., identical class name in two packages is possible with no conflict). 2) Packages can categorize the relevant and/or similar classes or interfaces in some conceptual and logical containers that assist developers in easier maintenance and a better understanding of the design of the software’s architecture. 3) Structured packaging is one of the core components of a clean architecture design. Developers may apply different strategies to structure the packages and these differences have repercussions in the quality and maintainability of the software architecture. In this work, we run a set of experiments on a number of open-source Java projects and analyse the packaging structures from a source-code structural and artifact (class, method, variable) names perspective. These experiments aim to investigate 1) the existence of any associations between the packaging structure and textual factors (artefact names) of the classes inside the package; and 2) what textual factors (artifact names) tend to be more associated with the package structure. The results of this research indicate that, on average, class names and inheritance (supper class names) tend to be considered as a packaging strategy. The focus on identifying ‘naturally’ occurring similarities in the packaging of software in the ‘wild’ is underpinned by the long-term objective to build developer-friendly architecture conformance protocols which help prevent architectural erosion. 1. Introduction level container or module, called a package. Usually, the visual representation of a software’s archi- Object oriented programming is underpinned by the idea tecture is a graph-like design where the software compo- of creating classes and using objects of those classes for nents are program packages that, in their turn, may con- higher reusability and better maintenance. The object tain other packages (hierarchical packages) [1, 2]. In most oriented programming paradigm is based on bringing software architecture design practices, modules or com- related fields and functions/methods together for a par- ponents are seen as a package or a set of packages [3, 4, 5]. ticular concept that is called a class. Different objects Hence, the intuition is that package structure can have a then can be instantiated from classes with different data direct impact on the quality of the software architecture. and implementation but they all share the same original Indeed this intuition has attracted the interest of other type, i.e., the class. For example, a class may represent researchers of the field such as Ebad et al., [6]. a car and its objects can be a hatchback or a sport utility One of the fundamental aspects of an architectural vehicle. In object oriented programming, methods and design is to consider the functionalities and interactions fields within a given class are expected to be logically between components at different granularities [7] with grouped in one container called class. a view of facilitating work among the components in a Some of the modern object oriented languages, includ- package. ing Java and Python, have another mechanism called Researchers [8, 9, 10, 7] show that a clean software packaging that lets developers have a higher level of architecture has a direct relation to the structured pack- grouping where related classes can be located in a high- aging; furthermore, they show how implicit packaging can cause architectural mismatching. They use the term ECSA2021 Companion Volume unstructured packaging as a lack of packaging strategy. " farshad.toosi@mtu.ie (F. G. Toosi); anila.mjeda@mtu.ie (A. Mjeda) For instance, all classes would be located in one package ~ or there are random packages, and classes are assigned https://www.linkedin.com/in/farshad-ghassemi-toosi-428a5852/ to them based on no particular strategy. As a result of (F. G. Toosi); https://www.linkedin.com/in/anila-mjeda-32a5064/ such packaging structure (or unstructured packaging), (A. Mjeda) there will be several of unrelated classes with no naming  0000-0002-1105-4819 (F. G. Toosi); 0000-0003-1311-6320 (A. Mjeda) and textual relevancies to each other in a package [8]. © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Naming relevancy, in particular, is important since arte- CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 facts (class, method, variable) are meant to be named by developers according to their responsibilities and func- Additionally, there is considerable research to auto- tionalities. matically optimise inter-package dependencies [16]. A Java is one of the object oriented languages that offers review of looking at object-oriented code issues in this the packaging mechanism. Every Java class is inside of a space as refactoring opportunities, can be found in [17]. package (unless there is no package declared, then the Interestingly for our research, Baxter et al [18] investi- class will be part of the default package). In this work, gated some of the reasons behind the structures and struc- we are using Java as the language of our case study to tural relationships in Java code, while Abedeen et al. [16] answer the following research questions: proposed a set of metrics to assess modularity principles for packages in large legacy systems (namely informa- 1. Are there any existing associations between the tion hiding, changeability and reusability principles) [19]. package structure and textual factors within the Coming up to twenty years ago, Hautus [12] proposed package?. The textual factors in question include a tool to run a package structure analysis through Java artefact names e.g., class, method and variable code and highlight potential weak areas to the human names. with an aim to refactor the source code. 2. What type of names and at what granularity tend Yet, there is still no standard and unique definition to have more weight on influencing packaging of relevant and/or similar classes and developers might structure? consider different criteria to insert two or more classes into a package. The latter becomes problematically ev- By answering the above two research questions, we ident when analysing code in the wild. Furthermore, try to discover the level of textual cohesion among com- packages typically appear in software architecture doc- ponents of each package to understand if there is any umentation as not-dividable components of package di- textual packaging structure in the project or not, and if agrams, drilling down within packages and investigate so, what type of artefact name has a heavier role. their relevancy validity within, has an added value. It is exactly this gap that is the focus of this research. 2. Background Indeed, the research reported in this paper represents the initial steps into identifying relevancy (through similar- In large programs, it is difficult to have an architecture for ity) factors within packages (or architecture components) a software system that conforms to the system’s packag- with a long term view of building developer-friendly ing structure. Object-oriented software, has an inherent architecture conformance protocols so as to prevent ar- affinity for structure such as packages as one of its appeal- chitectural erosion. ing promises. Albeit, that affinity does not necessarily automatically translate to a structure that is relevant to the architecture of the system. This issue has seeded 3. Experiment Design research into improving the packaging structure of the In this work, six open-source Java projects are under software system. Shaw et al. [10] propose that the po- study and their details are represented in Table 1. tential existing problem with reusing components of a The experiment tries to find whether there are factors software system is not necessarily due to the bad architec- that can define the relation within the members of each tural design but the packaging strategy as well. Shaw et package or not. It is worth noting that the factors are al., in a different work [7], emphasise the importance of mostly textual factors (e.g., artefact names) and not the a packaging strategy to enforce compatible components functional factors (e.g., the functionality of the artifacts) to be located in the same package. unless the functionality of the artifacts is reflected in The quality of the software architecture depends on their names. The details of these factors are discussed in several factors; one of which is the applied packaging Section 3.2. strategy [8, 9, 10, 7]. The packaging strategy refers to the The logic of the experiment is as follows: criteria that is used to combine components in packages. One of the first empirical studies to investigate the 1. All classes are put in a pool without considering structure of written code [11], relied on static and dy- the package structure (the left bottom rectangle namic analysis (of FORTRAN code) and looked at it at in Figure 1). a statement level. Existing research tends to look at im- 2. Pairwise similarity between every pair of classes proving existing package design, such as through pack- is calculated based on some similarity factor (see age structure analysis [12], using package cohesion to Section 3.2). assess organization and reusability of code [13, 14], or 3. A clustering technique is applied on the members using artificial intelligence algorithms or multi-objective of the class pool and 𝑃 clusters are generated (𝑃 approaches based on remodularization objectives [15]. is the number of packages in the project). Table 1 Six Open-Source Java projects. Name #classes #packages Details JHotDraw 730 64 Visualization tool, MIT license. Galaxy 39 17 Galaxy Artifacts is an opensource and freeware 4x game, written in Java. JavaFX 38 9 JavaFX is a cross platform GUI toolkit, MIT licence. JavaParserCore 516 29 Java parser tool, LGPL license JavaParserSymbol 167 21 Symbol solver tool, Apache License. Jung 227 14 Visualization tool, open source, Jung licence. Figure 1: The high-level picture of the proposed model for comparison. 4. The clustering result (the right bottom rectangle (i.e., Class name, Method name and Variable/Field name). in Figure 1) is compared to the package struc- Each name will be converted to some simple-names af- ture of the project (the top rectangle in Figure 1), ter the pre-processing. The following list indicates the where each package can be seen as an existing required actions for pre-processing. cluster. • Camel Case removal. E.g., StudentGrade → Stu- Figure 1 shows the general flowchart of the experiment. dent Grade (StudentGrade as a name is converted All the original packages in the system are also seen as to two simple-names: Student and Grade) a cluster of classes and the objective is to compare the • Snake Case removal. E.g., Employee_tax → Em- existing packaging to the one arrived at by the proposed ployee tax clustering algorithm. • Digits removal. E.g., distance100km → distance km, salary100k → salary (Note, words with one 3.1. Comparison Analysis character are ignored). • All lower case. PensionCalculator → pension All the experiments in this work are at source-code level calculator and focus on three different types of artefact names: 1) Class names, 2) Method names and 3) Variable/Field names. The comparisons are based on textual/term com- 3.2. Comparison Factors parison. Therefore, a simple pre-processing step is re- As mentioned earlier, the pool of classes is grouped via a quired prior to the actual comparison on each name clustering algorithm. Clustering algorithms work based on a similarity or dissimilarity matrix where the similar- ity/dissimilarity between every pair of entities (classes in this case) is known. Therefore, a similarity needs to be defined between every two classes. Each Java class has several different features and characteristics such as the class name, the method names within the class, field names and many more. In this work, we make use of nine different features of each class and use them as similarity factors for the clustering algorithm. The nine different factors that are examined are as follows: • Class Names. Two classes are compared accord- ing to their names, (CN). • Outgoing Methods. Two classes are compared according to their outgoing method names, (OM). • Incoming Methods. Two classes are compared Figure 2: Two packages with their classes. according to their incoming method names, (IM). • Field Declaration. Two classes are compared ac- cording to their declared fields names, (FD). 3.2.2. Outgoing Methods • Variable Accessed. Two classes are compared ac- cording to their accessed variables’ names, (AV). Outgoing Methods (OM) is the second factor that is con- • Outgoing Class. Two classes are compared ac- sidered to measure the similarity between two classes. cording to the class names where they were in- For class A, all the methods that are called from class stantiated, (OC). A in the project are collected and their names are pre- processed so a set of simple names is generated for class • Incoming Class. Two classes are compared ac- A. A similar process is repeated for Class B. cording to their instantiated class names in them, Figure 3 shows two classes with their meth- (IC). ods and the callee (outgoing) methods inside of • Class Methods Names. Two classes are compared them. The set of simple names that can be ex- according to their method names, (CM). tracted for Class A based on their callee methods is • Supper Class Names. Two classes are compared {𝑔𝑟𝑒𝑒𝑛, 𝑐𝑖𝑟𝑐𝑙𝑒, 𝑎𝑟𝑒𝑎} and the set of simple names for according to their supper class names, (SC). Class B is {𝑔𝑟𝑒𝑒𝑛, 𝑠𝑢𝑟𝑓 𝑎𝑐𝑒, 𝑏𝑙𝑎𝑐𝑘, 𝑤ℎ𝑖𝑡𝑒}. There is one Each Java program is analysed and nine different types simple term common within these two sets, therefore, a of information (mentioned earlier) are extracted. In or- degree of similarity exists within Class A and Class B. der to extract the details from the Java projects, a Java parser is employed. Among different choices of parsers, JavaParser [20] was selected due to its simplicity in im- plementation and high reputation. 3.2.1. Class Names Class Names (CN) is the first factor that is used for com- parison. Two classes are said to be similar if their names are similar or in other words, if they share some simple- terms. Figure 2 has two packages and each package has two classes. A set of simple-terms is generated for each class in the project: Figure 3: Two Classes with their callee methods. 1. 𝐶𝑖𝑟𝑐𝑙𝑒_𝑎𝑟𝑒𝑎 class: {𝑐𝑖𝑟𝑐𝑙𝑒, 𝑎𝑟𝑒𝑎}. 2. 𝐷𝑟𝑎𝑤𝐶𝑖𝑟𝑐𝑙𝑒 class: {𝑑𝑟𝑎𝑤, 𝑐𝑖𝑟𝑐𝑙𝑒}. 3. 𝐶𝑜𝑙𝑜𝑟𝑠 class: {𝑐𝑜𝑙𝑜𝑟𝑠}. 3.2.3. Incoming Methods 4. 𝑆𝑢𝑟𝑓 𝑎𝑐𝑒𝐶𝑖𝑟𝑐𝑙𝑒 class: {𝑠𝑢𝑟𝑓 𝑎𝑐𝑒, 𝑐𝑖𝑟𝑐𝑙𝑒}. Incoming Methods (IM) is the other selected factor to The first class has a degree of similarity with the sec- measure the similarity between two classes. This factor, ond class and the fourth class as they share 𝑐𝑖𝑟𝑐𝑙𝑒. Like- similar to the last one, works based on the method calls. wise, the second class and fourth class have a degree of Two classes are said to be similar if their contained meth- similarity while the third class is not similar to any class. ods are called by methods with similar name (common simple terms). 3.2.4. Field Declarations Field Declaration (FD) is another selected factor and it measures the similarity between classes based on de- clared fields within the class. Therefore, two classes with similarly declared field names are considered similar. Fig- ure 4 shows two classes with their declared fields. Class A Figure 5: Two classes with their details. has the following set of simple-terms extracted from its declared fields {𝑐𝑖𝑟𝑐𝑙𝑒, 𝑐𝑜𝑙𝑜𝑟, 𝑓 𝑢𝑙𝑙, 𝑎𝑟𝑒𝑎} and Class B has the following: {𝑠𝑢𝑟𝑓 𝑎𝑐𝑒, 𝑠ℎ𝑎𝑝𝑒, 𝑐𝑜𝑙𝑜𝑟}. Therefore, 3.2.7. Incoming Classes Class A and B are similar due to the existing of 𝑐𝑜𝑙𝑜𝑟 in both sets of simple terms. Incoming class (IC) is another notion we use in this ex- periment as a similarity factor. Class A is considered as an incoming class for Class B if Class B is instantiated in Class A. In Figure 6 DrawCircle is the incoming class for PaintSurface class. Two classes are said to be similar if their classes are instantiated with the same class or classes with similar names. Figure 4: Two classes with their declared fields. Figure 6: Two classes, one instantiates the other one. 3.2.5. Accessed Variables Variable Accessed (AV) is the other factor we use to mea- sure the class similarities. Two classes are considered 3.2.8. Class Methods similar if they are accessing variables/fields with similar names. The other employed factor in this work is method name (CM). Two classes are considered similar if they have 3.2.6. Outgoing Classes methods with similar names. Figure 7 shows two classes with their contained methods. Class A has the follow- The next factor to measure the package similarity is Out- ing set of simple-names extracted from method names: going Class names (OC). The characterization of being {𝑝𝑎𝑖𝑛𝑡, 𝑠𝑢𝑟𝑓 𝑎𝑐𝑒, 𝑔𝑒𝑡, 𝑐𝑜𝑙𝑜𝑟} and class B has the follow- an Outgoing Class is a subjective role for a class. Having ing set: {𝑐𝑜𝑙𝑜𝑟, 𝑐𝑖𝑟𝑐𝑙𝑒, 𝑜𝑣𝑎𝑙, 𝑔𝑟𝑒𝑒𝑛, 𝑙𝑎𝑟𝑔𝑒}. Since there two classes (Class A and Class B), Class B is said to be is one term common in both sets, therefore, Class A and an outgoing class for Class A, if Class B is instantiated B are similar with some degree. in Class A. Figure 5 shows two classes, each class has two methods and each method instantiates another class. 3.2.9. Supper Classes The name of the instantiated classes for each class are extracted, pre-processed and compared. Class A contains Classes are also compared by their supper classes. For the following set of simple terms extracted from instanti- each class, all the super class names are collected, pre- ated classes: {𝑙𝑎𝑟𝑔𝑒, 𝑐𝑖𝑟𝑐𝑙𝑒, 𝑜𝑣𝑎𝑙} and the set associated processed and a set of simple-terms is generated. Similar with Class B is: {𝑙𝑎𝑟𝑔𝑒, 𝑐𝑖𝑟𝑐𝑙𝑒, 𝑜𝑣𝑎𝑙, 𝑔𝑟𝑒𝑒𝑛, 𝑐𝑜𝑙𝑜𝑟}. As to other similarity factors, the common simple-terms shown in Figure 5, three simple terms are common within for each pair of classes is an indication of the degree of these two sets: {𝑙𝑎𝑟𝑔𝑒, 𝑐𝑖𝑟𝑐𝑙𝑒, 𝑜𝑣𝑎𝑙}. Therefore, Class similarity. In Figure 8, there are two classes with some A and B are similar with some degree. More details of super classes for each. The set of simple-terms for A is: how the degree of similarities is taken into account for {𝑠ℎ𝑎𝑝𝑒, 𝑔𝑒𝑜𝑚𝑒𝑡𝑟𝑦} and for B is: {𝑠𝑞𝑢𝑎𝑟𝑒}. Since there comparison, will be discussed in later sections. is no common simple-term in these two sets, there is also and is more suitable for a set of individuals where connec- tivity relations (e.g., similarity between two individuals) can be defined between them. Unlike other clustering algorithms (e.g., K-Means), Spectral Clustering, requires the relations/similarity between individuals to be com- puted as a matrix in advance and eigendecomposition can be applied on that matrix. Therefore, Spectral Clus- tering was found a good fit to be the clustering algorithm in this work. As seen in the previous section, each class gets a set of simple-terms (based on the applied similar- Figure 7: Two classes with their contained classes and meth- ity factors); the number of common simple-terms among ods. two sets from two classes is considered as the measure of relations/similarity between two classes. Therefore, a 𝑁 × 𝑁 (𝑁 is the number of classes) similarity matrix no similarity between these two classes based on this should be created for Spectral clustering. factor. Spectral Clustering, like most other clustering algo- rithms, requires to know the number of clusters/groups in advance. As shown in Figure 1, the number of clusters for the applied clustering algorithm is 𝑃 where 𝑃 is the number of existing packages in the project. Once Spectral Clustering returns 𝑃 clusters of similar classes (based on a given similarity factor), one can compare those 𝑃 clusters against the existing 𝑃 packages in the system. 4.2. Clustering vs Packaging Figure 8: Two classes with their supper classes. The objective is to analyse the individual similarity fac- tors and see how much each of them conform the pack- aging structure. To do this, the clustering that is resulted from each factor needs to be compared against the pack- 4. Clustering aging structure. Since the clustering is done on 𝑃 clusters Clustering is a task of splitting individuals into a number (𝑃 is the number of packages in the project), therefore, of groups or clusters where the members of a cluster there are two sets of groups where each set contains 𝑃 are more similar to other members of the same cluster number of groups of classes. In order to measure the than the members of other clusters. In this experiment, similarities between two sets of groups, we make use classes are considered as individuals, therefore, classes of Normalized Mutual Information [25] technique from with more similarity would be clustered in one group. As SKlearn in Python. Normalized Mutual Information mea- mentioned in previous section, there are nine different sures the similarity between two clusterings [26] and similarity factors considered in this work, therefore, for returns a value between 0 to 1. Given two clusterings by each Java project, clustering runs nine times, each time two different techniques, Normalized Mutual Information with a different factor. The goal is to measure how much specifies how much these two clustering are correlated. a given factor, as a similarity criteria, conforms to the Figure 9 shows two clusterings where each clustering has existing packaging in the system. three clusters with their members. As it is shown, there Clustering is an unsupervised learning technique that are some differences between the results of these two has applications in many different fields and domains. techniques. For instance, the first cluster of PS contains K-Means [21], Affinity propagation [22], DBSCAN [23] 𝑎1 , 𝑎3 and 𝑎3 and the first cluster of CR contains 𝑎1 , 𝑎3 and Spectral Clustering [24] are a number of clustering and 𝑧1 . The degree of similarity between the results of algorithms and the choice of algorithm depends on the these two techniques by Normalized Mutual Information nature of the data. is 0.2804. 4.1. Spectral Clustering 5. Evaluation In this experiment, we employ Spectral Clustering [24] to In this experiment, six different Open Source Java project cluster the pool of classes (see Figure 1). Spectral Cluster- are analysed (see Table 1). For each project, nine differ- ing algorithm is based on eigendecomposition calculation Table 2 Accumulated Factors. Name Similarity JHotDraw 0.364 Galaxy 0.818 JavaFX 0.4562 JavaParserCore 0.622 JavaParserSymbol 0.58140 Jung 0.387 Figure 9: Two different clusterings. ent similarity factors are separately employed to apply a clustering technique and compared against the pack- aging structure in the system. The nine factors are fully described in section 3.2. 5.1. Results and Discussion In total, there are 54 + 6 experiments performed. The first Figure 10: The percentage of each similarity between the ap- 54 experiments are for 6 projects and for each project 9 plied clustering technique and the packaging structure using individual similarity factors are tested. We run an extra 9 similarity factors. experiment for each project where the similarity factor is the accumulative of all the 9 individual factors. Figures 10 to 15 show the percentage similarity be- tween the applied clustering technique (Spectral Cluster- ing) and the packaging structure. The very first observation from all the results indicates the association between class names and packaging. Ex- cept for the Java Parser Core project, the class name has the highest impact on the packaging. Even for Java Parser Core, the class name comes in second-highest score. The other observation that can be realized from all diagrams is the association between supper class names and pack- aging. Except JavaFX project and the Galaxy project, supper class names are the second ‘winners’. Method names for one project (JavaFX ) have a higher associa- Figure 11: The percentage of each similarity between the ap- tion with the packaging compared to other projects. On plied clustering technique and the packaging structure using the other hand, class instantiation (incoming and outgo- 9 similarity factors. ing classes) on average has smaller association with the packaging. As mentioned earlier, six extra experiments are per- 6. Conclusion formed to see the impact of overall similarity factors when they are accumulated all together. Table 2 depicts In this work, we presented a comparative analysis on the results for each individual project. On average the six different Java Projects to discover the applied pack- Galaxy project has a strong naming association with the aging strategy from textual and naming point of view. packaging followed by Java Parser Core and Java Parser Our findings (see Table 3) illustrate that there is a tex- Symbol. tual similarities among components at each package to some extend (the first research question). On average, the textual similarity is stronger when class names are cho- sen as a similarity factor (the second research question). Table 3 Details of all experiments for 6 subject systems. Green indicates the applied factor that shows the highest similarity between the packaging structure and the clustering technique and orange indicates the second highest and blue indicates the third highest. Class Method Field Variable Super In Out In Out Name Name Name Accessed Class Class Class Method Method Name Name Name Name Name Name JHotDraw 0.6264 0.4674 0.454 0.3415 0.4989 0.4636 0.3658 0.312 0.3638 JavaFX 0.449 0.392 0.4139 0.446 0.357 0.3286 0.368 0.425 0.453 J-P Core 0.6159 0.332 0.392 0.3937 0.635 0.455 0.336 0.393 0.3977 J-P Symbol 0.6783 0.552 0.423 0.4392 0.534 0.3219 0.361 0.452 0.37 Jung 0.458 0.313 0.243 0.254 0.331 0.24 0.162 0.226 0.189 Galaxy 0.803 0.755 0.748 0.7518 0.734 0.713 0.6913 0.689 0.714 Average 0.6051 0.468567 0.44565 0.4377 0.514983 0.42035 0.380683 0.416167 0.414583 Figure 12: The percentage of each similarity between the ap- Figure 14: The percentage of each similarity between the ap- plied clustering technique and the packaging structure using plied clustering technique and the packaging structure using 9 similarity factors. 9 similarity factors. Figure 13: The percentage of each similarity between the ap- Figure 15: The percentage of each similarity between the ap- plied clustering technique and the packaging structure using plied clustering technique and the packaging structure using 9 similarity factors. 9 similarity factors. The second factor, after class names, that shows strong packages. Method names, as the third strong factor, on similarities among packages’ components is, on average, average show relatively high similarity among the pack- the super class name. This also indicates that most in- ages’ components. heritances are within the packages that is potentially an Although we can confirm that there are a couple of pat- indication for low cohesion and high decoupling between terns common in all projects (similarity of class names), still almost every project behaves differently. This can [9] R. C. Martin, J. Grenning, S. Brown, Clean architec- be further confirmed by looking at the results in Table 2 ture: a craftsman’s guide to software structure and where each project shows a different aggregated degree design, Prentice Hall, 2018. of similarity packaging ranging from 0.36 to 0.81. [10] M. Shaw, Architectural issues in software reuse: Looking from another angle, since class names score It’s not just the functionality, it’s the packaging, in: high in terms of similarity factors among the contents in Proceedings of the 1995 Symposium on Software a package, they can potentially be used to validate the rel- reusability, 1995, pp. 3–6. evancy within a package or other architectural construct. [11] D. E. Knuth, An empirical study of fortran pro- This claim, however, requires more experimentation on grams, Software: Practice and experience 1 (1971) a larger number of subject systems. 105–133. This research is only based on the artifact (class, [12] E. Hautus, Improving java software through pack- method and variables) names, therefore, the role of the age structure analysis, in: IASTED International developers’ naming style plays an important role in the Conference Software Engineering and Applications, results. 2002, pp. 1–5. In future work, we plan to include other similarity [13] V. Gupta, J. K. Chhabra, Package coupling mea- factors such as factors that define the functionality of surement in object-oriented software, Journal of the artefacts. This, with a long term objective of using computer science and technology 24 (2009) 273– these ’naturally’ occurring similarities in the packaging 283. of software in the ‘wild’ to build developer-friendly ar- [14] P. J. Kaur, S. Kaushal, A. K. Sangaiah, F. Piccialli, A chitecture conformance protocols which help prevent framework for assessing reusability using package architectural erosion. cohesion measure in aspect oriented systems, Inter- national Journal of Parallel Programming 46 (2018) 543–564. References [15] A. Prajapati, J. K. Chhabra, Madhs: Many-objective discrete harmony search to improve existing pack- [1] M.-A. Storey, C. Best, J. Michand, Shrimp views: age design, Computational Intelligence 35 (2019) An interactive environment for exploring java pro- 98–123. grams, in: Proceedings 9th International Workshop [16] H. Abdeen, S. Ducasse, H. Sahraoui, I. Alloui, Au- on Program Comprehension. IWPC 2001, IEEE, tomatic package coupling and cycle minimization, 2001, pp. 111–112. in: 2009 16th Working Conference on Reverse En- [2] M. Shaw, R. DeLine, D. V. Klein, T. L. Ross, D. M. gineering, IEEE, 2009, pp. 103–112. Young, G. Zelesnik, Abstractions for software archi- [17] J. Al Dallal, Identifying refactoring opportunities tecture and tools to support them, IEEE transactions in object-oriented code: A systematic literature re- on software engineering 21 (1995) 314–335. view, Information and software Technology 58 [3] J. Veit, Modules, Components, and Elements – Soft- (2015) 231–249. ware Architecture Terms explained (2021). URL: [18] G. Baxter, M. Frean, J. Noble, M. Rickerby, H. Smith, https://dev.to/jessica_veit/modules-componen M. Visser, H. Melton, E. Tempero, Understand- ts-and-elements-software-architecture-terms-ex ing the shape of java software, in: Proceedings plained-g59. of the 21st annual ACM SIGPLAN conference on [4] Tutisani, Modular Software Architecture - Tutisani Object-oriented programming systems, languages, Consulting, 2021. URL: https://www.tutisani.com/s and applications, 2006, pp. 397–412. oftware-architecture/modular-software-architec [19] H. Abdeen, S. Ducasse, H. Sahraoui, Modulariza- ture.html. tion metrics: Assessing package organization in [5] J. T. Taylor, W. T. Taylor, Software architecture, in: legacy large object-oriented software, in: 2011 18th Patterns in the Machine, Springer, 2021, pp. 63–82. Working Conference on Reverse Engineering, IEEE, [6] S. A. Ebad, M. Ahmed, Investigating the effect of 2011, pp. 394–398. software packaging on modular structure stabil- [20] JavaParser.org, JavaParser - Home, 2021. URL: "htt ity, Computer Systems Science and Engineering 34 ps://javaparser.org". (2019) 283–296. [21] J. A. Hartigan, M. A. Wong, Ak-means clustering [7] M. Shaw, D. Garlan, Formulations and formalisms algorithm, Journal of the Royal Statistical Society: in software architecture, in: Computer Science Series C (Applied Statistics) 28 (1979) 100–108. Today, Springer, 1995, pp. 307–323. [22] K. Wang, J. Zhang, D. Li, X. Zhang, T. Guo, Adap- [8] Vasiliy, 5 Most Popular Package Structures for Soft- tive affinity propagation clustering, arXiv preprint ware Projects, 2020. URL: https://www.techyourch arXiv:0805.1096 (2008). ance.com/popular-package-structures/. [23] K. Khan, S. U. Rehman, K. Aziz, S. Fong, S. Saras- vady, Dbscan: Past, present and future, in: The fifth international conference on the applications of dig- ital information and web technologies (ICADIWT 2014), IEEE, 2014, pp. 232–238. [24] J. Liu, J. Han, Spectral clustering, in: Data Cluster- ing, Chapman and Hall/CRC, 2018, pp. 177–200. [25] R. Koopman, S. Wang, Mutual information based labelling and comparing clusters, Scientometrics 111 (2017) 1157–1167. [26] A. F. McDaid, D. Greene, N. Hurley, Normal- ized mutual information to evaluate overlapping community finding algorithms, arXiv preprint arXiv:1110.2515 (2011).