1. Introduction

M. Ericsson);

Preliminary Study on the Use of Keywords for Source Code to Architecture Mappings

Tobias Olsson

tobias.olsson@lnu.se 0 1 2

Morgan Ericsson

morgan.ericsson@lnu.se 0 1 2

Anna Wingkvist

anna.wingkvist@lnu.se 0 1 2 0 Department of Computer Science and Media Technology, Linnaeus University , Kalmar/Växjö , Sweden 1 Orphan Adoption , Software Architecture, Source Code Clustering, Naive Bayes 2 Workshop Proce dings

000 0 0002

We implement an automatic mapper that can find the corresponding architectural module for a source code file. The mapper is based on multinomial naive Bayes, and it is trained using custom keywords for each architectural module. For prediction, the mapper uses the path and file name of source code elements. We find that the needed keywords often match the module names, but also that ambiguities and discrepancies exist. We evaluate the mapper using nine open-source systems and find that the mapper can successfully create a mapping with perfect precision, but in most cases, it cannot cover all source code elements. Other techniques can, however, use the mapping as a foothold and create further mappings.

1. Introduction

The modular software architecture captures major design decisions regarding reuse, maintainability, changeability, and portability [ 1 ]. During system evolution, the source code must conform to the architecture, or the system risks accumulating technical debt and finally lose the desired qualities.

Static Architecture Conformance Checking (SACC) methods, such as Reflexion modeling [ 2 ], statically analyze source code to ensure that it does not introduce architectural violations [ 3, 4 ]. These methods require an architecture model, with modules and dependencies, and a source code model, with entities (e.g., source code files) and concrete dependencies (e.g., due to inheritance or method invocations). They also require a mapping from SACC has not reached widespread use in the software industry [ 1, 3, 5, 6 ]. The necessary tools and methods for using SACC exist. However, practitioners perceive the mapping from source code to architectural modules as a significant hindrance; it is often outdated or nonexistent. Many tools address this by combining manual mapping and regular expressions to filter file, module, and package names. Still, such are considered to be both time-consuming and error-prone [ 3, 5, 6, 7 ].

CEUR

Workshop Proceedings (CEUR-WS.org) manual efort needed to create a mapping by using information available in the source code and intended modular architecture. For example, dependencies between source code entities can be used to create a mapping. A problem with current automatic techniques is that they require an initial set of mapped entities that the technique infers the automatic mappings from. Depending on the technique and system to be mapped, an initial set needs to consists of approximately 15-20% of the entities before reaching acceptable performance. In our experience, the physical structure of files on disk is often in part or wholly reflected in the intended modular architecture. Efective use of this information can present an attractive option to create an initial set. However, structure and naming are not always mapped one to one to a module, and there are discrepancies, ambiguities, or simply missing terms

We investigate how well a multinomial naive Bayes classifier trained using simple keywords derived from ate an initial set. We pose the following questions: 1. Can the mapper construct an initial set based on a simple set of keywords for each module? 2. How well does this initial set perform if used in combination with mapping based on dependencies? 3. How well does the above combination perform compared to the NBAttract (with a random initial set) and InMap approaches?

We evaluate the mapper using nine open-source systems with known mappings to a specified modular architecture and find that the keywords are often the same as the module names, but more and diferent keywords are needed in some cases. After the initial set is created, we run another automatic mapper that can map any remaining entities. We compare the results with a traditional automatic mapping technique [8] and an interactive mapping technique [ 7 ]. We find that the keywordsbased approach can, in some cases, provide a complete mapping and that the keywords-based approach plus the automatic mapping approach performs very well.

2. Background and Related Work

Tzerpos and Holt describe the general problem of mapping (or remapping) a source code entity to an architectural module [9]. They collectively call both the mapping and remapping of an entity the orphan adoption problem. They find four major criteria for solving the problem: naming, structure, style, and semantics and device an algorithm that they evaluate in three case studies [9]. Tzerpos and Holt regard the naming criteria as the first option to use in an orphan adoption scenario and suggest using per system regular expressions to determine a mapping. However, they also mention that naming criteria is not enough as they may be lacking or that naming standard is not always adhered to by developers.

Garcia et al. discuss the use of package and naming information in software architecture recovery [ 10]. In general, they found that their ground truth components often spanned or shared several packages. They could not find a correlation between components and single package or directory names. One of their four cases presented a reasonably good correlation, and in one system, they could find a repeating pattern of directories. The ground truth architectures recovered in their study are possibly at a lower level than the modular architectures we study. Still, there is likely variation in what dimension or view of an architecture is expressed in the package structure. This variation is further supported by Buckley et al., where one out of five studied systems did not have any clear correlation between packages and modules. This presented dificulties and significant efort when performing manual mapping [11].

Anquetil and Lethbridge, on the other hand, propose a method for architecture recovery of legacy systems using filenames [ 12]. Their approach focuses on the assumptions that files have short names with many abbreviations and are placed in a single directory. This is due to their focus on recovering legacy systems. Nevertheless, they present some interesting findings. First, they identify several forces that shape a filename, i.e., what influences it. There seem to be several examples of such forces also in more modern implementations, e.g., from the subject system Ant, we find the feature implemented (ant.taskdefs.SendEmail), the algorithms or steps of algorithms (ant.types.resources.Sort), or data processed (ant.taskdefs.email.Header ), as suggested in [12]. Much of the approach revolves around the problematic abbrevia1–10 tions found in the relatively short filenames. While this is not a technical problem in modern development, the use of abbreviations is still common practice. For example, one of the subject systems, ArgoUml, defines a module reverseEngineering, and the corresponding directory mapping is the abbreviation reveng. Finally, Anquetil and Lethbridge successfully use filenames to create a clustering that corresponds well to an expert’s view of a system. 2.1. Semi-Automatic Mapping Christl et al. introduced the Human Guided clustering Method (HuGMe), an approach to semi-automatic mapping of source code entities to modules of the intended architecture [9]. It is an iterative approach that, at its core, uses an attraction function to compute the attraction between a source code entity and a module. If the attraction is considered valid, an automatic mapping is made; if not, the attractions can be used as a suggestion for a human user. Two attraction functions based on dependencies are presented, CountAttract and MQAttract [ 13, 6 ].

Bittencourt et al. present two new attraction functions based on information retrieval techniques [ 5 ]. They use semantic information in the source code, including module- and filenames. The attractions are calculated based on cosine similarity (IRAttract) and latent semantic indexing (LSIAttract). They make a quantitative comparison between the performance of their attraction functions with CountAttract and MQAttract in an evolutionary setting (where a few new files are to be assigned a mapping). They find that combining attraction functions (e.g., if CountAttract fails, try IRAttract) performs best. They ifnd that CountAttract usually misplaces entities on module borders. MQAttract performs better when mapping entities with dependencies to many diferent modules. IRAttract and LSIAttract perform better when mapping entities in libraries or entities on module borders, but worse if there are modules that share vocabulary but are not related [ 5 ].

We have created an attraction function that uses machine learning techniques and introduced the Concrete Dependency Abstraction (CDA) method [8]. In short, CDA produces textual representations of dependencies at the level of architectural modules and lets a machine learning technique learn the patterns of dependencies from the actual source code and combine these with information retrieval techniques. We implement this approach using naive Bayes as an attraction function for the HuGMe method, NBAttract. We have compared the automatic mapping performance of CountAttract, IRAttract, LSIAttract and NBAttract over several systems using s4rdm3x, our open-source tool suite for automatic mapping experiments [8, 14].

The main limitations for the techniques that build on HuGMe are the need for an initial set and, in some cases,

3. Keywords and File-Based Mapping

low-quality mappings. The initial set needs to be manually created and be of good quality for the attraction functions to perform well. We estimate that a randomly composed initial set needs to include approximately 1520% of the source code entities. Based on this, we conclude that creating the initial set is likely a significant efort. Automated techniques will probably not result in a perfect mapping except when they use a large initial set and only map a few entities. In the best of cases, the automated technique leaves hard to map instances to the user (creating more manual work), but misclassifications are problematic. There has not been much research in the manual mapping steps of HuGMe except for the original studies [ 13, 6 ]. Handling of misclassification and manual support in these methods are still open issues.

File naming and structure seem to reflect the intended modular architectures we have studied quite well. For example, module names tend to map to the directory structure of the source code. However, the naming is often not perfect. In some cases, module names are not used, or shorter or slightly diferent terms are used. In other cases, several module names exist in the structure or naming of a file. A simplistic approach is thus not appropriate. Instead, the file naming patterns need to be fully defined, e.g., using regular expressions or a heuristic.

For regular expressions to work, there is often a need to maintain several expressions that can be conflicting and overlapping. A more attractive option would be to use 2.2. Interactive Mapping machine learning and train a classifier using a good set of keywords. The classifier’s task is to produce a good Sinkala and Herold present InMap, which is not an auto- enough initial set. An automatic mapping technique can mated approach to mapping per se, but instead suggest then use this initial set for further mappings. mappings to the end-user, who can then choose to ac- In this work, we implement a proof of concept mapcept the suggested mapping (or not) [ 7 ]. It is an iterative per using a multinomial naive Bayes classifier. It is a approach that iteratively presents a suggested mapping simple, probabilistic approach that uses word frequenfor a fixed number of entities. The end-user chooses to cies to compute the probability of each class. While it accept or reject the suggestions. InMap uses the accepted is conceptually simple, naive Bayes often produce good mappings to improve the suggested mappings further in results, especially if the training data is small. As the the next iteration. It also uses the negative evidence of goal is to create a good enough mapping using a small a rejected mapping and does not suggest this mapping set of predefined keywords, naive Bayes is thus a good again. InMap produces the suggested mappings similar candidate for a proof of concept study. to Bittencourt et al., with the addition of a descriptive We base our implementation on the Weka library [15] text for each architectural module. InMap also includes and train the classifier using the custom keywords for the path and filename used in the Java class and package each module. Note that the same keyword can be specnames. It treats the source code entities as a database of ified multiple times, increasing the importance of that documents and uses Lucene to search this database using particular keyword. module information as a query. Sinkala and Herold eval- We derive the prediction data from the path of each uate InMap using six open source systems. For the best source code entity, including the filename. The filename combination (in terms of highest F1 score) of informa- is split into words based on common camel-, kebab, and tion, InMap can suggest mappings for most of a system’s snake-case rules. In addition, we value later parts of the entities with a mean recall of 0.95, a mean precision of path more and add these words multiple times. Intuitively 0.84, and a mean F1 score of 0.89. allowing for a deeper nested folder mapping to ”override”

The main limitations of InMap are its highly interactive a higher level mapping. For example, the file: nature and that architectural documentation needs to exist for every module. The documentation provided needs net/sf/jabref/logic/util/io/FileHistory.java to be of good quality, i.e., as short as possible but containing good keywords. Noisy documentation will likely will produce the following words: not help in producing high-precision suggestions. The interactiveness of InMap is in some way double-edged; net sf jabref logic util io filehistory file history sf jabref the technique often seems to require more interaction logic util io jabref logic util io logic util io util io io (accepting or rejecting a suggested mapping) than there are entities in the source code. On the other hand, if not minor mapping errors can be tolerated, a mapping validation is needed anyway.

Note the six occurrences of io reflecting the nesting depth of the word in the path.

To generate a useful initial set, it is more important that the mappings are precise rather than complete. There needs to be a high diference between the best mapping probability and the second best. By trial and error, we found a factor of 1.99 to work well, i.e., the highest prob- 2016, 2017, and 2019 respectively. A system expert has ability needs to be 1.99 times higher than the second- provided both the architecture and the mapping for these highest probability for mapping to occur. systems. The architecture documentation and mappings

We have implemented the mapper described above in are available in the SAEroCon repository9. ArgoUML, our open-source tool suite s4rdm3x [16]. Ant, and Lucene has been previously studied [17, 18], and the architectures and mappings were extracted from the replication package of Brunet et al. [17]. K9 has been 4. Method preliminary mapped by ourselves based on architecture documentation provided in [19]10. We have not validated this mapping with system experts but include it since it is an interesting case with a more complex file structure.

We use nine open-source systems where the ground truth mappings are known. We create a keyword set for each module based on the ground truth mappings. We make sure that these keywords will successfully map at least some entities to each module. 5. Results and Analysis

After we have determined the keywords, we run our keywords-based mapper and create an initial set. This We use the existing ground truth mappings to construct initial set is then used as the input to another mapper, a set of keywords for each system. Table 1 shows the NBAttract, which also uses multinomial naive Bayes but manually extracted keywords. Note that a single keyinstead forms training- and prediction words using de- word is suficient in many cases, and many keywords pendency information in the form of concrete depen- are the same as or some variation of the module name. dency abstractions (CDA) [8]. We compare the perfor- K9 presents an interesting exception where several keymance to NBAttract with a random initial set. In this words are needed. We relied on a high-level architectural configuration, we use file information (not including the description when creating the mapping for K9, where module keywords) and CDA. In addition, we compare to allowed dependencies were the most clearly defined. The the interactive approach InMap [ 7 ]. keywords used reflect the sub-modules of the high-level

We collect precision, recall, and combined F1 scores modules. Note that our mapping has not been validated for each approach. When a random initial set is used, by systems experts. several sets of diferent sizes and compositions are needed Using the generated initial sets, we ran the NBAttract to cover a large range of combinations. We will present mapper with CDA information only. We ran 1530 experithe performance metrics numerically and visually as the ments with random initial sets for the NBAttract mapper efect of the initial set size is essential. where the mapper used filename and CDA information

We use nine open-source systems implemented in Java. (no module keywords). Finally, we use the best-reported Ant1 is an API and command-line tool for process au- performance metrics from [ 7 ]. Table 2 shows the compartomation. ArgoUML2 is a desktop application for UML ison of the four approaches. Using the keywords-based modeling. Jabref3 is a desktop application for managing mapping, we can create an initial set with perfect precibibliographical references. K94 is an open-source email sion and recall in Commons Imaging, ProM, and Sweet client for Android. Lucene5 is an indexing and search Home 3D. The keywords for these systems are straightlibrary. ProM6 is an extensible framework that supports forward and are often directly reflected in the module a variety of process mining techniques. Note that we name. For the other systems, keywords can generate use the ProM framework and not the full ProM system. an initial set with perfect precision. However, recall is Sweet Home 3D7 is an interior design application. Team- sufering.

Mates8 is a web application for handling student peer Using the keywords-based initial sets and NBAttract reviews and feedback. using CDA performs very well, with precision scores

A documented software architecture and a mapping over 0.95 in all cases and almost perfect scores for recall, from the implementation to this architecture exist for cf. Table 2). each system. Jabref, TeamMates, and ProM have been Figures 1, 2, and 3 shows the running median F1 score, the study subjects at the Software Architecture Erosion precision, and recall for each system. The figures focus and Architectural Consistency Workshop (SAEroCon) on showing the running median for random initial sets and NBAttract. This configuration seems to lack precision in Commons Imaging and Sweet Home 3D, and the recall is sufering in Ant. The naming and dependency information are possibly conflicting in these systems. Ta1https://ant.apache.org 2http://argouml.tigris.org 3https://jabref.org 4https://k9mail.app/ 5https://lucene.apache.org 6http://www.promtools.org 7http://www.sweethome3d.com 8https://teammatesv4.appspot.com 9https://github.com/sebastianherold/SAEroConRepo 10http://oss.models-db.com/Downloads/EASE2019_ ReplicationPackage/

System

Module globals preferences model logic gui cli queryparser search index store analysis util document business presentation

Random + NBAttract P R F1

ble 2 shows mean values; they can vary quite a bit in the 3D (cf. Figure 1). This indicates that when the mapping is actual cases depending on the size and composition of established, NBAttract often performs well when only a the initial set. few new source code entities are introduced (e.g., during

Finally, InMap lacks in precision but performs well re- software evolution). However, in some cases, the F1 score garding the recall. Note that InMap is a highly interactive is declining as the initial set becomes larger, e.g., JabRef, approach to mapping. The aim is not to automate the K9, and TeamMates (cf. Figure 1). A preliminary analmapping but rather give good advice to a human user that ysis seems to point towards overfitting, i.e., the model interactively maps the source code iteratively. If there becomes too specific, and as a result, the recall drops is a need to check an automatic mapping thoroughly, an (cf. Figure 3). It can also be an efect of randomness; the interactive approach is attractive regardless of precision. 1530 data points per system are pretty low considering the combinatorial complexity of random initial set sizes and compositions. However, it is suficient to indicate 6. Discussion and Validity the overall performance in a preliminary study such as this. The very high recall in ProM (cf. Figure 3) can be explained by the fact that the ProM framework has a very straightforward mapping, and as before, the number of data points may be too small.

We are limited to systems in Java, where the file structure often reflects the modular design of our subject systems well. While we could handle discrepancies and ambiguities well enough to create an initial set, this may not be the case in a system where the file structure is entirely diferent. However, we also show that these cases can use the file information. Current mapping methods, e.g., NBAttract and InMap, should likely give file information more attention.

Keywords can be efectively used and provide an excellent initial set, even a perfect mapping in some cases. It is an attractive approach compared to manually mapping an initial set. Hypothetically, it should be easier to extract the keywords and specify the corresponding module and weight of the keyword than mapping several tens or hundreds of files manually. The main challenge in this area is, of course, to find a high precision and minimal set of keywords. We used the already established ground truth mappings to do this in this preliminary evaluation, but this approach is not feasible in a real case. However, analyzing the directory structure and looking for words in the module names could provide a starting point in many cases. Possibly using a deeper level in the directory hierarchy or looking for repeating patterns could be fruit- 7. Conclusions and Future ful. Semantic analysis using, e.g., WordNet could be an approach to find related words in the directory structure.

In addition, information from, e.g., method names and identifiers could be used.

It would arguably be easier to create and maintain a small set of keywords compared to, e.g., regular expressions, even if done entirely manually.

Using a large random initial set seems to give a very high performance of NBAttract in some cases, e.g., ArgoUML, Commons Imaging, Lucene, ProM, Sweet Home We found that we could construct relatively simple keywords for a majority of the 96 modules in all nine systems.

Ten modules (9.6%) required weights for keywords, and 15 (15.6%) required two or more diferent keywords. Our mapper could successfully create an initial set using the keywords, and in some cases, this resulted in a perfect mapping.

Combining the keywords-based mapping and NBAttract using CDA provided outstanding performance with

Work

0.3

0.4 0.5 0.6 Commons Imaging 0.7

0.5 JabRef 0.0 0.1 0.2 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9

1.0 0.5 K9 0.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9

1.0 0.5

Lucene 0.5

ProM 0.4 0.5 0.6 TeamMates 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 Ant

ArgoUML re .90 o c S 1 .8 F 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.7 0.8 0.9

1.0 0.4 0.5 0.6 SweetHome3D 0.0 0.1 0.2 0.3 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1.0 Random+NBAttract Keywords Keywords+NBAttract InMap 0.3

0.4 0.5 0.6 Commons Imaging 0.7

0.5 JabRef 0.0 0.1 0.2 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9

1.0 0.5 K9 0.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9

1.0 0.5

Lucene 0.5

ProM 0.4 0.5 0.6 TeamMates 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 Ant

ArgoUML 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.7 0.8 0.9

1.0 0.4 0.5 0.6 SweetHome3D 0.0 0.1 0.2 0.3 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1.0 Random+NBAttract Keywords Keywords+NBAttract InMap 0.7 0.9 0.0 0.2 0.4 0.6 0.7 0.8 0.9

1.0 0.5

Lucene 0.5

ProM 0.4 0.5 0.6 TeamMates 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 Ant 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0 . 1 9 . 0 8 . 0 7 . 0 0.0 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.7 0.8 0.9

1.0 0.4 0.5 0.6 SweetHome3D 0.0 0.1 0.2 0.3 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

1.0 Random+NBAttract Keywords Keywords+NBAttract InMap 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Initial Set Size Figure 3: The recall of each approach, Random+NBAttract are shown with a running median and the running 25th to 75th quartiles. Note that the recall starts at 0.7.

a mean precision, recall , and F1 score of 0.98 , 1 .0, and 0.99, [8]

Olsson ,

Ericsson ,

Wingkvist , Semirespectively. The performance was higher than using automatic mapping of source code using naive random initial sets and NBAttract using CDA and file bayes, in: Proceedings of the 13th European Coninformation, and the interactive technique InMap (see ference on Software Architecture - Volume 2 , 2019 , Table 2). p. 209 - 216 . If a mapping is already established , NBAttract with [9]

Tzerpos ,

R. C.

Holt , The orphan adoption probCDA and file information provides good performance in lem in architecture maintenance, in: Working Conmany cases; however, in some systems, the model could ference on Reverse Engineering , IEEE, 1997 , pp.

sufer from overfitting issues (cf . Figure 3) . 76 - 82 . Using keywords is an attractive approach that can sig- [10]

Garcia , I. Krka,

Mattmann ,

Medvidovic , Obnificantly reduce the mapping efort. However, a central taining ground-truth software architectures, in: question that remains is how to extract good candidate 35th International Conference on Software Engikeywords and let a human user assign weights . neering (ICSE) , 2013 , pp. 901 - 910 . In addition, a keywords-based mapping approach is [11]

Buckley ,

Ali ,

English ,

Rosik , S. Herold, likely not applicable for some systems. We plan on per- Real-time reflexion modelling in architecture recforming comparative studies using the mappings from [10], onciliation: A multi case study, Information and where the authors claim architectural modules are not Software Technology 61 ( 2015 ) 107 - 123 .

bound to the file structure of the source code . [12]

Anquetil ,

T. C.

Lethbridge , Recovering software architecture from the names of source files , Journal of Software Maintenance: Research and Practice 11 Acknowledgments ( 1999 ) 201 - 221 . [13]

Christl ,

Koschke ,

M. A.

Storey , Equipping the The research was supported by the Centre for Data Inten- reflexion method with automated clustering , in: sive Sciences and Applications at Linnaeus University. Working Conference on Reverse Engineering, IEEE, 2005 , pp. 98 - 108 .

References [14] T.

Olsson , M.

Ericsson , A.

Wingkvist , An exploration and experiment tool suite for code to archi-

[1]

De Silva ,

Balasubramaniam , Controlling soft- tecture mapping techniques, in: Proceedings of the ware architecture erosion: A survey , Journal of 13th European Conference on Software ArchitecSystems and Software 85 ( 2012 ) 132 - 151 . ture - Volume 2 , ECSA ' 19 , 2019 , p. 26 - 29 .

[2]

G. C.

Murphy ,

Notkin ,

Sullivan , Software [15]

Witten , E. Frank,

Hall ,

Pal , Data Mining, reflexion models: Bridging the gap between source Fourth Edition: Practical Machine Learning Tools and high-level models , ACM SIGSOFT Software and Techniques , 4th ed., Morgan Kaufmann PubEngineering Notes 20 ( 1995 ) 18 - 28 . lishers Inc., San Francisco, CA, USA, 2016 .

[3]

Ali ,

Baker ,

R. O

'Crowley ,

Herold , J. Buck- [16]

Olsson ,

Ericsson ,

Wingkvist , s4rdm3x: A ley, Architecture consistency: State of the practice, tool suite to explore code to architecture mapping challenges and requirements, Empirical Software techniques , Journal of Open Source Software 6 Engineering 23 ( 2017 ) 1 - 35 . ( 2021 ) 2791 . doi:1 0 . 2 1 1 0 5 / j o s s . 0 2 7 9 1 .

[4]

Knodel ,

Popescu , A comparison of static archi- [17]

Brunet ,

R. A.

Bittencourt ,

Serey , J. Figueiredo, tecture compliance checking approaches , in: The On the evolutionary nature of architectural violaIEEE/IFIP Working Conference on Software Archi- tions, in: Working Conference on Reverse Engitecture , 2007 , pp. 12 - 21 . neering, IEEE, 2012 , pp. 257 - 266 .

[5]

R. A.

Bittencourt ,

Jansen de Souza Santos , D. D. S. [18]

Lenhard ,

Blom ,

Herold , Exploring the suitGuerrero, G. C. Murphy, Improving automated map- ability of source code metrics for indicating archiping in reflexion models using information retrieval tectural inconsistencies, Software Quality Journal techniques , in: Working Conference on Reverse ( 2018 ). Engineering, IEEE, 2010 , pp. 163 - 172 . [19]

Nurwidyantoro ,

Ho-Quang ,

M. R. V.

Chaudron ,

[6]

Christl ,

Koschke ,

M. A.

Storey , Automated Automated classification of class role-stereotypes clustering to support the reflexion method, Infor- via machine learning , in: Proceedings of the Evalmation and Software Technology 49 ( 2007 ) 255 - 274 . uation and Assessment on Software Engineering,

[7]

Z. T.

Sinkala ,

Herold , Inmap: Automated inter- 2019 , p. 79 - 88 . active code-to-architecture mapping recommendations , in: IEEE 18th International Conference on Software Architecture (ICSA) , 2021 , pp. 173 - 183 .