A Preliminary Study on the Use of Keywords for Source Code to Architecture Mappings Tobias Olsson, Morgan Ericsson and Anna Wingkvist Department of Computer Science and Media Technology, Linnaeus University, Kalmar/Växjö, Sweden Abstract We implement an automatic mapper that can find the corresponding architectural module for a source code file. The mapper is based on multinomial naive Bayes, and it is trained using custom keywords for each architectural module. For prediction, the mapper uses the path and file name of source code elements. We find that the needed keywords often match the module names, but also that ambiguities and discrepancies exist. We evaluate the mapper using nine open-source systems and find that the mapper can successfully create a mapping with perfect precision, but in most cases, it cannot cover all source code elements. Other techniques can, however, use the mapping as a foothold and create further mappings. Keywords Orphan Adoption, Software Architecture, Source Code Clustering, Naive Bayes 1. Introduction manual effort needed to create a mapping by using infor- mation available in the source code and intended modular The modular software architecture captures major design architecture. For example, dependencies between source decisions regarding reuse, maintainability, changeability, code entities can be used to create a mapping. A problem and portability [1]. During system evolution, the source with current automatic techniques is that they require an code must conform to the architecture, or the system initial set of mapped entities that the technique infers the risks accumulating technical debt and finally lose the automatic mappings from. Depending on the technique desired qualities. and system to be mapped, an initial set needs to consists Static Architecture Conformance Checking (SACC) meth- of approximately 15-20% of the entities before reaching ods, such as Reflexion modeling [2], statically analyze acceptable performance. In our experience, the physi- source code to ensure that it does not introduce archi- cal structure of files on disk is often in part or wholly tectural violations [3, 4]. These methods require an ar- reflected in the intended modular architecture. Effective chitecture model, with modules and dependencies, and a use of this information can present an attractive option source code model, with entities (e.g., source code files) to create an initial set. However, structure and naming and concrete dependencies (e.g., due to inheritance or are not always mapped one to one to a module, and there method invocations). They also require a mapping from are discrepancies, ambiguities, or simply missing terms the source code model to the architecture model to de- in the naming. tect convergent, absent, or divergent dependencies in the We investigate how well a multinomial naive Bayes implementation. classifier trained using simple keywords derived from Despite the importance of architecture conformance, ground truth mappings can be used to automatically cre- SACC has not reached widespread use in the software ate an initial set. We pose the following questions: industry [1, 3, 5, 6]. The necessary tools and methods for using SACC exist. However, practitioners perceive 1. Can the mapper construct an initial set based on the mapping from source code to architectural modules a simple set of keywords for each module? as a significant hindrance; it is often outdated or nonex- 2. How well does this initial set perform if used in istent. Many tools address this by combining manual combination with mapping based on dependen- mapping and regular expressions to filter file, module, cies? and package names. Still, such are considered to be both 3. How well does the above combination perform time-consuming and error-prone [3, 5, 6, 7]. compared to the NBAttract (with a random initial Automatic mapping techniques aim to minimize the set) and InMap approaches? ECSA2021 Companion Volume We evaluate the mapper using nine open-source sys- Envelope-Open tobias.olsson@lnu.se (T. Olsson); morgan.ericsson@lnu.se tems with known mappings to a specified modular ar- (M. Ericsson); anna.wingkvist@lnu.se (A. Wingkvist) chitecture and find that the keywords are often the same Orcid 0000-0003-1154-5308 (T. Olsson); 0000-0003-1173-5187 as the module names, but more and different keywords (M. Ericsson); 0000-0002-0835-823X (A. Wingkvist) © 2021 Copyright for this paper by its authors. Use permitted under Creative are needed in some cases. After the initial set is cre- Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) ated, we run another automatic mapper that can map 1 Tobias Olsson et al. CEUR Workshop Proceedings 1–10 any remaining entities. We compare the results with a tions found in the relatively short filenames. While this is traditional automatic mapping technique [8] and an inter- not a technical problem in modern development, the use active mapping technique [7]. We find that the keywords- of abbreviations is still common practice. For example, based approach can, in some cases, provide a complete one of the subject systems, ArgoUml, defines a module mapping and that the keywords-based approach plus the reverseEngineering, and the corresponding directory map- automatic mapping approach performs very well. ping is the abbreviation reveng. Finally, Anquetil and Lethbridge successfully use filenames to create a cluster- ing that corresponds well to an expert’s view of a system. 2. Background and Related Work Tzerpos and Holt describe the general problem of map- 2.1. Semi-Automatic Mapping ping (or remapping) a source code entity to an architec- Christl et al. introduced the Human Guided clustering tural module [9]. They collectively call both the mapping Method (HuGMe), an approach to semi-automatic map- and remapping of an entity the orphan adoption prob- ping of source code entities to modules of the intended lem. They find four major criteria for solving the problem: architecture [9]. It is an iterative approach that, at its core, naming, structure, style, and semantics and device an algo- uses an attraction function to compute the attraction be- rithm that they evaluate in three case studies [9]. Tzerpos tween a source code entity and a module. If the attraction and Holt regard the naming criteria as the first option is considered valid, an automatic mapping is made; if not, to use in an orphan adoption scenario and suggest using the attractions can be used as a suggestion for a human per system regular expressions to determine a mapping. user. Two attraction functions based on dependencies However, they also mention that naming criteria is not are presented, CountAttract and MQAttract [13, 6]. enough as they may be lacking or that naming standard Bittencourt et al. present two new attraction func- is not always adhered to by developers. tions based on information retrieval techniques [5]. They Garcia et al. discuss the use of package and naming use semantic information in the source code, including information in software architecture recovery [10]. In module- and filenames. The attractions are calculated general, they found that their ground truth components based on cosine similarity (IRAttract) and latent semantic often spanned or shared several packages. They could indexing (LSIAttract). They make a quantitative compari- not find a correlation between components and single son between the performance of their attraction functions package or directory names. One of their four cases pre- with CountAttract and MQAttract in an evolutionary set- sented a reasonably good correlation, and in one system, ting (where a few new files are to be assigned a mapping). they could find a repeating pattern of directories. The They find that combining attraction functions (e.g., if ground truth architectures recovered in their study are CountAttract fails, try IRAttract) performs best. They possibly at a lower level than the modular architectures find that CountAttract usually misplaces entities on mod- we study. Still, there is likely variation in what dimension ule borders. MQAttract performs better when mapping or view of an architecture is expressed in the package entities with dependencies to many different modules. structure. This variation is further supported by Buckley IRAttract and LSIAttract perform better when mapping et al., where one out of five studied systems did not have entities in libraries or entities on module borders, but any clear correlation between packages and modules. worse if there are modules that share vocabulary but are This presented difficulties and significant effort when not related [5]. performing manual mapping [11]. We have created an attraction function that uses ma- Anquetil and Lethbridge, on the other hand, propose chine learning techniques and introduced the Concrete a method for architecture recovery of legacy systems Dependency Abstraction (CDA) method [8]. In short, using filenames [12]. Their approach focuses on the CDA produces textual representations of dependencies assumptions that files have short names with many ab- at the level of architectural modules and lets a machine breviations and are placed in a single directory. This is learning technique learn the patterns of dependencies due to their focus on recovering legacy systems. Nev- from the actual source code and combine these with in- ertheless, they present some interesting findings. First, formation retrieval techniques. We implement this ap- they identify several forces that shape a filename, i.e., proach using naive Bayes as an attraction function for what influences it. There seem to be several examples of the HuGMe method, NBAttract. We have compared the such forces also in more modern implementations, e.g., automatic mapping performance of CountAttract, IRAt- from the subject system Ant, we find the feature imple- tract, LSIAttract and NBAttract over several systems us- mented (ant.taskdefs.SendEmail), the algorithms or steps ing s4rdm3x, our open-source tool suite for automatic of algorithms (ant.types.resources.Sort), or data processed mapping experiments [8, 14]. (ant.taskdefs.email.Header), as suggested in [12]. Much of The main limitations for the techniques that build on the approach revolves around the problematic abbrevia- HuGMe are the need for an initial set and, in some cases, 2 Tobias Olsson et al. CEUR Workshop Proceedings 1–10 low-quality mappings. The initial set needs to be man- 3. Keywords and File-Based ually created and be of good quality for the attraction functions to perform well. We estimate that a randomly Mapping composed initial set needs to include approximately 15- File naming and structure seem to reflect the intended 20% of the source code entities. Based on this, we con- modular architectures we have studied quite well. For clude that creating the initial set is likely a significant example, module names tend to map to the directory effort. Automated techniques will probably not result in structure of the source code. However, the naming is a perfect mapping except when they use a large initial often not perfect. In some cases, module names are not set and only map a few entities. In the best of cases, the used, or shorter or slightly different terms are used. In automated technique leaves hard to map instances to the other cases, several module names exist in the structure user (creating more manual work), but misclassifications or naming of a file. A simplistic approach is thus not are problematic. There has not been much research in the appropriate. Instead, the file naming patterns need to be manual mapping steps of HuGMe except for the original fully defined, e.g., using regular expressions or a heuristic. studies [13, 6]. Handling of misclassification and manual For regular expressions to work, there is often a need to support in these methods are still open issues. maintain several expressions that can be conflicting and overlapping. A more attractive option would be to use 2.2. Interactive Mapping machine learning and train a classifier using a good set of keywords. The classifier’s task is to produce a good Sinkala and Herold present InMap, which is not an auto- enough initial set. An automatic mapping technique can mated approach to mapping per se, but instead suggest then use this initial set for further mappings. mappings to the end-user, who can then choose to ac- In this work, we implement a proof of concept map- cept the suggested mapping (or not) [7]. It is an iterative per using a multinomial naive Bayes classifier. It is a approach that iteratively presents a suggested mapping simple, probabilistic approach that uses word frequen- for a fixed number of entities. The end-user chooses to cies to compute the probability of each class. While it accept or reject the suggestions. InMap uses the accepted is conceptually simple, naive Bayes often produce good mappings to improve the suggested mappings further in results, especially if the training data is small. As the the next iteration. It also uses the negative evidence of goal is to create a good enough mapping using a small a rejected mapping and does not suggest this mapping set of predefined keywords, naive Bayes is thus a good again. InMap produces the suggested mappings similar candidate for a proof of concept study. to Bittencourt et al., with the addition of a descriptive We base our implementation on the Weka library [15] text for each architectural module. InMap also includes and train the classifier using the custom keywords for the path and filename used in the Java class and package each module. Note that the same keyword can be spec- names. It treats the source code entities as a database of ified multiple times, increasing the importance of that documents and uses Lucene to search this database using particular keyword. module information as a query. Sinkala and Herold eval- We derive the prediction data from the path of each uate InMap using six open source systems. For the best source code entity, including the filename. The filename combination (in terms of highest F1 score) of informa- is split into words based on common camel-, kebab, and tion, InMap can suggest mappings for most of a system’s snake-case rules. In addition, we value later parts of the entities with a mean recall of 0.95, a mean precision of path more and add these words multiple times. Intuitively 0.84, and a mean F1 score of 0.89. allowing for a deeper nested folder mapping to ”override” The main limitations of InMap are its highly interactive a higher level mapping. For example, the file: nature and that architectural documentation needs to ex- ist for every module. The documentation provided needs net/sf/jabref/logic/util/io/FileHistory.java to be of good quality, i.e., as short as possible but con- taining good keywords. Noisy documentation will likely will produce the following words: not help in producing high-precision suggestions. The interactiveness of InMap is in some way double-edged; net sf jabref logic util io filehistory file history sf jabref the technique often seems to require more interaction logic util io jabref logic util io logic util io util io io (accepting or rejecting a suggested mapping) than there are entities in the source code. On the other hand, if Note the six occurrences of io reflecting the nesting not minor mapping errors can be tolerated, a mapping depth of the word in the path. validation is needed anyway. To generate a useful initial set, it is more important that the mappings are precise rather than complete. There needs to be a high difference between the best mapping probability and the second best. By trial and error, we 3 Tobias Olsson et al. CEUR Workshop Proceedings 1–10 found a factor of 1.99 to work well, i.e., the highest prob- 2016, 2017, and 2019 respectively. A system expert has ability needs to be 1.99 times higher than the second- provided both the architecture and the mapping for these highest probability for mapping to occur. systems. The architecture documentation and mappings We have implemented the mapper described above in are available in the SAEroCon repository9 . ArgoUML, our open-source tool suite s4rdm3x [16]. Ant, and Lucene has been previously studied [17, 18], and the architectures and mappings were extracted from the replication package of Brunet et al. [17]. K9 has been 4. Method preliminary mapped by ourselves based on architecture documentation provided in [19]10 . We have not validated We use nine open-source systems where the ground truth this mapping with system experts but include it since it mappings are known. We create a keyword set for each is an interesting case with a more complex file structure. module based on the ground truth mappings. We make sure that these keywords will successfully map at least some entities to each module. 5. Results and Analysis After we have determined the keywords, we run our keywords-based mapper and create an initial set. This We use the existing ground truth mappings to construct initial set is then used as the input to another mapper, a set of keywords for each system. Table 1 shows the NBAttract, which also uses multinomial naive Bayes but manually extracted keywords. Note that a single key- instead forms training- and prediction words using de- word is sufficient in many cases, and many keywords pendency information in the form of concrete depen- are the same as or some variation of the module name. dency abstractions (CDA) [8]. We compare the perfor- K9 presents an interesting exception where several key- mance to NBAttract with a random initial set. In this words are needed. We relied on a high-level architectural configuration, we use file information (not including the description when creating the mapping for K9, where module keywords) and CDA. In addition, we compare to allowed dependencies were the most clearly defined. The the interactive approach InMap [7]. keywords used reflect the sub-modules of the high-level We collect precision, recall, and combined F1 scores modules. Note that our mapping has not been validated for each approach. When a random initial set is used, by systems experts. several sets of different sizes and compositions are needed Using the generated initial sets, we ran the NBAttract to cover a large range of combinations. We will present mapper with CDA information only. We ran 1530 experi- the performance metrics numerically and visually as the ments with random initial sets for the NBAttract mapper effect of the initial set size is essential. where the mapper used filename and CDA information We use nine open-source systems implemented in Java. (no module keywords). Finally, we use the best-reported Ant1 is an API and command-line tool for process au- performance metrics from [7]. Table 2 shows the compar- tomation. ArgoUML2 is a desktop application for UML ison of the four approaches. Using the keywords-based modeling. Jabref3 is a desktop application for managing mapping, we can create an initial set with perfect preci- bibliographical references. K94 is an open-source email sion and recall in Commons Imaging, ProM, and Sweet client for Android. Lucene5 is an indexing and search Home 3D. The keywords for these systems are straight- library. ProM6 is an extensible framework that supports forward and are often directly reflected in the module a variety of process mining techniques. Note that we name. For the other systems, keywords can generate use the ProM framework and not the full ProM system. an initial set with perfect precision. However, recall is Sweet Home 3D7 is an interior design application. Team- suffering. Mates8 is a web application for handling student peer Using the keywords-based initial sets and NBAttract reviews and feedback. using CDA performs very well, with precision scores A documented software architecture and a mapping over 0.95 in all cases and almost perfect scores for recall, from the implementation to this architecture exist for cf. Table 2). each system. Jabref, TeamMates, and ProM have been Figures 1, 2, and 3 shows the running median F1 score, the study subjects at the Software Architecture Erosion precision, and recall for each system. The figures focus and Architectural Consistency Workshop (SAEroCon) on showing the running median for random initial sets and NBAttract. This configuration seems to lack preci- 1 https://ant.apache.org sion in Commons Imaging and Sweet Home 3D, and the 2 http://argouml.tigris.org recall is suffering in Ant. The naming and dependency 3 https://jabref.org 4 https://k9mail.app/ information are possibly conflicting in these systems. Ta- 5 https://lucene.apache.org 6 9 http://www.promtools.org https://github.com/sebastianherold/SAEroConRepo 7 10 http://www.sweethome3d.com http://oss.models-db.com/Downloads/EASE2019_ 8 https://teammatesv4.appspot.com ReplicationPackage/ 4 Tobias Olsson et al. CEUR Workshop Proceedings 1–10 Table 1 Keywords for each system and module. System Module Keywords System Module Keywords Ant compilers 2 * compiler JabRef globals globals 2 * compilers preferences preferences prefs condition condition model model shared dbms rmic rmic logic logic shared cvslib cvslib gui gui email email cli cli taskdefs taskdefs Lucene queryparser queryparser listener listener search search types types index index ant ant store store util util analysis analysis zip zip util util tar tar document document mail mail K9 business controller service bzip2 bzip2 mail k9 power AUML application 2 * application search migrations diagrams 2 * diagram presentation activity ui notification notation notation fragment view list explorer explorer widget helper crypto codeGeneration 3 * language code service provider action extra generation dataaccess mailstore util javaCodeGeneration language code crosscutting crypto autocrypt generation 2 * java cache helper reverseEngineering 3 * reveng ProM framework framework persistence persistence contexts contexts moduleLoader moduleloader 2 * api models models module modules plugins plugins gui ui SH3D sH3DModel model model model sH3DTools tools internationalization i18n sH3DPlugin plugin swingExtensions swingext sH3DViewController viewcontroller ocl ocl sH3DSwing swing critics 2 * cognitive sH3DJava3D j3d C Img base imaging sH3DIO io color color sH3DApplet applet common common sH3DApplication sweethome3d bmp bmp TMates common.util util dcx dcx common.exception exception gif gif common.dataTransfer datatransfer icns icns ui.automated automated ico ico ui.controller controller jpeg jpeg ui.view ui page pcx pcx logic.core core png png logic.api logic api pnm pnm logic.backdoor backdoor psd psd storage.entity entity rgbe rgbe storage.api storage api tiff tiff storage.search search wbmp wbmp testDriver 2 * test xbm xbm client.remoteAPI remoteapi xpm xpm client.scripts 2 * scripts icc icc internal internal palette palette 5 Tobias Olsson et al. CEUR Workshop Proceedings 1–10 Table 2 Precision, Recall and F1 score for each mapping technique. For Random + NBAttract, the median metrics are shown. Keywords Keywords + NBAttract Random + NBAttract InMap System P R F1 P R F1 P R F1 P R F1 Ant 1.00 0.97 0.99 0.99 1.00 0.99 0.94 0.91 0.94 0.73 1.00 0.84 AUML 1.00 0.67 0.80 0.97 1.00 0.98 0.95 1.00 0.97 0.78 0.98 0.87 C Img 1.00 1.00 1.00 0.84 0.99 0.90 JabRef 1.00 0.95 0.98 0.98 1.00 0.99 0.91 0.98 0.94 0.96 1.00 0.98 K9 1.00 0.81 0.90 0.96 1.00 0.98 0.92 1.00 0.96 Lucene 1.00 0.99 1.00 1.00 0.99 1.00 0.97 1.00 0.98 ProM 1.00 1.00 1.00 0.99 1.00 1.00 0.81 0.87 0.84 SH3D 1.00 1.00 1.00 0.83 1.00 0.91 TMates 1.00 0.60 0.75 0.97 1.00 0.99 0.97 1.00 0.98 0.95 0.97 0.96 Mean 1.00 0.89 0.93 0.98 1.00 0.99 0.92 0.99 0.95 0.846 0.964 0.90 ble 2 shows mean values; they can vary quite a bit in the 3D (cf. Figure 1). This indicates that when the mapping is actual cases depending on the size and composition of established, NBAttract often performs well when only a the initial set. few new source code entities are introduced (e.g., during Finally, InMap lacks in precision but performs well re- software evolution). However, in some cases, the F1 score garding the recall. Note that InMap is a highly interactive is declining as the initial set becomes larger, e.g., JabRef, approach to mapping. The aim is not to automate the K9, and TeamMates (cf. Figure 1). A preliminary anal- mapping but rather give good advice to a human user that ysis seems to point towards overfitting, i.e., the model interactively maps the source code iteratively. If there becomes too specific, and as a result, the recall drops is a need to check an automatic mapping thoroughly, an (cf. Figure 3). It can also be an effect of randomness; the interactive approach is attractive regardless of precision. 1530 data points per system are pretty low considering the combinatorial complexity of random initial set sizes and compositions. However, it is sufficient to indicate 6. Discussion and Validity the overall performance in a preliminary study such as this. The very high recall in ProM (cf. Figure 3) can be Keywords can be effectively used and provide an excel- explained by the fact that the ProM framework has a very lent initial set, even a perfect mapping in some cases. It straightforward mapping, and as before, the number of is an attractive approach compared to manually mapping data points may be too small. an initial set. Hypothetically, it should be easier to ex- We are limited to systems in Java, where the file struc- tract the keywords and specify the corresponding module ture often reflects the modular design of our subject sys- and weight of the keyword than mapping several tens or tems well. While we could handle discrepancies and am- hundreds of files manually. The main challenge in this biguities well enough to create an initial set, this may not area is, of course, to find a high precision and minimal be the case in a system where the file structure is entirely set of keywords. We used the already established ground different. However, we also show that these cases can truth mappings to do this in this preliminary evaluation, use the file information. Current mapping methods, e.g., but this approach is not feasible in a real case. However, NBAttract and InMap, should likely give file information analyzing the directory structure and looking for words more attention. in the module names could provide a starting point in many cases. Possibly using a deeper level in the directory hierarchy or looking for repeating patterns could be fruit- 7. Conclusions and Future Work ful. Semantic analysis using, e.g., WordNet could be an approach to find related words in the directory structure. We found that we could construct relatively simple key- In addition, information from, e.g., method names and words for a majority of the 96 modules in all nine systems. identifiers could be used. Ten modules (9.6%) required weights for keywords, and It would arguably be easier to create and maintain a 15 (15.6%) required two or more different keywords. Our small set of keywords compared to, e.g., regular expres- mapper could successfully create an initial set using the sions, even if done entirely manually. keywords, and in some cases, this resulted in a perfect Using a large random initial set seems to give a very mapping. high performance of NBAttract in some cases, e.g., Ar- Combining the keywords-based mapping and NBAt- goUML, Commons Imaging, Lucene, ProM, Sweet Home tract using CDA provided outstanding performance with 6 Tobias Olsson et al. CEUR Workshop Proceedings 1–10 a mean precision, recall, and F1 score of 0.98, 1.0, and 0.99, [8] T. Olsson, M. Ericsson, A. Wingkvist, Semi- respectively. The performance was higher than using automatic mapping of source code using naive random initial sets and NBAttract using CDA and file bayes, in: Proceedings of the 13th European Con- information, and the interactive technique InMap (see ference on Software Architecture - Volume 2, 2019, Table 2). p. 209–216. If a mapping is already established, NBAttract with [9] V. Tzerpos, R. C. Holt, The orphan adoption prob- CDA and file information provides good performance in lem in architecture maintenance, in: Working Con- many cases; however, in some systems, the model could ference on Reverse Engineering, IEEE, 1997, pp. suffer from overfitting issues (cf. Figure 3). 76–82. Using keywords is an attractive approach that can sig- [10] J. Garcia, I. Krka, C. Mattmann, N. Medvidovic, Ob- nificantly reduce the mapping effort. However, a central taining ground-truth software architectures, in: question that remains is how to extract good candidate 35th International Conference on Software Engi- keywords and let a human user assign weights. neering (ICSE), 2013, pp. 901–910. In addition, a keywords-based mapping approach is [11] J. Buckley, N. Ali, M. English, J. Rosik, S. Herold, likely not applicable for some systems. We plan on per- Real-time reflexion modelling in architecture rec- forming comparative studies using the mappings from [10], onciliation: A multi case study, Information and where the authors claim architectural modules are not Software Technology 61 (2015) 107–123. bound to the file structure of the source code. [12] N. Anquetil, T. C. Lethbridge, Recovering software architecture from the names of source files, Journal of Software Maintenance: Research and Practice 11 Acknowledgments (1999) 201–221. [13] A. Christl, R. Koschke, M. A. Storey, Equipping the The research was supported by the Centre for Data Inten- reflexion method with automated clustering, in: sive Sciences and Applications at Linnaeus University. Working Conference on Reverse Engineering, IEEE, 2005, pp. 98–108. References [14] T. Olsson, M. Ericsson, A. Wingkvist, An explo- ration and experiment tool suite for code to archi- [1] L. De Silva, D. Balasubramaniam, Controlling soft- tecture mapping techniques, in: Proceedings of the ware architecture erosion: A survey, Journal of 13th European Conference on Software Architec- Systems and Software 85 (2012) 132–151. ture - Volume 2, ECSA ’19, 2019, p. 26–29. [2] G. C. Murphy, D. Notkin, K. Sullivan, Software [15] I. Witten, E. Frank, M. Hall, C. Pal, Data Mining, reflexion models: Bridging the gap between source Fourth Edition: Practical Machine Learning Tools and high-level models, ACM SIGSOFT Software and Techniques, 4th ed., Morgan Kaufmann Pub- Engineering Notes 20 (1995) 18–28. lishers Inc., San Francisco, CA, USA, 2016. [3] N. Ali, S. Baker, R. O’Crowley, S. Herold, J. Buck- [16] T. Olsson, M. Ericsson, A. Wingkvist, s4rdm3x: A ley, Architecture consistency: State of the practice, tool suite to explore code to architecture mapping challenges and requirements, Empirical Software techniques, Journal of Open Source Software 6 Engineering 23 (2017) 1–35. (2021) 2791. doi:1 0 . 2 1 1 0 5 / j o s s . 0 2 7 9 1 . [4] J. Knodel, D. Popescu, A comparison of static archi- [17] J. Brunet, R. A. Bittencourt, D. Serey, J. Figueiredo, tecture compliance checking approaches, in: The On the evolutionary nature of architectural viola- IEEE/IFIP Working Conference on Software Archi- tions, in: Working Conference on Reverse Engi- tecture, 2007, pp. 12–21. neering, IEEE, 2012, pp. 257–266. [5] R. A. Bittencourt, G. Jansen de Souza Santos, D. D. S. [18] J. Lenhard, M. Blom, S. Herold, Exploring the suit- Guerrero, G. C. Murphy, Improving automated map- ability of source code metrics for indicating archi- ping in reflexion models using information retrieval tectural inconsistencies, Software Quality Journal techniques, in: Working Conference on Reverse (2018). Engineering, IEEE, 2010, pp. 163–172. [19] A. Nurwidyantoro, T. Ho-Quang, M. R. V. Chaudron, [6] A. Christl, R. Koschke, M. A. Storey, Automated Automated classification of class role-stereotypes clustering to support the reflexion method, Infor- via machine learning, in: Proceedings of the Eval- mation and Software Technology 49 (2007) 255–274. uation and Assessment on Software Engineering, [7] Z. T. Sinkala, S. Herold, Inmap: Automated inter- 2019, p. 79–88. active code-to-architecture mapping recommenda- tions, in: IEEE 18th International Conference on Software Architecture (ICSA), 2021, pp. 173–183. 7 Tobias Olsson et al. CEUR Workshop Proceedings 1–10 Ant ArgoUML 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Commons Imaging JabRef 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 K9 Lucene 1.0 1.0 0.9 0.9 F1 Score 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ProM SweetHome3D 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 TeamMates 1.0 0.9 Random+NBAttract Keywords Keywords+NBAttract InMap 0.8 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Initial Set Size Figure 1: The F1 score of each approach, Random+NBAttract are shown with a running median and the running 25th to 75th quartiles. Note that the F1 score starts at 0.7. 8 Tobias Olsson et al. CEUR Workshop Proceedings 1–10 Ant ArgoUML 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Commons Imaging JabRef 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 K9 Lucene 1.0 1.0 0.9 0.9 Precision 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ProM SweetHome3D 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 TeamMates 1.0 0.9 Random+NBAttract Keywords Keywords+NBAttract InMap 0.8 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Initial Set Size Figure 2: The precision of each approach, Random+NBAttract are shown with a running median and the running 25th to 75th quartiles. Note that the precision starts at 0.7. 9 Tobias Olsson et al. CEUR Workshop Proceedings 1–10 Ant ArgoUML 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Commons Imaging JabRef 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 K9 Lucene 1.0 1.0 0.9 0.9 Recall 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 ProM SweetHome3D 1.0 1.0 0.9 0.9 0.8 0.8 0.7 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 TeamMates 1.0 0.9 Random+NBAttract Keywords Keywords+NBAttract InMap 0.8 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Initial Set Size Figure 3: The recall of each approach, Random+NBAttract are shown with a running median and the running 25th to 75th quartiles. Note that the recall starts at 0.7. 10