Studying Expert Initial Set and Hard to Map Cases in Automated Code-to-Architecture Mappings Tobias Olsson, Morgan Ericsson and Anna Wingkvist Department of Computer Science and Media Technology, Linnaeus University, Kalmar/Växjö, Sweden Abstract We study the mapping of software source code to architectural modules. Background: To evaluate techniques for performing automatic mapping of code-to-architecture, a ground truth map- ping, often provided by an expert, is needed. From this ground truth, techniques use an initial set of mapped source code as a starting point. The size and composition of this set affect the techniques’ performance, and to make comparisons, random sizes and compositions are used. However, while randomness will give a baseline for comparison, it is not likely that a human expert would compose an initial set on random to map source code. We are interested in letting an expert create an initial set based on their experience with the system and study how this affects how a technique performs. Also, previous research has shown that when comparing an automatic mapping with the ground truth mappings, human experts often accept the automated mappings and, if not, point to the need for refactoring the source code. We want to study this phenomenon further. Audience: Researchers and developers of tools in the area of architecture conformance. The system expert can gain valuable insights into where the source code needs to be refactored. Aim: We hypothesize that an initial set assigned by an expert performs better than a random initial set of similar size and that an expert will agree upon or find opportunities for refactoring in a majority of cases where the automatic mapping and expert mapping disagrees. Method: The initial set will be extracted from an interview with the expert. Then the performance (precision and recall f1 score) will be compared to mappings starting from random initial sets and using an automatic technique. We will also use our tool to find the cases where the automatic and human mapping disagrees and then let the expert review these cases. Results: We expect to find a difference when performance is compared. We expect the expert review to reveal source code that should be remapped, source code that needs refactoring (e.g., possible architectural violations), and points where the automatic technique needs to be improved. Limitations: The study will only focus on only a single system, which limits the external validity significantly. The protocol for the interaction with the human expert can also introduce validity problems; for example, a mapping presented by an algorithm could be perceived as more objective and thus more acceptable for a software engineer. Conclusions: We seek to improve our understanding of how a human creates an initial set for automatic mapping and its effect on how well an automated mapping technique performs. By improving the ground truth mappings, we can improve our techniques, tools, and methods for architecture conformance checking. Keywords Orphan Adoption, Software Architecture, Source Code Clustering, Naive Bayes 1. Introduction described in the architecture, i.e., conformance checking. Creating a mapping from the source code to an archi- tectural model is perceived as labor-intensive. It hin- 2. Background ders widespread use of Static Architecture Conformance Checking (SACC) practices such as Reflexion modeling in Current semi-automatic techniques build on the Human industry [1, 2]. A mapping is an assignment of a source Guided clustering Method (HuGMe) and introduce dif- code entity, e.g., a source code file or class, to an architec- ferent attraction functions that guide the automatic map- tural module, e.g., a layer or a sub-system. The architec- ping [3, 4, 5, 6, 7]. HuGMe consists of a few essential tural modules and the dependencies between them form steps as described below: an intended architecture, see Figure 1. The mappings 1. An initial set is created manually. are used to determine if the dependencies in the source 2. The entities to be mapped are determined. code conform to or violate the intended dependencies as 3. The attraction function calculates an attraction ECSA2021 Companion Volume for each entity and module. email: tobias.olsson@lnu.se (T. Olsson); morgan.ericsson@lnu.se 4. If the attraction of a single module is deemed (M. Ericsson); anna.wingkvist@lnu.se (A. Wingkvist) orcid: 0000-0003-1154-5308 (T. Olsson); 0000-0003-1173-5187 valid, the entity is mapped to this module. (M. Ericsson); 0000-0002-0835-823X (A. Wingkvist) 5. If no valid attraction is found, the decision is left © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). to a human user. CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 1 Tobias Olsson et al. CEUR Workshop Proceedings 1–5 Entity to be Mapped StringChange ? Attraction Function Figure 1: The intended architecture of JabRef version 3.7. The boxes represent modules, and the arrows represent al- lowed directed dependencies between these modules. The module Global has been omitted for clarity. ChangeScanner AttachFileAction DOICheck XMLUtil DataBank Initial Set of Mapped Entities 6. If new mappings are made, and there are entities remaining, continue at Step 2. Figure 2: An example of how entities are mapped to mod- 7. If entities are remaining, let the human user de- ules. In this case, two classes are mapped to GUI, and three classes are mapped to Logic. These form the initial set. A sixth cide a mapping. class, StringChange, is about to be mapped. The Attraction There are some things to note. First, the method is Function uses the information in the initial set to calculate iterative, as the set of mapped entities can grow, and an attraction to each module. more mappings can be done. An initial set is needed to start the method, i.e., the attraction functions need some initial mappings to work with, see Figure 2. The human 2.2. Ground Truth Mapping user is involved in several steps of the method. Thus Related to the performance of a technique is also the it is semi-automatic and human-guided. However, the quest for a perfect mapping compared to the ground truth focus of most studies has been on the development and mappings. Previous research indicates that mappings’ comparison of attraction functions, i.e., the automatic differences often reveal points where the source code step of the method. needs refactoring, or developers made a mistake in the mapping. 2.1. Initial Set Tzerpos and Holt [9] found that their technique sug- gested 46 entities to change mapping compared to the The attraction functions used in HuGMe all need an ini- developers’ assignment in one of their case studies. The tial set of pre-mapped entities to work. In general, the developers agreed on this new suggested mapping in 33 previous studies have focused on comparing the auto- cases, and the remaining 13 original mappings were con- matic performance, e.g., precision and recall of differ- sidered valid but not optimal. In these 13 remaining cases, ent functions. In these studies, the initial set has been the developers expressed that restructuring the entities treated as a random variable considering size and com- was needed to motivate their inclusion in the original position [3, 4, 5, 6]. This assumption is fair in a general modules. We have previously constructed a heuristic performance comparison. However, it does not necessar- for automatic mapping of source code to Model-View- ily reflect a realistic scenario. Controller-based architectures [10]. We evaluated the A system expert would not select entities to map at approach on four products in a product-line of games. random. Realistically, a system expert user could make a We compared the automatic mapping to the manual careful and well-thought-out initial set, select represen- mapping of 653 entities and found a difference in map- tative parts of the source code to map, or map everything pings in 96 cases. Architectural problems caused 76 of easy to map and leave complex cases to the machine. these. Source code refactorings were suggested and im- We have previously studied the effect of the initial plemented for two of the projects covering 23 architec- set. We found large variations in attraction function tural problems. An interesting finding is that the most performance depending on both size and composition of common refactoring was Move Type (12 instances), i.e., the initial set: A representative initial set can give good move the type to the correct module. This indicates that results even if the set is small, and vice versa [8]. Our goal a perfect automatic mapping is an elusive or even unde- was to help guide a system expert to produce a minimal sirable target, especially if a system has evolved for some but high-performing initial set with the help of standard time and has accumulated some erosion or drift. source code metrics, and we found limited success in using inheritance-based metrics. However, the results did not generalize well for different subject systems. 2 Tobias Olsson et al. CEUR Workshop Proceedings 1–5 2.3. Attraction Function and Tool defined architectural modules and allowed dependencies between these and a mapping from the source code to the We have previously evaluated and implemented the at- architectural modules. If these do not exist, they need to traction functions found in research by Bittencourt et al. be prepared before we can initiate the study. Optimally [3], and Christl et al. [4] as well as implemented our own we will study these artifacts beforehand. attraction function NBAttract based on machine learn- We perform the study in three phases. 1. we interview ing [5]. To aid evaluations of attraction functions, we the expert to create an initial set and the rationale for the have set up an open-source tool aimed at allowing ex- mapping. 2. we conduct experiments to find the perfor- perimentation of different parameters and settings [11]. mance of the initial set created by the expert compared NBAttract has shown the most promise in our previous to a random initial set of similar size. During this phase, evaluation [5] and will be the main function used in this we will generate a list of mapping discrepancies for the study. Our tool allows us to vary the information the automatic mappings. And 3. we will interview the ex- function uses, and we plan on evaluating different com- pert once more. This interview aims at investigating the binations of the following: mapping discrepancies found in Phase 2. • File names and paths. • Architectural module names. 5.1. Phase 1: Initial Set Creation • Source code dependencies. To create an initial set, we will interview the human • Names of identifiers in the source code, e.g., expert in a semi-formal way. The interview will be held method names, variable names, etc. online and recorded. There will be an agreement on the use and handling of the recording. This session will likely 3. Audience take less time than two hours. The interview protocol will follow this design. The study will be valuable to researchers and developers 1. Introduce yourself and explain that the interview of tools in the area of architecture conformance. The is recorded and that the expert agrees to this. system expert can gain valuable insights into where the 2. Explain the purpose of the study and the use of source code needs to be refactored. the data. 3. Ask the expert about their involvement in the sub- 4. Aim ject system’s development, what role they have, and the general experience and the time frame of We want to know more about how a system expert would involvement. create an initial set of entities for semi-automatic map- 4. Ask the expert about the subject system, its ba- ping and the rationale for the specific mappings. This sic purpose, the end-users, and the architecture: would give insight into the composition and distribution what is the purpose of the architecture, the de- of an initial set created by an expert. We also want to fined modules and dependencies, and did the ex- know more about discrepancies in the automatic map- pert create the architecture and mapping? How pings compared to system expert mappings. To enable were the architecture and mapping created? this, we need to build a broad set of data over multiple 5. Explain the mapping scenario and give a rough systems and experts. This study would act as a first initial outline of how an automatic mapping would work. study towards this goal. 6. Ask the system expert for where they would start We hypothesize that an initial set assigned by an ex- to map, any parts that jump to their mind or any pert performs better than a random initial set of similar easy to map parts of the system, e.g., whole direc- size when used by our current best automatic mapping tories/packages that can be mapped. attraction function, NBAttract [5] 7. For each module in the architecture, ask the sys- We hypothesize that an expert will agree upon or find tem expert to provide the most typical and im- opportunities for refactoring in a majority of cases where portant source code files. Ask why each file is the automatic mapping using NBAttract [5] and expert deemed typical or important. At least 10% of the mapping disagrees. files should be provided. 8. Ask the expert if there is anything they would 5. Method like to add. 9. Thank the expert and explain the remainder of The study will involve a human that is a system expert the study. Book a new interview for Phase 3 to for a subject system. We assume a subject system im- take place a few days later. plemented in Java with a documented architecture with 3 Tobias Olsson et al. CEUR Workshop Proceedings 1–5 5.2. Phase 2: Experiments c) Show the automatic mapping results (sev- eral modules may be suggested). Ask if the We perform experiments where the expert’s initial set is expert would consider any of these map- used with different combinations of information for the pings valid and why/why not. NBAttract attraction function. If we get differences in 5. Ask the expert if there is anything they would the initial sets (e.g., typical mappings vs. easy mappings) like to add. based on the first interview, we can compare these ini- tial sets to each other. For further comparison, we will 6. Thank the expert. use a random initial set combination. Note that for ran- dom initial sets, several hundred experiments are needed, 6. Expected Results depending on the number of architectural modules, the number of source code entities, and the size of the ex- We expect to find a difference when we compare per- pert’s initial set. This phase will likely take three days to formance. We expect that the human expert’s initial set complete. performs better than a random initial set of similar size. Experiments will generate the mapping data to calcu- We expect the expert review to reveal source code that late the precision and recall of the mappings. The data should be remapped, source code that needs refactoring will also include a record of failed mappings, with the (e.g., possible architectural violations), and points where name of the source code entity and the failure frequency. the automatic technique needs to be improved. Depending on the number of failures, a limit may be needed to not create an overwhelming burden for Phase 3. A suggestion is that a failure rate of more than 50% 7. Limitations suggests a mapping discrepancy. We will consider the expert’s initial set better than a The study will only focus on a single system which lim- random initial set if the F1 score is better than the median its the external validity. However, the long-term goal is F1 score of the random initial sets. to find more systems and experts and perform similar studies and build and refine the dataset over time. The protocol for the interaction with the human expert can 5.3. Phase 3: Validation of Mapping also introduce validity problems; for example, a mapping Discrepancies presented by an algorithm could be perceived as more We will investigate the generated mapping discrepancies objective and thus more acceptable than the expert’s by interviewing the human expert in a semi-formal way. informal knowledge. The expert may also be biased re- The interview will be held online and recorded. There garding certain parts of the source code that they have will be an agreement on the use and handling of the been more or less involved in. This will need to be noted recording. The interview will likely involve looking at in the interview protocol. And, the expert may be biased source code, so both the expert and researcher should if they have created the original mapping or not, e.g., it prepare a development environment. If the interview is probably easier to accept a mapping presented by an session extends over two hours or if the expert expresses algorithm if someone else did the original mapping. This fatigue, it should be split into several sessions. The inter- has to be noted in the interview protocol. view protocol will be conducted as follows. 1. Explain that the interview is recorded and that 8. Conclusions the expert agrees to this. We seek to improve our understanding of how a human 2. Explain the purpose of the study and the use of creates an initial set for automatic mapping and its effect the data. on how well an automated mapping technique performs. 3. Roughly explain the results from the experiment. Based on what we learn, we can start to explore methods 4. For each mapping discrepancy, let the expert in- that actively suggest initial set candidates. By improving spect the corresponding source code. the ground truth mappings, we can improve our tech- a) Ask if the expert thinks the entity would niques, tools, and methods for architecture conformance need refactoring or that it contains serious checking. problems. b) Remind the expert of the original mapping and ask if the expert still agrees to this Acknowledgments mapping. If not, ask the expert for what The research was supported by the Centre for Data Inten- mapping would be more appropriate and sive Sciences and Applications at Linnaeus University. why. 4 Tobias Olsson et al. CEUR Workshop Proceedings 1–5 References [1] G. C. Murphy, D. Notkin, K. Sullivan, Software reflexion models: Bridging the gap between source and high-level models, ACM SIGSOFT Software Engineering Notes 20 (1995) 18–28. [2] N. Ali, S. Baker, R. O’Crowley, S. Herold, J. Buck- ley, Architecture consistency: State of the practice, challenges and requirements, Empirical Software Engineering 23 (2017) 1–35. [3] R. A. Bittencourt, G. Jansen de Souza Santos, D. D. S. Guerrero, G. C. Murphy, Improving automated map- ping in reflexion models using information retrieval techniques, in: IEEE Working Conference on Re- verse Engineering, 2010, pp. 163–172. [4] A. Christl, R. Koschke, M.-A. Storey, Automated clustering to support the reflexion method, Infor- mation and Software Technology 49 (2007) 255–274. [5] T. Olsson, M. Ericsson, A. Wingkvist, Semi- automatic mapping of source code using naive bayes, in: Proceedings of the 13th European Con- ference on Software Architecture - Volume 2, ECSA ’19, 2019, p. 209–216. [6] A. Christl, R. Koschke, M.-A. Storey, Equipping the reflexion method with automated clustering, in: IEEE Working Conference on Reverse Engineering, 2005, pp. 98–108. [7] F. Chen, L. Zhang, X. Lian, An improved mapping method for automated consistency check between software architecture and source code, in: IEEE 20th International Conference on Software Quality, Reliability and Security (QRS), 2020, pp. 60–71. [8] T. Olsson, M. Ericsson, A. Wingkvist, Towards im- proved initial mapping in semi automatic clustering, in: Proceedings of the 12th European Conference on Software Architecture: Companion Proceedings, ECSA ’18, ACM, 2018, pp. 51:1–51:7. [9] V. Tzerpos, R. C. Holt, The orphan adoption prob- lem in architecture maintenance, in: Proceedings of the Fourth Working Conference on Reverse En- gineering, IEEE, 1997, pp. 76–82. [10] T. Olsson, D. Toll, A. Wingkvist, M. Ericsson, Evalu- ation of a static architectural conformance checking method in a line of computer games, in: Proceed- ings of the 10th international ACM Sigsoft confer- ence on Quality of software architectures, ACM, 2014, pp. 113–118. [11] T. Olsson, M. Ericsson, A. Wingkvist, An explo- ration and experiment tool suite for code to archi- tecture mapping techniques, in: Proceedings of the 13th European Conference on Software Architec- ture - Volume 2, ECSA ’19, 2019, p. 26–29. 5