Studying Expert Initial Set and Hard to Map Cases in
Automated Code-to-Architecture Mappings
Tobias Olsson, Morgan Ericsson and Anna Wingkvist
Department of Computer Science and Media Technology, Linnaeus University, Kalmar/Växjö, Sweden


                                          Abstract
                                          We study the mapping of software source code to architectural modules.
                                               Background: To evaluate techniques for performing automatic mapping of code-to-architecture, a ground truth map-
                                          ping, often provided by an expert, is needed. From this ground truth, techniques use an initial set of mapped source code as
                                          a starting point. The size and composition of this set affect the techniques’ performance, and to make comparisons, random
                                          sizes and compositions are used. However, while randomness will give a baseline for comparison, it is not likely that a human
                                          expert would compose an initial set on random to map source code. We are interested in letting an expert create an initial
                                          set based on their experience with the system and study how this affects how a technique performs. Also, previous research
                                          has shown that when comparing an automatic mapping with the ground truth mappings, human experts often accept the
                                          automated mappings and, if not, point to the need for refactoring the source code. We want to study this phenomenon
                                          further.
                                               Audience: Researchers and developers of tools in the area of architecture conformance. The system expert can gain
                                          valuable insights into where the source code needs to be refactored.
                                               Aim: We hypothesize that an initial set assigned by an expert performs better than a random initial set of similar size
                                          and that an expert will agree upon or find opportunities for refactoring in a majority of cases where the automatic mapping
                                          and expert mapping disagrees.
                                               Method: The initial set will be extracted from an interview with the expert. Then the performance (precision and recall
                                          f1 score) will be compared to mappings starting from random initial sets and using an automatic technique. We will also use
                                          our tool to find the cases where the automatic and human mapping disagrees and then let the expert review these cases.
                                               Results: We expect to find a difference when performance is compared. We expect the expert review to reveal source
                                          code that should be remapped, source code that needs refactoring (e.g., possible architectural violations), and points where
                                          the automatic technique needs to be improved.
                                               Limitations: The study will only focus on only a single system, which limits the external validity significantly. The
                                          protocol for the interaction with the human expert can also introduce validity problems; for example, a mapping presented
                                          by an algorithm could be perceived as more objective and thus more acceptable for a software engineer.
                                               Conclusions: We seek to improve our understanding of how a human creates an initial set for automatic mapping and
                                          its effect on how well an automated mapping technique performs. By improving the ground truth mappings, we can improve
                                          our techniques, tools, and methods for architecture conformance checking.

                                          Keywords
                                          Orphan Adoption, Software Architecture, Source Code Clustering, Naive Bayes


1. Introduction                                                                                                        described in the architecture, i.e., conformance checking.

Creating a mapping from the source code to an archi-
tectural model is perceived as labor-intensive. It hin-                                                                2. Background
ders widespread use of Static Architecture Conformance
Checking (SACC) practices such as Reflexion modeling in                                                                Current semi-automatic techniques build on the Human
industry [1, 2]. A mapping is an assignment of a source                                                                Guided clustering Method (HuGMe) and introduce dif-
code entity, e.g., a source code file or class, to an architec-                                                        ferent attraction functions that guide the automatic map-
tural module, e.g., a layer or a sub-system. The architec-                                                             ping [3, 4, 5, 6, 7]. HuGMe consists of a few essential
tural modules and the dependencies between them form                                                                   steps as described below:
an intended architecture, see Figure 1. The mappings                                                                       1. An initial set is created manually.
are used to determine if the dependencies in the source
                                                                                                                           2. The entities to be mapped are determined.
code conform to or violate the intended dependencies as
                                                                                                                           3. The attraction function calculates an attraction
ECSA2021 Companion Volume                                                                                                     for each entity and module.
email: tobias.olsson@lnu.se (T. Olsson); morgan.ericsson@lnu.se
                                                                                                                           4. If the attraction of a single module is deemed
(M. Ericsson); anna.wingkvist@lnu.se (A. Wingkvist)
orcid: 0000-0003-1154-5308 (T. Olsson); 0000-0003-1173-5187                                                                   valid, the entity is mapped to this module.
(M. Ericsson); 0000-0002-0835-823X (A. Wingkvist)                                                                          5. If no valid attraction is found, the decision is left
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                    Commons License Attribution 4.0 International (CC BY 4.0).                                to a human user.
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                                   1
Tobias Olsson et al. CEUR Workshop Proceedings                                                                                                   1–5


                                                                                                                    Entity to be Mapped

                                                                                                    StringChange


                                                                                                          ?        Attraction Function


Figure 1: The intended architecture of JabRef version 3.7.
The boxes represent modules, and the arrows represent al-
lowed directed dependencies between these modules. The
module Global has been omitted for clarity.
                                                                       ChangeScanner   AttachFileAction        DOICheck       XMLUtil     DataBank


                                                                                             Initial Set of Mapped Entities
    6. If new mappings are made, and there are entities
       remaining, continue at Step 2.                    Figure 2: An example of how entities are mapped to mod-
    7. If entities are remaining, let the human user de- ules. In this case, two classes are mapped to GUI, and three
                                                         classes are mapped to Logic. These form the initial set. A sixth
       cide a mapping.
                                                                     class, StringChange, is about to be mapped. The Attraction
   There are some things to note. First, the method is               Function uses the information in the initial set to calculate
iterative, as the set of mapped entities can grow, and               an attraction to each module.
more mappings can be done. An initial set is needed to
start the method, i.e., the attraction functions need some
initial mappings to work with, see Figure 2. The human               2.2. Ground Truth Mapping
user is involved in several steps of the method. Thus
                                                                     Related to the performance of a technique is also the
it is semi-automatic and human-guided. However, the
                                                                     quest for a perfect mapping compared to the ground truth
focus of most studies has been on the development and
                                                                     mappings. Previous research indicates that mappings’
comparison of attraction functions, i.e., the automatic
                                                                     differences often reveal points where the source code
step of the method.
                                                                     needs refactoring, or developers made a mistake in the
                                                                     mapping.
2.1. Initial Set                                                        Tzerpos and Holt [9] found that their technique sug-
                                                                     gested 46 entities to change mapping compared to the
The attraction functions used in HuGMe all need an ini-
                                                                     developers’ assignment in one of their case studies. The
tial set of pre-mapped entities to work. In general, the
                                                                     developers agreed on this new suggested mapping in 33
previous studies have focused on comparing the auto-
                                                                     cases, and the remaining 13 original mappings were con-
matic performance, e.g., precision and recall of differ-
                                                                     sidered valid but not optimal. In these 13 remaining cases,
ent functions. In these studies, the initial set has been
                                                                     the developers expressed that restructuring the entities
treated as a random variable considering size and com-
                                                                     was needed to motivate their inclusion in the original
position [3, 4, 5, 6]. This assumption is fair in a general
                                                                     modules. We have previously constructed a heuristic
performance comparison. However, it does not necessar-
                                                                     for automatic mapping of source code to Model-View-
ily reflect a realistic scenario.
                                                                     Controller-based architectures [10]. We evaluated the
   A system expert would not select entities to map at
                                                                     approach on four products in a product-line of games.
random. Realistically, a system expert user could make a
                                                                        We compared the automatic mapping to the manual
careful and well-thought-out initial set, select represen-
                                                                     mapping of 653 entities and found a difference in map-
tative parts of the source code to map, or map everything
                                                                     pings in 96 cases. Architectural problems caused 76 of
easy to map and leave complex cases to the machine.
                                                                     these. Source code refactorings were suggested and im-
   We have previously studied the effect of the initial
                                                                     plemented for two of the projects covering 23 architec-
set. We found large variations in attraction function
                                                                     tural problems. An interesting finding is that the most
performance depending on both size and composition of
                                                                     common refactoring was Move Type (12 instances), i.e.,
the initial set: A representative initial set can give good
                                                                     move the type to the correct module. This indicates that
results even if the set is small, and vice versa [8]. Our goal
                                                                     a perfect automatic mapping is an elusive or even unde-
was to help guide a system expert to produce a minimal
                                                                     sirable target, especially if a system has evolved for some
but high-performing initial set with the help of standard
                                                                     time and has accumulated some erosion or drift.
source code metrics, and we found limited success in
using inheritance-based metrics. However, the results
did not generalize well for different subject systems.


                                                                 2
Tobias Olsson et al. CEUR Workshop Proceedings                                                                             1–5


2.3. Attraction Function and Tool                                  defined architectural modules and allowed dependencies
                                                                   between these and a mapping from the source code to the
We have previously evaluated and implemented the at-
                                                                   architectural modules. If these do not exist, they need to
traction functions found in research by Bittencourt et al.
                                                                   be prepared before we can initiate the study. Optimally
[3], and Christl et al. [4] as well as implemented our own
                                                                   we will study these artifacts beforehand.
attraction function NBAttract based on machine learn-
                                                                      We perform the study in three phases. 1. we interview
ing [5]. To aid evaluations of attraction functions, we
                                                                   the expert to create an initial set and the rationale for the
have set up an open-source tool aimed at allowing ex-
                                                                   mapping. 2. we conduct experiments to find the perfor-
perimentation of different parameters and settings [11].
                                                                   mance of the initial set created by the expert compared
NBAttract has shown the most promise in our previous
                                                                   to a random initial set of similar size. During this phase,
evaluation [5] and will be the main function used in this
                                                                   we will generate a list of mapping discrepancies for the
study. Our tool allows us to vary the information the
                                                                   automatic mappings. And 3. we will interview the ex-
function uses, and we plan on evaluating different com-
                                                                   pert once more. This interview aims at investigating the
binations of the following:
                                                                   mapping discrepancies found in Phase 2.
     • File names and paths.
     • Architectural module names.                                 5.1. Phase 1: Initial Set Creation
     • Source code dependencies.                                   To create an initial set, we will interview the human
     • Names of identifiers in the source code, e.g.,              expert in a semi-formal way. The interview will be held
       method names, variable names, etc.                          online and recorded. There will be an agreement on the
                                                                   use and handling of the recording. This session will likely
3. Audience                                                        take less time than two hours. The interview protocol
                                                                   will follow this design.
The study will be valuable to researchers and developers               1. Introduce yourself and explain that the interview
of tools in the area of architecture conformance. The                     is recorded and that the expert agrees to this.
system expert can gain valuable insights into where the                2. Explain the purpose of the study and the use of
source code needs to be refactored.                                       the data.
                                                                       3. Ask the expert about their involvement in the sub-
4. Aim                                                                    ject system’s development, what role they have,
                                                                          and the general experience and the time frame of
We want to know more about how a system expert would                      involvement.
create an initial set of entities for semi-automatic map-              4. Ask the expert about the subject system, its ba-
ping and the rationale for the specific mappings. This                    sic purpose, the end-users, and the architecture:
would give insight into the composition and distribution                  what is the purpose of the architecture, the de-
of an initial set created by an expert. We also want to                   fined modules and dependencies, and did the ex-
know more about discrepancies in the automatic map-                       pert create the architecture and mapping? How
pings compared to system expert mappings. To enable                       were the architecture and mapping created?
this, we need to build a broad set of data over multiple               5. Explain the mapping scenario and give a rough
systems and experts. This study would act as a first initial              outline of how an automatic mapping would work.
study towards this goal.                                               6. Ask the system expert for where they would start
   We hypothesize that an initial set assigned by an ex-                  to map, any parts that jump to their mind or any
pert performs better than a random initial set of similar                 easy to map parts of the system, e.g., whole direc-
size when used by our current best automatic mapping                      tories/packages that can be mapped.
attraction function, NBAttract [5]                                     7. For each module in the architecture, ask the sys-
   We hypothesize that an expert will agree upon or find                  tem expert to provide the most typical and im-
opportunities for refactoring in a majority of cases where                portant source code files. Ask why each file is
the automatic mapping using NBAttract [5] and expert                      deemed typical or important. At least 10% of the
mapping disagrees.                                                        files should be provided.
                                                                       8. Ask the expert if there is anything they would
5. Method                                                                 like to add.
                                                                       9. Thank the expert and explain the remainder of
The study will involve a human that is a system expert                    the study. Book a new interview for Phase 3 to
for a subject system. We assume a subject system im-                      take place a few days later.
plemented in Java with a documented architecture with


                                                               3
Tobias Olsson et al. CEUR Workshop Proceedings                                                                             1–5


5.2. Phase 2: Experiments                                                      c) Show the automatic mapping results (sev-
                                                                                  eral modules may be suggested). Ask if the
We perform experiments where the expert’s initial set is
                                                                                  expert would consider any of these map-
used with different combinations of information for the
                                                                                  pings valid and why/why not.
NBAttract attraction function. If we get differences in
                                                                        5. Ask the expert if there is anything they would
the initial sets (e.g., typical mappings vs. easy mappings)
                                                                           like to add.
based on the first interview, we can compare these ini-
tial sets to each other. For further comparison, we will                6. Thank the expert.
use a random initial set combination. Note that for ran-
dom initial sets, several hundred experiments are needed,           6. Expected Results
depending on the number of architectural modules, the
number of source code entities, and the size of the ex-             We expect to find a difference when we compare per-
pert’s initial set. This phase will likely take three days to       formance. We expect that the human expert’s initial set
complete.                                                           performs better than a random initial set of similar size.
   Experiments will generate the mapping data to calcu-             We expect the expert review to reveal source code that
late the precision and recall of the mappings. The data             should be remapped, source code that needs refactoring
will also include a record of failed mappings, with the             (e.g., possible architectural violations), and points where
name of the source code entity and the failure frequency.           the automatic technique needs to be improved.
Depending on the number of failures, a limit may be
needed to not create an overwhelming burden for Phase
3. A suggestion is that a failure rate of more than 50%             7. Limitations
suggests a mapping discrepancy.
   We will consider the expert’s initial set better than a    The study will only focus on a single system which lim-
random initial set if the F1 score is better than the median  its the external validity. However, the long-term goal is
F1 score of the random initial sets.                          to find more systems and experts and perform similar
                                                              studies and build and refine the dataset over time. The
                                                              protocol for the interaction with the human expert can
5.3. Phase 3: Validation of Mapping                           also introduce validity problems; for example, a mapping
      Discrepancies                                           presented by an algorithm could be perceived as more
We will investigate the generated mapping discrepancies objective and thus more acceptable than the expert’s
by interviewing the human expert in a semi-formal way. informal knowledge. The expert may also be biased re-
The interview will be held online and recorded. There garding certain parts of the source code that they have
will be an agreement on the use and handling of the been more or less involved in. This will need to be noted
recording. The interview will likely involve looking at in the interview protocol. And, the expert may be biased
source code, so both the expert and researcher should if they have created the original mapping or not, e.g., it
prepare a development environment. If the interview is probably easier to accept a mapping presented by an
session extends over two hours or if the expert expresses algorithm if someone else did the original mapping. This
fatigue, it should be split into several sessions. The inter- has to be noted in the interview protocol.
view protocol will be conducted as follows.

    1. Explain that the interview is recorded and that              8. Conclusions
       the expert agrees to this.                                   We seek to improve our understanding of how a human
    2. Explain the purpose of the study and the use of              creates an initial set for automatic mapping and its effect
       the data.                                                    on how well an automated mapping technique performs.
    3. Roughly explain the results from the experiment.             Based on what we learn, we can start to explore methods
    4. For each mapping discrepancy, let the expert in-             that actively suggest initial set candidates. By improving
       spect the corresponding source code.                         the ground truth mappings, we can improve our tech-
          a) Ask if the expert thinks the entity would              niques, tools, and methods for architecture conformance
              need refactoring or that it contains serious          checking.
              problems.
          b) Remind the expert of the original mapping
              and ask if the expert still agrees to this            Acknowledgments
              mapping. If not, ask the expert for what
                                                                    The research was supported by the Centre for Data Inten-
              mapping would be more appropriate and
                                                                    sive Sciences and Applications at Linnaeus University.
              why.


                                                                4
Tobias Olsson et al. CEUR Workshop Proceedings                    1–5


References
 [1] G. C. Murphy, D. Notkin, K. Sullivan, Software
     reflexion models: Bridging the gap between source
     and high-level models, ACM SIGSOFT Software
     Engineering Notes 20 (1995) 18–28.
 [2] N. Ali, S. Baker, R. O’Crowley, S. Herold, J. Buck-
     ley, Architecture consistency: State of the practice,
     challenges and requirements, Empirical Software
     Engineering 23 (2017) 1–35.
 [3] R. A. Bittencourt, G. Jansen de Souza Santos, D. D. S.
     Guerrero, G. C. Murphy, Improving automated map-
     ping in reflexion models using information retrieval
     techniques, in: IEEE Working Conference on Re-
     verse Engineering, 2010, pp. 163–172.
 [4] A. Christl, R. Koschke, M.-A. Storey, Automated
     clustering to support the reflexion method, Infor-
     mation and Software Technology 49 (2007) 255–274.
 [5] T. Olsson, M. Ericsson, A. Wingkvist, Semi-
     automatic mapping of source code using naive
     bayes, in: Proceedings of the 13th European Con-
     ference on Software Architecture - Volume 2, ECSA
     ’19, 2019, p. 209–216.
 [6] A. Christl, R. Koschke, M.-A. Storey, Equipping the
     reflexion method with automated clustering, in:
     IEEE Working Conference on Reverse Engineering,
     2005, pp. 98–108.
 [7] F. Chen, L. Zhang, X. Lian, An improved mapping
     method for automated consistency check between
     software architecture and source code, in: IEEE
     20th International Conference on Software Quality,
     Reliability and Security (QRS), 2020, pp. 60–71.
 [8] T. Olsson, M. Ericsson, A. Wingkvist, Towards im-
     proved initial mapping in semi automatic clustering,
     in: Proceedings of the 12th European Conference
     on Software Architecture: Companion Proceedings,
     ECSA ’18, ACM, 2018, pp. 51:1–51:7.
 [9] V. Tzerpos, R. C. Holt, The orphan adoption prob-
     lem in architecture maintenance, in: Proceedings
     of the Fourth Working Conference on Reverse En-
     gineering, IEEE, 1997, pp. 76–82.
[10] T. Olsson, D. Toll, A. Wingkvist, M. Ericsson, Evalu-
     ation of a static architectural conformance checking
     method in a line of computer games, in: Proceed-
     ings of the 10th international ACM Sigsoft confer-
     ence on Quality of software architectures, ACM,
     2014, pp. 113–118.
[11] T. Olsson, M. Ericsson, A. Wingkvist, An explo-
     ration and experiment tool suite for code to archi-
     tecture mapping techniques, in: Proceedings of the
     13th European Conference on Software Architec-
     ture - Volume 2, ECSA ’19, 2019, p. 26–29.


                                                              5