Towards Hierarchical Code-to-Architecture Mapping Using
Information Retrieval
Zipani Tom Sinkala 1, Sebastian Herold 1
1
    Department of Mathematics and Computer Science, Karlstad University, Karlstad, Sweden


                 Abstract
                 Automating the mapping of a system’s code to its architecture helps improve the adoption of
                 successful Software Architecture Consistency Checking (SACC) methods like Reflexion
                 Modelling. InMap is an interactive code-to-architecture mapping recommendation technique
                 that has been shown to do this task with good recall and precision using natural language
                 software architecture descriptions of the architectural modules. However, InMap like most
                 other automated recommendations techniques maps low level source code units like source
                 code files or classes to architectural modules. For large complex systems this can still be a
                 barrier to adoption due to the effort required by a software architect when accepting or rejecting
                 the recommendations. In this study we propose an extension to InMap that provides
                 recommendations for higher-level source code units, that is, packages. It utilizes InMap’s
                 information retrieval capabilities, using minimal architecture documentation, applied to a
                 software’s codebase, to recommend mappings between the software’s high-level source code
                 entities and its architectural modules. We show that using our proposed hierarchical mapping
                 technique we are able to reduce the effort required by the architect, as high as 6-fold in some
                 cases, and still achieve good precision and fairly good recall.

                 Keywords 1
                 Automated Mapping, Software Architecture Consistency Checking, Information Retrieval.


1. Introduction                                                                             [11, 15, 16]. This implies, in the case of systems
                                                                                            developed using an object oriented programming
                                                                                            language, where classes are considered as the
    Mapping code to architecture is a task that is
                                                                                            underlying unit of source code, they automate
common in Software Architecture Consistency
                                                                                            mapping at a class level – attempting to predict
Checking (SACC) [1, 11, 14, 16]. Popular SACC
                                                                                            which architectural module, a class (or class-file)
methods like Reflexion Modelling [9, 12] require
                                                                                            maps to. This has been done quite well with
a mapping step in order to be able to identify
                                                                                            techniques like InMap [15, 16] and NBC [11]. In
conformance or divergence of a system’s code to
                                                                                            our paper “InMap: Automated Interactive Code-
its intended software architectural modules [8, 9,
                                                                                            to-Architecture Mapping Recommendations” we
12, 13]. The mapping step is a manual and labour-
                                                                                            show that InMap achieved a recall of 0.87-1.00
intensive task for the most part that becomes a
                                                                                            and precision of 0.70-0.96 for the systems tested.
barrier to industry adoption of effective SACC
                                                                                               However, in a large system of say a 1000+
techniques like Reflexion Modelling especially
                                                                                            classes, in spite of achieving recall and precision
for large complex software systems [1, 7].
                                                                                            of 1, it is still burdensome for an architect to
    There have been a number of techniques that
                                                                                            inspect over a thousand recommendations before
have been created that attempt to decrease the
                                                                                            accepting them as correct. In an attempt to reduce
burden of mapping on software architects by
                                                                                            the effort needed, we investigate making mapping
automating the mapping step [4–6, 11, 15, 16].
                                                                                            recommendations for higher-level source code
Most of these however, are class- or file-based
                                                                                            units – that is, we make mapping recommend-

ECSA2021 Companion Volume
EMAIL: tom.sinkala@kau.se (Z.T. Sinkala);
sebastian.herold@kau.se (S. Herold)
             Copyright © 2021 for this paper by its authors. Use permitted under Creative
             Commons License Attribution 4.0 International (CC BY 4.0).
             CEUR Workshop Proceedings (CEUR-WS.org)
dations for larger units of code at a time (packages   incremental, in that HuGME does not attempt to
rather than classes) thereby reducing the amount       map all source code entities in one complete step;
of work required by an architect. In this paper, we    rather it maps a subset at a time until no more
present an automated hierarchical package              mapping is possible. The approach is non-
mapping technique. It garners from the successful      hierarchical as it views the mapping task from a
information retrieval-based InMap approach [15,        clustering perspective in which source code
16] that computes similarity of an unmapped class      entities that are mapped to the same hypothesized
to an architectural module. We exploit class-to-       entity form a cluster [4].
module similarity scores produced by InMap to              In their study, the results for HuGME had on
generate package-to-module similarity scores.          average about 90% recall and 80-90% accuracy
These are filtered using a defined set of hueristics   [5]. To get these results the technique needed
from which recommendations, that are detemined         about 20% of the system’s source entities to be
by a system’s package hierachy, are made. We           pre-mapped before running the algorithm. Of
show that using our proposed hierarchical              interest is that because this mapping technique is
mapping technique we are able to reduce the effort     dependency-based, for it to give meaningful
required by the architect, as high as 6-fold in some   results, the 20% pre-mapped source entities need
cases, and still achieve good precision.               to be spread across various modules. In addition,
    Section 2 briefly discusses automated mapping      they must have dependencies to unmapped
techniques along with their hierarchical mapping       entities. This presents a problem in that in order to
capabilities. In Section 3, we detail the approach,    benefit from this technique one needs to not only
describing how package scores are computed and         dedicate some time for pre-mapping but must also
how package-to-module mapping recommend-               ensure that the mapping is evenly spread across
dations are constructed. Section 4 describes the       the modules. Additionally, one must also ensure
experiment setup to evaluate the technique and         that the selected pre-mapped source code entities
presents the results obtained. In Section 5, we        have dependencies to the unmapped entities
interpret and discuss the results and in Section 6     otherwise entity relationship discovery is poor.
we draw our conclusions on our findings and            This all becomes a highly labour-intensive
present opportunities for further research.            exercise. Furthermore, because it uses clustering
                                                       algorithms based on high cohesion and low
2. Related Work                                        coupling, if developers do not follow this
                                                       principle in the software’s implementation then
                                                       the mapping of the algorithm will be affected [2].
    Christl et al. conceived, HuGME, a                     Bittencourt et al. propose an information
dependency analysis (DA) based automated               retrieval (IR) based technique that uses the same
mapping recommendation technique. It clusters a        automated mapping recommendations approach
software system’s source code using an                 as HuGME except it replaces dependency-based
architect’s knowledge about its intended               attraction functions with IR based similarity
architecture [4, 5]. HuGME applies an attraction       functions [3]. It calculates the similarity of an
function, which minimizes coupling and                 unmapped source entity to a module by searching
maximizes cohesion, to produce a matrix of
                                                       for specific terms (a module’s name and mapped
attraction scores for unmapped entities to modules
                                                       classes, methods and fields) within the source
[17]. The calculation of the score uses the
                                                       code of the unmapped class. Similar to HuGME,
dependency values between unmapped entities
                                                       Bittencourt et al.’s technique needs some manual
and mapped entities. The higher the score, the
                                                       pre-mapping before it can automate mapping.
higher the likelihood that an unmapped entity
                                                           Olsson et al. combine IR & DA methods in
belongs to a given module. All unmapped entities
                                                       their automated mapping technique called Naive
that result in only one candidate having a
                                                       Bayes Classification (NBC) [11]. NBC uses
similarity score higher than the arithmetic mean
                                                       Bayes’ theorem to build a probabilistic model of
of all scores produce a single recommendation.
                                                       classifications using words taken from the source
All unmapped entities for which two or more
                                                       code entities. The model gives the probability of
candidates exist are presented to the user in ranked   words belonging to a source file entity. This is
order, from highest to lowest, as recommend-
                                                       augmented with syntactical information of the
dations. HuGME presents recommendations to
                                                       dependencies, a method called Concrete
the user to allow cluster decisions to be made
                                                       Dependency Abstraction [11]. Just like HuGME,
exclusively by the architect. This process is
                                                       Olsson et al.’s proposed technique requires a pre-
mapped set in order to perform well. Both                results in considerable work for an architect in the
Bittencourt et al.’s and Olsson et al.’s results         case of large software systems. We therefore
showed that when there was a smaller pre-mapped          explore the following research question:
set there was a decreasing trend in the f1-score of
their techniques [3, 11]. Additionally, they both           How can we exploit InMap’s good class-
do not address package-level based mapping.                 to-module mappings to produce package-
    Naim et al. present a technique called                  to-module mappings, thereby reducing the
Coordinated Clustering of Heterogeneous                     effort needed by an architect in accepting
Datasets (CCHD), that combines both DA and IR               and/or rejecting mapping recommend-
methods to compute a similarity score for source            dations produced by InMap?
code files [10]. CCHD uses an architect’s
feedback on the recovered architecture to                   In the following section, we describe our
iteratively adjust the results until there are no        approach to answering this question.
suggestions for change. These adjusted results
train a classifier that automatically places new         3. Approach
code added to a codebase in the “right”
architectural module. However, the technique is
not necessarily meant for automated mapping in              We begin by describing the InMap technique
SACC but rather for software architecture                briefly. We then describe a technique for
recovery tasks. Moreover, it too does not directly       hierarchical package-to-module mapping that
address package-level based mapping.                     builds on top of InMap.
    Common among industry tools is the use of
naming patterns (or regular expressions). For            3.1.    InMap
example, the expressions **/gui/** or *.gui.* or
net.java.gui.* can be used to map source code                InMap is an interactive code-to-architecture
units (whether classes or packages) to an                automated mapping technique for SACC methods
architecture module named GUI. This is the               that uses information retrieval concepts to
technique used by both Sonargraph Architect              produce class-to-module mapping recommend-
and Structure101 Studio in addition to their drag        dations. It does not require manual pre-mapping
& drop capabilities. However, the drawback of            in order to produce recommendations, rather it
using naming patterns and/or drag & drop                 uses natural language architectural descriptions of
functionality is that they are both manual tasks         the architectural modules as input to predict
which makes mapping a tedious exercise –                 mappings. It presents its best mapping
especially for large software systems that have          recommendations a page/set at a time (the most
complex mapping configurations.                          optimal being 30 per page) from which the
    In summary, despite advances made, available         architect can accept and reject. As
techniques that are designed to automate mapping         recommendations from each page/set are accepted
have short comings. Some require an initial set of       or rejected, InMap learns from this and adapts its
the source code to be pre-mapped manually [3–5,          next page/set of recommendations from the
11], while the industry tools that do not require        obtained knowledge. This method works quite
pre-mapping offer manual methods. Additionally,          well giving an average recall of 97% and a
the automated mapping techniques that require            precision of 82% for the systems evaluated [16].
pre-mapping in order to “jump-start” mapping, as
it were, require about 15-20% of the source code
to be pre-mapped in order to give worthwhile
                                                         3.1.1. Class-to-Module Similarity
results [4, 5, 6, 15].
    InMap [15, 16] addresses the limitations of              InMap’s algorithm is made up of seven steps
these techniques in that it is able to automate          [16]. However, for our hierarchical package-to-
mapping without requiring pre-mapping. Using             module mapping technique the following steps in
simple and concise natural language descriptions         InMap are used to generate what are called class-
of the architecture modules it is able to automate       to-module mapping scores.
mapping of a completely unmapped system with                 Firstly, the source code files are filtered to
rather good results. Its limitation though is that the   exclude any external or third-party package
mapping recommendations provided are for low-            libraries or classes of system that the architect
level source code units, namely, classes. This           does not want to include in the mapping exercise.
Secondly, the filtered sourced files are stripped of      However, if we could map entire packages then
any special characters and programming language           we could reduce the effort needed. For example, a
keywords. Third, the pre-processed source code            package that has 50 classes that all map to the
files are indexed as an inverted index. In the fourth     same module could be (or should be) given as a
and fifth steps, InMap formulates a query using           single package-to-module mapping recommend-
four items namely, (1) the names of the modules           dation. Additionally, because packages are
and (2) the module’s architectural descriptions           hierarchal in nature, they present even more
(stripped of any special characters and stop              opportunity to reduce the number of “necessary”
words) to search the indexed source code files for        mapping recommendations to present to an
similarity to each module. In the first iteration,        architect. For example, say we have two packages
InMap uses this information only to build a query.        A and B that are both sub-packages of C. If A and
However, once the first set of classes are mapped,        B have 50 classes each and say all the classes in A
InMap then adds to the query (3) the names of             and B map to the same module. Then mapping C
classes mapped to a module and (4) the names of           to the module would suffice and saves the
methods contained within classes mapped to a              architect from reviewing 99 other mapping
module. This ‘enriches’ the query used to search          recommendations. Figure 1 illustrates a package
for the similarity of an unmapped class to a              hierarchy, that our technique (and certainly
module. Therefore, after each set of newly                others) can benefit from to reduce the number of
mapped classes the query for the next set of              recommendations needed.
recommendations looks different. The search
returns a set of scores for every class-module pair       3.2.1. Package-to-Module Similarity
based on the similarity information retrieval
function, tf-idf. The tf-idf scores are called class-
                                                          Our package-to-module mapping technique picks
to-module similarity scores (𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 ), where, c and
                                                          up from step 5 of the InMap algorithm after it
m are a class-module pair in the system. Specifics
                                                          produces similarity scores for all class-to-module
of how tf-idf is calculated can be found in [16].
                                                          pairs. We group the class-to-module similarity
                                                          scores 𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 , according to the packages they
3.1.2. Class-to-Module                   Mapping          belong to. This means for each package we have
Recommendations                                           a set of classes with scores to each identified
                                                          module. From this set of class-to-module
   In the sixth and seventh steps, InMap gives as         similarity scores that have a given package as
a class-to-module mapping recommendation the              their parent we then calculate the interquartile
highest scoring class-to-module pair. The                 mean (𝐼𝐼𝐼𝐼𝐼𝐼𝑝𝑝𝑝𝑝 ), where, p and m are a package-
architect can either accept or reject it. However,        module pair in the system. That is, the range of
InMap presents as recommendations either: only            values between the first quartile and third quartile
those above the arithmetic mean of all highest            (the interquartile range, IQR) are used to
scoring class-module pairs; or the best 30                calculate the arithmetic mean. Module IQRs for a
recommendations (if those above the mean is               package taken from Jittac are demonstrated in
greater than 30). After the architect gives               Figure 2. The lowest 25% and the highest 25% of
feedback, it returns to step 4 and repeats steps 4 to     the scores are ignored. Important to note is that the
7 until no more recommendations can be given.             IQR and hence the IQM of a non-terminal package
   Our proposed hierarchical package mapping              is calculated from not only the classes that belong
technique picks up right after the fifth step, that is,   to the package but also the classes of its child
once InMap produces the matrix of class-to-               packages. For example, se.kau.cs.jittac.eclipse.b-
module similarity scores (𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 ).                     uilders.jdt shown in the package tree in Figure 1
                                                          has its IQR calculated using the 8 classes that
                                                          belong to it but also the 3 classes in s-
3.2.     Hierarchical Package Mapping                     e.kau.cs.jittac.eclipse.builders.jdt.commands and
                                                          the single class in se.kau.cs.jittac.eclipse.buil-
   In as much as InMap is able to achieve good            ders.jdt.util. Formally, we define 𝐼𝐼𝐼𝐼𝐼𝐼𝑝𝑝𝑝𝑝 as,
results with the approach described in Section 3.1
because it based on class module mappings, the
effort required by architects could still be
significant for large and complex systems.
  Figure 2: Package hierarchy for Jittac - one of the systems we evaluate our technique on.


Figure 1: Box plots for a package taken from Jittac showing the IQRs for Jittac’s modules as well as
the class distribution inside and outside the IQRs. The x-axis shows the class 𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 scores and the y-
axis shows the architectural modules of the system. The number in brackets beside a module indicates
the total number of classes for the given package that have an 𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 score to the module.

                                                      number of classes that make up the package p and
                              3
                              4
                                𝑛𝑛                    i is the position of 𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 in the ordered set of class-
                       2                              to-module similarity scores for the package p.
        𝐼𝐼𝐼𝐼𝐼𝐼𝑝𝑝𝑝𝑝 =          � 𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 𝑖𝑖   (1)          Using the scores within the IQR as opposed to
                       𝑛𝑛       𝑛𝑛
                            𝑖𝑖 = + 1
                                4
                                                      the full set of scores makes a package-to-module
where, p and m are a package-module pair in the       similarity, more resilient to the presence of outlier
system, c has p as its parent package, n is the       classes in the class-to-module similarity scores
                                                      that it is derived from. Figure 2 shows outlier
Table 1                                                                 tree bottom-up starting with the terminal packages
                                                                        and working our way up to the root package. At
Extract of 𝑺𝑺𝑺𝑺𝒑𝒑𝒑𝒑 scores taken from Jittac. A value                   each tree-depth level we retain the package-to-
>= 0.6 (highlighted blue) implies it is a good                          module similarity scores into two sets for each
package-to-module similarity score; a score >=                          package, namely a set of outstanding package-to-
1.5 (highlighted red) implies it is an outstanding                      module similarity scores and a set of good
package-to-module similarity score.                                     package-to-module similarity scores. Outstanding
                                     Modules
                                                                        mappings are those in which a package has a score
             Packages
                                                                        above the outstanding threshold and its child
                                     architecture-   eclipse-   impl-
                                        model           ui      model   packages have a score above the good threshold.
       se.kau.cs.jittac.model            2.3          -0.6      1.0
                                                                        Good mappings are those in which a package and
     se.kau.cs.jittac.model.am           2.6          -0.4      0.4     its children have a score above the good threshold.
  se.kau.cs.jittac.model.am.events       2.4          -0.4      0.3     We formally define this notion with the following
    se.kau.cs.jittac.model.am.io         2.3          -0.5      0.6     two rules,
     se.kau.cs.jittac.model.im           1.0          -0.5      2.3
  se.kau.cs.jittac.model.im.events       0.9          -0.6      1.6        Given:
    se.kau.cs.jittac.model.im.io         0.5           -        1.6           Package p
                                                                              Module m
classes which we define as classes with                                       Package-to-module score 𝐒𝐒𝐒𝐒𝐩𝐩𝐩𝐩
𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 scores that are higher than the box plot max,                        Good score threshold GSt
                                                                              Outstanding score threshold OSt
classes with 𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 scores that are lower than the
box plot min but also classes with 𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 scores that                    Rule 1: A mapping (pim) is called good iff
are within the box plot min-max but outside the
IQR. The result of this step is a matrix of IQMs for                             𝐒𝐒𝐒𝐒𝐩𝐩𝐩𝐩 >= GSt
each package-module combination.
     We then apply feature scaling to normalize the                        and for all sub-packages pi of p, pim is a good
IQM module scores for each package. We use                                 mapping.
standardization (also known as z-score
normalization) which makes the scores for each                             Rule 2: A mapping (pim) is called outstanding iff
package-module pair have a zero-mean. In our
hierarchical package mapping technique we call                                   𝐒𝐒𝐒𝐒𝐩𝐩𝐩𝐩 >= OSt
the resulting z-scores of the standardization
normalization package-to-module similarity                                 and for all sub-packages pi of p, pim is a good
scores (𝑆𝑆𝑆𝑆𝑝𝑝𝑝𝑝 ). Formally we define 𝑆𝑆𝑆𝑆𝑝𝑝𝑝𝑝 as                              mapping.
follows,
                                                                            Figure 3 illustrates the rules with an example
                                                                        using 𝑆𝑆𝑆𝑆𝑝𝑝𝑝𝑝 scores shown in Table 1. You will
                𝐼𝐼𝐼𝐼𝐼𝐼𝑝𝑝𝑝𝑝 − 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎�𝐼𝐼𝐼𝐼𝐼𝐼𝑝𝑝𝑝𝑝 �
 𝑆𝑆𝑆𝑆𝑝𝑝𝑝𝑝 =                                                     (2)     notice        that    despite      the       package
                                 𝜎𝜎                                     se.kau.cs.jittac.model     having      good       and
                                                                        outstanding scores for the modules impl-model
where, p and m are a package-module pair in the
                                                                        and architecture-model respectively in Table 1,
system, 𝐼𝐼𝐼𝐼𝐼𝐼𝑝𝑝𝑝𝑝 is the original package-to-module
                                                                        Figure 3 indicates that the package has no good
similarity score, 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎�𝐼𝐼𝐼𝐼𝐼𝐼𝑝𝑝𝑝𝑝 � is the mean               or outstanding mappings. This is because it fails
of the 𝐼𝐼𝐼𝐼𝐼𝐼𝑝𝑝𝑝𝑝 scores for a specific package to the                  to satisfy the second part of Rule 2, that is, that all
range of given modules, and 𝜎𝜎 is the standard                          its sub-packages must have good mappings to the
deviation of 𝐼𝐼𝐼𝐼𝐼𝐼𝑝𝑝𝑝𝑝 . Using this method on all                      same module. However, one of se.kau.cs.jittac.
package module pairs we obtain a matrix of                              model’s sub-packages has a good mapping to the
package-to-module similarity scores for the entire                      same module but the other does not hence no good
system. Table 1 shows an extract of these scores.                       or outstanding mappings for the se.kau.cs.jittac.
                                                                        model package.
3.2.2. Package Mapping Filtering                                            These rules are applied from the bottom of the
                                                                        package tree starting with the deepest terminal
                                                                        packages then their parent packages, then their
   Using the matrix of package-to-module                                grandparent packages and so on and so forth until
similarity scores we then traverse the package-                         we reach the root package at the top of the tree.
Figure 3: Package tree traversal in order to produce package-to-module mappings recommendations.

This is necessary as packages higher up in the        systems that were used in the evaluation of
package tree depend on the results of packages        InMap’s class-to-module mapping technique.
lower in the package tree.                            These are Ant, a command line and API-based
                                                      tool for process automation; ArgoUML, a
3.2.3. Package-to-Module Mapping                      desktop-based application for UML modelling;
                                                      JabRef a desktop-based bibliographic reference
Recommendation Selection                              manager; Jittac an eclipse plugin for reflexion
                                                      modelling tasks; ProM a desktop-based processes
    Once both sets of good and outstanding            mining tool; and TeamMates a web-based
mappings for each package are obtained, we then       application for handling peer reviews and
traverse the package tree top-down. At each tree-     feedback. Table 2 shows the attributes of these
level we check if a package has outstanding           systems. The natural language architectural
mappings and pick the highest that fulfils the        module descriptions used as input to InMap to
above defined criteria for outstanding and            generate class-module similarity scores were
recommend it as the most likely mapping. If a         obtained from the previous study of InMap. The
package is recommended then we terminate              prior study of InMap obtained the oracle
following that tree path downwards and do not         mappings, that is the correct list of code-to-
recommend any of its sub-packages, we instead         module mappings, from experts involved in
proceed to check its siblings. If a package returns   developing each respective open-source project.
an empty set, then we go one-step lower in the        The oracle package-to-module mappings used in
package tree. Figure 3 illustrates this; it shows     this study were extracted from these. We retained
two package-to-module mapping recommend-              in the oracle only packages that had direct 1-1
dations (in bold). Observe that architecture-         mappings with a module, and excluded packages
model is recommended as the module to which           that had child entities that map to more than one
se.kau.cs.jittac.model.am should map to and impl-     module.
model as the module to which se.kau.cs.jittac.            From the oracle mappings we only extracted
model.im should map to. Their sub-packages are        package-to-module mappings, leaving out the
skipped since they are already considered as a        class-to-module mappings to allow us to evaluate
result of Rule 2 and se.kau.cs.jittac.model has no    the performance of proposed technique strictly at
mapping recommendation since it retained no           a package-level. Table 2 also shows the number
mappings after the package mapping score              of packages in the oracle mapping of a system.
filtering step.                                       This is the number of actual packages our
                                                      proposed technique should predict mappings for,
4. Evaluation                                         in other words, the packages that are of concern.
                                                      For example, if se.kau.cs.jittac.eclipse is part of
                                                      the oracle mapping and our technique puts up
   Test Cases: We evaluated our hierarchical
                                                      se.kau.cs.jittac.eclipse.builders as a possible map-
package mapping approach on six Java-based
Table 2                                                                                     “human architect” accepting and rejecting the
                                                                                            recommendations produced.
System Case Studies                                                                             For all possible single decimal combinations
          System
                                Ant
                                              Argo
                                                       JabRef     Jittac   ProM
                                                                                  Team      within the range -5.0 to 5.0 for the good and
         Attributes                           UML                                 Mates
                                                                                            outstanding threshold we collected the recall of
         Version #          r584500          r13713        3.7   0.1 (…)    6.9   5.11
                                                                                            the package mappings as well the technique’s
    # of source files           778          1,429       843      124       700    467
                                                                                            precision. The min-max of the test range was
 # of source files after
 filtering (# of classes)
                                724           763        840      110       699    293      based on the highest and lowest 𝑆𝑆𝑆𝑆𝑝𝑝𝑝𝑝 scores
     # of packages                64          60         118       27       162    18       obtained by all 6 systems. We also collected the
   # of packages in
                                  14          21           11       9       30     11       number of recommendations it took to achieve the
   oracle mapping
                                                                                            given recall & precision. Finally, we also
  # of source files in
   oracle package               558           692        812       98       675    293      collected the class coverage (or code reach), that
       mapping                                                                              is, the number of classes that were mapped as a
     # of modules                 15          17             6      9       11     11       result of their parent packages being mapped by
                                                                                            our hierarchical mapping technique.
Table 3                                                                                         Results: Table 3 shows the results obtained for
                                                                                            the optimal thresholds for each system, i.e. they
Results showing the optimal thresholds for each
                                                                                            gave the best results for the range of values tested.
system tested.                                                                              We got for three systems, Ant, JabRef and
 Test   Good Oustand.    # of  Package Package                                 Class        TeamMates, perfect precision with TeamMates
System Thresh. Thresh. Recomm. Recall Precision                              Coverage
                                                                                            getting the same for its recall and class coverage.
 Ant         1.9          2.4            9            0.64       1.00      276/558 (50%)
                                                                                            We found 6 out of Jabref’s 11 package-to-module
ArgoU      0.1-1.6      3.2-3.3          13           0.43       0.69       93/692 (13%)
                                                                                            mappings (as package recall) and 9 of Ant’s 14,
JabRef     1.2-1.4      1.6-2.0          6            0.55       1.00      794/812 (98%)
Jittac     1.0-1.4        1.7            7            0.67       0.86       88/98 (90%)
                                                                                            which resulted in class coverage of 98% and 50%
ProM       0.4-0.6        1.6            37           0.40       0.32       58/675 (9%)
                                                                                            respectively. For Jittac, 90% of its classes were
TeamM 1.3-1.4           0.1-1.2          10           1.00       1.00      293/293 (100%)
                                                                                            mapped by finding 6 of it’s 9 package-to-module
                                                                                            mappings with a precision of 0.86. ArgoUML had
Table 4                                                                                     fairly good precision but low recall resulting in
                                                                                            low class coverage as well. ProM appeared to be
Effort comparison of class vs package mappings                                              an outlier obtaining poor precision and the lowest
for systems with class coverage >= 50%.                                                     recall from the six systems tested. All results
                                                                                            presented are for a single iteration (or pass) of the
                     Class Mapping                  Package Mapping
  Test
                                                                            Effort saved    technique.
                Class                 Class                                    ( Effort
 System     Coverage after
                             # of
                                    Coverage
                                                  # of
                                                                             reduced )
                                                                                                In Table 4 we compare the effort required by
                           Recomm.              Recomm.
                  …                after 1 pass                                             an architect of our hierarchical mapping technique
  Ant       13 passes, 50%             390           50%           9       381 ( –97.7% )   vs InMap in its original form. We do this by
 JabRef     32 passes, 98%             853           98%           6       847 ( –99.3% )   looking at the class coverage of each technique
 Jittac      7 passes, 90%             123           90%           7       116 ( –94.3% )   and the number of recommendations an architect
TeamM 14 passes, 97%                   275           100%         10       265 ( –96.4% )   has to sift through to achieve the given class
                                                                                            coverage. Table 4 shows this for the systems that
ping we count this as a false positive even though                                          achieved more than 50% class coverage after a
the latter is a child package of the former. The                                            single iteration. In simple terms we define the
reason is the technique must reduce the effort                                              effort saved (𝐸𝐸𝐸𝐸) and the effort reduced (𝐸𝐸𝐸𝐸) as
needed by an architect and therefore must be                                                follows,
penalized for recommending child packages of a                                                            𝐸𝐸𝐸𝐸 = | 𝑅𝑅 𝑝𝑝 − 𝑅𝑅 𝑐𝑐 |          (3)
package that is already mapped (or should be).
   Experimentation & Data Collection: To                                                                                   𝐸𝐸𝐸𝐸
                                                                                                       𝐸𝐸𝐸𝐸 = −100 ×                        (4)
experiment on the test cases with various good                                                                             𝑅𝑅 𝑐𝑐
and outstanding threshold combinations we
extended the evaluator tool we developed in our                                             where 𝑅𝑅 𝑐𝑐 is the number of class-to-module
previous studies of InMap to accommodate the                                                recommendations needed by the InMap class-
evaluation of package-based mappings. Using the                                             based technique and 𝑅𝑅 𝑝𝑝 is the number of package-
oracle architecture package-to-module mappings                                              to-module recommendations needed by our
of each system the tool automatically simulates a                                           hierarchical package mapping technique. As an
                                                                                            example, Table 4 shows that in the case of Ant it
would take 390 recommendations to map 50% of            outstanding score threshold is very close to or the
Ant’s classes using the InMap class-to-module           same as the arithmetic mean of the max package
technique, whereas it would take 9                      similarity scores for each module of the system.
recommendations to map 50% of Ant’s classes             And the optimal good score threshold was usually
using our heirarchical package-to-module                0.5 less than the optimal outstanding score
mapping technique. You will also notice the effort      threshold. This establishes a basis for developing
saved is more than 800 recommendations for              an automated approach for deriving threshold
JabRef and the effort reduced is more than 90%          values that will give good results across different
for all 4 systems.                                      systems.
                                                           Threats to Validity: Since our package-based
5. Discussion                                           technique is derived from InMap the external
                                                        validity of its results is affected by similar things,
                                                        that is, factors such as number of modules and
    Table 3 shows that the technique has almost         classes, code commenting style and quality, and
perfect precision, 0.91 excluding ProM. This is         architecture description quality. Therefore, more
likely due to the fact that our hierarchal package      cases studies with varying attributes would add to
mapping technique is an extension of InMap’s            the validity of the results. However, the results of
class-to-module similarity function. Using simple       the six test systems used with varying attributes
natural language descriptions of architecture           shown in Table 2 provide a compelling case for an
modules the InMap algorithm, which has the              automated hierarchical package mapping
class-to-module similarity score 𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 function at   technique.
its core, was shown to obtain rather good                  With regard to construct validity, the effort
precision. Our hierarchical package mapping             required by an architect using our technique needs
technique borrows from InMap’s success by using         to be evaluated against other package-based
the information retrieval based 𝑆𝑆𝑆𝑆𝑐𝑐𝑐𝑐 to generate    mapping methods provided by industry tools like
is own package-to-module similarity score 𝑆𝑆𝑆𝑆𝑝𝑝𝑝𝑝 .    drag & drop, naming patterns or regular
    The package recall of our technique is fairly       expressions. For example, how does our
good considering that these results are obtained        hierarchical package-based technique compare
only after 1 iteration (or pass). As outlined in        with manually mapping packages? Evaluations
Section 3.1, InMap is an interactive-iterative          such as these would require enhanced user studies
technique that presents a set of recommendations        with software architects in appropriately planned
at a time and progresses by learning from the           and controlled experiments.
feedback of the architect to formulate the next set
of recommendations. However, the number of
iterations (or passes) is proportional to the size of   6. Conclusion & Future Work
the system under review. Compare Tables 2, 3 and
4, observe that systems with a high number of               We have presented a proposed solution to
source files require a high number of passes (or        hierarchical package-based mapping. It extends or
iterations) compared to the “smaller” systems.          builds on InMap, an information retrieval class-
Table 3 shows that with our hierarchical mapping        based mapping technique that uses concise natural
technique we are able to obtain a package recall        language architectural descriptions of modules.
of more than 50% in the first pass for 4 out of the     Our hierarchical package-based mapping
6 systems. Of these 4, from the first iteration we      technique provides almost perfect precision and
get 50% class coverage for Ant with the other 3         fairly good recall and great code coverage. But
getting more than 90% class coverage. Despite           most importantly our techniques helps reduce the
this, two systems get low package recall and class      effort or workload required by an architect in
coverage. We do not see this as a problem because       accepting        and       rejecting       mapping
it is resolved simply by having more package-           recommendations in interactive techniques like
mapping recommendation iterations which would           InMap. The technique is an improvement over the
still be far less compared to class-based mapping       manual package mapping methods used in today’s
recommendation algorithms.                              state-of-the-art reflexion modelling tools.
    Table 3 shows the threshold values that give            Despite reducing effort required, the drawback
the optimal results for each system. However, we        of using a purely package-based approach is that
observed some similarities across the systems in        due to their 1-1 package-to-module mapping style
our threshold values experiments. The optimal           these methods do not work well for systems that
have more complex mapping configurations. It is           [7]    Knodel, J. 2010. Sustainable Structures in
not always the case that packages, and their                     Software Implementations by Live Compliance
members directly map to modules in a 1-1                         Checking.
manner. It is more likely the case that a software        [8]    Knodel, J. and Popescu, D. 2007. A
                                                                 Comparison of Static Architecture Compliance
system’s code-to-architecture mapping has a
                                                                 Checking Approaches. Proceedings of the
combination of both package and class mappings.                  Sixth Working IEEE/IFIP Conference on
Cases where package members are spread across                    Software Architecture (USA, 2007), 12.
multiple modules requires a class-based                   [9]    Murphy, G.C. et al. 2001. Software Reflexion
technique. Therefore, we plan as future work to                  Models: Bridging the Gap between Source and
derive an approach to combine InMap’s good                       High-Level Models. IEEE Transactions on
class-based approach with the good package                       Software Engineering. 27, 4 (Apr. 2001), 364–
hierarchy-based approach presented in this paper.                380. DOI:https://doi.org/10.1109/32.917525.
The aim is to combine class and package mapping           [10]   Naim, S.M. et al. 2017. Reconstructing and
recommendations in a way that benefits from the                  Evolving Software Architectures Using a
                                                                 Coordinated        Clustering       Framework.
advantages, and negates the disadvantages, of
                                                                 Automated Software Engineering. 24, 3
both mapping styles. Nevertheless, the                           (2017),                               543–572.
hierarchical packaged/based mapping technique                    DOI:https://doi.org/10.1007/s10515-017-
presented in this paper remains useful and is                    0211-8.
useful in cases where it is appropriate to map            [11]   Olsson, T. et al. 2019. Semi-Automatic
entire packages.                                                 Mapping of Source Code using Naive Bayes.
                                                                 ACM International Conference Proceeding
                                                                 Series.       2,       (2019),        209–216.
7. References                                                    DOI:https://doi.org/10.1145/3344948.334498
                                                                 4.
[1]    Ali, N. et al. 2018. Architecture Consistency:     [12]   Passos, L. et al. 2010. Static Architecture-
       State of the Practice, Challenges and                     Conformance Checking: An Illustrative
       Requirements.          Empirical       Software           Overview. IEEE Software. 27, 5 (2010), 82–
       Engineering. 23, 1 (2018), 224–258.                       89.
       DOI:https://doi.org/10.1007/s10664-017-                   DOI:https://doi.org/10.1109/MS.2009.117.
       9515-3.                                            [13]   Rosik, J. et al. 2011. Assessing Architectural
[2]    Bauer, M. and Trifu, M. 2004. Architecture-               Drift in Commercial Software Development: A
       aware adaptive clustering of OO systems.                  Case Study. Softw., Pract. Exper. 41, (2011),
       Eighth European Conference on Software                    63–86.
       Maintenance and Reengineering, 2004. CSMR          [14]   de Silva, L. and Balasubramaniam, D. 2012.
       2004. Proceedings. (2004), 3–14.                          Controlling software architecture erosion: A
[3]    Bittencourt, R.A. et al. 2010. Improving                  survey. Journal of Systems and Software. 85, 1
       automated mapping in reflexion models using               (2012),                               132–151.
       information retrieval techniques. Proceedings             DOI:https://doi.org/https://doi.org/10.1016/j.js
       -    Working       Conference     on    Reverse           s.2011.07.036.
       Engineering, WCRE. (2010), 163–172.                [15]   Sinkala, Z.T. and Herold, S. 2021. InMap:
       DOI:https://doi.org/10.1109/WCRE.2010.26.                 Automated interactive code-to-architecture
[4]    Christl, A. et al. 2007. Automated Clustering             mapping. Proceedings of the ACM Symposium
       to Support the Reflexion Method. Information              on Applied Computing (Mar. 2021), 1439–
       and Software Technology. 49, 3 (2007), 255–               1442.
       274.                                               [16]   Sinkala, Z.T. and Herold, S. 2021. InMap:
       DOI:https://doi.org/https://doi.org/10.1016/j.i           Automated Interactive Code-to-Architecture
       nfsof.2006.10.015.                                        Mapping Recommendations. Proceedings -
[5]    Christl, A. et al. 2005. Equipping the reflexion          IEEE 18th International Conference on
       method with automated clustering. 12th                    Software Architecture, ICSA 2021 (Mar.
       Working Conference on Reverse Engineering                 2021), 173–183.
       (WCRE’05) (2005), 10 pp. – 98.                     [17]   Wiggerts, T.A. 1997. Using Clustering
[6]    Fontana, F.A. et al. 2016. Tool Support for               Algorithms        in      Legacy       Systems
       Evaluating Architectural Debt of an Existing              Remodularization. Proceedings of the Fourth
       System: An Experience Report. Proceedings                 Working Conference on Reverse Engineering
       of the 31st Annual ACM Symposium on                       (1997), 33–43.
       Applied Computing (New York, NY, USA,
       2016), 1347–1349.