A Preliminary Study on the Use of Keywords for Source
Code to Architecture Mappings
Tobias Olsson, Morgan Ericsson and Anna Wingkvist
Department of Computer Science and Media Technology, Linnaeus University, Kalmar/Växjö, Sweden


                                          Abstract
                                          We implement an automatic mapper that can find the corresponding architectural module for a source code file. The mapper
                                          is based on multinomial naive Bayes, and it is trained using custom keywords for each architectural module. For prediction,
                                          the mapper uses the path and file name of source code elements. We find that the needed keywords often match the module
                                          names, but also that ambiguities and discrepancies exist. We evaluate the mapper using nine open-source systems and find
                                          that the mapper can successfully create a mapping with perfect precision, but in most cases, it cannot cover all source code
                                          elements. Other techniques can, however, use the mapping as a foothold and create further mappings.

                                          Keywords
                                          Orphan Adoption, Software Architecture, Source Code Clustering, Naive Bayes


1. Introduction                                                                                                        manual effort needed to create a mapping by using infor-
                                                                                                                       mation available in the source code and intended modular
The modular software architecture captures major design                                                                architecture. For example, dependencies between source
decisions regarding reuse, maintainability, changeability,                                                             code entities can be used to create a mapping. A problem
and portability [1]. During system evolution, the source                                                               with current automatic techniques is that they require an
code must conform to the architecture, or the system                                                                   initial set of mapped entities that the technique infers the
risks accumulating technical debt and finally lose the                                                                 automatic mappings from. Depending on the technique
desired qualities.                                                                                                     and system to be mapped, an initial set needs to consists
   Static Architecture Conformance Checking (SACC) meth-                                                               of approximately 15-20% of the entities before reaching
ods, such as Reflexion modeling [2], statically analyze                                                                acceptable performance. In our experience, the physi-
source code to ensure that it does not introduce archi-                                                                cal structure of files on disk is often in part or wholly
tectural violations [3, 4]. These methods require an ar-                                                               reflected in the intended modular architecture. Effective
chitecture model, with modules and dependencies, and a                                                                 use of this information can present an attractive option
source code model, with entities (e.g., source code files)                                                             to create an initial set. However, structure and naming
and concrete dependencies (e.g., due to inheritance or                                                                 are not always mapped one to one to a module, and there
method invocations). They also require a mapping from                                                                  are discrepancies, ambiguities, or simply missing terms
the source code model to the architecture model to de-                                                                 in the naming.
tect convergent, absent, or divergent dependencies in the                                                                 We investigate how well a multinomial naive Bayes
implementation.                                                                                                        classifier trained using simple keywords derived from
   Despite the importance of architecture conformance,                                                                 ground truth mappings can be used to automatically cre-
SACC has not reached widespread use in the software                                                                    ate an initial set. We pose the following questions:
industry [1, 3, 5, 6]. The necessary tools and methods
for using SACC exist. However, practitioners perceive                                                                      1. Can the mapper construct an initial set based on
the mapping from source code to architectural modules                                                                         a simple set of keywords for each module?
as a significant hindrance; it is often outdated or nonex-                                                                 2. How well does this initial set perform if used in
istent. Many tools address this by combining manual                                                                           combination with mapping based on dependen-
mapping and regular expressions to filter file, module,                                                                       cies?
and package names. Still, such are considered to be both                                                                   3. How well does the above combination perform
time-consuming and error-prone [3, 5, 6, 7].                                                                                  compared to the NBAttract (with a random initial
   Automatic mapping techniques aim to minimize the                                                                           set) and InMap approaches?

ECSA2021 Companion Volume                                                                                                 We evaluate the mapper using nine open-source sys-
Envelope-Open tobias.olsson@lnu.se (T. Olsson); morgan.ericsson@lnu.se                                                 tems with known mappings to a specified modular ar-
(M. Ericsson); anna.wingkvist@lnu.se (A. Wingkvist)                                                                    chitecture and find that the keywords are often the same
Orcid 0000-0003-1154-5308 (T. Olsson); 0000-0003-1173-5187                                                             as the module names, but more and different keywords
(M. Ericsson); 0000-0002-0835-823X (A. Wingkvist)
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative       are needed in some cases. After the initial set is cre-
                                    Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)                                            ated, we run another automatic mapper that can map


                                                                                                                   1
Tobias Olsson et al. CEUR Workshop Proceedings                                                                               1–10


any remaining entities. We compare the results with a              tions found in the relatively short filenames. While this is
traditional automatic mapping technique [8] and an inter-          not a technical problem in modern development, the use
active mapping technique [7]. We find that the keywords-           of abbreviations is still common practice. For example,
based approach can, in some cases, provide a complete              one of the subject systems, ArgoUml, defines a module
mapping and that the keywords-based approach plus the              reverseEngineering, and the corresponding directory map-
automatic mapping approach performs very well.                     ping is the abbreviation reveng. Finally, Anquetil and
                                                                   Lethbridge successfully use filenames to create a cluster-
                                                                   ing that corresponds well to an expert’s view of a system.
2. Background and Related Work
Tzerpos and Holt describe the general problem of map-              2.1. Semi-Automatic Mapping
ping (or remapping) a source code entity to an architec-           Christl et al. introduced the Human Guided clustering
tural module [9]. They collectively call both the mapping          Method (HuGMe), an approach to semi-automatic map-
and remapping of an entity the orphan adoption prob-               ping of source code entities to modules of the intended
lem. They find four major criteria for solving the problem:        architecture [9]. It is an iterative approach that, at its core,
naming, structure, style, and semantics and device an algo-        uses an attraction function to compute the attraction be-
rithm that they evaluate in three case studies [9]. Tzerpos        tween a source code entity and a module. If the attraction
and Holt regard the naming criteria as the first option            is considered valid, an automatic mapping is made; if not,
to use in an orphan adoption scenario and suggest using            the attractions can be used as a suggestion for a human
per system regular expressions to determine a mapping.             user. Two attraction functions based on dependencies
However, they also mention that naming criteria is not             are presented, CountAttract and MQAttract [13, 6].
enough as they may be lacking or that naming standard                 Bittencourt et al. present two new attraction func-
is not always adhered to by developers.                            tions based on information retrieval techniques [5]. They
   Garcia et al. discuss the use of package and naming             use semantic information in the source code, including
information in software architecture recovery [10]. In             module- and filenames. The attractions are calculated
general, they found that their ground truth components             based on cosine similarity (IRAttract) and latent semantic
often spanned or shared several packages. They could               indexing (LSIAttract). They make a quantitative compari-
not find a correlation between components and single               son between the performance of their attraction functions
package or directory names. One of their four cases pre-           with CountAttract and MQAttract in an evolutionary set-
sented a reasonably good correlation, and in one system,           ting (where a few new files are to be assigned a mapping).
they could find a repeating pattern of directories. The            They find that combining attraction functions (e.g., if
ground truth architectures recovered in their study are            CountAttract fails, try IRAttract) performs best. They
possibly at a lower level than the modular architectures           find that CountAttract usually misplaces entities on mod-
we study. Still, there is likely variation in what dimension       ule borders. MQAttract performs better when mapping
or view of an architecture is expressed in the package             entities with dependencies to many different modules.
structure. This variation is further supported by Buckley          IRAttract and LSIAttract perform better when mapping
et al., where one out of five studied systems did not have         entities in libraries or entities on module borders, but
any clear correlation between packages and modules.                worse if there are modules that share vocabulary but are
This presented difficulties and significant effort when            not related [5].
performing manual mapping [11].                                       We have created an attraction function that uses ma-
   Anquetil and Lethbridge, on the other hand, propose             chine learning techniques and introduced the Concrete
a method for architecture recovery of legacy systems               Dependency Abstraction (CDA) method [8]. In short,
using filenames [12]. Their approach focuses on the                CDA produces textual representations of dependencies
assumptions that files have short names with many ab-              at the level of architectural modules and lets a machine
breviations and are placed in a single directory. This is          learning technique learn the patterns of dependencies
due to their focus on recovering legacy systems. Nev-              from the actual source code and combine these with in-
ertheless, they present some interesting findings. First,          formation retrieval techniques. We implement this ap-
they identify several forces that shape a filename, i.e.,          proach using naive Bayes as an attraction function for
what influences it. There seem to be several examples of           the HuGMe method, NBAttract. We have compared the
such forces also in more modern implementations, e.g.,             automatic mapping performance of CountAttract, IRAt-
from the subject system Ant, we find the feature imple-            tract, LSIAttract and NBAttract over several systems us-
mented (ant.taskdefs.SendEmail), the algorithms or steps           ing s4rdm3x, our open-source tool suite for automatic
of algorithms (ant.types.resources.Sort), or data processed        mapping experiments [8, 14].
(ant.taskdefs.email.Header), as suggested in [12]. Much of            The main limitations for the techniques that build on
the approach revolves around the problematic abbrevia-             HuGMe are the need for an initial set and, in some cases,


                                                               2
Tobias Olsson et al. CEUR Workshop Proceedings                                                                            1–10


low-quality mappings. The initial set needs to be man-            3. Keywords and File-Based
ually created and be of good quality for the attraction
functions to perform well. We estimate that a randomly
                                                                     Mapping
composed initial set needs to include approximately 15-           File naming and structure seem to reflect the intended
20% of the source code entities. Based on this, we con-           modular architectures we have studied quite well. For
clude that creating the initial set is likely a significant       example, module names tend to map to the directory
effort. Automated techniques will probably not result in          structure of the source code. However, the naming is
a perfect mapping except when they use a large initial            often not perfect. In some cases, module names are not
set and only map a few entities. In the best of cases, the        used, or shorter or slightly different terms are used. In
automated technique leaves hard to map instances to the           other cases, several module names exist in the structure
user (creating more manual work), but misclassifications          or naming of a file. A simplistic approach is thus not
are problematic. There has not been much research in the          appropriate. Instead, the file naming patterns need to be
manual mapping steps of HuGMe except for the original             fully defined, e.g., using regular expressions or a heuristic.
studies [13, 6]. Handling of misclassification and manual         For regular expressions to work, there is often a need to
support in these methods are still open issues.                   maintain several expressions that can be conflicting and
                                                                  overlapping. A more attractive option would be to use
2.2. Interactive Mapping                                          machine learning and train a classifier using a good set
                                                                  of keywords. The classifier’s task is to produce a good
Sinkala and Herold present InMap, which is not an auto-           enough initial set. An automatic mapping technique can
mated approach to mapping per se, but instead suggest             then use this initial set for further mappings.
mappings to the end-user, who can then choose to ac-                 In this work, we implement a proof of concept map-
cept the suggested mapping (or not) [7]. It is an iterative       per using a multinomial naive Bayes classifier. It is a
approach that iteratively presents a suggested mapping            simple, probabilistic approach that uses word frequen-
for a fixed number of entities. The end-user chooses to           cies to compute the probability of each class. While it
accept or reject the suggestions. InMap uses the accepted         is conceptually simple, naive Bayes often produce good
mappings to improve the suggested mappings further in             results, especially if the training data is small. As the
the next iteration. It also uses the negative evidence of         goal is to create a good enough mapping using a small
a rejected mapping and does not suggest this mapping              set of predefined keywords, naive Bayes is thus a good
again. InMap produces the suggested mappings similar              candidate for a proof of concept study.
to Bittencourt et al., with the addition of a descriptive            We base our implementation on the Weka library [15]
text for each architectural module. InMap also includes           and train the classifier using the custom keywords for
the path and filename used in the Java class and package          each module. Note that the same keyword can be spec-
names. It treats the source code entities as a database of        ified multiple times, increasing the importance of that
documents and uses Lucene to search this database using           particular keyword.
module information as a query. Sinkala and Herold eval-              We derive the prediction data from the path of each
uate InMap using six open source systems. For the best            source code entity, including the filename. The filename
combination (in terms of highest F1 score) of informa-            is split into words based on common camel-, kebab, and
tion, InMap can suggest mappings for most of a system’s           snake-case rules. In addition, we value later parts of the
entities with a mean recall of 0.95, a mean precision of          path more and add these words multiple times. Intuitively
0.84, and a mean F1 score of 0.89.                                allowing for a deeper nested folder mapping to ”override”
   The main limitations of InMap are its highly interactive       a higher level mapping. For example, the file:
nature and that architectural documentation needs to ex-
ist for every module. The documentation provided needs        net/sf/jabref/logic/util/io/FileHistory.java
to be of good quality, i.e., as short as possible but con-
taining good keywords. Noisy documentation will likely will produce the following words:
not help in producing high-precision suggestions. The
interactiveness of InMap is in some way double-edged;         net sf jabref logic util io filehistory file history sf jabref
the technique often seems to require more interaction logic util io jabref logic util io logic util io util io io
(accepting or rejecting a suggested mapping) than there
are entities in the source code. On the other hand, if        Note the six occurrences of io reflecting the nesting
not minor mapping errors can be tolerated, a mapping depth of the word in the path.
validation is needed anyway.                                  To generate a useful initial set, it is more important that
                                                            the mappings are precise rather than complete. There
                                                            needs to be a high difference between the best mapping
                                                            probability and the second best. By trial and error, we


                                                              3
Tobias Olsson et al. CEUR Workshop Proceedings                                                                            1–10


found a factor of 1.99 to work well, i.e., the highest prob-       2016, 2017, and 2019 respectively. A system expert has
ability needs to be 1.99 times higher than the second-             provided both the architecture and the mapping for these
highest probability for mapping to occur.                          systems. The architecture documentation and mappings
  We have implemented the mapper described above in                are available in the SAEroCon repository9 . ArgoUML,
our open-source tool suite s4rdm3x [16].                           Ant, and Lucene has been previously studied [17, 18],
                                                                   and the architectures and mappings were extracted from
                                                                   the replication package of Brunet et al. [17]. K9 has been
4. Method                                                          preliminary mapped by ourselves based on architecture
                                                                   documentation provided in [19]10 . We have not validated
We use nine open-source systems where the ground truth
                                                                   this mapping with system experts but include it since it
mappings are known. We create a keyword set for each
                                                                   is an interesting case with a more complex file structure.
module based on the ground truth mappings. We make
sure that these keywords will successfully map at least
some entities to each module.                                      5. Results and Analysis
   After we have determined the keywords, we run our
keywords-based mapper and create an initial set. This              We use the existing ground truth mappings to construct
initial set is then used as the input to another mapper,           a set of keywords for each system. Table 1 shows the
NBAttract, which also uses multinomial naive Bayes but             manually extracted keywords. Note that a single key-
instead forms training- and prediction words using de-             word is sufficient in many cases, and many keywords
pendency information in the form of concrete depen-                are the same as or some variation of the module name.
dency abstractions (CDA) [8]. We compare the perfor-               K9 presents an interesting exception where several key-
mance to NBAttract with a random initial set. In this              words are needed. We relied on a high-level architectural
configuration, we use file information (not including the          description when creating the mapping for K9, where
module keywords) and CDA. In addition, we compare to               allowed dependencies were the most clearly defined. The
the interactive approach InMap [7].                                keywords used reflect the sub-modules of the high-level
   We collect precision, recall, and combined F1 scores            modules. Note that our mapping has not been validated
for each approach. When a random initial set is used,              by systems experts.
several sets of different sizes and compositions are needed           Using the generated initial sets, we ran the NBAttract
to cover a large range of combinations. We will present            mapper with CDA information only. We ran 1530 experi-
the performance metrics numerically and visually as the            ments with random initial sets for the NBAttract mapper
effect of the initial set size is essential.                       where the mapper used filename and CDA information
   We use nine open-source systems implemented in Java.            (no module keywords). Finally, we use the best-reported
Ant1 is an API and command-line tool for process au-               performance metrics from [7]. Table 2 shows the compar-
tomation. ArgoUML2 is a desktop application for UML                ison of the four approaches. Using the keywords-based
modeling. Jabref3 is a desktop application for managing            mapping, we can create an initial set with perfect preci-
bibliographical references. K94 is an open-source email            sion and recall in Commons Imaging, ProM, and Sweet
client for Android. Lucene5 is an indexing and search              Home 3D. The keywords for these systems are straight-
library. ProM6 is an extensible framework that supports            forward and are often directly reflected in the module
a variety of process mining techniques. Note that we               name. For the other systems, keywords can generate
use the ProM framework and not the full ProM system.               an initial set with perfect precision. However, recall is
Sweet Home 3D7 is an interior design application. Team-            suffering.
Mates8 is a web application for handling student peer                 Using the keywords-based initial sets and NBAttract
reviews and feedback.                                              using CDA performs very well, with precision scores
   A documented software architecture and a mapping                over 0.95 in all cases and almost perfect scores for recall,
from the implementation to this architecture exist for             cf. Table 2).
each system. Jabref, TeamMates, and ProM have been                    Figures 1, 2, and 3 shows the running median F1 score,
the study subjects at the Software Architecture Erosion            precision, and recall for each system. The figures focus
and Architectural Consistency Workshop (SAEroCon)                  on showing the running median for random initial sets
                                                                   and NBAttract. This configuration seems to lack preci-
   1
     https://ant.apache.org                                        sion in Commons Imaging and Sweet Home 3D, and the
   2
     http://argouml.tigris.org                                     recall is suffering in Ant. The naming and dependency
   3
     https://jabref.org
   4
     https://k9mail.app/
                                                                   information are possibly conflicting in these systems. Ta-
   5
     https://lucene.apache.org
   6                                                                   9
     http://www.promtools.org                                           https://github.com/sebastianherold/SAEroConRepo
   7                                                                  10
     http://www.sweethome3d.com                                         http://oss.models-db.com/Downloads/EASE2019_
   8
     https://teammatesv4.appspot.com                               ReplicationPackage/


                                                               4
Tobias Olsson et al. CEUR Workshop Proceedings                                                                 1–10


Table 1
Keywords for each system and module.

  System     Module                 Keywords               System   Module                Keywords
  Ant        compilers              2 * compiler           JabRef   globals               globals
                                    2 * compilers                   preferences           preferences prefs
             condition              condition                       model                 model shared dbms
             rmic                   rmic                            logic                 logic shared
             cvslib                 cvslib                          gui                   gui
             email                  email                           cli                   cli
             taskdefs               taskdefs               Lucene   queryparser           queryparser
             listener               listener                        search                search
             types                  types                           index                 index
             ant                    ant                             store                 store
             util                   util                            analysis              analysis
             zip                    zip                             util                  util
             tar                    tar                             document              document
             mail                   mail                   K9       business              controller service
             bzip2                  bzip2                                                 mail k9 power
  AUML       application            2 * application                                       search migrations
             diagrams               2 * diagram                     presentation          activity ui notification
             notation               notation                                              fragment view list
             explorer               explorer                                              widget helper crypto
             codeGeneration         3 * language code               service               provider action extra
                                    generation                      dataaccess            mailstore util
             javaCodeGeneration     language code                   crosscutting          crypto autocrypt
                                    generation 2 * java                                   cache helper
             reverseEngineering     3 * reveng             ProM     framework             framework
             persistence            persistence                     contexts              contexts
             moduleLoader           moduleloader 2 * api            models                models
                                    module modules                  plugins               plugins
             gui                    ui                     SH3D     sH3DModel             model
             model                  model                           sH3DTools             tools
             internationalization   i18n                            sH3DPlugin            plugin
             swingExtensions        swingext                        sH3DViewController    viewcontroller
             ocl                    ocl                             sH3DSwing             swing
             critics                2 * cognitive                   sH3DJava3D            j3d
  C Img      base                   imaging                         sH3DIO                io
             color                  color                           sH3DApplet            applet
             common                 common                          sH3DApplication       sweethome3d
             bmp                    bmp                    TMates   common.util           util
             dcx                    dcx                             common.exception      exception
             gif                    gif                             common.dataTransfer   datatransfer
             icns                   icns                            ui.automated          automated
             ico                    ico                             ui.controller         controller
             jpeg                   jpeg                            ui.view               ui page
             pcx                    pcx                             logic.core            core
             png                    png                             logic.api             logic api
             pnm                    pnm                             logic.backdoor        backdoor
             psd                    psd                             storage.entity        entity
             rgbe                   rgbe                            storage.api           storage api
             tiff                   tiff                            storage.search        search
             wbmp                   wbmp                            testDriver            2 * test
             xbm                    xbm                             client.remoteAPI      remoteapi
             xpm                    xpm                             client.scripts        2 * scripts
             icc                    icc
             internal               internal
             palette                palette


                                                           5
Tobias Olsson et al. CEUR Workshop Proceedings                                                                            1–10


Table 2
Precision, Recall and F1 score for each mapping technique. For Random + NBAttract, the median metrics are shown.

                        Keywords        Keywords + NBAttract           Random + NBAttract                 InMap
      System      P        R     F1      P    R       F1                P    R      F1              P        R     F1
      Ant        1.00     0.97   0.99   0.99     1.00   0.99           0.94    0.91     0.94      0.73     1.00    0.84
      AUML       1.00     0.67   0.80   0.97     1.00   0.98           0.95    1.00     0.97      0.78     0.98    0.87
      C Img      1.00     1.00   1.00                                  0.84    0.99     0.90
      JabRef     1.00     0.95   0.98   0.98     1.00   0.99           0.91    0.98     0.94      0.96     1.00    0.98
      K9         1.00     0.81   0.90   0.96     1.00   0.98           0.92    1.00     0.96
      Lucene     1.00     0.99   1.00   1.00     0.99   1.00           0.97    1.00     0.98
      ProM       1.00     1.00   1.00                                  0.99    1.00     1.00      0.81     0.87    0.84
      SH3D       1.00     1.00   1.00                                  0.83    1.00     0.91
      TMates     1.00     0.60   0.75   0.97     1.00   0.99           0.97    1.00     0.98      0.95     0.97    0.96
      Mean       1.00     0.89   0.93   0.98     1.00   0.99           0.92    0.99     0.95      0.846    0.964   0.90


ble 2 shows mean values; they can vary quite a bit in the         3D (cf. Figure 1). This indicates that when the mapping is
actual cases depending on the size and composition of             established, NBAttract often performs well when only a
the initial set.                                                  few new source code entities are introduced (e.g., during
   Finally, InMap lacks in precision but performs well re-        software evolution). However, in some cases, the F1 score
garding the recall. Note that InMap is a highly interactive       is declining as the initial set becomes larger, e.g., JabRef,
approach to mapping. The aim is not to automate the               K9, and TeamMates (cf. Figure 1). A preliminary anal-
mapping but rather give good advice to a human user that          ysis seems to point towards overfitting, i.e., the model
interactively maps the source code iteratively. If there          becomes too specific, and as a result, the recall drops
is a need to check an automatic mapping thoroughly, an            (cf. Figure 3). It can also be an effect of randomness; the
interactive approach is attractive regardless of precision.       1530 data points per system are pretty low considering
                                                                  the combinatorial complexity of random initial set sizes
                                                                  and compositions. However, it is sufficient to indicate
6. Discussion and Validity                                        the overall performance in a preliminary study such as
                                                                  this. The very high recall in ProM (cf. Figure 3) can be
Keywords can be effectively used and provide an excel-
                                                                  explained by the fact that the ProM framework has a very
lent initial set, even a perfect mapping in some cases. It
                                                                  straightforward mapping, and as before, the number of
is an attractive approach compared to manually mapping
                                                                  data points may be too small.
an initial set. Hypothetically, it should be easier to ex-
                                                                     We are limited to systems in Java, where the file struc-
tract the keywords and specify the corresponding module
                                                                  ture often reflects the modular design of our subject sys-
and weight of the keyword than mapping several tens or
                                                                  tems well. While we could handle discrepancies and am-
hundreds of files manually. The main challenge in this
                                                                  biguities well enough to create an initial set, this may not
area is, of course, to find a high precision and minimal
                                                                  be the case in a system where the file structure is entirely
set of keywords. We used the already established ground
                                                                  different. However, we also show that these cases can
truth mappings to do this in this preliminary evaluation,
                                                                  use the file information. Current mapping methods, e.g.,
but this approach is not feasible in a real case. However,
                                                                  NBAttract and InMap, should likely give file information
analyzing the directory structure and looking for words
                                                                  more attention.
in the module names could provide a starting point in
many cases. Possibly using a deeper level in the directory
hierarchy or looking for repeating patterns could be fruit-       7. Conclusions and Future Work
ful. Semantic analysis using, e.g., WordNet could be an
approach to find related words in the directory structure.        We found that we could construct relatively simple key-
In addition, information from, e.g., method names and             words for a majority of the 96 modules in all nine systems.
identifiers could be used.                                        Ten modules (9.6%) required weights for keywords, and
   It would arguably be easier to create and maintain a           15 (15.6%) required two or more different keywords. Our
small set of keywords compared to, e.g., regular expres-          mapper could successfully create an initial set using the
sions, even if done entirely manually.                            keywords, and in some cases, this resulted in a perfect
   Using a large random initial set seems to give a very          mapping.
high performance of NBAttract in some cases, e.g., Ar-               Combining the keywords-based mapping and NBAt-
goUML, Commons Imaging, Lucene, ProM, Sweet Home                  tract using CDA provided outstanding performance with


                                                              6
Tobias Olsson et al. CEUR Workshop Proceedings                                                                        1–10


a mean precision, recall, and F1 score of 0.98, 1.0, and 0.99, [8] T. Olsson, M. Ericsson, A. Wingkvist, Semi-
respectively. The performance was higher than using                 automatic mapping of source code using naive
random initial sets and NBAttract using CDA and file                bayes, in: Proceedings of the 13th European Con-
information, and the interactive technique InMap (see               ference on Software Architecture - Volume 2, 2019,
Table 2).                                                           p. 209–216.
   If a mapping is already established, NBAttract with          [9] V. Tzerpos, R. C. Holt, The orphan adoption prob-
CDA and file information provides good performance in               lem in architecture maintenance, in: Working Con-
many cases; however, in some systems, the model could               ference on Reverse Engineering, IEEE, 1997, pp.
suffer from overfitting issues (cf. Figure 3).                      76–82.
   Using keywords is an attractive approach that can sig- [10] J. Garcia, I. Krka, C. Mattmann, N. Medvidovic, Ob-
nificantly reduce the mapping effort. However, a central            taining ground-truth software architectures, in:
question that remains is how to extract good candidate              35th International Conference on Software Engi-
keywords and let a human user assign weights.                       neering (ICSE), 2013, pp. 901–910.
   In addition, a keywords-based mapping approach is [11] J. Buckley, N. Ali, M. English, J. Rosik, S. Herold,
likely not applicable for some systems. We plan on per-             Real-time reflexion modelling in architecture rec-
forming comparative studies using the mappings from [10],           onciliation: A multi case study, Information and
where the authors claim architectural modules are not               Software Technology 61 (2015) 107–123.
bound to the file structure of the source code.                [12] N. Anquetil, T. C. Lethbridge, Recovering software
                                                                    architecture from the names of source files, Journal
                                                                    of Software Maintenance: Research and Practice 11
Acknowledgments                                                     (1999) 201–221.
                                                               [13] A. Christl, R. Koschke, M. A. Storey, Equipping the
The research was supported by the Centre for Data Inten-
                                                                    reflexion method with automated clustering, in:
sive Sciences and Applications at Linnaeus University.
                                                                    Working Conference on Reverse Engineering, IEEE,
                                                                    2005, pp. 98–108.
References                                                     [14] T. Olsson, M. Ericsson, A. Wingkvist, An explo-
                                                                    ration and experiment tool suite for code to archi-
  [1] L. De Silva, D. Balasubramaniam, Controlling soft-            tecture mapping techniques, in: Proceedings of the
       ware architecture erosion: A survey, Journal of              13th European Conference on Software Architec-
       Systems and Software 85 (2012) 132–151.                      ture - Volume 2, ECSA ’19, 2019, p. 26–29.
  [2] G. C. Murphy, D. Notkin, K. Sullivan, Software [15] I. Witten, E. Frank, M. Hall, C. Pal, Data Mining,
       reflexion models: Bridging the gap between source            Fourth Edition: Practical Machine Learning Tools
       and high-level models, ACM SIGSOFT Software                  and Techniques, 4th ed., Morgan Kaufmann Pub-
       Engineering Notes 20 (1995) 18–28.                           lishers Inc., San Francisco, CA, USA, 2016.
  [3] N. Ali, S. Baker, R. O’Crowley, S. Herold, J. Buck- [16] T. Olsson, M. Ericsson, A. Wingkvist, s4rdm3x: A
       ley, Architecture consistency: State of the practice,        tool suite to explore code to architecture mapping
       challenges and requirements, Empirical Software              techniques, Journal of Open Source Software 6
       Engineering 23 (2017) 1–35.                                  (2021) 2791. doi:1 0 . 2 1 1 0 5 / j o s s . 0 2 7 9 1 .
  [4] J. Knodel, D. Popescu, A comparison of static archi- [17] J. Brunet, R. A. Bittencourt, D. Serey, J. Figueiredo,
       tecture compliance checking approaches, in: The              On the evolutionary nature of architectural viola-
       IEEE/IFIP Working Conference on Software Archi-              tions, in: Working Conference on Reverse Engi-
       tecture, 2007, pp. 12–21.                                    neering, IEEE, 2012, pp. 257–266.
  [5] R. A. Bittencourt, G. Jansen de Souza Santos, D. D. S. [18] J. Lenhard, M. Blom, S. Herold, Exploring the suit-
       Guerrero, G. C. Murphy, Improving automated map-             ability of source code metrics for indicating archi-
       ping in reflexion models using information retrieval         tectural inconsistencies, Software Quality Journal
       techniques, in: Working Conference on Reverse                (2018).
       Engineering, IEEE, 2010, pp. 163–172.                   [19] A. Nurwidyantoro, T. Ho-Quang, M. R. V. Chaudron,
  [6] A. Christl, R. Koschke, M. A. Storey, Automated               Automated classification of class role-stereotypes
       clustering to support the reflexion method, Infor-           via machine learning, in: Proceedings of the Eval-
       mation and Software Technology 49 (2007) 255–274.            uation and Assessment on Software Engineering,
  [7] Z. T. Sinkala, S. Herold, Inmap: Automated inter-             2019, p. 79–88.
       active code-to-architecture mapping recommenda-
       tions, in: IEEE 18th International Conference on
       Software Architecture (ICSA), 2021, pp. 173–183.


                                                             7
Tobias Olsson et al. CEUR Workshop Proceedings                                                                                                                           1–10


                                                     Ant                                                                      ArgoUML


                 1.0


                                                                                              1.0
                 0.9


                                                                                              0.9
                 0.8


                                                                                              0.8
                 0.7


                                                                                              0.7
                       0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0             0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                           Commons Imaging                                                                        JabRef
                 1.0


                                                                                              1.0
                 0.9


                                                                                              0.9
                 0.8


                                                                                              0.8
                 0.7


                                                                                              0.7
                       0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0             0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                                     K9                                                                           Lucene
                 1.0


                                                                                              1.0
                 0.9


                                                                                              0.9
      F1 Score
                 0.8


                                                                                              0.8
                 0.7


                                                                                              0.7


                       0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0             0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                                     ProM                                                                   SweetHome3D
                 1.0


                                                                                              1.0
                 0.9


                                                                                              0.9
                 0.8


                                                                                              0.8
                 0.7


                                                                                              0.7


                       0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0             0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                               TeamMates
                 1.0
                 0.9


                                                                                                                             Random+NBAttract
                                                                                                                             Keywords
                                                                                                                             Keywords+NBAttract
                                                                                                                             InMap
                 0.8
                 0.7


                       0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0
                                                                                Initial Set Size
Figure 1: The F1 score of each approach, Random+NBAttract are shown with a running median and the running 25th to 75th
quartiles. Note that the F1 score starts at 0.7.


                                                                                          8
Tobias Olsson et al. CEUR Workshop Proceedings                                                                                                                            1–10


                                                      Ant                                                                      ArgoUML


                  1.0


                                                                                               1.0
                  0.9


                                                                                               0.9
                  0.8


                                                                                               0.8
                  0.7


                                                                                               0.7
                        0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0             0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                            Commons Imaging                                                                        JabRef
                  1.0


                                                                                               1.0
                  0.9


                                                                                               0.9
                  0.8


                                                                                               0.8
                  0.7


                                                                                               0.7
                        0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0             0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                                      K9                                                                           Lucene
                  1.0


                                                                                               1.0
                  0.9


                                                                                               0.9
      Precision
                  0.8


                                                                                               0.8
                  0.7


                                                                                               0.7


                        0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0             0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                                      ProM                                                                   SweetHome3D
                  1.0


                                                                                               1.0
                  0.9


                                                                                               0.9
                  0.8


                                                                                               0.8
                  0.7


                                                                                               0.7


                        0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0             0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                                TeamMates
                  1.0
                  0.9


                                                                                                                              Random+NBAttract
                                                                                                                              Keywords
                                                                                                                              Keywords+NBAttract
                                                                                                                              InMap
                  0.8
                  0.7


                        0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0
                                                                                 Initial Set Size
Figure 2: The precision of each approach, Random+NBAttract are shown with a running median and the running 25th to
75th quartiles. Note that the precision starts at 0.7.


                                                                                           9
Tobias Olsson et al. CEUR Workshop Proceedings                                                                                                                      1–10


                                                   Ant                                                                   ArgoUML


               1.0


                                                                                         1.0
               0.9


                                                                                         0.9
               0.8


                                                                                         0.8
               0.7


                                                                                         0.7
                     0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0          0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                         Commons Imaging                                                                     JabRef
               1.0


                                                                                         1.0
               0.9


                                                                                         0.9
               0.8


                                                                                         0.8
               0.7


                                                                                         0.7
                     0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0          0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                                   K9                                                                        Lucene
               1.0


                                                                                         1.0
               0.9


                                                                                         0.9
      Recall
               0.8


                                                                                         0.8
               0.7


                                                                                         0.7


                     0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0          0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                                   ProM                                                                SweetHome3D
               1.0


                                                                                         1.0
               0.9


                                                                                         0.9
               0.8


                                                                                         0.8
               0.7


                                                                                         0.7


                     0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0          0.0   0.1   0.2   0.3   0.4     0.5    0.6   0.7   0.8   0.9   1.0
                                             TeamMates
               1.0
               0.9


                                                                                                                        Random+NBAttract
                                                                                                                        Keywords
                                                                                                                        Keywords+NBAttract
                                                                                                                        InMap
               0.8
               0.7


                     0.0   0.1   0.2   0.3   0.4    0.5   0.6   0.7   0.8   0.9   1.0
                                                                              Initial Set Size
Figure 3: The recall of each approach, Random+NBAttract are shown with a running median and the running 25th to 75th
quartiles. Note that the recall starts at 0.7.


                                                                                    10