<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Ericsson);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Preliminary Study on the Use of Keywords for Source Code to Architecture Mappings</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tobias Olsson</string-name>
          <email>tobias.olsson@lnu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Morgan Ericsson</string-name>
          <email>morgan.ericsson@lnu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Wingkvist</string-name>
          <email>anna.wingkvist@lnu.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Media Technology, Linnaeus University</institution>
          ,
          <addr-line>Kalmar/Växjö</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Orphan Adoption</institution>
          ,
          <addr-line>Software Architecture, Source Code Clustering, Naive Bayes</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Workshop Proce dings</institution>
        </aff>
      </contrib-group>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>We implement an automatic mapper that can find the corresponding architectural module for a source code file. The mapper is based on multinomial naive Bayes, and it is trained using custom keywords for each architectural module. For prediction, the mapper uses the path and file name of source code elements. We find that the needed keywords often match the module names, but also that ambiguities and discrepancies exist. We evaluate the mapper using nine open-source systems and find that the mapper can successfully create a mapping with perfect precision, but in most cases, it cannot cover all source code elements. Other techniques can, however, use the mapping as a foothold and create further mappings.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The modular software architecture captures major design
decisions regarding reuse, maintainability, changeability,
and portability [
        <xref ref-type="bibr" rid="ref5">1</xref>
        ]. During system evolution, the source
code must conform to the architecture, or the system
risks accumulating technical debt and finally lose the
desired qualities.
      </p>
      <p>
        Static Architecture Conformance Checking (SACC)
methods, such as Reflexion modeling [
        <xref ref-type="bibr" rid="ref6">2</xref>
        ], statically analyze
source code to ensure that it does not introduce
architectural violations [
        <xref ref-type="bibr" rid="ref7 ref8">3, 4</xref>
        ]. These methods require an
architecture model, with modules and dependencies, and a
source code model, with entities (e.g., source code files)
and concrete dependencies (e.g., due to inheritance or
method invocations). They also require a mapping from
SACC has not reached widespread use in the software
industry [
        <xref ref-type="bibr" rid="ref10 ref5 ref7 ref9">1, 3, 5, 6</xref>
        ]. The necessary tools and methods
for using SACC exist. However, practitioners perceive
the mapping from source code to architectural modules
as a significant hindrance; it is often outdated or
nonexistent. Many tools address this by combining manual
mapping and regular expressions to filter file, module,
and package names. Still, such are considered to be both
time-consuming and error-prone [
        <xref ref-type="bibr" rid="ref10 ref11 ref7 ref9">3, 5, 6, 7</xref>
        ].
      </p>
      <p>Automatic mapping techniques aim to minimize the
CEUR
htp:/ceur-ws.org
ISN1613-073
© 2021 Copyright for this paper by its authors. Use permitted under Creative</p>
      <p>CEUR</p>
      <p>Workshop Proceedings (CEUR-WS.org)
manual efort needed to create a mapping by using
information available in the source code and intended modular
architecture. For example, dependencies between source
code entities can be used to create a mapping. A problem
with current automatic techniques is that they require an
initial set of mapped entities that the technique infers the
automatic mappings from. Depending on the technique
and system to be mapped, an initial set needs to consists
of approximately 15-20% of the entities before reaching
acceptable performance. In our experience, the
physical structure of files on disk is often in part or wholly
reflected in the intended modular architecture. Efective
use of this information can present an attractive option
to create an initial set. However, structure and naming
are not always mapped one to one to a module, and there
are discrepancies, ambiguities, or simply missing terms</p>
      <p>We investigate how well a multinomial naive Bayes
classifier trained using simple keywords derived from
ate an initial set. We pose the following questions:
1. Can the mapper construct an initial set based on
a simple set of keywords for each module?
2. How well does this initial set perform if used in
combination with mapping based on
dependencies?
3. How well does the above combination perform
compared to the NBAttract (with a random initial
set) and InMap approaches?</p>
      <p>
        We evaluate the mapper using nine open-source
systems with known mappings to a specified modular
architecture and find that the keywords are often the same
as the module names, but more and diferent keywords
are needed in some cases. After the initial set is
created, we run another automatic mapper that can map
any remaining entities. We compare the results with a
traditional automatic mapping technique [8] and an
interactive mapping technique [
        <xref ref-type="bibr" rid="ref11">7</xref>
        ]. We find that the
keywordsbased approach can, in some cases, provide a complete
mapping and that the keywords-based approach plus the
automatic mapping approach performs very well.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>Tzerpos and Holt describe the general problem of
mapping (or remapping) a source code entity to an
architectural module [9]. They collectively call both the mapping
and remapping of an entity the orphan adoption
problem. They find four major criteria for solving the problem:
naming, structure, style, and semantics and device an
algorithm that they evaluate in three case studies [9]. Tzerpos
and Holt regard the naming criteria as the first option
to use in an orphan adoption scenario and suggest using
per system regular expressions to determine a mapping.
However, they also mention that naming criteria is not
enough as they may be lacking or that naming standard
is not always adhered to by developers.</p>
      <p>Garcia et al. discuss the use of package and naming
information in software architecture recovery [ 10]. In
general, they found that their ground truth components
often spanned or shared several packages. They could
not find a correlation between components and single
package or directory names. One of their four cases
presented a reasonably good correlation, and in one system,
they could find a repeating pattern of directories. The
ground truth architectures recovered in their study are
possibly at a lower level than the modular architectures
we study. Still, there is likely variation in what dimension
or view of an architecture is expressed in the package
structure. This variation is further supported by Buckley
et al., where one out of five studied systems did not have
any clear correlation between packages and modules.
This presented dificulties and significant efort when
performing manual mapping [11].</p>
      <p>
        Anquetil and Lethbridge, on the other hand, propose
a method for architecture recovery of legacy systems
using filenames [ 12]. Their approach focuses on the
assumptions that files have short names with many
abbreviations and are placed in a single directory. This is
due to their focus on recovering legacy systems.
Nevertheless, they present some interesting findings. First,
they identify several forces that shape a filename, i.e.,
what influences it. There seem to be several examples of
such forces also in more modern implementations, e.g.,
from the subject system Ant, we find the feature
implemented (ant.taskdefs.SendEmail), the algorithms or steps
of algorithms (ant.types.resources.Sort), or data processed
(ant.taskdefs.email.Header ), as suggested in [12]. Much of
the approach revolves around the problematic
abbrevia1–10
tions found in the relatively short filenames. While this is
not a technical problem in modern development, the use
of abbreviations is still common practice. For example,
one of the subject systems, ArgoUml, defines a module
reverseEngineering, and the corresponding directory
mapping is the abbreviation reveng. Finally, Anquetil and
Lethbridge successfully use filenames to create a
clustering that corresponds well to an expert’s view of a system.
2.1. Semi-Automatic Mapping
Christl et al. introduced the Human Guided clustering
Method (HuGMe), an approach to semi-automatic
mapping of source code entities to modules of the intended
architecture [9]. It is an iterative approach that, at its core,
uses an attraction function to compute the attraction
between a source code entity and a module. If the attraction
is considered valid, an automatic mapping is made; if not,
the attractions can be used as a suggestion for a human
user. Two attraction functions based on dependencies
are presented, CountAttract and MQAttract [
        <xref ref-type="bibr" rid="ref10">13, 6</xref>
        ].
      </p>
      <p>
        Bittencourt et al. present two new attraction
functions based on information retrieval techniques [
        <xref ref-type="bibr" rid="ref9">5</xref>
        ]. They
use semantic information in the source code, including
module- and filenames. The attractions are calculated
based on cosine similarity (IRAttract) and latent semantic
indexing (LSIAttract). They make a quantitative
comparison between the performance of their attraction functions
with CountAttract and MQAttract in an evolutionary
setting (where a few new files are to be assigned a mapping).
They find that combining attraction functions (e.g., if
CountAttract fails, try IRAttract) performs best. They
ifnd that CountAttract usually misplaces entities on
module borders. MQAttract performs better when mapping
entities with dependencies to many diferent modules.
IRAttract and LSIAttract perform better when mapping
entities in libraries or entities on module borders, but
worse if there are modules that share vocabulary but are
not related [
        <xref ref-type="bibr" rid="ref9">5</xref>
        ].
      </p>
      <p>We have created an attraction function that uses
machine learning techniques and introduced the Concrete
Dependency Abstraction (CDA) method [8]. In short,
CDA produces textual representations of dependencies
at the level of architectural modules and lets a machine
learning technique learn the patterns of dependencies
from the actual source code and combine these with
information retrieval techniques. We implement this
approach using naive Bayes as an attraction function for
the HuGMe method, NBAttract. We have compared the
automatic mapping performance of CountAttract,
IRAttract, LSIAttract and NBAttract over several systems
using s4rdm3x, our open-source tool suite for automatic
mapping experiments [8, 14].</p>
      <p>The main limitations for the techniques that build on
HuGMe are the need for an initial set and, in some cases,</p>
    </sec>
    <sec id="sec-3">
      <title>3. Keywords and File-Based Mapping</title>
      <p>
        low-quality mappings. The initial set needs to be
manually created and be of good quality for the attraction
functions to perform well. We estimate that a randomly
composed initial set needs to include approximately
1520% of the source code entities. Based on this, we
conclude that creating the initial set is likely a significant
efort. Automated techniques will probably not result in
a perfect mapping except when they use a large initial
set and only map a few entities. In the best of cases, the
automated technique leaves hard to map instances to the
user (creating more manual work), but misclassifications
are problematic. There has not been much research in the
manual mapping steps of HuGMe except for the original
studies [
        <xref ref-type="bibr" rid="ref10">13, 6</xref>
        ]. Handling of misclassification and manual
support in these methods are still open issues.
      </p>
      <p>File naming and structure seem to reflect the intended
modular architectures we have studied quite well. For
example, module names tend to map to the directory
structure of the source code. However, the naming is
often not perfect. In some cases, module names are not
used, or shorter or slightly diferent terms are used. In
other cases, several module names exist in the structure
or naming of a file. A simplistic approach is thus not
appropriate. Instead, the file naming patterns need to be
fully defined, e.g., using regular expressions or a heuristic.</p>
      <p>
        For regular expressions to work, there is often a need to
maintain several expressions that can be conflicting and
overlapping. A more attractive option would be to use
2.2. Interactive Mapping machine learning and train a classifier using a good set
of keywords. The classifier’s task is to produce a good
Sinkala and Herold present InMap, which is not an auto- enough initial set. An automatic mapping technique can
mated approach to mapping per se, but instead suggest then use this initial set for further mappings.
mappings to the end-user, who can then choose to ac- In this work, we implement a proof of concept
mapcept the suggested mapping (or not) [
        <xref ref-type="bibr" rid="ref11">7</xref>
        ]. It is an iterative per using a multinomial naive Bayes classifier. It is a
approach that iteratively presents a suggested mapping simple, probabilistic approach that uses word
frequenfor a fixed number of entities. The end-user chooses to cies to compute the probability of each class. While it
accept or reject the suggestions. InMap uses the accepted is conceptually simple, naive Bayes often produce good
mappings to improve the suggested mappings further in results, especially if the training data is small. As the
the next iteration. It also uses the negative evidence of goal is to create a good enough mapping using a small
a rejected mapping and does not suggest this mapping set of predefined keywords, naive Bayes is thus a good
again. InMap produces the suggested mappings similar candidate for a proof of concept study.
to Bittencourt et al., with the addition of a descriptive We base our implementation on the Weka library [15]
text for each architectural module. InMap also includes and train the classifier using the custom keywords for
the path and filename used in the Java class and package each module. Note that the same keyword can be
specnames. It treats the source code entities as a database of ified multiple times, increasing the importance of that
documents and uses Lucene to search this database using particular keyword.
module information as a query. Sinkala and Herold eval- We derive the prediction data from the path of each
uate InMap using six open source systems. For the best source code entity, including the filename. The filename
combination (in terms of highest F1 score) of informa- is split into words based on common camel-, kebab, and
tion, InMap can suggest mappings for most of a system’s snake-case rules. In addition, we value later parts of the
entities with a mean recall of 0.95, a mean precision of path more and add these words multiple times. Intuitively
0.84, and a mean F1 score of 0.89. allowing for a deeper nested folder mapping to ”override”
      </p>
      <p>The main limitations of InMap are its highly interactive a higher level mapping. For example, the file:
nature and that architectural documentation needs to
exist for every module. The documentation provided needs net/sf/jabref/logic/util/io/FileHistory.java
to be of good quality, i.e., as short as possible but
containing good keywords. Noisy documentation will likely will produce the following words:
not help in producing high-precision suggestions. The
interactiveness of InMap is in some way double-edged; net sf jabref logic util io filehistory file history sf jabref
the technique often seems to require more interaction logic util io jabref logic util io logic util io util io io
(accepting or rejecting a suggested mapping) than there
are entities in the source code. On the other hand, if
not minor mapping errors can be tolerated, a mapping
validation is needed anyway.</p>
      <p>Note the six occurrences of io reflecting the nesting
depth of the word in the path.</p>
      <p>To generate a useful initial set, it is more important that
the mappings are precise rather than complete. There
needs to be a high diference between the best mapping
probability and the second best. By trial and error, we
found a factor of 1.99 to work well, i.e., the highest prob- 2016, 2017, and 2019 respectively. A system expert has
ability needs to be 1.99 times higher than the second- provided both the architecture and the mapping for these
highest probability for mapping to occur. systems. The architecture documentation and mappings</p>
      <p>We have implemented the mapper described above in are available in the SAEroCon repository9. ArgoUML,
our open-source tool suite s4rdm3x [16]. Ant, and Lucene has been previously studied [17, 18],
and the architectures and mappings were extracted from
the replication package of Brunet et al. [17]. K9 has been
4. Method preliminary mapped by ourselves based on architecture
documentation provided in [19]10. We have not validated
this mapping with system experts but include it since it
is an interesting case with a more complex file structure.</p>
      <p>We use nine open-source systems where the ground truth
mappings are known. We create a keyword set for each
module based on the ground truth mappings. We make
sure that these keywords will successfully map at least
some entities to each module. 5. Results and Analysis</p>
      <p>
        After we have determined the keywords, we run our
keywords-based mapper and create an initial set. This We use the existing ground truth mappings to construct
initial set is then used as the input to another mapper, a set of keywords for each system. Table 1 shows the
NBAttract, which also uses multinomial naive Bayes but manually extracted keywords. Note that a single
keyinstead forms training- and prediction words using de- word is suficient in many cases, and many keywords
pendency information in the form of concrete depen- are the same as or some variation of the module name.
dency abstractions (CDA) [8]. We compare the perfor- K9 presents an interesting exception where several
keymance to NBAttract with a random initial set. In this words are needed. We relied on a high-level architectural
configuration, we use file information (not including the description when creating the mapping for K9, where
module keywords) and CDA. In addition, we compare to allowed dependencies were the most clearly defined. The
the interactive approach InMap [
        <xref ref-type="bibr" rid="ref11">7</xref>
        ]. keywords used reflect the sub-modules of the high-level
      </p>
      <p>We collect precision, recall, and combined F1 scores modules. Note that our mapping has not been validated
for each approach. When a random initial set is used, by systems experts.
several sets of diferent sizes and compositions are needed Using the generated initial sets, we ran the NBAttract
to cover a large range of combinations. We will present mapper with CDA information only. We ran 1530
experithe performance metrics numerically and visually as the ments with random initial sets for the NBAttract mapper
efect of the initial set size is essential. where the mapper used filename and CDA information</p>
      <p>
        We use nine open-source systems implemented in Java. (no module keywords). Finally, we use the best-reported
Ant1 is an API and command-line tool for process au- performance metrics from [
        <xref ref-type="bibr" rid="ref11">7</xref>
        ]. Table 2 shows the
compartomation. ArgoUML2 is a desktop application for UML ison of the four approaches. Using the keywords-based
modeling. Jabref3 is a desktop application for managing mapping, we can create an initial set with perfect
precibibliographical references. K94 is an open-source email sion and recall in Commons Imaging, ProM, and Sweet
client for Android. Lucene5 is an indexing and search Home 3D. The keywords for these systems are
straightlibrary. ProM6 is an extensible framework that supports forward and are often directly reflected in the module
a variety of process mining techniques. Note that we name. For the other systems, keywords can generate
use the ProM framework and not the full ProM system. an initial set with perfect precision. However, recall is
Sweet Home 3D7 is an interior design application. Team- sufering.
      </p>
      <p>Mates8 is a web application for handling student peer Using the keywords-based initial sets and NBAttract
reviews and feedback. using CDA performs very well, with precision scores</p>
      <p>A documented software architecture and a mapping over 0.95 in all cases and almost perfect scores for recall,
from the implementation to this architecture exist for cf. Table 2).
each system. Jabref, TeamMates, and ProM have been Figures 1, 2, and 3 shows the running median F1 score,
the study subjects at the Software Architecture Erosion precision, and recall for each system. The figures focus
and Architectural Consistency Workshop (SAEroCon) on showing the running median for random initial sets
and NBAttract. This configuration seems to lack
precision in Commons Imaging and Sweet Home 3D, and the
recall is sufering in Ant. The naming and dependency
information are possibly conflicting in these systems.
Ta1https://ant.apache.org
2http://argouml.tigris.org
3https://jabref.org
4https://k9mail.app/
5https://lucene.apache.org
6http://www.promtools.org
7http://www.sweethome3d.com
8https://teammatesv4.appspot.com
9https://github.com/sebastianherold/SAEroConRepo
10http://oss.models-db.com/Downloads/EASE2019_
ReplicationPackage/</p>
      <sec id="sec-3-1">
        <title>System</title>
        <p>Module
globals
preferences
model
logic
gui
cli
queryparser
search
index
store
analysis
util
document
business
presentation</p>
      </sec>
      <sec id="sec-3-2">
        <title>Random + NBAttract P R F1</title>
        <p>ble 2 shows mean values; they can vary quite a bit in the 3D (cf. Figure 1). This indicates that when the mapping is
actual cases depending on the size and composition of established, NBAttract often performs well when only a
the initial set. few new source code entities are introduced (e.g., during</p>
        <p>Finally, InMap lacks in precision but performs well re- software evolution). However, in some cases, the F1 score
garding the recall. Note that InMap is a highly interactive is declining as the initial set becomes larger, e.g., JabRef,
approach to mapping. The aim is not to automate the K9, and TeamMates (cf. Figure 1). A preliminary
analmapping but rather give good advice to a human user that ysis seems to point towards overfitting, i.e., the model
interactively maps the source code iteratively. If there becomes too specific, and as a result, the recall drops
is a need to check an automatic mapping thoroughly, an (cf. Figure 3). It can also be an efect of randomness; the
interactive approach is attractive regardless of precision. 1530 data points per system are pretty low considering
the combinatorial complexity of random initial set sizes
and compositions. However, it is suficient to indicate
6. Discussion and Validity the overall performance in a preliminary study such as
this. The very high recall in ProM (cf. Figure 3) can be
explained by the fact that the ProM framework has a very
straightforward mapping, and as before, the number of
data points may be too small.</p>
        <p>We are limited to systems in Java, where the file
structure often reflects the modular design of our subject
systems well. While we could handle discrepancies and
ambiguities well enough to create an initial set, this may not
be the case in a system where the file structure is entirely
diferent. However, we also show that these cases can
use the file information. Current mapping methods, e.g.,
NBAttract and InMap, should likely give file information
more attention.</p>
        <p>Keywords can be efectively used and provide an
excellent initial set, even a perfect mapping in some cases. It
is an attractive approach compared to manually mapping
an initial set. Hypothetically, it should be easier to
extract the keywords and specify the corresponding module
and weight of the keyword than mapping several tens or
hundreds of files manually. The main challenge in this
area is, of course, to find a high precision and minimal
set of keywords. We used the already established ground
truth mappings to do this in this preliminary evaluation,
but this approach is not feasible in a real case. However,
analyzing the directory structure and looking for words
in the module names could provide a starting point in
many cases. Possibly using a deeper level in the directory
hierarchy or looking for repeating patterns could be fruit- 7. Conclusions and Future
ful. Semantic analysis using, e.g., WordNet could be an
approach to find related words in the directory structure.</p>
        <p>In addition, information from, e.g., method names and
identifiers could be used.</p>
        <p>It would arguably be easier to create and maintain a
small set of keywords compared to, e.g., regular
expressions, even if done entirely manually.</p>
        <p>Using a large random initial set seems to give a very
high performance of NBAttract in some cases, e.g.,
ArgoUML, Commons Imaging, Lucene, ProM, Sweet Home
We found that we could construct relatively simple
keywords for a majority of the 96 modules in all nine systems.</p>
        <p>Ten modules (9.6%) required weights for keywords, and
15 (15.6%) required two or more diferent keywords. Our
mapper could successfully create an initial set using the
keywords, and in some cases, this resulted in a perfect
mapping.</p>
        <p>Combining the keywords-based mapping and
NBAttract using CDA provided outstanding performance with</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Work</title>
      <p>0.3</p>
      <p>0.4 0.5 0.6
Commons Imaging
0.7</p>
      <p>0.5
JabRef
0.0
0.1
0.2
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9</p>
      <p>1.0
0.5
K9
0.0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9</p>
      <p>1.0
0.5</p>
      <p>Lucene
0.5</p>
      <p>ProM
0.4 0.5 0.6
TeamMates
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
Ant</p>
      <p>ArgoUML
re .90
o
c
S
1 .8
F 0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0.0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.7
0.8
0.9</p>
      <p>1.0
0.4 0.5 0.6
SweetHome3D
0.0
0.1
0.2
0.3
0.7
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9</p>
      <p>1.0
Random+NBAttract
Keywords
Keywords+NBAttract
InMap
0.3</p>
      <p>0.4 0.5 0.6
Commons Imaging
0.7</p>
      <p>0.5
JabRef
0.0
0.1
0.2
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9</p>
      <p>1.0
0.5
K9
0.0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9</p>
      <p>1.0
0.5</p>
      <p>Lucene
0.5</p>
      <p>ProM
0.4 0.5 0.6
TeamMates
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
Ant</p>
      <p>ArgoUML
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0.0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.7
0.8
0.9</p>
      <p>1.0
0.4 0.5 0.6
SweetHome3D
0.0
0.1
0.2
0.3
0.7
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9</p>
      <p>1.0
Random+NBAttract
Keywords
Keywords+NBAttract
InMap
0.7
0.9
0.0
0.2
0.4
0.6
0.7
0.8
0.9</p>
      <p>1.0
0.5</p>
      <p>Lucene
0.5</p>
      <p>ProM
0.4 0.5 0.6
TeamMates
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
Ant
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0
.
1
9
.
0
8
.
0
7
.
0
0.0
0.1
0.2
0.3
0.4
0.6
0.7
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.7
0.8
0.9</p>
      <p>1.0
0.4 0.5 0.6
SweetHome3D
0.0
0.1
0.2
0.3
0.7
0.8
0.9
1.0
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9</p>
      <p>1.0
Random+NBAttract
Keywords
Keywords+NBAttract
InMap
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0</p>
      <p>Initial Set Size
Figure 3: The recall of each approach, Random+NBAttract are shown with a running median and the running 25th to 75th
quartiles. Note that the recall starts at 0.7.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>a mean precision, recall</article-title>
          , and
          <source>F1 score of 0.98</source>
          ,
          <issue>1</issue>
          .0, and 0.99, [8]
          <string-name>
            <given-names>T.</given-names>
            <surname>Olsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ericsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wingkvist</surname>
          </string-name>
          ,
          <string-name>
            <surname>Semirespectively.</surname>
          </string-name>
          <article-title>The performance was higher than using automatic mapping of source code using naive random initial sets and NBAttract using CDA and file bayes, in: Proceedings of the 13th European Coninformation, and the interactive technique InMap (see ference on Software Architecture</article-title>
          - Volume
          <volume>2</volume>
          ,
          <year>2019</year>
          , Table 2). p.
          <fpage>209</fpage>
          -
          <lpage>216</lpage>
          .
          <article-title>If a mapping is already established</article-title>
          ,
          <source>NBAttract with</source>
          [9]
          <string-name>
            <given-names>V.</given-names>
            <surname>Tzerpos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. C.</given-names>
            <surname>Holt</surname>
          </string-name>
          ,
          <article-title>The orphan adoption probCDA and file information provides good performance in lem in architecture maintenance, in: Working Conmany cases; however, in some systems, the model could ference on Reverse Engineering</article-title>
          , IEEE,
          <year>1997</year>
          , pp.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>sufer from overfitting issues (cf</article-title>
          .
          <source>Figure 3)</source>
          .
          <fpage>76</fpage>
          -
          <lpage>82</lpage>
          .
          <article-title>Using keywords is an attractive approach that can sig-</article-title>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Garcia</surname>
          </string-name>
          , I. Krka,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mattmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Medvidovic</surname>
          </string-name>
          ,
          <article-title>Obnificantly reduce the mapping efort. However, a central taining ground-truth software architectures, in: question that remains is how to extract good candidate 35th International Conference on Software Engikeywords and let a human user assign weights</article-title>
          .
          <source>neering (ICSE)</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>901</fpage>
          -
          <lpage>910</lpage>
          .
          <article-title>In addition, a keywords-based mapping approach</article-title>
          is [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Buckley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>English</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rosik</surname>
          </string-name>
          , S. Herold,
          <article-title>likely not applicable for some systems. We plan on per- Real-time reflexion modelling in architecture recforming comparative studies using the mappings from [10], onciliation: A multi case study, Information and where the authors claim architectural modules are not</article-title>
          <source>Software Technology</source>
          <volume>61</volume>
          (
          <year>2015</year>
          )
          <fpage>107</fpage>
          -
          <lpage>123</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>bound to the file structure of the source code</article-title>
          . [12]
          <string-name>
            <given-names>N.</given-names>
            <surname>Anquetil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. C.</given-names>
            <surname>Lethbridge</surname>
          </string-name>
          ,
          <article-title>Recovering software architecture from the names of source files</article-title>
          ,
          <source>Journal of Software Maintenance: Research and Practice 11 Acknowledgments</source>
          (
          <year>1999</year>
          )
          <fpage>201</fpage>
          -
          <lpage>221</lpage>
          . [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Christl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Koschke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Storey</surname>
          </string-name>
          ,
          <article-title>Equipping the The research was supported by the Centre for Data Inten- reflexion method with automated clustering</article-title>
          ,
          <source>in: sive Sciences and Applications</source>
          at Linnaeus University. Working Conference on Reverse Engineering, IEEE,
          <year>2005</year>
          , pp.
          <fpage>98</fpage>
          -
          <lpage>108</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>References</surname>
            [14]
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Olsson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ericsson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Wingkvist</surname>
          </string-name>
          ,
          <article-title>An exploration and experiment tool suite for code to archi-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>De Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Balasubramaniam</surname>
          </string-name>
          ,
          <article-title>Controlling soft- tecture mapping techniques, in: Proceedings of the ware architecture erosion: A survey</article-title>
          ,
          <source>Journal of 13th European Conference on Software ArchitecSystems and Software</source>
          <volume>85</volume>
          (
          <year>2012</year>
          )
          <fpage>132</fpage>
          -
          <lpage>151</lpage>
          .
          <fpage>ture</fpage>
          - Volume
          <volume>2</volume>
          , ECSA '
          <volume>19</volume>
          ,
          <year>2019</year>
          , p.
          <fpage>26</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Murphy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Notkin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sullivan</surname>
          </string-name>
          , Software [15]
          <string-name>
            <given-names>I.</given-names>
            <surname>Witten</surname>
          </string-name>
          , E. Frank,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pal</surname>
          </string-name>
          ,
          <article-title>Data Mining, reflexion models: Bridging the gap between source Fourth Edition: Practical Machine Learning Tools and high-level models</article-title>
          ,
          <source>ACM SIGSOFT Software and Techniques</source>
          , 4th ed., Morgan Kaufmann PubEngineering Notes 20 (
          <year>1995</year>
          )
          <fpage>18</fpage>
          -
          <lpage>28</lpage>
          . lishers Inc., San Francisco, CA, USA,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Baker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. O</given-names>
            <surname>'Crowley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Herold</surname>
          </string-name>
          , J. Buck- [16]
          <string-name>
            <given-names>T.</given-names>
            <surname>Olsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ericsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wingkvist</surname>
          </string-name>
          , s4rdm3x:
          <article-title>A ley, Architecture consistency: State of the practice, tool suite to explore code to architecture mapping challenges and requirements, Empirical Software techniques</article-title>
          ,
          <source>Journal of Open Source Software 6 Engineering</source>
          <volume>23</volume>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>35</lpage>
          . (
          <year>2021</year>
          )
          <article-title>2791</article-title>
          .
          <source>doi:1 0 . 2 1 1 0 5 / j o s s . 0 2</source>
          <volume>7 9 1 .</volume>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Knodel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Popescu</surname>
          </string-name>
          , A comparison of static archi- [17]
          <string-name>
            <given-names>J.</given-names>
            <surname>Brunet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Bittencourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Serey</surname>
          </string-name>
          , J. Figueiredo,
          <article-title>tecture compliance checking approaches</article-title>
          ,
          <source>in: The On the evolutionary nature of architectural violaIEEE/IFIP Working Conference on Software Archi- tions, in: Working Conference on Reverse Engitecture</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>12</fpage>
          -
          <lpage>21</lpage>
          . neering, IEEE,
          <year>2012</year>
          , pp.
          <fpage>257</fpage>
          -
          <lpage>266</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R. A.</given-names>
            <surname>Bittencourt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Jansen de Souza Santos</surname>
          </string-name>
          , D. D. S. [18]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lenhard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blom</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Herold</surname>
          </string-name>
          , Exploring the suitGuerrero, G. C.
          <article-title>Murphy, Improving automated map- ability of source code metrics for indicating archiping in reflexion models using information retrieval tectural inconsistencies, Software Quality Journal techniques</article-title>
          , in: Working Conference on Reverse (
          <year>2018</year>
          ). Engineering, IEEE,
          <year>2010</year>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>172</lpage>
          . [19]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nurwidyantoro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ho-Quang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R. V.</given-names>
            <surname>Chaudron</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Christl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Koschke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Storey</surname>
          </string-name>
          , Automated Automated classification
          <article-title>of class role-stereotypes clustering to support the reflexion method, Infor- via machine learning</article-title>
          ,
          <source>in: Proceedings of the Evalmation and Software Technology</source>
          <volume>49</volume>
          (
          <year>2007</year>
          )
          <fpage>255</fpage>
          -
          <lpage>274</lpage>
          . uation and Assessment on Software Engineering,
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z. T.</given-names>
            <surname>Sinkala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Herold</surname>
          </string-name>
          , Inmap:
          <source>Automated inter- 2019</source>
          , p.
          <fpage>79</fpage>
          -
          <lpage>88</lpage>
          .
          <article-title>active code-to-architecture mapping recommendations</article-title>
          ,
          <source>in: IEEE 18th International Conference on Software Architecture (ICSA)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>173</fpage>
          -
          <lpage>183</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>