<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Workshop on Software Quality Analysis, Monitoring, Improvement, and Applications, September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Matrix Based Approach to Structural and Semantic Analysis Supporting Software Product Line Evolution</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jakub Perdek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valentino Vranić</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Informatics, Information Systems and Software Engineering, Faculty of Informatics and Information Technologies, Slovak University of Technology in Bratislava</institution>
          ,
          <addr-line>Ilkovičova 2, 842 16 Bratislava 4</addr-line>
          ,
          <country country="SK">Slovakia</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0</fpage>
      <lpage>13</lpage>
      <abstract>
        <p>The evolution of product families is complex due to the exponential growth of new products with a modified codebase. It is caused by the introduction of new such important or restrictive features that allow transforming an existing solution into a new product as the basis for another one. Specifically, both the structure of resulting components and semantic information need to be taken into account to provide in-depth supporting materials by capturing feature interactions and their capabilities. Additionally, data from heterogeneous applications usually difer and their handling needs account for diferent scoring. We tackled these problems by creating various supporting views based on structural and semantic information. Related applications are modeled as graphs, also instances of their components are optionally included, and various matrix based algorithms based on similarity metrics are integrated. The ifnal integrated and automated approach is eficient because it runs in polynomial time, is extensively focused on dependencies/connections between nodes, can be adapted to big data, and is extendable and enhanced to support diferent metrics. Its capability to organize analyzed parts into hierarchies by applying hierarchic clustering helps to specify the context to support comprehension or find a related position of features based on their interaction, especially their coupling. With regards to possible design issues, each updated product should be checked by an expert or more autonomously against performed predictions, especially its complexity and coupling between components. Additionally, extendability can be measured from the chosen sequence of such updates. An approach including an automated matrix based feature recognition process is presented in the analysis of modular Angular applications. Our future work will be more focused on semantic enhancements, and its applications on big data by generating and analyzing various types of fractals including extracted knowledge from them.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;feature modeling</kwd>
        <kwd>feature trees</kwd>
        <kwd>matrix clustering</kwd>
        <kwd>hierarchical clustering</kwd>
        <kwd>graph comparison</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Software evolution, especially the evolution of software product lines, requires support
decisionmaking with domain information about features, components, and related software artifacts.
Even if the number of resulting products rises exponentially then the process is more complicated.
We tackle both problems by proposing a method capable of generating various supporting views
according to chosen metrics based on structural and semantic information. For this purpose,
we integrated various matrix based algorithms which are based on similarity measures.</p>
      <p>We addressed scalability issues by selecting fast algorithms which are capable to process
graphs given by their adjacency matrices. In addition, issues related to restricted matrix size are
handled by selecting only a subset from all candidates. Available approaches are mostly focused
on semantics but do not directly handle structural information given by
dependencies/connections between nodes. In our integrated approach, the information can be balanced with semantic
one to focus on diferent software aspects that can support decision-making. Consequently, it is
driven by calculated similarity metrics which selection depends on the chosen configuration.</p>
      <p>The paper is focused on presenting the important aspects and applications of the used
algorithms followed by their integration. Additionally, we commented on some of their parts
and given decisions during their integration. Finally, We present our method applied to analyze
two Angular applications, namely Puzzle To Play and Design 3D. From both of them, graphs were
created. Extracted modular software fragments are mostly services or components. Connections
are created based on observed coupling given by imports and occurrences in the HTML template.
We made the proposed solution publicly available on GitHub1. The most of functionality is
verified with unit tests and results agree with ones from the authentic papers.</p>
      <p>The paper is organized as follows: Section 2 provides a detailed analysis of semantic methods
for feature extraction and hierarchical mapping with a focus on graph merging and clustering
driven by semantics. The way how various matrix based methods are integrated to support
decision-making about software product line evolution is explained in Section 3. The solution
is discussed in Section 4. Comparison with other available semantic-based methods is stated in
Section 5. Finally, conclusions and future work are presented in Section 6.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Semantic Matrix Based Analysis of Features</title>
      <p>
        Similarity can be measured by a lot of metrics. The main text-based approaches are shingles [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
and the cosine index. Categorical data are measured by the Jaccard index to compare two distinct
sets which have evenly distributed data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. A few other metrics used to measure local structural
information are the Hub Promoted index, Hub Depressed index, and Leicht-Holme-Newman
index [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. For problems not dependent, and thus unbiased on the number of overlapping items,
but only on how many items from a smaller set are covered by a bigger one, is beneficial to
use Inclusion index [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. This index should be preferred over Cosine or the Jaccard index [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Another semantic approach focused on document categorization is Latent Semantic Indexing
(LSI) where the similarity between two documents is measured by using cosine similarity or
inner product on the vector representation of documents [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        The farthest point algorithm (complete agglomerative clustering scheme [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]) was used as base
metrics to evaluate the distance between clusters and finally during the creation of dendrogram
in hierarchical clustering to get communities of manufacturing process [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]:
 (,  ) =
      </p>
      <p>{
( [ ],  [ ])}</p>
      <p>Also, they combined similarity calculations by summing and making square root from the
1https://github.com/jperdek/matrixBasedMethodsForGraphs
additive value of features where each feature is given by squared similarity (  ) of feature
multiplied by given feature weight (  ):
⎯⎸ 
 (,  ) = ⎷⎸∑︁
  * (  )2</p>
      <p>
        Metrics measure similarity from diferent representations such as sets or vectors and thus is
necessary to choose also semantic attributes which should be processed. In the semantic-based
extraction approach [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], authors extract/tag information from source code using encoding rules
suitable for mapping into UML form. The return type, signatures, parameters, and lines of code
are some of these semantic relations which can be measured and transformed into similarity
scores. To extract appropriate information many metrics such as the Jaccard coeficient and
cosine similarity are forming the similarity model in the next analyzed methods.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Matrix Based Hierarchical Graph Clustering</title>
        <p>
          Connections in the graph carry semantic information fulfilled by implication that each reference
from one source file to another one creates a relation between these documents [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Document
weight can be evaluated accordingly by using non-negative input and output weights [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. The
maximal value from these two weights appropriately normalized is chosen to determine if the
document is a better hub or authority [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Adding one to this value helps to not decrease
overall weight during their multiplication. The next step consists of finding the correlation
between pages by multiplying each weight on the path and with chosen correlation factor
(0&lt;F&lt;1, but usually 0.5) powered by the length of the path to penalize longer paths:
  =  , 1 *   1, 2 *   2, 3 * ... *  ,
*
 
(, )
where each correlation weight   is the maximum from weights   and   or 1 if documents
match [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This part can be implemented using Floyd-Warshall algorithm [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] applied on
adjacency matrix where maximum weights of edges in this future shortest path matrix are
assigned for no connections (zeros) and self connections do not exist. The correlation weight
matrix is assigned according to evaluated input and output weights and updated together with
the distance matrix in each update of the shortest path matrix in this algorithm. If nodes are
the same resulting correlation is one [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
and output contributions of links [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]:
        </p>
        <p>The next important part is the creation of a similarity metric to take into account the input

  =


,
= |
 | + |</p>
        <p>|

,
+ 

,</p>
        <p>
          =


,
= |  | + |  |

,


,
+ 

,


,

Cosine similarity metrics are used here to measure similarity between each type of links [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]:

=
(  *
        </p>
        <p>)
|
  | * |  |



=
(
|

 * 
 | * |
 )
 |</p>
        <p>Similarity for the same nodes in the resulting similarity matrix is 1 otherwise the value of</p>
        <p>
          (,  ). Finally, similar documents should be put together and clustered by the algorithm
based on matrix partition. How closely related attributes are (togetherness) is measured by
afinity metric [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. In the Bond energy algorithm (BEA) columns of the similarity matrix are
permuted to maximize the global afinity matrix defined as [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]:
=
︁∑

︁∑
        </p>
        <p>=1  =1

, * (
, −1 + 
, +1)</p>
        <p>
          By this operation elements of greater values are put closer to each other [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Rows are then
permuted accordingly. Firstly, the algorithm initializes the first two columns according to the
ifrst columns of the similarity matrix. Then in each iteration, a new column is inserted into the
new matrix in the place where the cont value is maximized. This value is evaluated for each
adjacent pair of columns by temporarily substituting a new column before, between, or behind
and given as [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]:

( ,
 ,
        </p>
        <p>
          ℎ
where bond (afinity) is given as [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]:
 
(,  ) =
        </p>
        <p>(,  ) =
 ,
ℎ
) = 2 * 
( ,
 
) + 2 * 
(
)
︁∑</p>
        <p>=1

) − 2 * 
( ,</p>
        <p>ℎ
, *

,</p>
        <p>
          The global afinity matrix is iteratively created by inserting all columns into a new global
afinity matrix according to Bond (afinity) between the column before and the first column or
column after and the last column is 0 [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. For the maximum value of cont, the new column is
inserted in the colMiddle position. The rows are permuted in the same order. The similarity
matrix then can be recreated by only swapping values according to evaluated permutations that
were required to form a global afinity matrix. Swapping is applied firstly on columns and then
on rows accordingly. Permuted similarity matrix should be then partitioned into submatrices
by choosing dividing point X positioned on a diagonal. Four matrices then can be evaluated
using the following metric [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]:
  ,
=
        </p>
        <p>
          ︁∑
 +( −1)* ( −1)  +( − )* ( −1)
 =( −1)*  +1  =( −1)*  +1
︁∑
 ,
and for each dividing point, the following function is maximized [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]:
        </p>
        <p>=   1,1 *   2,2 −   1,2 *   2,1</p>
        <p>Dividing point X divides the matrix into four sub-matrices and the algorithm recursively
clusters the upper left matrix (
1,1) and then the bottom right one (</p>
        <p>
          2,2) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The approach
has been also extended to cluster the same document into multiple clusters in the work of Hou
et al. [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. The whole clustering process is depicted in Figure 1.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Matrix Based Comparison of Graphs</title>
        <p>
          Similarity can be measured also between graphs and thus be applied in many fields. Feature
extraction, graph isomorphism, and iterative methods are the main categories for graph
similarity measurement [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. In feature analysis of many applications, the main task is to find node
correspondence. Information about connections can be used to characterize the most important
nodes by finding if such a node is a good authority or hub. HITS [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is a known iterative
algorithm used to evaluate the importance of web pages. Authoritative and hub scores thus
can be used as similarity scores of graph vertices (for given vertex j and vertex authority/hub)
and further generalized for diferent graphs [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. This can be used to match given graphs in an
where A and B are adjacency matrices. Obtained similarity matrix from even iteration [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] can
be further used to find maximum weight matching between vertices of two graphs in graph
matching approaches. The process of matching the vertices based on the adjacency matrix is
depicted in Figure 2. Many authors [
          <xref ref-type="bibr" rid="ref16 ref18 ref19">18, 19, 16</xref>
          ] used Hungarian method [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] mainly for its
asymptotic polynomial upper bound  ( 3) [
          <xref ref-type="bibr" rid="ref21 ref22 ref23">21, 22, 23</xref>
          ] to solve similar problems.
        </p>
        <p>
          According to Zager and Verghese [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], this iterative process can be extended on edges as
well. First, each adjacency matrix needs to be transformed into the source-edge matrix and
terminus-edge matrix by determining each source node of edge in the first one (   and   )
and terminal node edge in the second one (  and   ) by setting ones in given matrices [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
In comparison with the first method oriented only on nodes, the update of edges (matrix Y) and
nodes (matrix X) are intertwined:
  =   *   −1 *   +   *   −1 *  
  =   *   −1 *   +   *   −1 *
        </p>
        <p>After each phase, matrices are normalized by the Frobenius norm. The steps of the whole
process applied to graph matching are depicted in Figure 3.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Integrating Matrix Based Methods for Structural and</title>
    </sec>
    <sec id="sec-4">
      <title>Semantic Analysis</title>
      <p>Product variants from the same product family share similar code fragments. Their aggregation
according to their interconnections and semantic content can potentially express domain
knowledge which can be organized into variability or other models. Used scores for connections
between nodes indicate the relation of a given software part with another, but mainly its coupling
and importance. Consequently, is important to take context, which is given by semantics, into
account. Semantic similarity scores must ensure balance of semantically related software
fragments in each group.</p>
      <p>Identification of features from the given source code should help analyze and compose
diferent applications and even their parts on diferent hierarchy levels. Appropriate metrics
for diferent types of documents need to be selected to capture their semantic information and
relatedness. The most used metrics are various similarity scores. They are used in iterative
algorithms. Web components have not only methods and classes but also template code. Modules
in some environments such as Angular require diferent strategies to extract information and
represent their structure as a graph. All dependencies are represented as connections between
nodes and can be weighted. The remaining part is focused on the integration of matrix-based
algorithms and decisions applied during this process.</p>
      <p>Firstly, the application code is parsed to extract and collect imports, links, and optionally in
the case of template of web components also elements that reference other ones. For future
analysis, relevant semantic information is collected as well such as method and class names,
variables, parameters, comments, attributes, and classes of markup language elements. Secondly,
the graph is constructed according to potential references between components and optionally
iflled with semantic attributes. In a graph database given elements can be analyzed directly, but
mainly it allows transformation from the selected part of the graph to the adjacency matrix.</p>
      <p>
        Many related applications from the same domain are required for deeper analysis of features
and further variability model creation such as a feature tree. Adjacency matrices from graph
representation allow the grouping of similar nodes together according to the relation between
software parts in each similar application modeled as a graph. In marginal cases, this grouping
can be strengthened by merging nodes into the new one. This provides a way to analyze,
process, and cluster unconnected graphs together. The methods for graph matching from
Section 2.2 are adapted here. The simplest way is to the algorithm by Blondel et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] that
evaluates similarity among nodes/documents only. Following this one, the application matrix
with similarities between each pair of nodes/documents is produced. The higher the value, the
higher the similarity of nodes is. These values can be used further to decide if nodes should be
only grouped according to the introduced new node or merged into one. For analysis purposes
only the intersection of nodes represented as the mapping between them can be used to only
reinforce the existing application structure by grouping more related information. The second
node-edge graph matching algorithm [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] which is performing the update of both node and
edge similarity matrices in each iteration is due to the additional edge matrix being more flexible.
The similarity between each pair of edges can be directly used in the clustering algorithm in
time when document weight is evaluated according to input, and then output connections. Also
in the process of grouping or merging nodes this information is useful to not lose edge weight
determining the similarity of connection between nodes (if this value is low, the probability
that nodes are semantically related in the given context is low).
      </p>
      <p>
        The next important step is to extract given features from prepared sources. Connections
amongst files are represented as an adjacency matrix with associated labels. To semantically
enhance clustering additional information about indexed nodes is collected. From this matrix
page weight and correlation matrix are created according to Section 2.1 which is based on Hou
et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. If a similarity matrix of edges is available then these values are useful (depending on
configuration) to use while evaluating document/node weight. In such cases mainly those which
are analyzing graph intersection weights are updated according to the following additional
change. During counting the number of edges, the adjacency matrix value referring to the
given connection of nodes is multiplied by the appropriate edge similarity value increased by
about one (to make weight at least one). Then it is possible to propagate the information of
similarity of nodes during graph matching to the clustering process and thus make connections
that appear in many applications stronger (the higher edge weight value) and match amongst
each other as much as possible as those connected only in one or few applications or those
partially matched according to their lower edge similarity score. Clustering of results, during
which the additive value (for each node similarity score) of connections is enhanced by this
scoring, are more precisely evaluated to support timeless compression of use cases and final
feature detection. This is important to partially capture domain knowledge and what the system
is.
      </p>
      <p>
        In the following step, the similarity matrix used for clustering purposes is created. The
connections of documents are evaluated using cosine similarity metric according to Hou et al.
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and multiplied by a special coeficient given for each node pair as a ratio where the value of
rows (columns) is divided into whole value. If we create additional matrices for each similarity
metric and cluster them in case of equal values of this base similarity matrix then our model
will not be easily scalable. To prevent this issue we extended the previous similarity equation
about values by adding other semantic metrics multiplied by the given coeficient which value
should be found during model tuning:

evaluated similarly. Final graph structure and results from clustering support further
decisionmaking which is based on structural and semantic similarity of analyzed graphs. The process
lfow of our integrated approach is displayed in Figure 4.
      </p>
    </sec>
    <sec id="sec-5">
      <title>4. Discussion</title>
      <p>We apply integrated functionality on two Angular applications that significantly difer from
each other but are tied up by the same domain in general. Thus they are characterized by
overlap of the same components, operations, and problems. Our approach enables merging
these two applications and then cluster source files. It allows us to analyze visualized results of
the likely feature names or similarly coupled nodes. We decided to use the Neo4j graph database
due to the possibility to perform fast evaluation and visualization of the aggregated results. It
allows us to realize another analysis with inserted data. The same process has been applied to
the Design 3D application. Finally, a hierarchy was created from given clusters.</p>
      <p>Results show that semantic metrics are necessary, otherwise similarly coupled nodes will be
clustered together in the clustering phase. Two tested applications are not enough to test full
strength and tune algorithm parameters. For this purpose, we propose another fractal-based
software product line capable to produce an extensive number of fractal products. Additionally,
the evolution process will be automated and driven by knowledge handled by this integrated
method. Information and component instances from each application will be merged, clustered,
and then used for classification purposes as input especially to graph neural network (GNN) or
naive Bayes (NB) models.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Related Work</title>
      <p>
        The most similar approach to ours is proposed in the AMPLE project [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Features and their
dependencies are mined from documentation by using latent semantic analysis and feature
models are also created by hierarchic clustering [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]. The significant diference with our
approach is in its restriction to given semantics related to requirements. Our approach evaluates
connections and node importance given by them and is additionally enhanced by semantic
information from diferent metrics. Also merging graphs, and adding or removing nodes is done
according to resulting scores and thus extend analytic possibilities that can be even automated.
Our integrated approach can process big data and be reused for general problems.
      </p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusions and Future Work</title>
      <p>The exponential growth of information from related and mostly heterogeneous products makes
decision-making about software product line evolution more complex. Specifically, both the
structure of resulting components and semantic information need to be taken into account to
provide in-depth supporting materials by capturing feature interactions and their capabilities.
Consequently, this process needs to handle big data and incorporate fast techniques into the
resulting solution. We address these problems by creating various semantic and structural views.
These views consist of related knowledge that is organized into hierarchies or groups. The
process is based on various matrix based algorithms with hierarchic clustering that all process
graph data represented as adjacency matrices. Thanks to used algorithms and no introduced
bottleneck, it runs in polynomial time which flexibly makes it suitable for processing big data,
but due to restrictions on matrix size (very large matrices cannot be processed at once), further
adaptations are required. Except for integrated algorithms designed for graph matching, other
ones, especially algorithms used to solve a graph isomorphism problem, are NP-hard, or are not
designed for massive amounts of data because of no directed selections from a large number
of nodes. Integrated algorithms extensively use similarity metrics that are based on input
and output connections. These can be easily replaced or adapted to other metrics by slightly
modifying algorithms or just calculating single similarity values which will be used across all of
them. This makes the integrated solution extendable. We enhanced it by calculating Jaccard’s
index from class-based categoric semantic variables such as class names or attributes from
HTML templates. Still, for such a supporting similarity-based tool verification is required.</p>
      <p>The resulting information can be merged or grouped on diferent abstraction levels such as
merging whole graphs by merging or grouping the most similar nodes and optionally connecting
(or disconnecting) remaining unmatched nodes. Nodes are grouped also in hierarchic structures
and new more significant connections can be introduced even to newly and optionally created
representative nodes for each group called medoids. Visualized results (mostly in a graph
database) help to discover the related context which is necessary for making predictions and
recognizing features. The overall process is based on similarity measures and thus results can be
diferent for their various settings and calculated metrics. This opens space for further analysis
and possibilities in directing software product line evolution in an automated way. Created
variability models and associated knowledge support decision-making about software product
line evolution. An approach is presented in the analysis of modular Angular applications.</p>
      <p>In future work, we will focus more on semantic enhancements, and their applications on
big data by generating and analyzing various types of fractals including extracted knowledge
from them. Possible future improvement is to calculate diferent metrics from software systems
such as complexity scores or using various models to calculate semantic similarity. Integration
into automated software product line evolution based on fractals will provide more information
on which similarity measures should be used and in which way are the most beneficial to
implement. This can be fulfilled by checking products against the best ones or using thresholds
for measured metrics, and then omitting poor ones. Large space of generated products according
to structural/semantic configuration and possibilities to insert given features needs to be handled.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>The work reported here received funding from the Operational Programme Integrated
Infrastructure for the project: Support of Research Activities of Excellence Laboratories STU in
Bratislava (ITMS code: 313021BXZ1), co-funded by the European Regional Development Fund
(ERDF).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Biegel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q. D.</given-names>
            <surname>Soetens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hornig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Diehl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Demeyer</surname>
          </string-name>
          ,
          <article-title>Comparison of similarity metrics for refactoring detection</article-title>
          ,
          <source>in: Proceeding of the 8th working conference on Mining software repositories - MSR '11</source>
          , ACM Press, Waikiki, Honolulu,
          <string-name>
            <surname>HI</surname>
          </string-name>
          , USA,
          <year>2011</year>
          , p.
          <fpage>53</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Z.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <article-title>Developing a capability-based similarity metric for manufacturing processes</article-title>
          ,
          <source>in: Volume</source>
          <volume>3</volume>
          :
          <string-name>
            <given-names>Manufacturing</given-names>
            <surname>Equipment</surname>
          </string-name>
          and Systems, American Society of Mechanical Engineers, Los Angeles, California, USA,
          <year>2017</year>
          , p.
          <fpage>V003T04A015</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lu</surname>
          </string-name>
          , Y.-C. Zhang, Predicting missing links via local information,
          <source>The European Physical Journal B</source>
          <volume>71</volume>
          (
          <year>2009</year>
          )
          <fpage>623</fpage>
          -
          <lpage>630</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sternitzke</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Bergmann</surname>
          </string-name>
          ,
          <article-title>Similarity measures for document mapping: A comparative study on the level of an individual scientist</article-title>
          ,
          <source>Scientometrics</source>
          <volume>78</volume>
          (
          <year>2009</year>
          )
          <fpage>113</fpage>
          -
          <lpage>130</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ducasse</surname>
          </string-name>
          , T. Gîrba,
          <article-title>Semantic clustering: Identifying topics in source code</article-title>
          ,
          <source>Information and Software Technology</source>
          <volume>49</volume>
          (
          <year>2007</year>
          )
          <fpage>230</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Müllner</surname>
          </string-name>
          ,
          <article-title>Modern hierarchical, agglomerative clustering algorithms</article-title>
          ,
          <source>ArXiv</source>
          (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kadar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Syed-Mohamad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. Abdul</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <article-title>Semantic-based extraction approach for generating source code summary towards program comprehension</article-title>
          ,
          <source>in: 2015 9th Malaysian Software Engineering Conference (MySEC)</source>
          , IEEE,
          <string-name>
            <surname>Kuala</surname>
            <given-names>Lumpur</given-names>
          </string-name>
          , Malaysia,
          <year>2015</year>
          , pp.
          <fpage>129</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bharat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. R.</given-names>
            <surname>Henzinger</surname>
          </string-name>
          ,
          <article-title>Improved algorithms for topic distillation in a hyperlinked environment</article-title>
          ,
          <source>in: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval</source>
          , SIGIR '98,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>1998</year>
          , pp.
          <fpage>104</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          ,
          <article-title>Authoritative sources in a hyperlinked environment</article-title>
          ,
          <source>J. ACM</source>
          <volume>46</volume>
          (
          <year>1999</year>
          )
          <fpage>604</fpage>
          -
          <lpage>632</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , J. Cao,
          <article-title>Web page clustering: A hyperlink-based similarity and matrixbased hierarchical algorithms</article-title>
          ,
          <source>in: Web Technologies and Applications</source>
          , volume
          <volume>2642</volume>
          , Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2003</year>
          , pp.
          <fpage>201</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Floyd</surname>
          </string-name>
          , Algorithm 97:
          <string-name>
            <surname>Shortest</surname>
            <given-names>path</given-names>
          </string-name>
          ,
          <source>Commun. ACM</source>
          <volume>5</volume>
          (
          <year>1962</year>
          )
          <fpage>345</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Özsu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Valduriez</surname>
          </string-name>
          ,
          <source>Principles of Distributed Database Systems</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>D. B. Crouch</surname>
            ,
            <given-names>C. J.</given-names>
          </string-name>
          <string-name>
            <surname>Crouch</surname>
            ,
            <given-names>G. Andreas,</given-names>
          </string-name>
          <article-title>The use of cluster hierarchies in hypertext information retrieval</article-title>
          ,
          <source>in: Proceedings of the second annual ACM conference on Hypertext - HYPERTEXT '89</source>
          , ACM Press, Pittsburgh, Pennsylvania, United States,
          <year>1989</year>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>237</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Jitian</surname>
            <given-names>Xiao</given-names>
          </string-name>
          , Yanchun Zhang, Xiaohua Jia,
          <string-name>
            <given-names>Tianzhu</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>Measuring similarity of interests for clustering web-users</article-title>
          ,
          <source>in: Proceedings 12th Australasian Database Conference. ADC</source>
          <year>2001</year>
          ,
          <article-title>IEEE Comput</article-title>
          . Soc, Gold Coast, Qld.,
          <string-name>
            <surname>Australia</surname>
          </string-name>
          ,
          <year>2001</year>
          , pp.
          <fpage>107</fpage>
          -
          <lpage>114</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gajardo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Heymans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Senellart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Van Dooren</surname>
          </string-name>
          ,
          <article-title>A measure of similarity between graph vertices: Applications to synonym extraction and web searching</article-title>
          ,
          <source>SIAM Review 46</source>
          (
          <year>2004</year>
          )
          <fpage>647</fpage>
          -
          <lpage>666</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>L. A.</given-names>
            <surname>Zager</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. C.</given-names>
            <surname>Verghese</surname>
          </string-name>
          ,
          <article-title>Graph similarity scoring and matching</article-title>
          ,
          <source>Applied Mathematics Letters</source>
          <volume>21</volume>
          (
          <year>2008</year>
          )
          <fpage>86</fpage>
          -
          <lpage>94</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Koutra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parikh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ramdas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xiang</surname>
          </string-name>
          ,
          <article-title>Algorithms for graph similarity and subgraph matching (</article-title>
          <year>2011</year>
          )
          <fpage>50</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>S.</given-names>
            <surname>Umeyama</surname>
          </string-name>
          ,
          <article-title>An eigendecomposition approach to weighted graph matching problems</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>10</volume>
          (
          <year>1988</year>
          )
          <fpage>695</fpage>
          -
          <lpage>703</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Almohamad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dufuaa</surname>
          </string-name>
          ,
          <article-title>A linear programming approach for the weighted graph matching problem</article-title>
          ,
          <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>
          <volume>15</volume>
          (
          <year>1993</year>
          )
          <fpage>522</fpage>
          -
          <lpage>525</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>H. W.</given-names>
            <surname>Kuhn</surname>
          </string-name>
          ,
          <article-title>The hungarian method for the assignment problem</article-title>
          ,
          <source>Naval Research Logistics Quarterly</source>
          <volume>2</volume>
          (
          <year>1955</year>
          )
          <fpage>83</fpage>
          -
          <lpage>97</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>N.</given-names>
            <surname>Tomizawa</surname>
          </string-name>
          ,
          <article-title>On some techniques useful for solution of transportation network problems</article-title>
          .,
          <source>Networks</source>
          <volume>1</volume>
          (
          <year>1971</year>
          )
          <fpage>173</fpage>
          -
          <lpage>194</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Edmonds</surname>
          </string-name>
          ,
          <article-title>Theoretical improvements in algorithmic eficiency for network flow problems (</article-title>
          <year>1972</year>
          )
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>E.</given-names>
            <surname>Munapo</surname>
          </string-name>
          ,
          <article-title>Development of an accelerating hungarian method for assignment problems</article-title>
          ,
          <source>Eastern-European Journal of Enterprise Technologies</source>
          <volume>4</volume>
          (
          <year>2020</year>
          )
          <fpage>6</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>A.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Royer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rummler</surname>
          </string-name>
          , Aspect-Oriented,
          <article-title>Model-Driven Software Product Lines: The AMPLE Way</article-title>
          , first edition ed., Cambridge University Press, New York,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>K.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mei</surname>
          </string-name>
          ,
          <article-title>An approach to constructing feature models based on requirements clustering</article-title>
          ,
          <source>in: 13th IEEE International Conference on Requirements Engineering (RE'05)</source>
          ,
          <year>2005</year>
          , pp.
          <fpage>31</fpage>
          -
          <lpage>40</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>