<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Preliminary Study of Higher Dimensional Software Structures</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emili Puh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tihana Galinac Grbac</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Neven Grbac</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Juraj Dobrila University of Pula</institution>
          ,
          <addr-line>Zagrebačka 30, Pula, HR-52100</addr-line>
          ,
          <country country="HR">Croatia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SQAMIA 2023: Workshop on Software Quality Analysis</institution>
          ,
          <addr-line>Monitoring, Improvement, and Applications</addr-line>
        </aff>
      </contrib-group>
      <fpage>13</fpage>
      <lpage>25</lpage>
      <abstract>
        <p>Quality of large-scale mission-critical software systems depends on software system architecture. Although we design and create software system architecture we are still unable to evaluate how architectural decisions influence software behavior. This problem is becoming even more important in the context of future networks which assume autonomic software creation and interconnection guided by special needs and purposes of its creation where software architectures are created autonomously. One of the approaches to this problem is in development of software structure metrics that can be used as characterization metrics for architecture comparison and evaluation of its properties. Our previous work proposed to use of motifs for such purposes. However, its application is limited because of computational eficiency and we were able to investigate only 3-node motifs. Here in this paper, we provide a preliminary study investigating how higher dimensional substructures within software architecture may be used as software metrics and if these higher dimensional structures bring benefit to software characterization. Here, we search for higher dimensional substructures and aim to investigate their growth and change trends in the context of evolving software systems. We find out that structure behavior we may only understand by having a multidimensional structure view. Furthermore, we observed that the initial project size may correlate to structure stabilization across the product evolution. More precisely, projects with larger initial sizes may be slower in stabilizing the structure of internal dependencies while projects with smaller initial sizes may stabilize the structure of internal dependencies in just a few project releases.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Software structure</kwd>
        <kwd>software call graphs</kwd>
        <kwd>higher dimensional subgraphs</kwd>
        <kwd>software topology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        One of the key research goals within the software engineering community was to understand
the mechanisms of software dynamics and its relations to the software static attributes which
we can measure during the software creation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Although the software engineering discipline
has matured over the years of evolution we still lack instruments to model software behavior
or more precisely, to model software operational characteristics during the design time. In
most cases, the software modeling approaches are focused on implementing logic that would
deliver functional software characteristics not on modeling its operational behavior. One active
direction of software engineering research is aiming to find appropriate instruments that would
allow modeling software behavior already during the design time and thus more precisely
design software products that would better fit its purpose and its operational goals.
      </p>
      <p>
        The software architecture is one of the main artifacts we design and create during the software
design phase. Usually, the modeling of software structure and architecture is based on static
software attributes or based on human subjective decisions [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. However, its impact on software
dynamics has been weakly explored.
      </p>
      <p>
        One large group of research is devoted to developing various models for Software Defect
Prediction, SDP experimenting with various statistical approaches, machine learning, and
artiifcial intelligence that aims to build models based on aforementioned static software metrics
measured on software components such as software size, development efort, detected during
the design phase, and various software complexity metrics. However, these approaches have
almost neglected the existence of the software communication structure and focus solely on
static software metrics on files, modules, and classes (we will call them modules further on), and
correlate these static metrics with a number of failures or faults identified during the software
operation. These approaches rely on the module clustering principle based solely on data
similarity measured on those modules. However, many researchers have recognized the influence
of communication patterns within the software system on the fault and failure behavior and
operational software characteristics [
        <xref ref-type="bibr" rid="ref22 ref23 ref24 ref25">22, 23, 24, 25</xref>
        ]. Furthermore, communication patterns
persist during the system evolution and may uncover fault and failure prediction abilities across
system versions. However, a recent literature review of cross-project predictions showed that
papers are mainly based on these static metrics aiming to improve data preprocessing and
feature selection methods [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and there the system architecture view is weakly represented
within these metrics.
      </p>
      <p>
        On the other hand, many approaches have been investigated for software structure analysis.
Previous research eforts have mainly focused on applying complex network science based on
the software dependency or call graph, [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1, 2, 3, 4, 5</xref>
        ]. Investigating the dependency network
metrics [
        <xref ref-type="bibr" rid="ref14 ref20 ref21">20, 21, 14</xref>
        ] and network science approaches to software defect prediction is still an
active area of research. Recent studies have identified that its direct application alongside all
software files may not be efective. However, some great improvements may be achieved if
these models are used to target some particular modules [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. More precisely, these models may
be beneficial for module clustering within the software structure.
      </p>
      <p>
        Our previous study has been motivated by the aim to find appropriate software structure
characterization metrics that would enable us to further improve software defect prediction
modes. We found that information obtained from software structure expressed as the frequency
of the three-node subgraphs may be in correlation with software defects, [
        <xref ref-type="bibr" rid="ref15 ref17 ref5">5, 17, 15</xref>
        ]. This finding
is aligned with numerous previous observations that the most faulty modules are exactly the ones
that implement the most communication links with other modules. Moreover, we identified that
the same 3-node subgraphs are present across all system releases but their frequency changes
as the system evolves and we statistically proved that such measured structure evolution is
continuous and there are no signs of structure stabilization as the system matures [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Here in
this paper, our research is further exploring higher dimension subgraphs and their representation
within the software structure and aims to identify what subgraph dimensions are represented
within the software structures and what is the highest subgraph dimension present within the
software structures. Furthermore, we want to explore how subgraphs evolve and change across
from the various structural dimensions within the system evolution.
      </p>
      <p>The paper is structured as follows.</p>
      <p>In Sect. 2 we provide an overview of the software structure analysis on the evolving software
systems by introducing the concept of software structure, providing details of Eclipse dataset
which is frequently used for SDP research, providing an overview of related works on software
structures. Furthermore, in Sect. 3 we explain how we extracted the higher dimensions and the
results we obtained. In Sect. 4 we discuss the issues that may represent threats to the results
and conclusions we derive based on the selected study case. Finally, in the Sect. 5 we conclude
the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Software structure evolution analysis</title>
      <sec id="sec-2-1">
        <title>2.1. Research background</title>
        <p>
          In this section, we aim to provide an overview of the research in finding appropriate models
for delivering best-efort software quality by doing appropriate actions during the design of
the software. One of the research directions is to predict and detect as many as possible
defects early, already during the design time, and to additionally invest in quality assurance
activities on selected-prediction supported faulty parts. In this line of research belong numerous
software defect prediction studies aiming to predict faulty software parts needed additional
attention with the main benefit to save on focused quality activities that are invested into
high-risk defective modules. The majority of such modules are employing machine learning and
artificial intelligence tools to find high-risk modules based on historical data and fining relations
between faulty modules’ incidence and their static attributes. There are numerous metrics
investigated to measure static code attributes and therefore a lot of studies reported challenges
of dimensionality reduction and feature selection procedures, [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ]. In spite of numerous static
metrics, it is identified that SDP models could be improved with the additional characterization
of the problem it studies and that software structure is not well represented among existing
static code attributes. The system structure is an important design time artifact however we
still do not have adequate models to asses its goodness for the purpose it is developed. Our
previous studies have already demonstrated that some representation of the software structure
may contain useful information to model software defect prediction [
          <xref ref-type="bibr" rid="ref15 ref5">15, 5</xref>
          ]. Therefore, here
in this paper, we focus our eforts on the further explanation of how some hidden structural
characterization obtained from topological information and higher dimensional structure can
support SDP.
        </p>
        <p>
          Software architecture is an abstract term that we usually use in the software design phase.
Mostly it is used to represent a software structure as a set of software components i.e. modules,
ifles, or classes and their interactions [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Software structure may be represented by a call
graph as indicated in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. We reused the concept of presenting software structure as a call
graph from [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and develop a rFind tool for automatic call graph extraction from Java source
code that we presented in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Actually, the call graphs are useful instruments to identify the
dependencies that exist between parts of the software system. The working of the tool may
be simply explained with the help of the figure 1 where on the left-hand side is represented
software code and on the right-hand side is represented by a call graph. The example of software
consists of three classes A, B, and C. Each class is represented by a node on the call graph of
the right side. Furthermore, the method calls among the classes, methodB(), and methodC() are
represented by vertices or edges of the software structure call graph presented on the right side
of the 1.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Datasets in the study</title>
        <p>In our previous works, we analyzed the software structure from open-source software
repositories from the Eclipse community. Eclipse is an open-source community that stores its software
versions on the open GIT repository (https://git.eclipse.org/c/). For the purposes of our study,
we selected Java Development Tools (JDT) and Plug-in development environment (PDE) and
Business Intelligence and Reporting Tools (BIRT) which is an Eclipse Platform-based reporting
system used to create data visualizations and reports that can be embedded into rich client
and web applications. These projects were selected because they are often analyzed in various
research papers so our selection was motivated with the aim to understand the obtained results
and their meaning when integrated into the existing knowledge base. Our study involves 14
JDT, 13 PDE, and 9 BIRT sequential releases. Most of the other studies analyzes only the first
3 releases of each of these projects. Here we wanted to analyze changes over the software
evolution and therefore we undertake data collection for all available releases in the respective
projects. The projects are developed during a longer period of time, BIRT in a period of 7 years,
JDT 12 years of development and PDE is developed in 11 years.</p>
        <p>
          The data collection process consists of the following steps. Firstly we downloaded from the
GIT repository all class files related to each project release. Then we extracted call graphs with
rFind tool that we developed within our research group and used in our previous studies for
software structure analytics [
          <xref ref-type="bibr" rid="ref15 ref17 ref5">5, 17, 15</xref>
          ]. Furthermore, we collected a number of defect data per
release in Bugzilla open-source project bug report repository. For precise mapping of classes
with defects we have developed a tool BuCo [
          <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
          ]. Furthermore, we calculated static metrics
for each project release. The data set contains information for all classes involved within each
analyzed software release (36 software releases in total). For each software release, we collected
more than 50 static metrics per class and per release.
        </p>
        <p>Here, in this paper, we will base our analysis solely on the call graphs collected with rFind
tool. But our future plans are to extend the analysis also for the context of SDP.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Review on Eclipse studies</title>
        <p>
          Firstly, in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] we analyzed the characteristics of software structure when it is represented as
a graph through the lens of subgraph occurrences and motifs. Motif is a graph property and
is represented by the frequency of subgraph occurrence that is statistically significantly more
represented in the observed graph than within a large number of random graphs. As has been
already shown in many scientific fields such as medicine, sociology, and electrical engineering
motifs are a good characterization of graph sources and may be used to diferentiate graphs
originating from diferent sources, e.g. motifs found in Escherichia coli, World Wide Web, and
feed-forward networks difer, [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ]. However, previous studies have focused their analysis only
on three node subgraph motifs, which are presented in Fig. 2, since motif identification requires
significant computational resources and finding higher order subgraph structures and their
motifs is algorithmic complex and is costly in the sense of computational resources and time. In
our previous study where we analyzed motifs within Eclipse software structures, we confirmed
the finding from [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] that the object-oriented software has its unique characterization in terms
of motifs 5 and 6 see Fig. 2. Furthermore, we proved that as software grows the significance of
the specific motifs grows in the analyzed context of Eclipse software.
        </p>
        <p>
          In our previous study [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] performed on Eclipse software, we also show that the same subgraph
types are present in all versions of the system releases throughout the system evolution but their
frequencies change as the system evolves. Also, not all subgraph types are present in all system
versions. With the help of the subgraph occurrence instrument for representing the system
structure, we proved that the system structure is significantly changing during the system
evolution and that the system structure is continuously evolving. Moreover, we could not
confirm that the software structure is tending to stabilize as the system matures. On the other
hand, we found that the frequency of particular subgraph occurrences is correlated with system
defects and is stabilizing as the system evolves. This finding has opened new research challenges
that we further investigated and results are presented in [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. In our further investigation,
we were analyzing the impact of each three-node subgraph on the defectiveness of the code
represented by this subgraph. We statistically proved on observed Eclipse software programs
that diferent subgraph types behave diferently in the sense of defectiveness of code represented
by that subgraph. This finding leads us to the conclusion that communication interactions that
are represented by diferent subgraph types behave diferently and that the communication
interactions among the classes of code have the influence of defects. It is worth noting that this
conclusion is aligned with some previous research and empirical studies. However, this is the
ifrst study that has succeeded to measure and statistically prove the efect of communication
patterns on software defectiveness. Furthermore, it is the same work we found that code
defectiveness of diferent subgraphs does not stabilize as the system matures during the system
evolution. Note that this finding is opposite to the code defectiveness of the system that tends
to stabilize during the system evolution. Our findings are limited only to three-node subgraphs.
Although we were able to generalize this finding on all three-node subgraphs represented within
the analyzed system, we were not able to generalize this finding on higher-order subgraphs
that may also be present within the system graph structure. It is interesting to further analyze
at which higher-order subgraph this code defectiveness stabilization occurs.
        </p>
        <p>Based on this analysis we may conclude that with the help of subgraph defectiveness, we
may capture the code liveness property that other static code metrics may not be able to ofer.
Furthermore, we found that subgraph defectiveness is correlated to system release defectiveness
and this finding opens new opportunities to further investigations within the Software Defect
Prediction community.</p>
        <p>
          Furthermore, we were analyzing the software structure with the help of network science
models [
          <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
          ]. Previous studies that imply graph theory for mathematical modeling of software
architecture have analyzed the cyclomatic complexity of various structures, resilience to failures,
and failure propagation through architecture, [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>Here in this paper, we aim to further investigate higher-order structures in graphs that may
bring some positive results in developing new models for smart software architecting. We
believe that particular software graph structures may have bad influences on software behavior
and should be avoided by proper software design decisions. Some earlier works have analyzed
software modularity but we are not aware of studies analyzing this impact directly in relation
to code defectiveness. Also, previous studies were analyzing the level of modularity between
software classes as a number of communicating links but here we aim to investigate the behavior
of higher-order software structures within the software evolution.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Results</title>
      <p>The main goal of the study was to understand how software structures evolve with respect to
the higher dimensional subgraph structures that are represented within the software structure.
In our study, as explained in the previous sections, we observe software structure from the
code dependency viewpoint and the main source of our analysis is the call graphs. As we have
already explained, the call graphs are software structures formed from the nodes that represent
Java classes from the Eclipse software and edges between these nodes that are represented by
method calls between these classes.</p>
      <p>
        Within the call graphs, the goal is to find how higher dimensional substructures are
represented as follows:
• Zero-dimensional subgraphs (which we refer to as 0D) are the substructures present
within the call graphs represented by nodes, i.e., these are just the classes present in the
software.
• One-dimensional subgraphs (which we refer to as 1D) ) are the substructures present
within the call graphs represented by two nodes and an edge connecting these two nodes,
i.e., these are just the method calls present in the software.
• Two-dimensional subgraphs (which we refer to as 2D) are the substructures present
within the call graphs represented by three nodes fully connected by edges among them,
i.e., these are the complete subgraphs with three nodes. These were studied as part of our
research on motifs in the software [
        <xref ref-type="bibr" rid="ref15 ref17">17, 15</xref>
        ].
• Three-dimensional subgraphs (which we refer to as 3D) are the substructures present
within the call graphs represented by four nodes fully connected by edges among them,
i.e., these are the complete subgraphs with four nodes.
• And so on,  -dimensional subgraphs (which we refer to as  D) are the substructures
present within the call graphs represented by  + 1 nodes fully connected by edges among
them, i.e., these are the complete subgraphs with  + 1 nodes.
      </p>
      <p>The dimension of these substructures refers to their geometric realization. The OD substructures
are nodes, represented by points, the 1D substructures are edges, represented by segments, the
2D substructures can be viewed geometrically as triangles, the 3D substructures as tetrahedra,
and so on. The higher dimensional substructure of dimension  can be viewed geometrically as
the polyhedron with  + 1 vertices.</p>
      <p>Figure 3 represents the trend of the software growth over the project releases a) in the zero
dimension 0D represented by a number of nodes, and b) in the first dimension 1D represented by
a number of edges. The figure provides also the results of linear regression analysis performed
on the three Eclipse projects. The obtained regression parameters (a, b) are provided in Table 1
for each analyzed dimension and Eclipse product. From the figure, we can observe that there is
bigger growth in 0D and 1D (the number of classes and method calls) at the beginning of the
evolution, and then, in the middle of the evolution, this growth is gradually decreasing or if
we imagine infinite evolution we can say that size measured in 0D and 1D dimension has an
asymptote towards some finite software size. This is also evident from the table1 where we can
observe a reductions of a shape parameter (a) in higher dimensions.</p>
      <p>From the figures and tables provided for 0D and 1D we may observe that size of the slope
parameter may be in correlation with the initial software size, software size at the first release.
The initial size of the BIRT project and all its subsequent releases is more then doubled with
respect to the JDT and PDE projects and the same may be observed with the slope parameter in
the respective projects.</p>
      <p>Moreover, variations in growth size between releases are unstable in the first several releases,
and then after some point, these variations stabilize as software evolves for JDT and PDE
projects and tend to converge to some asymptote. Projects with smaller initial sizes have smaller
growth variations compared to projects with larger initial sizes.</p>
      <p>In Figure 4 we present the trend of software growth with respect to the 2D subgraphs
structures along with the regression lines for each analyzed project. From the figure we may
observe that in the case of smaller Eclipse projects (JDT and PDE) the software evolution has
similar behavior as in lower dimensional substructures. However, the behavior of the largest
project BIRT is somewhat diferent. In 2D we may observe that bigger projects have weaker
linear regression fit, and we may observe larger deviation of the measurements from the linear
regression line. Software growth, expressed as a number of triangles in the 2D dimension, at
the beginning of the evolution is much higher than in the second part of the analyzed evolution
cycle. Like in 0D and 1D dimensions, we can observe that the slope parameter in the linear
regression curve may be correlated with software size in the initial project release.</p>
      <p>Variations in software change between the project releases, as a measure of structural change
during the evolution, are more evident in the first half of the observed evolution cycle while in
the second part of the evolution cycle, these structural changes tend to stabilize. It is interesting
to observe the diference in software structure change trends for the three observed projects
during the evolution. We can observe that the slope parameter in the linear regression curve of
structure change is very low and resulting in an almost horizontal regression line while the slope
parameter of the 2D structure change between releases is very high for BIRT in comparison with
other two projects, JDT and PDE. For the BIRT project, we may also observe a larger deviation
of the measured curve from the linear regression line than for the other two projects. It seems
that larger projects have larger variations in structure change between the project releases than
smaller projects.</p>
      <p>Figure 5 presents the growth in the number of tetrahedral found in software structure and
the change in the number of tetrahedral in software evolution. We can observe from the figure
there is a significantly diferent trend in the number of tetrahedral over software evolution. The
smallest project in the 0D dimensions, PDE, has a constant and very low number of tetrahedral
and no growth during the evolution can be observed. Then, a project with a bit larger initial
size, JDT, in the 0D dimension has a trend of almost linear growth in this 3D dimension. While,
the project with the largest initial size in the 0D, BIRT, has rapid growth in the 3D dimension in
the first three evolution cycles, then we may observe to stabilize and almost constant trend in
the middle four versions of evolution and finally we may observe a high drop down almost to
the initial size during the last three releases of evolution. It is interesting to observe how smaller
projects JDT and PDE had relatively small or no variations in a growth change during the
evolution while in the largest project, we may observe significant variations over the evolution.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Threats to validity</title>
      <p>
        The current open-source development practice is still not matured to perform and develop
clear measurement standards. In Software code analytics there has been numerous attempts to
systematically collect source code metrics however the datasets are often criticised in sense of
lack of systematic procedure. Besides numerous criticism, the SDP community have succeeded
to develop some research results that improved not only the SDP research but also found
applications in other areas of research and practice. However, recent studies have indicated on
low reports from industrial case studies [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>There are numerous issues one should address: remove test source files, identify relation
of bug report and bug source (usually it is hard to match the exact location where the bug is
corrected in which code file and there exist numerous approaches to establish unique standards
to address that problem.), remove duplicates (which may be hard).</p>
      <p>
        Furthermore, in our study, we used the call graphs that are measuring dependencies among
software parts as a source for grounding conclusions. The call graphs have been widely
explored within the software engineering community aiming to develop better software design
and analysis tools that would be able to perform and predict impact or vulnerability detection,
analysis, risk assessment, etc. Using call graphs as a source to bring structural and quality
conclusions have several threats. First of all, there are also numerous tools developed for
call graph extraction from the source codes. One comparative study on the capabilities and
efectiveness of various call graph extraction tools has been performed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The results
indicate that each tool has its strengths and weaknesses. Furthermore, there are diferent views
on software architecture. It may be seen from the runtime view, i.e. dependencies that are
obtained from running code and capturing its processing flows or it may be obtained from
dependencies that are obtained from statical code analysis by extracting all the possible calls
among the system source files.
      </p>
      <p>Despite numerous criticisms, some datasets become standard for various machine learning,
deep learning and artificial intelligence approaches to software defect prediction. It turns
positive to have at least some dataset, although not fully correct, to compare various algorithms
on the unique base. The prerequisite for such an action is to have open datasets so the community
can easily approach them and experiment. The conclusion is that it is more important to have a
standard metric or dataset to bring valuable conclusions than to have a very precise and complex
procedure. In that sense, we have developed tools for data collection and have systematically
collected the data for all the datasets in the case study. Since we collected the data on the same
Eclipse projects as the open datasets (PROMISE) we have the possibility to compare our results
and analyze the diferences. Moreover, our tools go beyond the PROMISE dataset because it
connects two communities of software structure analysis and software defect prediction.</p>
      <p>Here, we do not claim to have a completely ideal dataset. However, we do our best to collect
thorough data and report on all possible misunderstandings. We did not use the usual datasets
(PROMISE) because we do not have full control over the data in analysis. Although we have
performed systematic comparisons with these datasets and reported on diferences we identified.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Concluson</title>
      <p>From our analysis, it is obvious that we can observe diferent trends in structure change when
we observe the same software structure in diferent dimensions. Moreover, we may observe
that this trend is somehow related to the project size of its initial release.</p>
      <p>Projects that are initially smaller tend to stabilize already in lower-order dimensions and
in fewer evolution steps. While for the projects with larger initial sizes in the zero and one
dimensions, we may capture size growth stabilization at higher-order dimensions. Therefore,
from the analyzed case we observed that initial project size may be in relation to trends observed
at various structural dimensions.</p>
      <p>All these imply that the evolution of structure growth has to be analyzed across various
dimensions and a complete overview of structure growth behavior we may only understand by
having such a multidimensional approach. One dimension of structural analysis may not be
enough to understand structure evolution.</p>
      <p>Furthermore, we concluded that the project’s initial size is very important for its further
evolution. Projects with initially larger sizes may have greater opportunities for future growth. If
we observe this conclusion in the context of call graphs that are representing code dependencies
we may say that projects with higher initial code complexity may have more evolution cycles
before they stabilize. Also, it is interesting to observe the meaning of stabilization in
higherorder dimensions. Projects with higher initial sizes have more fluctuations in higher dimensions
and have a slower trend to stabilize. Also, we observe higher rising trends in higher dimensions.
It is interesting to observe a significant downtrend in the third dimension of the largest project
in our analysis. Probable, due to unpredictable complexity there was some redesign of software
structure that might explain such a downtrend.</p>
      <p>In analysing Eclipse projects we were able to find representations up to three-dimensional
software structures. This implies that software structures are not much represented in higher
dimensional structures and thus finding higher dimensional structures may not be a
computationaly demanding task. However, we analysed just Eclipse projects and our further explorative
analysis would involve a much wider study case.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Acknowledgments</title>
      <p>This paper acknowledges the support of the Erasmus+ Key Action 2 (Strategic partnership
for higher education) project No. 2020–1–PT01–KA203–078646: “SusTrainable - Promoting
Sustainability as a Fundamental Driver in Software Development Training and Education” and
the support of the Croatian Science Foundation under the project HRZZ-IP-2019-04-4216. The
information and views set out in this paper are those of the author(s) and do not necessarily
reflect the oficial opinion of the European Union. Neither the European Union institutions and
bodies nor any person acting on their behalf may be held responsible for the use which may be
made of the information contained therein.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Callahan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carle</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hall</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kennedy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Constructing the procedure call multigraph</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          <volume>16</volume>
          (
          <issue>4</issue>
          ),
          <fpage>483</fpage>
          -
          <lpage>487</lpage>
          (
          <year>1990</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ryder</surname>
            ,
            <given-names>B.G.</given-names>
          </string-name>
          (May
          <year>1979</year>
          ).
          <article-title>"Constructing the Call Graph of a Program"</article-title>
          .
          <source>IEEE Transactions on Software Engineering. SE-5</source>
          (
          <issue>3</issue>
          ):
          <fpage>216</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Grove</surname>
          </string-name>
          , David; DeFouw, Greg; Dean, Jefrey; Chambers, Craig; Grove, David; DeFouw, Greg; Dean, Jefrey; Chambers,
          <source>Craig (9 October</source>
          <year>1997</year>
          ).
          <article-title>"Call graph construction in objectoriented languages"</article-title>
          .
          <source>ACM SIGPLAN Notices. ACM</source>
          .
          <volume>32</volume>
          (
          <issue>10</issue>
          ):
          <volume>108</volume>
          ,
          <fpage>108</fpage>
          -
          <lpage>124</lpage>
          ,
          <fpage>124</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Vitalis</given-names>
            <surname>Salis</surname>
          </string-name>
          , Thodoris Sotiropoulos, Panos Louridas, Diomidis Spinellis, Dimitris Mitropoulos,
          <article-title>"PyCG: Practical Call Graph Generation in Python"</article-title>
          ,
          <source>2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)</source>
          , pp.
          <fpage>1646</fpage>
          -
          <lpage>1657</lpage>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Jean</given-names>
            <surname>Petric</surname>
          </string-name>
          , Tihana Galinac Grbac, Mario Dubravac,
          <article-title>"Processing and Data Collection of Program Structures in Open Source Repositories"</article-title>
          .
          <source>Proceedings of the 3rd Workshop on Software Quality Analysis, Monitoring, Improvement and Applications (SQAMIA)</source>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>66</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Antal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hegedus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tóth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ferenc</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Gyimóthy</surname>
          </string-name>
          , "[Research Paper]
          <string-name>
            <surname>Static JavaScript Call Graphs: A Comparative Study</surname>
          </string-name>
          ,
          <article-title>"</article-title>
          <source>2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM)</source>
          , Madrid, Spain,
          <year>2018</year>
          , pp.
          <fpage>177</fpage>
          -
          <lpage>186</lpage>
          , doi: 10.1109/SCAM.
          <year>2018</year>
          .
          <volume>00028</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Daniel</surname>
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <surname>Brauner</surname>
            <given-names>R. N.</given-names>
          </string-name>
          <string-name>
            <surname>Oliveira</surname>
          </string-name>
          , Rick Kazman, and
          <string-name>
            <surname>Elisa</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Nakagawa</surname>
          </string-name>
          .
          <year>2022</year>
          .
          <article-title>Evaluation of Systems-of-Systems Software Architectures: State of the Art and Future Perspectives</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>55</volume>
          ,
          <issue>4</issue>
          , Article 67 (May
          <year>2023</year>
          ),
          <volume>35</volume>
          pages.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Eisenbarth</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Koschke</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ; Simon,
          <string-name>
            <surname>D.</surname>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>"Aiding program comprehension by static and dynamic feature analysis"</article-title>
          .
          <source>Proceedings IEEE International Conference on Software Maintenance. ICSM</source>
          <year>2001</year>
          :
          <fpage>602</fpage>
          -
          <lpage>611</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Giray</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bennin</surname>
            ,
            <given-names>K.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Köksal</surname>
            ,
            <given-names>Ö</given-names>
          </string-name>
          , Babur,
          <string-name>
            <surname>Ö</surname>
          </string-name>
          , Tekinerdogan,
          <string-name>
            <surname>B.</surname>
          </string-name>
          :
          <article-title>On the use of deep learning in software defect prediction</article-title>
          .
          <source>Journal of Systems and Software 195</source>
          , Article no.
          <volume>111537</volume>
          (
          <year>2023</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Pandey</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tripathi</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          :
          <article-title>Machine learning based methods for software fault prediction: A survey</article-title>
          .
          <source>Expert Systems with Applications 172</source>
          , Article no.
          <volume>114595</volume>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Szymon</given-names>
            <surname>Stradowski</surname>
          </string-name>
          and
          <string-name>
            <given-names>Lech</given-names>
            <surname>Madeyski</surname>
          </string-name>
          .
          <year>2023</year>
          .
          <article-title>Industrial applications of software defect prediction using machine learning: A business-driven systematic literature review</article-title>
          .
          <source>Inf. Softw. Technol</source>
          . 159,
          <string-name>
            <surname>C (</surname>
          </string-name>
          Jul
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Pal</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Sillitti</surname>
          </string-name>
          ,
          <article-title>"Cross-Project Defect Prediction: A Literature Review,"</article-title>
          <source>in IEEE Access</source>
          , vol.
          <volume>10</volume>
          , pp.
          <fpage>118697</fpage>
          -
          <lpage>118717</lpage>
          ,
          <year>2022</year>
          , doi: 10.1109/ACCESS.
          <year>2022</year>
          .
          <volume>3221184</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Rajbahadur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Hassan</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <article-title>"Revisiting the Impact of Dependency Network Metrics on Software Defect Prediction,"</article-title>
          <source>in IEEE Transactions on Software Engineering</source>
          , vol.
          <volume>48</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>5030</fpage>
          -
          <issue>5049</issue>
          , 1 Dec.
          <year>2022</year>
          , doi: 10.1109/TSE.
          <year>2021</year>
          .
          <volume>3131950</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Zimmermann</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Nagappan</surname>
          </string-name>
          , “
          <article-title>Predicting defects using network analysis on dependency graphs,”</article-title>
          <source>in Proc. Int. Conf. Softw. Eng.</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>531</fpage>
          -
          <lpage>540</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Ana</surname>
            <given-names>Vranković</given-names>
          </string-name>
          , Tihana Galinac Grbac, Željka Car,
          <article-title>Software structure evolution and relation to subgraph defectiveness</article-title>
          .
          <source>IET Softw</source>
          .
          <volume>13</volume>
          (
          <issue>5</issue>
          ):
          <fpage>355</fpage>
          -
          <lpage>367</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Milo</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen-Orr</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Itzkovitz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.:
          <article-title>'Network motifs: simple building blocks of complex networks'</article-title>
          ,
          <source>Science</source>
          ,
          <year>2002</year>
          ,
          <volume>298</volume>
          , (
          <issue>5594</issue>
          ), pp.
          <fpage>824</fpage>
          -
          <lpage>827</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Petrić</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Galinac Grbac,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>Software structure evolution and relation to system defectiveness</article-title>
          .
          <source>In: Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, EASE '14</source>
          . pp.
          <volume>34</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          :
          <fpage>10</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Pita</given-names>
            <surname>Costa</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , Galinac Grbac,
          <string-name>
            <surname>T.</surname>
          </string-name>
          :
          <article-title>The topological data analysis of time series failure data in software evolution</article-title>
          .
          <source>In: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering Companion</source>
          . pp.
          <fpage>25</fpage>
          -
          <lpage>30</lpage>
          . ICPE '17 Companion,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Valverde</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solé</surname>
            ,
            <given-names>R.V.</given-names>
          </string-name>
          :
          <article-title>Network motifs in computational graphs: a case study in software architecture. Physical review</article-title>
          . E, Statistical, nonlinear,
          <source>and soft matter physics 72 2 Pt</source>
          <volume>2</volume>
          ,
          <fpage>026107</fpage>
          -
          <lpage>1</lpage>
          ,
          <fpage>026107</fpage>
          -
          <lpage>8</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Sanja</given-names>
            <surname>Grbac</surname>
          </string-name>
          <string-name>
            <surname>Babic</surname>
          </string-name>
          , Tihana Galinac Grbac:
          <article-title>Network analysis of evolving software-systems</article-title>
          .
          <source>SoftCOM</source>
          <year>2017</year>
          :
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>Sanja</given-names>
            <surname>Grbac</surname>
          </string-name>
          <string-name>
            <surname>Babic</surname>
          </string-name>
          , Tihana Galinac Grbac, Jonatan Lerga:
          <article-title>Community structure of a complex software-system in evolution</article-title>
          .
          <source>MIPRO</source>
          <year>2018</year>
          :
          <fpage>1467</fpage>
          -
          <lpage>1471</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>N.E.</given-names>
            <surname>Fenton</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Ohlsson</surname>
          </string-name>
          .
          <year>2000</year>
          .
          <article-title>Quantitative Analysis of Faults and Failures in a Complex Software System</article-title>
          .
          <source>IEEE Trans. Software Eng. 8 (Aug</source>
          <year>2000</year>
          .),
          <fpage>797</fpage>
          -
          <lpage>814</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>C.</given-names>
            <surname>Andersson</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Runeson</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          <volume>33</volume>
          ,
          <issue>5</issue>
          (
          <year>2007</year>
          ),
          <fpage>273</fpage>
          -
          <lpage>286</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>T. Galinac</given-names>
            <surname>Grbac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Runeson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Huljeni</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>A Second Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          <volume>39</volume>
          ,
          <issue>4</issue>
          (
          <year>2013</year>
          ),
          <fpage>462</fpage>
          -
          <lpage>476</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Vrankovic</surname>
            ,
            <given-names>Ana,</given-names>
          </string-name>
          <source>and Tihana Galinac Grbac. "Replication of Quantitative Analysis of Fault Distributions on Open Source Complex Software Systems." SQAMIA</source>
          .
          <year>2018</year>
          . CEUR Workshop proceedings, Vol.
          <volume>2217</volume>
          , paper 22.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>Goran</surname>
            <given-names>Mauša</given-names>
          </string-name>
          , Paolo Perković, Tihana Galinac Grbac, Ivan Stajduhar:
          <article-title>Techniques for Bug-Code Linking</article-title>
          .
          <source>SQAMIA</source>
          <year>2014</year>
          ,
          <volume>47</volume>
          -
          <fpage>55</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>Mauša</surname>
          </string-name>
          , Goran et al. “
          <article-title>Software defect prediction with Bug-Code analyzer - A data collection tool demo</article-title>
          .
          <source>” 2014 22nd International Conference on Software, Telecommunications and Computer Networks (SoftCOM)</source>
          (
          <year>2014</year>
          ):
          <fpage>425</fpage>
          -
          <lpage>426</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>