<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>RMSD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Mathematical Model for Detecting Outliers in the Two- Dimensional Data of Software Metrics RFC and CBO from Applications in Java</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergiy Prykhodko</string-name>
          <email>sergiy.prykhodko@nuos.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lidiia Makarova</string-name>
          <email>lidiia.makarova@nuos.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Liudmyla Latanska</string-name>
          <email>liudmyla.latanska@nuos.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Bryzghalov</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Admiral Makarov National University of Shipbuilding, Heroes of Ukraine Ave.</institution>
          ,
          <addr-line>9, Mykolaiv, 54007</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Odesa Polytechnic National University</institution>
          ,
          <addr-line>Shevchenko Ave., 1, Odesa, 65044</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1020</year>
      </pub-date>
      <volume>819</volume>
      <fpage>24</fpage>
      <lpage>26</lpage>
      <abstract>
        <p>This paper presents a mathematical model in the form of a transformed prediction ellipse for detecting outliers in two-dimensional data of the software metrics RFC (response for a class) and CBO (coupling between object classes) from applications in Java. At present, when the data distribution follows a normal law, it is possible to apply a prediction ellipse to identify outliers. However, for data whose distribution significantly deviates from normality, the use of such a prediction ellipse becomes statistically invalid. In such cases, it is necessary to first normalize the data, construct a prediction ellipse for normalized metrics, and subsequently apply the inverse transformation to obtain a transformed prediction ellipse for the initial data. The dataset used in this study consists of RFC and CBO metrics collected from various open-source Java projects covering different functional areas, architectural styles, and development practices. This diversity ensures the applicability of the model to a wide range of real-world Java projects. Mardia normality test shows that the distribution of these metrics deviates from multivariate normality. Consequently, it is essential to apply the normalization procedure described above. For this purpose, we employ a bivariate Box Cox transformation, which enables scale correction and distributional alignment, facilitating the construction of a mathematically valid transformed prediction ellipse. The study aims to analyze Java applications by building a mathematical model capable of identifying outliers in the metric space and characterizing the typical range of variation in the examined applications. This ellipse is defined by a statistical boundary based on the F-distribution, providing a formal confidence region for typical metrics. Outliers are determined by calculating the Mahalanobis distance from the ellipse center and comparing it to a threshold value. The resulting model allows for formal outlier detection and supports visual analysis of typical and atypical metric behavior, aiding in better interpretation of structural anomalies. The practical use of the constructed model was verified on projects that were not involved in its development. Overall, this approach combines normalization, reliable outlier detection, and the construction of prediction boundaries to distinguish between normal and anomalous behavior of the RFC and CBO metrics.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Mathematical model</kwd>
        <kwd>outlier detection</kwd>
        <kwd>software metric</kwd>
        <kwd>Java application</kwd>
        <kwd>prediction ellipse</kwd>
        <kwd>Box-Cox transformation</kwd>
        <kwd>Mahalanobis distance1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Statistical analysis of multivariate data plays an important role in many areas, including empirical
software engineering [1, 2]. One of the most important tasks of statistical analysis is to detect outliers
in data [3, 4]. There are several models for detecting and removing outliers in the data. One of the
known models used in statistical analysis is the prediction ellipsoid, in the particular case of two
variables - the prediction ellipse. However, there is a problem: this model works only in the case of
normal data distribution.</p>
      <p>Well-known software metrics, RFC and CBO, were proposed by Chidamber and Kemerer [5],
have been extensively validated as correlating with fault-proneness and change-proneness across
numerous software systems [6, 7 -oriented design (OOD), they are
used today to solve different problems, including software quality [8-10]. Despite their utility,
distributions in real-world Java systems often deviate substantially from multivariate normality.</p>
      <p>Empirical software engineering studies apply various methods of multivariate statistical analysis
and Mathematical Modelling. Assuring the validity of such methods and corresponding results is
challenging and critical [11]. As it is known [12], many methods of multivariate statistical analysis
are based on the assumption that the data is normally distributed. Also, we know [13] that if the data
are not normally distributed, it is misleading to draw conclusions based on the normal distribution.</p>
      <p>It is known, the approach in multivariate statistics based on the multivariate normalizing
transformations, for instance, the use of the Mahalanobis distance for normalized data to detect
multivariate outliers in high-dimensional non-Gaussian data [14, 15].</p>
      <p>Importantly, Mahalanobis distance allows modeling of ellipsoidal confidence regions, named
prediction ellipsoid (or prediction ellipses in the bivariate case), which capture acceptable ranges of
software metrics and detect outliers. This integrated statistical approach not only supports anomaly
detection but also contributes to a better understanding of software structure in the early
development stages of Java applications.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Review of the literature</title>
      <p>The use of the object-oriented software metrics CBO and RFC has been well established in software
engineering. These metrics are widely recognized for their ability to capture the complexity and
interdependence of object-oriented designs, which directly influence maintainability, fault
proneness, and project effort estimation [10, 16, 17]. All this also applies to Java applications.</p>
      <p>Early predominantly utilized models, in which the acceptable ranges of CBO and RFC metrics
values were defined separately for each metric, without taking into account their mutual influence
[18, 19]. In addition, the two-dimensional data of these metrics are not normally distributed. This
model limits the detection of anomalous values of software metrics. To overcome these limits, further
research has begun applying multivariate statistical techniques, including normalizing
transformations and prediction ellipses, to model the joint behavior of CBO and RFC metrics. The
prediction ellipse constructed given covariance matrix provides geometric boundaries, which ensure
accurate outlier detection and removal.</p>
      <p>The prediction ellipsoid is used in various statistical methods for multivariate data analysis, such
as multivariate outlier detection [20] and solving optimization problems [11, 21]. However, its
application is limited by the assumption that the data follow a multivariate normal distribution,
which is rarely the case. As a result, transformed prediction ellipsoids are used for multidimensional
non-Gaussian data.</p>
      <p>In [22], the use of the squared Mahalanobis distance for outlier detection in multivariate
nonGaussian data is discussed, based on applying univariate and multivariate normalizing
transformations.</p>
      <p>Developing the presented technique, a technique for building transformed prediction ellipsoids
based on normalizing transformations for multivariate non-Gaussian data is proposed in [23]. The
transformed prediction ellipsoid gives the same results as the squared Mahalanobis distance, but
using the transformed prediction ellipsoid is more visually apparent.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Formulation of the problem</title>
      <p>Traditional approaches to outlier detection in software metrics often rely on the assumption of
multivariate normality, which allows the use of statistical methods such as the construction of the
prediction ellipse. However, in practice, software metrics data collected from real-world Java
applications such as RFC and CBO frequently deviate from this assumption. As shown by Mar
multivariate normality test, the joint distribution of RFC and CBO metrics, when normalized relative
to the number of classes (NCL), does not conform to the normal law. This deviation invalidates the
direct application of classical predictive ellipses for the initial data.</p>
      <p>To address this, we pose the problem of developing a mathematically grounded technique for
detecting outliers in heterogeneous, non-normally distributed initial data. The proposed solution
involves normalizing metrics through a bivariate Box Cox normalization, which aligns the data
distribution with the assumptions required for the construction of a prediction ellipse. Based on
normalized metrics, a prediction ellipse is constructed using statistical boundaries derived from the
F-distribution and evaluated using Mahalanobis distance.</p>
      <p>The inverse transformation is then applied to the ellipse, resulting in the construction of a
transformed prediction ellipse for the initial data. This transformed prediction ellipse defines a
region, enabling the identification of outliers in Java applications.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Objectives of the study</title>
      <p>The
in Java.</p>
      <p>The study aims to construct a mathematical model in the form of a transformed prediction ellipse
based on the object-oriented software metrics RFC and CBO of Java applications. The model applies
a bivariate Box Cox transformation to normalize software metrics, enabling statistically valid outlier
detection through F-distribution-based thresholds and Mahalanobis distance. The study further aims
to assess the model's effectiveness in identifying outliers in Java applications.</p>
      <p>The object of study is the process of building a mathematical model in the form of a transformed
prediction ellipse for detecting outliers in two-dimensional data of the software metrics RFC and
CBO from applications in Java.</p>
      <p>is a mathematical model in the form of a transformed prediction ellipse for
detecting outliers in two-dimensional data of the software metrics RFC and CBO from applications</p>
    </sec>
    <sec id="sec-5">
      <title>5. Materials and research methods</title>
      <p>To achieve the aim of this study, it is necessary to analyze the existing mathematical models for
finding outliers in Java applications. The focus is placed on quantitative values of RFC and CBO
metrics collected from a diverse sample of open-source Java applications representing a variety of
software types and architectural styles.</p>
      <p>The study justifies the need for constructing a transformed prediction model, given that the
distribution of these object-oriented metrics deviates from multivariate normality, as confirmed by
ellipses statistically
invalid for robust anomaly detection.</p>
      <p>The bivariate Box-Cox transformation was chosen to normalize multivariate data:
 = {</p>
      <p>−1 ,   ≠ 0;
log( ) ,   = 0,
where  takes the values of according columns of data RFC/NCL and CBO/NCL accordingly. The
optimal vector of parameters  = [ 1,  2] is estimated using Maximum Likelihood Estimation (MLE),
where the likelihood incorporates the log-determinant of the covariance matrix of the normalized
data and the Jacobian term of the transformation:

2
 ( ) = − 
∗ 
(</p>
      <p>( )) − ∑ =1(  − 1) ∑ =1 log (  ),
where  is the matrix of normalized values, 
( ) is the covariance matrix of  and the second
term is the Jacobian adjustment that accounts for the transformation of variables.</p>
      <p>Normal distribution of bivariate data is checked with the Mardia test [24]. The test is based on
the measurement of bivariate skewness  1, and kurtosis  2, of the sample:
 1, = 1 ∑</p>
      <p>
        2  =1 ∑ =1[(  −  ̅)  −1(  −  ̅)]3,

(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
where  is a k-dimensional vector of variables,  = ( 1,  2, … ,   ) and   is a biased sample
variance matrix of the multivariate variable  . It is calculated by a formula:
where  ̅ is a means vector of the independent variable of the sample, 
= ( 1,  2, … ,   ) .
      </p>
      <p>To assess the multivariate normality of a dataset, the Mardia test employs two separate criteria
based on skewness and kurtosis. These conditions must be satisfied for the assumption of
multivariate normality to hold.</p>
      <sec id="sec-5-1">
        <title>Test statistic for  1, goes by</title>
        <p>6
 ∗ 1, ≤  2,
where  2 denotes the upper quantile of the chi-squared distribution with degrees of freedom
 ( + 1)( + 2)/6 and  = 0.005 is a proposed significance level.</p>
        <p>For the kurtosis part, the test statistic  2, is compared against the 1 −  quantile of the normal
distribution  with the mathematical expectation  =  ( + 2) and variance  2 = 8 ( + 2)/
calculated by
function of the F-distribution.
considered an outlier.</p>
        <p>2, = 1 ∑</p>
        <p>= 1

 =1[(  −  ̅)  −1(  −  ̅)]2,</p>
        <p>∑</p>
        <p>
          =1(  −  ̅)(  −  ̅) ,
 2, ≤  1− ( ,  2).
 2 = (  −  ̅)  −1(  −  ̅),
 =
 ( 2−1)
 ( − )
 1− ( ,  −  ),
(
          <xref ref-type="bibr" rid="ref4">4</xref>
          )
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(
          <xref ref-type="bibr" rid="ref8">8</xref>
          )
(
          <xref ref-type="bibr" rid="ref9">9</xref>
          )
        </p>
        <p>In addition to distributional assumptions, sample quality is improved through the removal of
anomalous data points (outliers). The identification of such points is performed using the squared
Mahalanobis distance</p>
        <p>where  ̅ is a means vector of the normalized independent variable of the sample,  =
( 1,  2, … ,   ) ,   is a biased sample variance matrix for normalized data.</p>
        <p>
          Statistical threshold based on the F-distribution, for a confidence level 1 –  , the cutoff value T is
where  is the number of dimensions,  is the number of samples, and  1− is the quantile
If the value of  2,  = 1,2, … ,  exceeds a statistical threshold based on the F-distribution, it is
After removing all outliers, the ellipse model can be obtained by calculating the mean and
covariance matrix from the cleaned data. Based on it, a normalized prediction ellipse can be built,
which finds outliers by using the same approach as was described before for formulas (
          <xref ref-type="bibr" rid="ref8">8</xref>
          ) and (
          <xref ref-type="bibr" rid="ref9">9</xref>
          ).
        </p>
        <p>According to [23], the transformed prediction ellipsoid can be obtained based on the constructed
prediction ellipsoid for normalized data using the inverse transformation. In the particular case of
two variables, have transformed prediction ellipse is given by the formula
[ 1( 1)−  1]</p>
        <p>2
 2
 1
+
[ 2( 2)−  2]</p>
        <p>
          2
 22
−
2  1 2[ 1( 1)−  1][ 2( 2)−  2] =
 21 22
2( 2−1)( 21 22
 ( −2) 21 22
− 21 2)  2, −2, ,(
          <xref ref-type="bibr" rid="ref10">10</xref>
          )
where  means normalization transformation by bivariate Box-Cox;   is the mean vector   =
(  1,   2) ;  is the covariance matrix.
        </p>
        <p>The authors have collected the sample dataset of the code metrics of 140 Game Engine Java
applications hosted on the GitHub platform [25], which were collected using static source code
analysis [26]. 88 metrics were collected from article [27], and another 74 metrics were retrieved from
[28].
application.</p>
        <p>Overall, the dataset consisted of 302 code metrics. The dataset includes CBO/NCL and RFC/NCL.
These metrics can be obtained at an early stage of project planning from the conceptual model of the</p>
        <p>The descriptive statistic of the initial data set is presented in Table 1.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Experiment</title>
      <sec id="sec-6-1">
        <title>6.1. Normalization, outlier removal, and model building</title>
        <p>In this paper, we have extended the results to a larger amount of data of metrics RFC and CBO. As
in [23], to detect outliers, we apply the technique based on the squared Mahalanobis distance for the
two-dimensional normalized data. To build the transformed prediction ellipse for detecting outliers,
we use bivariate normalizing Box-Cox transformations.</p>
        <p>The bivariate dataset consisting of software design metrics CBO/NCL and RFC/NCL was initially
evaluated for multivariate normality using the Mardia test, which assesses both skewness and
kurtosis. The results indicated that the dataset was not normally distributed, which violates a core
assumption for statistical modeling and Mahalanobis distance-based prediction ellipse construction.</p>
        <p>
          The Mardia skewness and kurtosis values were calculated using (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) and (
          <xref ref-type="bibr" rid="ref4">4</xref>
          ), which are equal to
3.58 and 13.86, respectively.
        </p>
        <sec id="sec-6-1-1">
          <title>Test statistic for  1, and  2, exceeded the critical thresholds, confirming significant deviation</title>
          <p>
            from multivariate normality based on (
            <xref ref-type="bibr" rid="ref6">6</xref>
            ) and (
            <xref ref-type="bibr" rid="ref7">7</xref>
            ) comparison, resulting in 180.61 &gt; 14.86, 12.74 &gt;
2.57.
          </p>
          <p>To address the non-normality, the dataset was subjected to a multivariate Box-Cox
transformation, which requires all values to be strictly positive. Optimal  parameters for each
variable were estimated via maximum likelihood  = [0.276, 0.227].</p>
          <p>
            During the transformation process, iterative outlier detection and removal were employed using
the Mahalanobis distance and a statistical test threshold based on the F-distribution using the
described logic in (
            <xref ref-type="bibr" rid="ref8">8</xref>
            ) and (
            <xref ref-type="bibr" rid="ref9">9</xref>
            ):
          </p>
          <p>2 ≤   (( 2−− 1))   , − , ≈ 10.86.</p>
          <p>Four outliers were removed iteratively, each time re-estimating the Box-Cox parameters and
recalculating the Mahalanobis distances. The descriptive statistic of the cleaned data set is presented
in Table 2.</p>
          <p>Emissions in the assumption of normal distribution were also detected. To detect outliers, we
used the technique based on the squared Mahalanobis distance without data normalization. During
this process, 25 outliers were detected iteratively. This number is more than six times greater than
using the data normalization. At the same time, only two data points were detected as the same
outliers by both methods.</p>
          <p>
            After clearing all outliers, optimal  values were found as  = [0.235, 0.109]. The Mardia skewness
and kurtosis values were calculated on cleaned data using (
            <xref ref-type="bibr" rid="ref3">3</xref>
            ) and (
            <xref ref-type="bibr" rid="ref4">4</xref>
            ), which are equal: 0.135 and
          </p>
        </sec>
        <sec id="sec-6-1-2">
          <title>8.162. Test statistic for  1, and  2, are within critical thresholds, indicating no significant deviation</title>
          <p>
            from multivariate normality based on the comparison using (
            <xref ref-type="bibr" rid="ref6">6</xref>
            ) and (
            <xref ref-type="bibr" rid="ref7">7</xref>
            ), resulting in 6.75 ≤ 14.86,
0.351 ≤ 2.57.
          </p>
          <p>The prediction ellipse for normalized data is defined with the next mean vector and covariance
matrix:
 ̅= [1.98,
  = [00..8322
and has the following form:
( 1 − 1.98)2</p>
          <p>( 2 − 2.82)2 0.64( 1 − 1.98)( 2 − 2.82)
+ − = 6.54
0.82 0.31 0.32</p>
          <p>These values were used to build a prediction ellipse equation for normalized data from RFC/NCL
and CBO/NCL metrics. The constructed prediction ellipse is presented in Figure 1.</p>
          <p>
            Based on the approach described by formula (
            <xref ref-type="bibr" rid="ref10">10</xref>
            ), it is possible to build the transformed prediction
ellipse for initial data:
( 1( 1) − 1.98)2
          </p>
        </sec>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Test points</title>
        <p>A sample dataset of the code metrics from 26 different Java applications was collected, which
were hosted on the GitHub platform [25] and did not participate in the construction of this model.
The goal is to determine whether each project falls within the expected boundaries of design
complexity defined by a previously built prediction ellipse or if it should be considered an outlier.</p>
        <p>These applications vary in size and complexity, ranging from lightweight utility tools and
educational examples to full-featured frameworks and production-grade systems. Projects were
chosen to reflect a diversity of architectural styles, developer practices, and application scopes within
the Java ecosystem.</p>
        <p>To test the model, it is necessary to use data within the same range as the data used to build the
model. In the original dataset, there was a project with a CBO/NCL value of 0.2, but it was removed
as an outlier. In the cleaned data and constructed model, the minimum CBO/NCL value is 0.696, so
the test data with CBO/NCL values of 0.25 and 0.5 cannot be estimated using the constructed model.
To account for such low CBO/NCL values, a separate model needs to be built. Test data were shown
in Table 3 and Figures 3, 4.</p>
        <p>PROJECT GITHUB URL
Anshu231/bookstore.git
emrekcse/calculator-with-android-studioJava.git
google/cel-java.git
maringallien/Chat-App.git
mc4/chess-ai.git
wasabeef/glide-transformations
glygener/glygen.cfde.generator.git
NCL
26
All of the analyzed projects within the boundaries of the transformed prediction ellipse incline
towards the left side of the ellipse without any entries in the right one. However, the test data, which
was used, has a range of NCL [4, 7874], RFC [3.642, 17.255], and CBO [1, 8.912] that correlates with
its whole range.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>To detect the outliers in the two-dimensional software metrics RFC and CBO, have proposed to apply
mathematical model in the form of a transformed prediction ellipse based on their normalization
using the bivariate Box-Cox transformation. The number of classes is not used directly, but indirectly
through the RFC divided by NCL and CBO divided by NCL. The practical application of this model
is detecting outliers in the two-dimensional data of software metrics RFC and CBO from applications
in Java.</p>
      <p>Advantages of this model are that it can be used for object-oriented metrics, such as RFC and
CBO, that do not follow a normal distribution.</p>
      <p>Disadvantages of this model are that it can be applied only to Java applications.</p>
      <p>Limitations of this model are that it the
objectoriented metrics RFC and CBO, and can be used for the next range of metric values: NCL [4, 7874],
RFC [3, 37.278], CBO [0.696, 22.417]. These limitations are due to the range of metric values that
were used to build the model.</p>
      <p>Moving forward, we plan to develop a mathematical model in the form of a transformed
prediction ellipse for detecting outliers in the two-dimensional data of the software metrics RFC and
CBO that do not have limitations due to the programming language and the sample size.</p>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used OpenAI ChatGPT-4 in order to: Text
translation; Grammar and spelling check. After using this tool/service, the authors reviewed and
edited the content as needed and take
[18] T. Filó, M. Bigonha, K. Ferreira, A Catalog of Thresholds for Object-Oriented Software Metrics,
in: Proceedings of the First International Conference on Advances and Trends in Software
Engineering, 2015, pp. 48 55.
[19] I. Turnu, G. Concas, M. Marchesi, R. Tonelli, Entropy of some CK metrics to assess
objectoriented software quality, International Journal of Software Engineering and Knowledge
Engineering 23 2 (2013) 173 188. doi: 10.1142/S0218194013500034.
[20] M. Friendly, G. Monette, J. Fox, Elliptical Insights: Understanding Statistical Methods Through</p>
      <p>Elliptical Geometry, Statistical Science, 28 (2013) 1 39. doi:10.1214/12-STS402.
[21] F. Golestaneh, P. Pinson, R. Azizipanah-Abarghooee, H.B. Gooi, Ellipsoidal Prediction Regions
for Multivariate Uncertainty Characterization, IEEE Transactions on Power Systems, 33 4, 2018,
pp. 4519 4530. doi: 10.1109/TPWRS.2018.2791975.
[22] S. Prykhodko, N. Prykhodko, L. Makarova, A. Pukhalevych, Outlier Detection in Non-Linear
Regression Analysis Based on the Normalizing Transformations, in: IEEE 15th International
Conference on Advanced Trends in Radioelectronics, Telecommunications and Computer
Engineering (TCSET), 2020, pp. 407 410. doi: 10.1109/TCSET49122.2020.235464.
[23] S. Prykhodko, L. Makarova, K. Prykhodko, A. Pukhalevych, Application of Transformed
Prediction Ellipsoids for Outlier Detection in Multivariate Non-Gaussian Data, in: Proceedings
of the IEEE 15th International Conference on Advanced Trends in Radioelectronics,
Telecommunications and Computer Engineering (TCSET), 2020, pp. 359 362, doi:
10.1109/TCSET49122.2020.235454.
[24] K.V. Mardia, Measures of multivariate skewness and kurtosis with applications, Biometrika,
volume 57, 1970, pp. 519 530. doi: 10.1093/biomet/57.3.519.
[25] Build and ship software on a single, collaborative platform, 2025. URL: https://github.com.
[26] SourceMeter for Java, 2025. URL: https://sourcemeter.com.
[27] S. Prykhodko, L. Makarova, A. Pukhalevych, Statistical Analysis of the Three-Dimensional Data
of Software Metrics RFC, CBO, and WMC that are not Normally Distributed, in: Proceedings of
International Conference on Applied Innovation in IT, vol. 13, issue 1, 2025, pp. 127 132. doi:
10.25673/119254.
[28] O. Oriekhov, T. Farionova, L.S. Chernova, L. Chernova, M. Vorona, Nonlinear regression models
for software size estimation of Data Science and Machine Learning Java-applications, in:
Proceedings of the 5th International Workshop IT Project Management (ITPM), 2024; CEUR
Workshop Proceedings, Vol. 3709, 2024, pp. 54 66. doi:10.23939/IW_itpm2024.054.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mendez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Avgeriou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kalinowski</surname>
          </string-name>
          , N. bin
          <string-name>
            <surname>Ali</surname>
          </string-name>
          (Ed.),
          <article-title>Handbook on teaching empirical software engineering</article-title>
          . Cham: Springer,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Anil</surname>
          </string-name>
          , G. Manjari,
          <source>Software Metrics Selection for Fault Prediction: A Review</source>
          ,
          <source>International Journal of Management, Technology and Engineering</source>
          <volume>8</volume>
          (
          <year>2018</year>
          ) 1267
          <fpage>1283</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>M.M. NezhadShokouhi</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          <string-name>
            <surname>Majidi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Rasoolzadegan</surname>
          </string-name>
          ,
          <article-title>Software defect prediction using oversampling and feature extraction based on Mahalanobis distance</article-title>
          ,
          <source>The Journal of Supercomputing</source>
          <volume>76</volume>
          (
          <year>2020</year>
          )
          <fpage>602</fpage>
          635. doi:
          <volume>10</volume>
          .1007/s11227-019-03051-w.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>YS.</given-names>
            <surname>Seo</surname>
          </string-name>
          , DH. Bae,
          <article-title>On the value of outlier elimination on software effort estimation research</article-title>
          ,
          <source>Empirical Software Engineering</source>
          <volume>18</volume>
          (
          <year>2013</year>
          )
          <fpage>659</fpage>
          698. https://doi.org/10.1007/s10664-012-9207-y.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.R.</given-names>
            <surname>Chidamber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.F.</given-names>
            <surname>Kemerer</surname>
          </string-name>
          ,
          <article-title>Towards a metrics suite for object oriented design</article-title>
          ,
          <source>ACM SIGPLAN Notices 26</source>
          <volume>11</volume>
          (
          <year>1991</year>
          )
          <fpage>197</fpage>
          211. doi:
          <volume>10</volume>
          .1145/118014.117970.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Suresh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. Ku</given-names>
            <surname>Rath</surname>
          </string-name>
          ,
          <article-title>Effectiveness of software metrics for object-oriented system</article-title>
          ,
          <source>Procedia Technology</source>
          <volume>6</volume>
          (
          <year>2012</year>
          )
          <fpage>420</fpage>
          427. doi:
          <volume>10</volume>
          .1016/j.protcy.
          <year>2012</year>
          .
          <volume>10</volume>
          .050.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Shatnawi</surname>
          </string-name>
          ,
          <article-title>Empirical study of fault prediction for open-source systems using the Chidamber and Kemerer metrics</article-title>
          ,
          <source>IET Software 8</source>
          <volume>3</volume>
          (
          <issue>2014</issue>
          )
          <fpage>113</fpage>
          119. doi:
          <volume>10</volume>
          .1049/iet-sen.
          <year>2013</year>
          .
          <volume>0008</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>A</surname>
          </string-name>
          .-
          <volume>1172</volume>
          (
          <year>2020</year>
          )
          <fpage>163</fpage>
          187. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -40223-
          <issue>5</issue>
          _
          <fpage>8</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>I.M.A.</given-names>
            <surname>Wikantyasa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.P.</given-names>
            <surname>Kurniawan</surname>
          </string-name>
          , S. Rochimah,
          <source>CK Metric and Architecture Smells Relations: Towards Software Quality Assurance, in: Proceedings of the 14th International Conference on Information and Communication Technology and System (ICTS)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>13</fpage>
          <lpage>17</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICTS58770.
          <year>2023</year>
          .
          <volume>10330874</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xia</surname>
          </string-name>
          , Software Code Quality Measurement:
          <article-title>Implications from Metric Distributions</article-title>
          ,
          <source>in: Proceedings of the IEEE International Conference on Software Quality, Reliability and Security</source>
          ,
          <string-name>
            <surname>QRS</surname>
          </string-name>
          ,
          <year>2023</year>
          , pp.
          <fpage>488</fpage>
          <lpage>496</lpage>
          . doi:
          <volume>10</volume>
          .1109/QRS60937.
          <year>2023</year>
          .
          <volume>00054</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Härtel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lämmel</surname>
          </string-name>
          ,
          <article-title>Operationalizing validity of empirical software engineering studies</article-title>
          ,
          <source>Empirical Software Engineering 28</source>
          <volume>6</volume>
          (
          <year>2023</year>
          )
          <article-title>153</article-title>
          . doi:
          <volume>10</volume>
          .1007/s10664-023-10370-3.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.A.</given-names>
            <surname>Johnson</surname>
          </string-name>
          , D.W. Wichern, Applied Multivariate Statistical Analysis, New Jersey: Pearson Prentice Hall,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.W.</given-names>
            <surname>Osborne</surname>
          </string-name>
          ,
          <article-title>Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data, SAGE Publications, Inc</article-title>
          .,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>T.</given-names>
            <surname>Etherington</surname>
          </string-name>
          ,
          <article-title>Mahalanobis distances for ecological niche modelling and outlier detection: implications of sample size, error, and bias for selecting and parameterising a multivariate location and scatter method</article-title>
          ,
          <source>PeerJ</source>
          , 9,
          <year>2021</year>
          , pp.
          <source>e11436. doi: 10</source>
          .7717/peerj.11436.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>S.</given-names>
            <surname>Prykhodko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Prykhodko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Makarova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Pugachenko</surname>
          </string-name>
          ,
          <article-title>Detecting outliers in multivariate non-Gaussian data on the basis of normalizing transformations</article-title>
          ,
          <source>in: Proceedings of the IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>846</fpage>
          <lpage>849</lpage>
          . doi:
          <volume>10</volume>
          .1109/UKRCON.
          <year>2017</year>
          .
          <volume>8100366</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.G.</given-names>
            <surname>Al-Obeidallah</surname>
          </string-name>
          ,
          <article-title>The Impact of Design Patterns on Software Maintainability and Understandability: A Metrics-based Approach, ICIC Express Letters</article-title>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>B</given-names>
          </string-name>
          :
          <string-name>
            <surname>Applications</surname>
          </string-name>
          , Vol.
          <volume>12</volume>
          , No.
          <volume>12</volume>
          ,
          <year>2021</year>
          , pp.
          <fpage>1111</fpage>
          <lpage>1119</lpage>
          . doi:
          <volume>10</volume>
          .24507/icicelb.12.12.1111.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Monika</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Prediction of Fault-Proneness using CK Metrics</article-title>
          ,
          <source>International Journal of Computer Science and Information Technology Research</source>
          , Vol.
          <volume>4</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>3</given-names>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>114</fpage>
          <lpage>118</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>