Study of the Profitability of the Enterprise
based on the Method of Machine Learning without a Teacher
Mariia Nazarkevych1,2, Hanna Nazarkevych1, Roman Moravskyi1, Maryna Kostiak1,
and Oleksii Shevchuk3
1
  Lviv Polytechnic National University, 12 Stepan Bandera str., Lviv, 79013, Ukraine
2
  Lviv Ivan Franko National University, 1 Universytetska str., Lviv, 79000, Ukraine
3
  Ukrainian Academy of Printing, 19 Pid Goloskom str., Lviv, 79020, Ukraine

                  Abstract
                  Machine learning methods, which are used in the framework of predicting the solution of
                  methods of adaptive management of the enterprise's goods sales, have been analyzed. Conduct
                  an analysis of input data obtained during the operation of one enterprise. With the help of input
                  data, train and conduct machine learning simulations with K-Nearest Neighbors, Support
                  Vector Machines classifiers. The most significant factors influencing the client's decision to
                  purchase goods have been identified. The study proposed a business process scenario for
                  solving the problem of increasing the company’s profit based on machine learning technology.
                  The performance of the proposed methods was verified on a test sample of data.

                  Keywords 1
                  Unsupervised learning methods, adaptive management, enterprise.

                                                                                                         Let’s consider the main indicators of the
1. Introduction                                                                                      company’s profitability:
                                                                                                         1. Profitability of invested funds [6], which
                                                                                                     consists of the general level of profitability of
    In an era where organizations increasingly                                                       the enterprise, which in turn is equal to the ratio
rely on data to run their business, the ability of                                                   of the gross profit of the enterprise to the total
different companies to manage and analyze                                                            production cost.
data will have a significant impact on their                                                             2. Another indicator is the profitability of
performance. In this article, we develop a
                                                                                                     production assets [7], which in turn is equal to
model to measure the current level of maturity
                                                                                                     the ratio of the gross profit of the enterprise to
of an enterprise's data and analytics systems to                                                     the profitability of total assets.
optimize them to maximize the potential of                                                               Return on total assets [8] is the ratio of the
their data assets. Based on the three critical                                                       company’s gross profit to the average amount
components of people, technology and process,                                                        of assets on the company’s balance
we take the requirements of the enterprise as a                                                      sheet.1. Return on equity (equity) [9] is the
starting point. After a comprehensive review of                                                      ratio of the company’s net profit to the amount
the current situation in the industry and the                                                        of equity.
needs of the enterprise, a methodology for                                                               2. And another indicator is the profitability
determining profitability was built [1].                                                             of production [10], which is equal to the ratio of
    Profitability [2] is the ratio of profit and                                                     the total cost of goods sold to the volume of
costs, expressed as a percentage. Profitability is
                                                                                                     sales.
a relative indicator, and it is necessary for the
                                                                                                         One of the fundamental prerequisites in the
analysis of economic and economic activity of                                                        field of artificial intelligence is the assumption
any enterprise [3–5].                                                                                of the possibility of creating machines that are


CPITS-2022: Cybersecurity Providing in Information and Telecommunication Systems, October 13, 2022, Kyiv, Ukraine
EMAIL: mariia.a.nazarkevych@lpnu.ua (M. Nazarkevych); hanna.ya.nazarkevych@lpnu.ua (H. Nazarkevych);
roman.moravskyi.mnpzm2021@lpnu.ua (R. Moravskyi); kostiak.maryna@lpnu.ua (M. Kostiak); uad.sow@gmail.com (O. Shevchuk)
ORCID: 0000-0002-6528-9867 (M. Nazarkevych); 0000-0002-1413-630X (H. Nazarkevych); 0000-0002-6695-1920 (R. Moravskyi); 0000-
0002-6667-7693 (M. Kostiak); 0000-0002-3200-7317 (O. Shevchuk)
           ©️ 2022 Copyright for this paper by its authors.
           Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
           CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                               44
capable of performing tasks that usually require          it uncovers a certain hidden regularity that is
the addition of human intelligence [11].                  responsible for the distribution of data in the
    The artificial neural network is designed to          training sample. Thanks to the discovery of
simulate learning processes in the human brain.           such a regularity, the system will be able to use
Artificial neural networks are designed in such           it to effectively predict answers on the test
a way that they can recognize basic patterns              sample.
(regularities, stable relationships hidden in                 A neural network [16] is a certain complex
data) and learn from them [12]. They can be               function that has a huge number of parameters.
used to solve classification, regression, and data        Each of these parameters (called weights) is
segmentation problems [13]. Before providing              adapted to approximate the function to the state
neural network data, they must be converted               of how the data in the test sample is distributed.
into numerical form. For example, data of                 Very often, the number of these parameters is
various nature, including visual and textual              much larger than the description of the test data.
data, time series, etc. Decisions have to be made
about how tasks should be presented so that               3. Construction of Neural Networks
they are understandable to neural networks.
    The practical implementation of material              3.1. Mathematical Model
objects involves the solution of the
corresponding problems of synthesis. At the                  Perceptron is a mathematical model
same time, the mathematical model of the                  proposed by F. Rosenblatt, which is described
object should not only adequately describe the            by the transformation 𝑅 𝑛 → 𝑅 using the
physical processes that ensure obtaining the              formula
                                                                                 𝑚
necessary initial characteristics of this object,
but also allow the optimization process itself to                          𝑣 = ∑ 𝜔𝑖𝑗 𝑥𝑖
be implemented. The software currently used to                                  𝑖=1
solve similar problems is characterized by                where 𝑗 = 1, … , 𝑛; 𝜔𝑖𝑗 is the weight of the
either narrow specialization, or has a universal          perceptron; 𝑥𝑖 – value of input signals. After
calculation device and requires an unacceptably
                                                          receiving the result, the activation function f is
long calculation time for complex objects. One
                                                          applied to the received value in v. The resulting
of the options for implementing fast and
                                                          value is compared with the trigger threshold θ
flexible methods for solving synthesis problems
                                                          and a decision is made.
is the use of approximation properties of
                                                              Perceptron learning consists of finding
artificial neural networks. Most of the research
                                                          weight coefficients. Let there be a set of pairs
related to artificial neural networks is focused
                                                          of vectors (𝑥 𝑛 , 𝑦 𝑛 ), 𝛼 = 1, … . 𝑝, which is
on the solution of combinatorial optimization
                                                          called a training sample. We will call a neural
and forecasting problems, therefore, research
                                                          network trained on a given training dataset if,
on the application of neural networks is
                                                          when applying each vector 𝑥 𝑛 to the inputs of
relevant.
                                                          the network, we get the corresponding vector
                                                          𝑦 𝑛 at the outputs each time. Recognition,
2. Stages of Developing a Machine                         processing of text, human language, music,
   Learning Model                                         images, 3D objects, tabular data, object
                                                          detection in photos and videos, even creation of
    The process of developing a certain model             texts, images and videos—all this and much
of machine learning [14] consists of the                  more is included in the practical application of
following stages:                                         neural networks.
                                                              The learning method proposed by
         Process of data preparation and
                                                          Rosenblatt [17] consists in iterative substitution
            presentation.
                                                          of the weight matrix, successively reducing the
         Algorithm design process.
                                                          error in the output vectors. The algorithm has
         Process of training the algorithm on            several stages:
            the available data.                               Step 1. Initial values of all neuron weights
         Algorithm validation process on test            𝑊(𝑡 = 0) are placed randomly.
            data.                                             Step 2. If the input image 𝑥 𝑛 is loaded, the
    A training sample [15] is a set of examples           result will be the output image 𝑦̃ 𝑛 ≠ 𝑦 𝑛 .
that we show the system so that, based on them,


                                                     45
   Step 3. The error vector 𝛿 𝑛 = (𝑦 𝑛 − 𝑦̃ 𝑛 ) is          3.3.    Creation of a Neural Network
calculated, carried out by the network at the
output. Then it happens that the change in the                 Human learning continues hierarchically. In
vector of weighting coefficients in the area of             the neural network of our brain, this process is
small errors must be proportional to the error at
                                                            carried out in several stages, each of which is
the output and equal to zero if the error is zero.
                                                            characterized by its own degree of learning. At
   Step three. The vector of weights is                     some stages, simple things are taught, at
modified by the formula:                                    others—more complex.
       𝑊(𝑡 + ∆𝑇) = 𝑊(𝑡) + 𝜂𝑥 𝑛 ∙ (𝛿 𝑛 )𝑇 ,                     In order to simulate the human learning
                                                            process, layers of neurons are used in the
where 0 < 𝜂 < 1 is learning pace.                           construction of artificial neural networks. The
    Step 4. Steps 1–3 are repeated for all vectors          idea of these neurons is suggested by biological
to be trained. One cycle of sequential                      processes. Each layer of an artificial neural
representation of the entire sample is called an            network is a set of independent neurons, each
epoch. Training ends after several epochs: (a)              neuron of a certain layer is connected to the
when the iterations converge, that is, the vector           neurons of an adjacent layer.
stops changing, or when (b) or when the
complete, rubberized absolute error over all
vectors becomes smaller than some small value.              3.4.    Neural Network Training
    Deep learning is based on complex systems
consisting of a huge number of neurons. One                     If we are dealing with n-dimensional input
neural network can have billions of such                    data, then the input layer will consist of n
structural units as neurons or perceptrons.                 neurons. If m different classes are distinguished
Accordingly, there are many ways to structure               among our training (training) data, then the
them. And depending on how they will be                     output layer will consist of m neurons. The
connected, different architectures of neural                layers nested between the input and output
networks can be distinguished.                              layers are called hidden. A simple neural
                                                            network consists of a pair of layers, and a deep
3.2. Optimization of Neural                                 neural network consists of many layers.
                                                                Consider the case when we want to use a
Network Parameters                                          neural network for data classification. The first
                                                            step is to collect relevant training data and label
   One of the simplest gradient descent                     it. Each neuron acts as a simple function, and
optimization algorithms looks like this:                    the neural network is trained until the error is
                                                            less than a certain set value. The difference
      𝑤𝑝+1 = 𝑤𝑝 − 𝜂𝛻𝐶𝑜𝑠𝑡(𝑤𝑝 )                               between predicted and actual outputs is mostly
   We also justify that the Nesterov algorithm              used as an error. Based on how big the error is,
will be the optimal algorithm for finding the               the neural network corrects itself and retrains
minimum [18]                                                until it gets close to the solution.
  𝛥𝑝 + 1 = 𝛾 ⋅ 𝛥𝑝 + 𝜂 ⋅ 𝛻𝐶𝑜𝑠𝑡(𝑤𝑝 − 𝛾𝛥𝑝),
                                                            3.5. Creating a Classifier based on
            𝑤𝑝 + 1 = 𝑤𝑝 − 𝛥𝑝 + 1.
                                                            a Perceptron
    A kind of refining operation is used here to
find the gradient. Therefore, the method is                     A perceptron is a single neuron that receives
sometimes called the accelerated Nesterov                   input data, performs calculations on it and
gradient. However, despite the fact that the                outputs an output signal. The perceptron uses a
proven upper estimate of the convergence time               simple function to make decisions. Suppose we
for this algorithm is minimal, it has not found             are dealing with an N-dimensional input data
its practical application. The thing is that the top        point. The perceptron calculates a weighted
score does not always coincide with the                     sum of N numbers, and then adds a constant to
average. And in the case of this algorithm, the             them to get the original result. The ego constant
average convergence time estimate is close to               is called the bias of the neuron. It is interesting
the upper estimate. Therefore, the algorithm                to note that such simple perceptrons are used to
mostly works slowly.                                        design very complex deep neural networks.


                                                       46
3.6. Construction of a Single-Layer                     4. Deployment of Machine
Neural Network                                             Learning
    The capabilities of the perceptron are                  We will use the creation of the model for the
limited. You need to get a set of neurons to act        forecasting task.
as one and evaluate it to achieve a goal. Let's             To do this, we will help predict the
create a single-layer neural network consisting         probability that the enterprise will be profitable
of independent neurons that influence the input         and will have profits in the future.
data to obtain the output result.                           The dataset we used in this chapter is taken
                                                        from this database: ML_Manufacture_Prifit_
3.7. Building a Multilayer Neural                       Companies.csv. This data set contains several
                                                        independent predictors and one goal, to make
Network                                                 the enterprise profitable. Its features are as
                                                        follows: rating, change in rating, income,
   To obtain higher accuracy, we must give              management assets, market value, percentage
more freedom to the neural network. This                change in income, percentage change in profit,
means that the neural network must have more            employees.
than one layer to capture the underlying                    The data set contains 1001 records and all
patterns that exist among the test data. Let’s          businesses are successful and profitable.
create a multilayer neural network that will
ensure this.
   A neural network can be used as a classifier.
                                                        5. Downloading Data
You can use a neural network like regressor
[19, 20].                                                   For this example, the dataset was
   Let’s define a multilayer neural network             downloaded         locally     and      named
with two hidden layers. You can design a neural         ML_Manufacture_Prifit_Companies.csv
network in any other way. In this case, we will         (Fig. 1). We observe the columns with the
have 10 neurons in the first layer and six              following data (Fig. 2). The first task for
neurons in the second layer. Our task is to             estimating a company’s profit is to clean the
predict a single value, so the output layer will        data so that there are no missing or erroneous
contain only one neuron.                                values (Fig. 3 and 4) [21–23].


Figure 1: Download dataset


                                                   47
Figure 2: Data cleaning


Figure 3: Cleaned and transformed data


Figure 4: Visualization of the top 10 companies by revenue


                                                48
6. Study of the Relationship                              7. Construction of a Graph of the
   Between Features                                          Relationship between Features
    The next step is to study how different                  The following code snippet uses the
independent characteristics affect the result             matshow() function to plot the results returned
(which characteristics affect the profitability of        by the corr() function as a matrix. At the same
the enterprise).                                          time, various correlation coefficients are also
    The corr() function calculates the pairwise           displayed in the matrix:
correlation of columns. For example, the                     Fig. 5 shows the matrix. Cubes with colors
following result shows that the level of                  closest to black represent the highest
management assets has little relationship with            correlation coefficients, and those closest to
the market value of the enterprise 0.127, but it          blue represent the lowest correlation
has a significant relationship with the                   coefficients.
profitability of the firm 0.49.                              Another way to construct a correlation
    We find out which functions significantly             matrix is to use Seaborn’s heatmap() function
affect the result.                                        as follows: Fig. 5 shows a heat map created
                                                          using the Seaborn library (Fig. 6).
                                                             Let’s single out four signs at enterprises that
                                                          have the highest correlation with the result
                                                          (Fig. 7–9).


Figure 5: Matrix showing various correlation factors


                                                     49
Figure 6: Visualization of the top 10 companies with the largest profit in relation to the number of
employees


Figure 7: Visualization of the top 10 companies with the largest income per number of employees


                                                  50
Figure 8: Graph of dependence of the Inertia metric on the number of clusters


Figure 9: Found clusters of companies


8. Evaluation of Algorithms                           8.1.    Logistic Regression
   We will evaluate several algorithms to find            For the first method, we will use logistic
the most optimal one and the one that will            regression. Instead of splitting the dataset into
provide the best performance. Therefore, we           training and testing sets, we will use 10-fold
will use the following methods:                       cross-validation to get the average score of the
        Logistic regression.                         algorithm used.
        K-Nearest Neighbors (KNN).
        Support vector machines (SVM)                8.2.    K-Nearest Neighbours
            using linear kernels and RBF
            kernels.                                     The next method we will use is K-Nearest
                                                      Neighbors (KNN). In addition to using 10-fold
                                                      cross-validation to average the algorithm,


                                                 51
different values of k should be tried to find the        the files so that we can get the model for
optimal one so that the best accuracy can be             prediction later. The trained model is now saved
obtained.                                                to a file.
                                                             After loading the model, let’s make some
8.3.    Support Vector Machines                          predictions:
                                                             The output prints the word “Profitable
                                                         Enterprise” if the return value of the prediction
   Another method we will use is Support                 is 1; otherwise, the word “Unprofitable
Vector Machine (SVM). We will use two types              enterprise” is printed. It is also necessary to
of kernels for SVM: linear and RBF.                      obtain the probability of the forecast in order to
                                                         obtain the probability in percent (Fig. 10–12).
9. Learning and Saving the Model                             The printed probabilities show that the
                                                         probability of the outcome is 0 and the
    Since the most efficient algorithm for our           probability of the outcome is 1. The prediction
data set is KNN with k = 3, we can now                   is based on the prediction with the highest
continue to train the model using the entire data        probability, and we take that probability and
set: After training the model, we need to save           convert it to a confidence percentage.


Figure 10: Example of data from companies from cluster 0


Figure 11: Example of data from companies from cluster 1


Figure 12: Example of data from companies from cluster 2


                                                    52
10.Conclusions                                                  374–388. doi: 10.1007/978-3-030-80472-
                                                                5_31.
                                                           [6] A. I. Kato, C. P. E. Germinah, Empirical
    In this study, a dataset of top 1,000
                                                                Examination of Relationship between
profitable companies was selected and machine
                                                                Venture       Capital      Financing     and
learning was implemented without a teacher.
                                                                Profitability of Portfolio Companies in
Companies were divided into three clusters
                                                                Uganda, Journal of Innovation and
based on revenue, earnings, current assets,
                                                                Entrepreneurship, vol. 11, no. 1, 2022, pp.
market value and number of employees.
                                                                1–18.
Clustering was implemented based on the
                                                           [7] E. Filatov, Analysis of Profitability of
KMeans algorithm. The Inertia metric was also
                                                                Production of Enterprises in the Field of
used to determine the optimal number of
                                                                Transportation and Storage of the Irkutsk
clusters. Inertia shows us the sum of the
                                                                Region.        Transportation       Research
distances to each center of mass. If the total
                                                                Procedia, vol. 63, 2022, pp. 518–524.
distance is large, it means that the points are far
                                                           [8] H. Habibniya, et al., Impact of Capital
apart and may be less similar to each other. In
                                                                Structure on Profitability: Panel Data
this case, one can continue to evaluate larger
                                                                Evidence of the Telecom Industry in the
values of K to see if the overall distance can be
                                                                United States, Risks, vol. 10, no. 8, 2022,
reduced. However, it is not always the smartest
                                                                pp. 157.
idea to reduce the distance. Using the elbow
                                                           [9] V. V. Stipic, V. Ruzic, Panel VAR
(bend) method, we can choose the value of the
                                                                Analysis of the Interdependence of Capital
number of clusters 2 or 3 (the first or second
                                                                Structure and Profitability, Economic and
obvious bend).
                                                                Social      Development:         Book      of
                                                                Proceedings, 2022, pp. 177–187.
11.References                                              [10] E. A. Virtanen,       et    al.,   Balancing
                                                                Profitability of Energy Production,
[1] T. Zhang, et al., Enterprise Digital                        Societal Impacts and Biodiversity in
    Transformation        and        Production                 Offshore Wind Farm Design, Renewable
    Efficiency: Mechanism Analysis and                          and Sustainable Energy Reviews, vol. 158,
    Empirical Research. Economic Research-                      2022, pp. 112087.
    Ekonomska Istraživanja, vol. 35, no. 1,                [11] X. Chen, P. Geyer, Machine Assistance in
    2022, pp. 2781–2792.                                        Energy-Efficient Building Design: A
[2] S. Jahan, et al., Modeling Profitability-                   Predictive Framework Toward Dynamic
    Influencing Risk Factors for Construction                   Interaction with Human Decision-Making
    Projects: A System Dynamics Approach,                       under Uncertainty, Applied Energy, vol.
    Buildings, vol. 12, no. 6, 2022, pp. 701.                   307, 2022, pp. 118240.
[3] H. Hulak, et al. Formation of                          [12] G. V. S. Bhagya          Raj,    K. K. Dash,
    Requirements for the Electronic Record-                     Comprehensive Study on Applications of
    Book in Guaranteed Information Systems                      Artificial Neural Network in Food Process
    of Distance Learning, in Workshop on                        Modeling, Critical Reviews in Food
    Cybersecurity Providing in Information                      Science and Nutrition, vol. 62, no. 10,
    and Telecommunication Systems, CPITS                        2022, pp. 2756–2783.
    2021, vol. 2923, 2021, pp. 137–142.                    [13] F. Moosavi, et al., Application of Machine
[4] V. Buriachok, V. Sokolov, Implementa-                       Learning Tools for Long-Term Diagnostic
    tion of Active Learning in the Master’s                     Feature Data Segmentation, Applied
    Program on Cybersecurity, in Advances in                    Sciences, vol. 12, no. 13, 2022, pp. 6766.
    Computer Science for Engineering and                   [14] J. Siebert, et al., Construction of a Quality
    Education II, 2019, vol. 938, 2020, pp.                     Model for Machine Learning Systems,
    610–624. doi: 10.1007/978-3-030-16621-                      Software Quality Journal, vol. 30, no. 2,
    2_57.                                                       2022, pp. 307–335.
[5] Z. B. Hu, et al., Authentication System by             [15] C. Zhang, et al., Mapping Irrigated
    Human Brainwaves Using Machine                              Croplands in China using a Synergetic
    Learning and Artificial Intelligence, in                    Training Sample Generating Method,
    Advances in Computer Science for                            Machine Learning Classifier, and Google
    Engineering and Education IV, 2021, pp.                     Earth Engine, International Journal of


                                                      53
     Applied       Earth     Observation     and
     Geoinformation, vol. 112, 2022, pp.
     102888.
[16] O. I. Abiodun, et al., State-of-the-Art in
     Artificial Neural Network Applications: A
     Survey, Heliyon, vol. 4, no. 11, 2018, pp.
     e00938.
[17] E. Kussul, et al., Rosenblatt Perceptrons
     for Handwritten Digit Recognition, in
     International Joint Conference on Neural
     Networks, vol. 2, 2001, pp. 1516–1520.
[18] Q. Lü, et al., A Nesterov-Like Gradient
     Tracking Algorithm for Distributed
     Optimization over Directed Networks,
     IEEE Transactions on Systems, Man, and
     Cybernetics: Systems, vol. 51, no. 10,
     2020, pp. 6258–6270.
[19] M. Medykovskyy, et al., Methods of
     Protection Document Formed from Latent
     Element Located by Fractals, in X
     International Scientific and Technical
     Conference “Computer Sciences and
     Information Technologies,” 2015, pp. 70–
     72.
[20] M. Nazarkevych, et al., Application
     Perfected Wave Tracing Algorithm, in
     IEEE First Ukraine Conference on
     Electrical and Computer Engineering
     (UKRCON), 2017, pp. 1011–1014.
[21] M.Nazarkevych, The Ateb-Gabor Filter
     for Fingerprinting, in International
     Conference on Computer Science and
     Information Technology, 2019, pp. 247–
     255. doi: 10.1007/978-3-030-33695-0_18.
[22] M. Logoyda, et al., Identification of
     Biometric Images using Latent Elements.
     CEUR Workshop Proceedings, 2019.
[23] M. Nazarkevych, et al., The Ateb-Gabor
     Filter for Fingerprinting, in Conference on
     Computer Science and Information
     Technologies, 2019, pp. 247–255.


                                                   54