Study of the Profitability of the Enterprise based on the Method of Machine Learning without a Teacher Mariia Nazarkevych1,2, Hanna Nazarkevych1, Roman Moravskyi1, Maryna Kostiak1, and Oleksii Shevchuk3 1 Lviv Polytechnic National University, 12 Stepan Bandera str., Lviv, 79013, Ukraine 2 Lviv Ivan Franko National University, 1 Universytetska str., Lviv, 79000, Ukraine 3 Ukrainian Academy of Printing, 19 Pid Goloskom str., Lviv, 79020, Ukraine Abstract Machine learning methods, which are used in the framework of predicting the solution of methods of adaptive management of the enterprise's goods sales, have been analyzed. Conduct an analysis of input data obtained during the operation of one enterprise. With the help of input data, train and conduct machine learning simulations with K-Nearest Neighbors, Support Vector Machines classifiers. The most significant factors influencing the client's decision to purchase goods have been identified. The study proposed a business process scenario for solving the problem of increasing the company’s profit based on machine learning technology. The performance of the proposed methods was verified on a test sample of data. Keywords 1 Unsupervised learning methods, adaptive management, enterprise. Let’s consider the main indicators of the 1. Introduction company’s profitability: 1. Profitability of invested funds [6], which consists of the general level of profitability of In an era where organizations increasingly the enterprise, which in turn is equal to the ratio rely on data to run their business, the ability of of the gross profit of the enterprise to the total different companies to manage and analyze production cost. data will have a significant impact on their 2. Another indicator is the profitability of performance. In this article, we develop a production assets [7], which in turn is equal to model to measure the current level of maturity the ratio of the gross profit of the enterprise to of an enterprise's data and analytics systems to the profitability of total assets. optimize them to maximize the potential of Return on total assets [8] is the ratio of the their data assets. Based on the three critical company’s gross profit to the average amount components of people, technology and process, of assets on the company’s balance we take the requirements of the enterprise as a sheet.1. Return on equity (equity) [9] is the starting point. After a comprehensive review of ratio of the company’s net profit to the amount the current situation in the industry and the of equity. needs of the enterprise, a methodology for 2. And another indicator is the profitability determining profitability was built [1]. of production [10], which is equal to the ratio of Profitability [2] is the ratio of profit and the total cost of goods sold to the volume of costs, expressed as a percentage. Profitability is sales. a relative indicator, and it is necessary for the One of the fundamental prerequisites in the analysis of economic and economic activity of field of artificial intelligence is the assumption any enterprise [3–5]. of the possibility of creating machines that are CPITS-2022: Cybersecurity Providing in Information and Telecommunication Systems, October 13, 2022, Kyiv, Ukraine EMAIL: mariia.a.nazarkevych@lpnu.ua (M. Nazarkevych); hanna.ya.nazarkevych@lpnu.ua (H. Nazarkevych); roman.moravskyi.mnpzm2021@lpnu.ua (R. Moravskyi); kostiak.maryna@lpnu.ua (M. Kostiak); uad.sow@gmail.com (O. Shevchuk) ORCID: 0000-0002-6528-9867 (M. Nazarkevych); 0000-0002-1413-630X (H. Nazarkevych); 0000-0002-6695-1920 (R. Moravskyi); 0000- 0002-6667-7693 (M. Kostiak); 0000-0002-3200-7317 (O. Shevchuk) ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 44 capable of performing tasks that usually require it uncovers a certain hidden regularity that is the addition of human intelligence [11]. responsible for the distribution of data in the The artificial neural network is designed to training sample. Thanks to the discovery of simulate learning processes in the human brain. such a regularity, the system will be able to use Artificial neural networks are designed in such it to effectively predict answers on the test a way that they can recognize basic patterns sample. (regularities, stable relationships hidden in A neural network [16] is a certain complex data) and learn from them [12]. They can be function that has a huge number of parameters. used to solve classification, regression, and data Each of these parameters (called weights) is segmentation problems [13]. Before providing adapted to approximate the function to the state neural network data, they must be converted of how the data in the test sample is distributed. into numerical form. For example, data of Very often, the number of these parameters is various nature, including visual and textual much larger than the description of the test data. data, time series, etc. Decisions have to be made about how tasks should be presented so that 3. Construction of Neural Networks they are understandable to neural networks. The practical implementation of material 3.1. Mathematical Model objects involves the solution of the corresponding problems of synthesis. At the Perceptron is a mathematical model same time, the mathematical model of the proposed by F. Rosenblatt, which is described object should not only adequately describe the by the transformation 𝑅 𝑛 → 𝑅 using the physical processes that ensure obtaining the formula 𝑚 necessary initial characteristics of this object, but also allow the optimization process itself to 𝑣 = ∑ 𝜔𝑖𝑗 𝑥𝑖 be implemented. The software currently used to 𝑖=1 solve similar problems is characterized by where 𝑗 = 1, … , 𝑛; 𝜔𝑖𝑗 is the weight of the either narrow specialization, or has a universal perceptron; 𝑥𝑖 – value of input signals. After calculation device and requires an unacceptably receiving the result, the activation function f is long calculation time for complex objects. One applied to the received value in v. The resulting of the options for implementing fast and value is compared with the trigger threshold θ flexible methods for solving synthesis problems and a decision is made. is the use of approximation properties of Perceptron learning consists of finding artificial neural networks. Most of the research weight coefficients. Let there be a set of pairs related to artificial neural networks is focused of vectors (𝑥 𝑛 , 𝑦 𝑛 ), 𝛼 = 1, … . 𝑝, which is on the solution of combinatorial optimization called a training sample. We will call a neural and forecasting problems, therefore, research network trained on a given training dataset if, on the application of neural networks is when applying each vector 𝑥 𝑛 to the inputs of relevant. the network, we get the corresponding vector 𝑦 𝑛 at the outputs each time. Recognition, 2. Stages of Developing a Machine processing of text, human language, music, Learning Model images, 3D objects, tabular data, object detection in photos and videos, even creation of The process of developing a certain model texts, images and videos—all this and much of machine learning [14] consists of the more is included in the practical application of following stages: neural networks. The learning method proposed by  Process of data preparation and Rosenblatt [17] consists in iterative substitution presentation. of the weight matrix, successively reducing the  Algorithm design process. error in the output vectors. The algorithm has  Process of training the algorithm on several stages: the available data. Step 1. Initial values of all neuron weights  Algorithm validation process on test 𝑊(𝑡 = 0) are placed randomly. data. Step 2. If the input image 𝑥 𝑛 is loaded, the A training sample [15] is a set of examples result will be the output image 𝑦̃ 𝑛 ≠ 𝑦 𝑛 . that we show the system so that, based on them, 45 Step 3. The error vector 𝛿 𝑛 = (𝑦 𝑛 − 𝑦̃ 𝑛 ) is 3.3. Creation of a Neural Network calculated, carried out by the network at the output. Then it happens that the change in the Human learning continues hierarchically. In vector of weighting coefficients in the area of the neural network of our brain, this process is small errors must be proportional to the error at carried out in several stages, each of which is the output and equal to zero if the error is zero. characterized by its own degree of learning. At Step three. The vector of weights is some stages, simple things are taught, at modified by the formula: others—more complex. 𝑊(𝑡 + ∆𝑇) = 𝑊(𝑡) + 𝜂𝑥 𝑛 ∙ (𝛿 𝑛 )𝑇 , In order to simulate the human learning process, layers of neurons are used in the where 0 < 𝜂 < 1 is learning pace. construction of artificial neural networks. The Step 4. Steps 1–3 are repeated for all vectors idea of these neurons is suggested by biological to be trained. One cycle of sequential processes. Each layer of an artificial neural representation of the entire sample is called an network is a set of independent neurons, each epoch. Training ends after several epochs: (a) neuron of a certain layer is connected to the when the iterations converge, that is, the vector neurons of an adjacent layer. stops changing, or when (b) or when the complete, rubberized absolute error over all vectors becomes smaller than some small value. 3.4. Neural Network Training Deep learning is based on complex systems consisting of a huge number of neurons. One If we are dealing with n-dimensional input neural network can have billions of such data, then the input layer will consist of n structural units as neurons or perceptrons. neurons. If m different classes are distinguished Accordingly, there are many ways to structure among our training (training) data, then the them. And depending on how they will be output layer will consist of m neurons. The connected, different architectures of neural layers nested between the input and output networks can be distinguished. layers are called hidden. A simple neural network consists of a pair of layers, and a deep 3.2. Optimization of Neural neural network consists of many layers. Consider the case when we want to use a Network Parameters neural network for data classification. The first step is to collect relevant training data and label One of the simplest gradient descent it. Each neuron acts as a simple function, and optimization algorithms looks like this: the neural network is trained until the error is less than a certain set value. The difference 𝑤𝑝+1 = 𝑤𝑝 − 𝜂𝛻𝐶𝑜𝑠𝑡(𝑤𝑝 ) between predicted and actual outputs is mostly We also justify that the Nesterov algorithm used as an error. Based on how big the error is, will be the optimal algorithm for finding the the neural network corrects itself and retrains minimum [18] until it gets close to the solution. 𝛥𝑝 + 1 = 𝛾 ⋅ 𝛥𝑝 + 𝜂 ⋅ 𝛻𝐶𝑜𝑠𝑡(𝑤𝑝 − 𝛾𝛥𝑝), 3.5. Creating a Classifier based on 𝑤𝑝 + 1 = 𝑤𝑝 − 𝛥𝑝 + 1. a Perceptron A kind of refining operation is used here to find the gradient. Therefore, the method is A perceptron is a single neuron that receives sometimes called the accelerated Nesterov input data, performs calculations on it and gradient. However, despite the fact that the outputs an output signal. The perceptron uses a proven upper estimate of the convergence time simple function to make decisions. Suppose we for this algorithm is minimal, it has not found are dealing with an N-dimensional input data its practical application. The thing is that the top point. The perceptron calculates a weighted score does not always coincide with the sum of N numbers, and then adds a constant to average. And in the case of this algorithm, the them to get the original result. The ego constant average convergence time estimate is close to is called the bias of the neuron. It is interesting the upper estimate. Therefore, the algorithm to note that such simple perceptrons are used to mostly works slowly. design very complex deep neural networks. 46 3.6. Construction of a Single-Layer 4. Deployment of Machine Neural Network Learning The capabilities of the perceptron are We will use the creation of the model for the limited. You need to get a set of neurons to act forecasting task. as one and evaluate it to achieve a goal. Let's To do this, we will help predict the create a single-layer neural network consisting probability that the enterprise will be profitable of independent neurons that influence the input and will have profits in the future. data to obtain the output result. The dataset we used in this chapter is taken from this database: ML_Manufacture_Prifit_ 3.7. Building a Multilayer Neural Companies.csv. This data set contains several independent predictors and one goal, to make Network the enterprise profitable. Its features are as follows: rating, change in rating, income, To obtain higher accuracy, we must give management assets, market value, percentage more freedom to the neural network. This change in income, percentage change in profit, means that the neural network must have more employees. than one layer to capture the underlying The data set contains 1001 records and all patterns that exist among the test data. Let’s businesses are successful and profitable. create a multilayer neural network that will ensure this. A neural network can be used as a classifier. 5. Downloading Data You can use a neural network like regressor [19, 20]. For this example, the dataset was Let’s define a multilayer neural network downloaded locally and named with two hidden layers. You can design a neural ML_Manufacture_Prifit_Companies.csv network in any other way. In this case, we will (Fig. 1). We observe the columns with the have 10 neurons in the first layer and six following data (Fig. 2). The first task for neurons in the second layer. Our task is to estimating a company’s profit is to clean the predict a single value, so the output layer will data so that there are no missing or erroneous contain only one neuron. values (Fig. 3 and 4) [21–23]. Figure 1: Download dataset 47 Figure 2: Data cleaning Figure 3: Cleaned and transformed data Figure 4: Visualization of the top 10 companies by revenue 48 6. Study of the Relationship 7. Construction of a Graph of the Between Features Relationship between Features The next step is to study how different The following code snippet uses the independent characteristics affect the result matshow() function to plot the results returned (which characteristics affect the profitability of by the corr() function as a matrix. At the same the enterprise). time, various correlation coefficients are also The corr() function calculates the pairwise displayed in the matrix: correlation of columns. For example, the Fig. 5 shows the matrix. Cubes with colors following result shows that the level of closest to black represent the highest management assets has little relationship with correlation coefficients, and those closest to the market value of the enterprise 0.127, but it blue represent the lowest correlation has a significant relationship with the coefficients. profitability of the firm 0.49. Another way to construct a correlation We find out which functions significantly matrix is to use Seaborn’s heatmap() function affect the result. as follows: Fig. 5 shows a heat map created using the Seaborn library (Fig. 6). Let’s single out four signs at enterprises that have the highest correlation with the result (Fig. 7–9). Figure 5: Matrix showing various correlation factors 49 Figure 6: Visualization of the top 10 companies with the largest profit in relation to the number of employees Figure 7: Visualization of the top 10 companies with the largest income per number of employees 50 Figure 8: Graph of dependence of the Inertia metric on the number of clusters Figure 9: Found clusters of companies 8. Evaluation of Algorithms 8.1. Logistic Regression We will evaluate several algorithms to find For the first method, we will use logistic the most optimal one and the one that will regression. Instead of splitting the dataset into provide the best performance. Therefore, we training and testing sets, we will use 10-fold will use the following methods: cross-validation to get the average score of the  Logistic regression. algorithm used.  K-Nearest Neighbors (KNN).  Support vector machines (SVM) 8.2. K-Nearest Neighbours using linear kernels and RBF kernels. The next method we will use is K-Nearest Neighbors (KNN). In addition to using 10-fold cross-validation to average the algorithm, 51 different values of k should be tried to find the the files so that we can get the model for optimal one so that the best accuracy can be prediction later. The trained model is now saved obtained. to a file. After loading the model, let’s make some 8.3. Support Vector Machines predictions: The output prints the word “Profitable Enterprise” if the return value of the prediction Another method we will use is Support is 1; otherwise, the word “Unprofitable Vector Machine (SVM). We will use two types enterprise” is printed. It is also necessary to of kernels for SVM: linear and RBF. obtain the probability of the forecast in order to obtain the probability in percent (Fig. 10–12). 9. Learning and Saving the Model The printed probabilities show that the probability of the outcome is 0 and the Since the most efficient algorithm for our probability of the outcome is 1. The prediction data set is KNN with k = 3, we can now is based on the prediction with the highest continue to train the model using the entire data probability, and we take that probability and set: After training the model, we need to save convert it to a confidence percentage. Figure 10: Example of data from companies from cluster 0 Figure 11: Example of data from companies from cluster 1 Figure 12: Example of data from companies from cluster 2 52 10.Conclusions 374–388. doi: 10.1007/978-3-030-80472- 5_31. [6] A. I. Kato, C. P. E. Germinah, Empirical In this study, a dataset of top 1,000 Examination of Relationship between profitable companies was selected and machine Venture Capital Financing and learning was implemented without a teacher. Profitability of Portfolio Companies in Companies were divided into three clusters Uganda, Journal of Innovation and based on revenue, earnings, current assets, Entrepreneurship, vol. 11, no. 1, 2022, pp. market value and number of employees. 1–18. Clustering was implemented based on the [7] E. Filatov, Analysis of Profitability of KMeans algorithm. The Inertia metric was also Production of Enterprises in the Field of used to determine the optimal number of Transportation and Storage of the Irkutsk clusters. Inertia shows us the sum of the Region. Transportation Research distances to each center of mass. If the total Procedia, vol. 63, 2022, pp. 518–524. distance is large, it means that the points are far [8] H. Habibniya, et al., Impact of Capital apart and may be less similar to each other. In Structure on Profitability: Panel Data this case, one can continue to evaluate larger Evidence of the Telecom Industry in the values of K to see if the overall distance can be United States, Risks, vol. 10, no. 8, 2022, reduced. However, it is not always the smartest pp. 157. idea to reduce the distance. Using the elbow [9] V. V. Stipic, V. Ruzic, Panel VAR (bend) method, we can choose the value of the Analysis of the Interdependence of Capital number of clusters 2 or 3 (the first or second Structure and Profitability, Economic and obvious bend). Social Development: Book of Proceedings, 2022, pp. 177–187. 11.References [10] E. A. Virtanen, et al., Balancing Profitability of Energy Production, [1] T. Zhang, et al., Enterprise Digital Societal Impacts and Biodiversity in Transformation and Production Offshore Wind Farm Design, Renewable Efficiency: Mechanism Analysis and and Sustainable Energy Reviews, vol. 158, Empirical Research. Economic Research- 2022, pp. 112087. Ekonomska Istraživanja, vol. 35, no. 1, [11] X. Chen, P. Geyer, Machine Assistance in 2022, pp. 2781–2792. Energy-Efficient Building Design: A [2] S. Jahan, et al., Modeling Profitability- Predictive Framework Toward Dynamic Influencing Risk Factors for Construction Interaction with Human Decision-Making Projects: A System Dynamics Approach, under Uncertainty, Applied Energy, vol. Buildings, vol. 12, no. 6, 2022, pp. 701. 307, 2022, pp. 118240. [3] H. Hulak, et al. Formation of [12] G. V. S. Bhagya Raj, K. K. Dash, Requirements for the Electronic Record- Comprehensive Study on Applications of Book in Guaranteed Information Systems Artificial Neural Network in Food Process of Distance Learning, in Workshop on Modeling, Critical Reviews in Food Cybersecurity Providing in Information Science and Nutrition, vol. 62, no. 10, and Telecommunication Systems, CPITS 2022, pp. 2756–2783. 2021, vol. 2923, 2021, pp. 137–142. [13] F. Moosavi, et al., Application of Machine [4] V. Buriachok, V. Sokolov, Implementa- Learning Tools for Long-Term Diagnostic tion of Active Learning in the Master’s Feature Data Segmentation, Applied Program on Cybersecurity, in Advances in Sciences, vol. 12, no. 13, 2022, pp. 6766. Computer Science for Engineering and [14] J. Siebert, et al., Construction of a Quality Education II, 2019, vol. 938, 2020, pp. Model for Machine Learning Systems, 610–624. doi: 10.1007/978-3-030-16621- Software Quality Journal, vol. 30, no. 2, 2_57. 2022, pp. 307–335. [5] Z. B. Hu, et al., Authentication System by [15] C. Zhang, et al., Mapping Irrigated Human Brainwaves Using Machine Croplands in China using a Synergetic Learning and Artificial Intelligence, in Training Sample Generating Method, Advances in Computer Science for Machine Learning Classifier, and Google Engineering and Education IV, 2021, pp. Earth Engine, International Journal of 53 Applied Earth Observation and Geoinformation, vol. 112, 2022, pp. 102888. [16] O. I. Abiodun, et al., State-of-the-Art in Artificial Neural Network Applications: A Survey, Heliyon, vol. 4, no. 11, 2018, pp. e00938. [17] E. Kussul, et al., Rosenblatt Perceptrons for Handwritten Digit Recognition, in International Joint Conference on Neural Networks, vol. 2, 2001, pp. 1516–1520. [18] Q. Lü, et al., A Nesterov-Like Gradient Tracking Algorithm for Distributed Optimization over Directed Networks, IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 10, 2020, pp. 6258–6270. [19] M. Medykovskyy, et al., Methods of Protection Document Formed from Latent Element Located by Fractals, in X International Scientific and Technical Conference “Computer Sciences and Information Technologies,” 2015, pp. 70– 72. [20] M. Nazarkevych, et al., Application Perfected Wave Tracing Algorithm, in IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), 2017, pp. 1011–1014. [21] M.Nazarkevych, The Ateb-Gabor Filter for Fingerprinting, in International Conference on Computer Science and Information Technology, 2019, pp. 247– 255. doi: 10.1007/978-3-030-33695-0_18. [22] M. Logoyda, et al., Identification of Biometric Images using Latent Elements. CEUR Workshop Proceedings, 2019. [23] M. Nazarkevych, et al., The Ateb-Gabor Filter for Fingerprinting, in Conference on Computer Science and Information Technologies, 2019, pp. 247–255. 54