=Paper= {{Paper |id=Vol-3909/Paper_11.pdf |storemode=property |title=Machine Learning Systems for Analyzing and Predicting User Behavior |pdfUrl=https://ceur-ws.org/Vol-3909/Paper_11.pdf |volume=Vol-3909 |authors=Yuri Kravchenko,Olga Leshchenko,Andriy Dudnik,Nataliia Dakhno,Hennadii Dakhno |dblpUrl=https://dblp.org/rec/conf/iti2/KravchenkoLDDD24 }} ==Machine Learning Systems for Analyzing and Predicting User Behavior== https://ceur-ws.org/Vol-3909/Paper_11.pdf
                                Machine Learning Systems for Analyzing and Predicting
                                User Behavior⋆
                                Yuri Kravchenko , Olga Leshchenko1 , Andriy Dudnik                                                           , Nataliia Dakhno , and
                                Hennadii Dakhno

                                1
                                 Taras Shevchenko National University of Kyiv, 60, Volodymyrska Str., Kyiv, 01033, Ukraine
                                Interregional Academy of Personnel Managemen, 2 Frometivska, Str, Kyiv, 03039, Ukraine
                                2




                                                Abstract
                                                In the article are considered machine learning systems designed for analysis and forecasting behavior users,
                                                with an emphasis on methods that effectively modeled complex patterns interaction users. using various
                                                algorithms such as the k - nearest method neighbors, method of support vectors, logistic regression,
                                                decision trees and random forests, in research are evaluated precision, accuracy, completeness and F -
                                                measure different approaches to forecasting behavior users. It is compared productivity these methods,
                                                emphasizing the potential of each to provide accurate and reliable predictions in different contexts. In the
                                                article also are considered challenges related to the selection and adjustment of machine learning models,
                                                including the previous one processing data, choice signs and optimization hyperparameters. The results of
                                                the study demonstrate the importance of selecting the right models for specific behavioral analysis tasks,
                                                providing valuable recommendations on optimal approaches to predicting individual behavior. This allows
                                                for improved user engagement strategies, increased participation, and enhanced decision-making support
                                                across various industries.

                                                Keywords
                                                machine learning, behavior prediction, SVM, KNN, decision tree, random forest, logistic regression, deep
                                                neural networks, information system 1



                                1. Introduction
                                Modern information systems are becoming increasingly complex and integrated into the lives of
                                users, which creates new opportunities for analyzing their behavior. Machine learning systems open
                                up powerful tools for developers to analyze large volumes of data and predict user behavior in
                                various areas. Thanks to such technologies, it is possible to detect hidden patterns, predict user needs
                                and make informed decisions about improving interaction with them. Machine learning makes it
                                possible to create adaptive systems capable of predicting user actions based on their previous
                                activity, thereby increasing the efficiency of interaction and user experience.
                                   However, the development of such an information system is associated with many challenges. In
                                particular, it is important to choose the right machine learning methods that are best suited for
                                analyzing customer behavior. Such methods include logistic regression, decision trees, support
                                vector machines, neural networks, and others. Each of these methods has its advantages and
                                disadvantages, which must be taken into account when developing the system. In addition, effective
                                use of these methods requires careful data preprocessing, feature selection, and hyperparameter
                                tuning.



                                Information Technology and Implementation (IT&I-2024), November 20-21, 2024, Kyiv, Ukraine
                                 Corresponding author.
                                 These authors contributed equally.
                                   kr34@ukr.net (Y. Kravchenko); olga.leshchenko@knu.ua (O. Leshchenko); andrii.dudnik@knu.ua (A.Dudnik);
                                gennadiy.dakhno@gmail.com (H.Dakhno); nataly.dakhno@ukr.net (N.Dakhno)
                                    0000-0002-0281-4396 (Y. Kravchenko); 0000-0002-3997-2785 (O. Leshchenko); 0000-0003-1339-7820 (A.Dudnik); 0000-
                                0003-3892-4543 (N.Dakhno)
                                           © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).



                                                                                                                                                                                     138
CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
    The purpose of the work is an information system for predicting the behavior of bank customers
using machine learning methods. To achieve this goal, it is necessary to review and compare different
machine learning methods, determine their effectiveness in the context of predicting customer
behavior, and develop a user interface for convenient interaction with the system. Such a system is
designed to help organizations better understand the behavioral patterns of their users, which will
allow them to make more accurate and effective decisions in the field of interaction with customers.
    To achieve this goal, the following tasks were performed:

   •   Analyze machine learning models and evaluate model performance: compare the results of
       different models and select the best ones using appropriate metrics.
   •   Research and collect data on user behavior, including cleaning, normalization and selection
       of relevant features that will have the greatest impact on analysis results.
   •   Develop a model to predict future user behavior and evaluate its accuracy using relevant
       performance metrics.

2. Analysis of machine learning models of user behavior
In modern research, considerable attention is paid to machine learning methods and their application
in various fields, in particular, in the management of unmanned aircraft systems and process
optimization. For example, works [1, 2] consider modified gradient methods for controlling
unmanned aerial vehicles, using integro-differential models. These approaches can be adapted for
the tasks of predicting user behavior, where high accuracy and efficiency of algorithms are required.
In addition, the works [3, 4] investigate optimization and management models in conditions of
limited resources, which has practical value for the development of information systems focused on
the analysis of user behavior. In the context of modeling complex systems, as shown in [5], the use
of genetic algorithms to create artificial ecosystems demonstrates the possibilities of optimizing
complex processes, which can also be useful in predicting behavioral models. Thanks to the different
approaches to text classification described in [6], it is possible to improve the methods of analyzing
text data for studying user behavior.
    Research [7] uses web usage mining to evaluate the quality of websites, improving the user
experience. In [8], an algorithm based on machine learning is proposed for predicting behavior in
"smart" homes, which adapts systems to user actions. Research [9] demonstrates how behavioral
analysis can reduce risks in border regions, and work [10] describes a model for e-commerce that
takes into account global trends, improving the compliance of platforms with digital requirements.
These works confirm the role of machine learning in modeling behavior for different contexts.
    The choice of machine learning models for predicting the behavior of bank customers is an
important stage in the development of an information system. The following models were chosen in
this work: Support vector method (SVM), k-nearest neighbors method (KNN), Decision tree, Random
Forest, Logistic Regression and Deep Neural Networks. This section is devoted to the analysis of the
reasons for choosing these methods, their advantages and disadvantages, as well as the evaluation
of their effectiveness in the context of this project.

2.1. Method of support vectors
The method of support vectors (SVM) is a powerful machine learning tool widely used for
classification and regression. In this subsection, we will consider the theoretical basis of SVM, model
parameters and their meaning, as well as examples of their use.
   SVM is based on finding the optimal hyperplane that separates two classes in the
multidimensional feature space with the maximum margin. The hyperplane is chosen so that the
distance between it and the nearest points of both classes (support vectors) is maximal.


                                                                                                   139
    In the case when the data are not linearly separable, SVM uses kernel functions (kernel functions)
to transform the feature space into a higher-dimensional space where the data can become linearly
separable.
    When applying the support vector method, the settings of various parameters that affect the
performance and accuracy of the model are important. The main SVM parameters include: C
(regularization parameter), Kernel (kernel function), Gamma (kernel function parameter), Degree
(used for the polynomial kernel and determines the degree of the polynomial), Coef0 (kernel
constant).
    The SVM kernel is symmetric, positive semi-definite matrix K consisting of scalar products of
pairs:𝑥𝑖 𝑥𝑗 : 𝐾(𝑥𝑖 , 𝑥𝑗 ) =< 𝑓(𝑥𝑖 ), 𝑓(𝑥𝑗 ) >,
    where f - an arbitrary transforming function for forming the kernel.
    For example:

   1.   linear core:𝐾(𝑥𝑖 , 𝑥𝑗 ) = 𝑥𝑖𝑇 𝑥𝑗
   2.   sigmoid nucleus: 𝐾(𝑥𝑖 , 𝑥𝑗 ) = tan (𝛾𝑥𝑖𝑇 𝑥𝑗 + 𝛽0)
   3.   Gaussian kernel with the function: 𝐾(𝑥𝑖 , 𝑥𝑗 ) = exp (𝛾||𝑥𝑖 − 𝑥𝑗 ||2 )
   4.   polynomial kernel with degree p:: 𝐾(𝑥𝑖 , 𝑥𝑗 ) = (1 + 𝑥𝑖𝑇 𝑥𝑗 )𝑝 .

   The support vector method (SVM) is a powerful tool for solving classification and regression
problems. Using different kernel functions and adjusting model parameters allows SVM to be adapted
to different types of data and tasks, ensuring high accuracy and reliability of predictions.

2.2. The method of k-nearest neighbors (KNN)
The k-nearest neighbor method is a simple and popular machine learning algorithm used for
classification and regression. It is based on the assumption that similar data have similar labels or
values [11] (Figure 1).




Figure 1: Work diagrams of the k-nearest neighbors algorithm

    The basic principle of KNN is to determine a label or value for a new sample by comparing it with
its nearest neighbors from the training data set. The number of nearest neighbors (k) is user-defined
and affects the classification or regression process.
    The KNN algorithm has several key features. Choosing the number of neighbors (k) is an
important step that depends on the specific data and problem. A distance metric, such as the
Euclidean distance metric, is used to measure proximity between samples, and this choice is also
context dependent. The label of the new model is determined by the principle of majority voting
among the nearest neighbors. Since KNN can be sensitive to class scaling and imbalance, data pre-
processing such as normalization or class balancing is important. In addition, as the dimensionality
of the data increases, the computational complexity of the algorithm increases, which can become a
problem for large data sets. Choosing the right hyperparameters, such as the number of neighbors
and the distance metric, is critical to achieving the best results.
    There are many metrics for calculating the distance between objects [12], among which the most
popular are:
                                                                                                  140
   •   Euclidean distance is the simplest and generally accepted metric, which is defined as the
       length of a segment between two objects and in space with features and is calculated by
       the formula:
                                     𝑑(𝑎, 𝑏) = √∑𝑛𝑖=1(𝑎𝑖 − 𝑏𝑖                                      (1)
   •   The Manhattan distance is a metric that is defined as the sum of the moduli of the differences
       in the coordinates of two points in space between two objects and with features and is
       calculated by the formula:

                                             𝑑(𝑎, 𝑏) = ∑𝑛𝑖=1|𝑎𝑖 − 𝑏𝑖 |                              (2)
   •   The cosine distance is a metric that is defined as the angle between two vectors       and in
       feature space and is calculated by the formula:
                                                           ∑𝑛
                                                            𝑖=1 𝑎𝑖 𝑏𝑖
                                      𝑑(𝑎, 𝑏) = 1 −                                                 (3)
                                                      √∑𝑛     2  𝑛      2
                                                        𝑖=1 𝑎𝑖 √∑𝑖=1 𝑏𝑖



   Considering all these aspects, the k-nearest neighbors method can be a useful and efficient
algorithm, especially for simple classification and regression problems. However, it has its
limitations and requires appropriate data processing and hyperparameter tuning to achieve the best
results.

2.3. The decision tree method
Decision trees are known in the machine learning world for a particular distinguishing characteristic:
their visualization is easier to understand compared to other machine learning models. Figure 2
shows a graphical representation of a typical decision tree for classification. This is for the
hypothetical situation where a person wants to see an R-rated movie in a theater.




Figure 2: Tree of the Decision method Tree
   Target the function, which determines the informativeness of features, is defined as follows [13]:
                                 𝑁𝑗
   𝐼𝐺(𝐷𝑝 , 𝑓) = 𝐼(𝐷𝑝 ) − ∑𝑛𝑗=1        𝐼(𝐷𝑗 ), where
                                 𝑁𝑝
   F - the sign by which splitting occurs;
   𝐷𝑝 , 𝐷𝑗 - is the data set of the parent and j th child node;
   𝑁𝑝 , 𝑁𝑗 - the total number of copies in the parent i jth daughter nodes;
   I - peace inhomogeneities.

    The method is used both for classification problems and for regression problems. A decision tree
is a hierarchical structure in which each node is responsible for checking a certain condition, and
each branch represents the result of this check. The leaf nodes of the tree contain the final solutions
or predicted values.
    A key feature of decision tree construction algorithms is the method of selecting the next
attribute. There are the following algorithms [14]:
                                                                                                   141
   •   the ID3 algorithm, where the selection of the attribute is based on the increase or on the basis
       of the Gini index;
   •   algorithm C4.5 (improved version of ID3), where the selection of the attribute is based on the
       normalized increase of information;
   •   the CART algorithm and its modifications IndCART, DB-CART [15].

   The decision tree method uses three measures inhomogeneities (or splitting criteria):

   •   Entropy 𝐼𝐻 (𝑡) = ∑𝑐𝑖=1 𝑝(𝑖|𝑡)𝑙𝑜𝑔2 𝑝(𝑖|𝑡) for all nonempty classes𝑝(𝑖|𝑡)
       pi - part of the elements from the i -th class for node t.
       The maximum is reached with a uniform distribution classes
   •   Mira heterogeneity Genie 𝐼𝐺 (𝑡) = 1 − ∑𝑐𝑖=1 𝑝(𝑖|𝑡)2 .
       This value shows the frequency that an element of the training sample is randomly selected
       is recognized incorrectly, provided that the value of the target known functions distribution
       That is, the distance between distribution target values and distribution of model predictions.
       The maximum, as in entropy, is reached at uniform distribution classes
   •   Classification error 𝐼𝐸 (𝑡) = 1 − max {𝑝(𝑖|𝑡)}.

    Advantages of the decision tree method: simplicity and intuitive comprehensibility; the decision
tree is easy to visualize and interpret; can process both numerical and categorical data; does not
require data scaling. Disadvantages of the method: tendency to overfitting, especially on small data
sets; can be unstable because small changes in the data can lead to large changes in the tree structure.

2.4. Random forest method
A random forest is an ensemble machine learning method that consists of a large number of decision
trees. The main idea of the method is to combine the results of several decision trees to obtain a more
accurate and stable forecast. A random forest uses the bagging method to build a set of decision trees,
each of which is trained on a random subset of data [16-17].
   To build an ensemble of algorithms based on bagging using bootstrap, samples are generated, on
each of which a classifier is trained 𝑎𝑖 (𝑥). The resulting classifier will average the responses of all
algorithms (in the case of classification, this corresponds to voting):
                                                     1
                                            𝑎(𝑥) = ∑𝑀
                                                  𝑀     𝑖=1 𝑎𝑖 (𝑥).                                  (4)
   Bagging allows to reduce the variance of the classifier and prevents overtraining. The
effectiveness of bagging is achieved due to the fact that the basic algorithms trained on different
subsamples are quite different, and their errors are mutually compensated during voting. In addition,
outliers may not be included in some training subsamples.
   All trees of the ensemble are built independently of each other according to the following
procedure:

   •   generate a random subsample of size n from the training sample;
   •   build a decision tree, and during the creation of the next tree node, not all features are
       considered, but only m randomly selected features, on the basis of which the division will be
       carried out;
   •   the tree is built until the objects of the subsample are completely exhausted and is not subject
       to the branch cutting procedure.

   Advantages of the random forest method: high accuracy and stability; a random forest is less
prone to overtraining than individual decision trees; can efficiently process large data sets and a
large number of attributes; is relatively insensitive to missing data and can handle a dataset with
missing values. Disadvantages of the random forest method: high computational complexity and
                                                                                                142
memory consumption due to the large number of trees; difficult to interpret because the result is an
average of many trees.

2.5. Logistic regression method
The logistic regression method (logistic regression) is a statistical method in machine learning that
is used to predict the probability of belonging to two or more categories or classes. It is one of the
most common algorithms for binary classification problems [18-19].
    The main idea of the method is to model the logarithm of the odds (the logarithm of the likelihood
ratio) as a linear combination of the input attributes. A logistic function (also known as a sigmoid)
is used to convert a linear combination into a probability of belonging to a particular class.
    The main steps of the logical regression method:

   1.   Data preparation: Loading and preprocessing of data, including scaling, normalization or
        removal of missing values.
   2.   Model definition: establishing a logistic regression model that includes input attributes and
        model parameters.
   3.   Parameter estimation: using the maximum likelihood method or other optimization methods,
        model parameters that best fit the training data are estimated.
   4.   Classification: with the help of a trained model, the probability of belonging to a certain class
        for new data samples is predicted. A classification decision is made based on the probability
        threshold.
   5.   Evaluation of the results: evaluate the accuracy and efficiency of the model using metrics
        such as accuracy, sensitivity, specificity or ROC curve.

    Logistic regression has several advantages, such as ease of implementation, interpretability of
results, ability to work with different types of attributes (categorical and numerical) and copes well
with large amounts of data. However, it may also have limitations, such as linearity assumptions and
vulnerability to outliers or unbalanced data. Regularization techniques, selection of optimal
attributes or use of ensemble methods can be used to improve the results.

2.6. Deep neural networks
Deep neural networks (Deep Neural Networks, DNN) are powerful machine learning models that
consist of many layers of neurons. They are capable of automatically learning complex nonlinear
relationships in large volumes of data and are widely used for a variety of classification problems. In
this subsection, we will consider the theoretical foundations of deep neural networks, their
architecture, parameters and examples of use [20-21].
   A DNN is a multilayer neural network that consists of an input layer, one or more hidden layers,
and an output layer. Each layer consists of neurons that are interconnected by weighted connections.
Neurons calculate the weighted sum of their input signals, apply an activation function to this sum,
and pass the result to the next layer.
   Main components of DNN:

   1.   Neurons (Nodes): basic computing elements that receive input signals, calculate a weighted
        sum and apply an activation function.
   2.   Layers: collections of neurons, where each layer processes the output signals of the previous
        layer.

   •    the input layer receives initial data;
   •    hidden layers process data, revealing complex patterns and dependencies;
   •    the output layer generates the final predictions or classifications.
                                                                                                     143
   3. Activation functions (Activation Functions): non-linear functions applied to the original sum
of neurons. Popular functions include ReLU, Sigmoid, Tanh, and Softmax.
   The architecture of deep neural networks defines the number of layers, the number of neurons in
each layer, and the types of activation functions. Some popular architectures include:

   1.   Direct neural networks (Feedforward Neural Networks, FNN) [22]: the simplest form of DNN,
        where signals pass from the input layer to the output layer through one or more hidden
        layers without feedback.
   2.   Convolutional neural networks (Convolutional Neural Networks, CNN) [23].
   3.   Recurrent neural networks (Recurrent Neural Networks, RNN) [24].

   The main parameters that are adjusted when building and training deep neural networks include:

   •    Number of layers (Number of Layers): a larger number of layers allows the model to detect
        more complex patterns, but also increases computational costs and the risk of overtraining.
   •    The number of neurons in the layer (Number of Neurons per Layer): defines the number of
        computing units in each layer. Increasing the number of neurons can improve the learning
        ability of the model, but also increases the risk of overtraining.
   •    Activation functions (Activation Functions): ReLU (Rectified Linear Unit), Sigmoid, Softmax.
   •    Speed of learning (Learning Rate):
   •    Batch size (Batch Size):
   •    Number of epochs (Number of epochs.

   Deep neural networks (DNNs) are a powerful tool for solving classification problems due to their
ability to learn complex nonlinear relationships in large volumes of data. The choice of network
architecture and parameter settings are critical to achieving high model accuracy and performance.
In the context of predicting customer behavior, DNNs can provide high classification accuracy,
especially when dealing with large and complex datasets.

3. Collection and preparation of data on user behavior
The subscription_prediction.csv file was used as a Dataset to study the behavior of the bank's clients.
It contains data on clients of the banking institution and consists of 21 columns: age, job, marital
status, education, default, housing, loan (availability of a personal loan), contact (type of
communication), month (month of last contact), day_of_week (day of last contact), duration
(duration of last contact in seconds), campaign (number of contacts made during this campaign and
for of this client), pdays (the number of days that have passed since the last contact with the client
from the previous campaign), previous (the number of contacts made before this campaign and for
this client), poutcome (the result of the previous marketing campaign), emp.var. rate (rate of
employment change - quarterly indicator), cons.price.idx (consumer price index - monthly indicator),
cons.conf.idx (consumer sentiment index - monthly indicator), euribor3m (3-month euribor rate -
daily indicator), nr.employed (the number of employees is a quarterly indicator), y (has the client
signed a term deposit).
         To read data from a CSV file and create a Dataframe, the function " pandas.read_csv ()" from
the pandas library is used (Figure 3):



Figure 3: Reading data and creating a Dataframe



                                                                                                   144
   Data preparation plays an important role in the implementation of the algorithm, so it is
necessary to carefully examine the data set. ". head ()" method used to display the first few rows of
the DataFrame.
   After examining the results of the methods, it was found that there are no missing values in the
data set. Also, the target column 'y' takes the value " yes " or " no ". For convenience, it is necessary
to encode these lines as numbers "0" for " no " and "1" for " yes ".
   For error-free operation of the fit () method, which will be used later, it is necessary to convert all
categorical columns into binary variables (Figure 4):


Figure 4: Conversion of columns to binary variables

   The variable "top_5_features" contains the indices of the five variables that have the highest
correlation with the target variable. These variables are the most informative for predicting the target
variable and can be used for further analysis or model building (Figure 5).




   Figure 5:

4. Development of a model for predicting user behavior
The next step will be to create and train the classifier using the scilit-learn library. The goal when
training a model is to evaluate its performance on a test data set or on new data. To ensure accuracy,
it is necessary to create an intermediate data set between training and test data.
    Thus, the initial data set will be divided into three parts: training set (60%), validation set (20%),
test set (20%). To implement this task, there is a method " train_test_split ()" (Figure 6):




Figure 6: Splitting the data into three parts

   In the next step, you need to normalize the values of the columns by scaling their values to the
range [0, 1].
   NeighborsClassifier " is used to create a classifier using the k-nearest neighbors algorithm. In this
case, the " knn " model is used as the base classifier, and the " grid_params " is passed as the "
param_grid " parameter in the " GridSearchCV " constructor. The " scoring =' accuracy '" parameter
indicates the use of the accuracy metric to evaluate the model.
   After initializing the " knn_grid " object of the " GridSearchCV " class, the "fit()" method is called,
which starts the search for the best hyperparameters on the training data set that has been pre-scaled
by " X_train_scaled " and the corresponding target values " y_train ". The algorithm goes through all
possible combinations of hyperparameters, calculates the results according to a certain metric (in
this case, accuracy), and stores the best combination of hyperparameters in the " knn_grid " object.

                                                                                                      145
    After calling " fit()", the " knn_grid " model will contain the best hyperparameter values that can
be used for further prediction and evaluation.
    The last stage of the model implementation is the derivation of the efficiency of its work (Figure
7):




Figure 7: Output of the accuracy of the model

    In this code, the accuracy of the model is evaluated on the test data set after finding the best
hyperparameters using " Grid Search ».« X_test_scaled » - test data set " X_test " that was scaled
using " scaler.transform ()". This is done to ensure the same scale between the training and test
datasets. " accuracy " is a variable that stores the accuracy of the model on the test data set. The
score() method is called on the best estimator " best_estimator " of the " knn_grid " object (the
NeighborsClassifier model with the best hyperparameters ), and the accuracy of the model is
calculated on the test data [25].
    The main difference from the k-nearest neighbors implementation is the use of
LogisticRegression () instead of NeighborsClassifier. LogisticRegression () is a class in the scikit-learn
library that is used to build a logistic regression model.
    Implementation by the decision tree method is slightly different from, for example, the KNN
method:
    The main difference is as follows:

   •    for KNN, KneighborsClassifier from the sklearn.neighbors library is used.
        DecisionTreeClassifier from the sklearn.tree library is used for the decision tree;
   •    in the case of KNN, the hyperparameter is configured n_neighbors (number of neighbors)
        and metric (Euclidean, Manhatta, etc.). In the case of a decision tree, hyperparameters are set,
        such as the criterion (gini or entropy) and the maximum depth of the tree;

    In general, both approaches have a similar structural skeleton for loading data, training, training
a model, and evaluating its accuracy, but they use different algorithms for classification.
    The RandomForestClassifier from the sklearn.ensemble library is used for the random forest. A
random forest combines decision trees to reduce overtraining and improve accuracy. SelectKBest
with f_classif is also used to select the top 5 features using analysis of variance [26].
    Feature selection can improve a model's speed and efficiency, but it can affect its workflow and
decisions. To reduce the code execution time, in this case, RandomizedSearchCV was used, it allows
you to explore a random subset of the hyperparameter space [27-29].
    The implementation of the model differs by importing additional libraries for working with data,
building and training a neural network: "Sequential", "Dense", "Dropout", "Adam", "to_categorical",
"tensorflow.keras". Next, a neural network is built. The network consists of three " Dense " layers (
fully connected layers), each of which has a different number of neurons and uses the " ReLU "
activation function, except for the output layer, which uses the "softmax " activation function for the
classification task. Two " Dropout " layers are placed after each hidden layer to help prevent
overtraining by turning off random neurons during training.
    This architecture provides flexibility in training complex models capable of recognizing patterns
in data, while " Dropout " helps avoid overtraining problems.

                                                                                                      146
5. Evaluation of the quality of models
The accuracy of the k-nearest neighbors model was 86.62%. With the help of the "
classification_report ()" function, a detailed report on the classification quality indicators for the
machine learning model was obtained (Figure 8).




   Figure 8: Display of model accuracy
   Classification report (Classification Report) provides detailed information about the classification
results:

   •   precision: measures how many positive predictions were correct. For " no " it is 0.91, for " yes
       " - 0.82. This means that the model was accurate in predicting the " no " class 91% of the time
       and the " yes " class 82% of the time;
   •   recall (sensitivity): this measures what fraction of true positive cases were found by the
       model. For " no " it is 0.83, for " yes " 0.90. This means that the model is able to reproduce
       83% of positive cases for " no " and 90% of positive cases for " yes ";
   •   f1-score (F1-index): indicates the balance between accuracy and sensitivity. For " no " it is
       equal to 0.87, for " yes " - 0.86;
   •   support: this is the number of instances in each class in the test data set.

                                   represent the average value for each indicator across all classes.
            " calculates the average regardless of class size, while " weighted avg » takes class size
into account.
   The accuracy of the model by the method of support vectors is 86.62% (Figure 9), To assess
the quality of the model, the validation curve graph is also used (Figure 10):




 Figure 9: Display of model accuracy                    Figure 10: Graph of the validation curve for
 by the method of support vectors                       parameter

   Training score (red line) and Cross-validation score (green line) show the estimation of the
accuracy of the model at different values of the C parameter. Both graphs have a similar shape and
reach a plateau around C = 1. This indicates that the model performs well already at low values of
C, and increasing this parameter does not brings significant performance improvements. The

                                                                                                   147
accuracy of the model stabilizes at the level of about 0.855. This indicates a good overall quality of
the model, since the cross- validation accuracy is very close to the training accuracy, which indicates
the absence of strong overfitting or underfitting.
   The accuracy of the logistic regression model was 84.20%, which is quite a good result (Figure
11).
   The ROC curve (Receiver Operating Characteristic curve) (Figure 12). The ROC curve displays
the relationship between sensitivity (True Positive Rate, TPR) and specificity (True Negative Rate,
TNR) of the classifier at different threshold values. True Positive Rate defines how often the classifier
correctly identifies positive examples among all true positive examples. False Positive Rate defines
how often the classifier incorrectly identifies negative examples among all true negative examples.
An ROC curve is a graph where the X- axis shows FPR and the Y- axis shows TPR. Each point in this
graph corresponds to a different threshold value at which the classifier identifies positive and
negative examples.
   The ROC curve diagonally passes through the point (0,0) and (1,1) and corresponds to a classifier
that randomly determines a class. The optimal classifier will lie above this diagonal.
   The area under the ROC curve (AUC-ROC) determines the overall quality of the classifier. The
larger the AUC-ROC, the better the classifier. Typically, AUC-ROC ranges from 0 to 1. A classifier
with an AUC-ROC of 0.5 corresponds to random class selection, while a classifier with an AUC-ROC
close to 1 is considered very effective.




 Figure 11: Representation of the accuracy of the Figure 12: ROC curve graph
 logistic regression model

    In the ROC curve plot, the optimal classifier will have an ROC curve that approaches the upper
left corner of the plot and is 0.92, which corresponds to a high TPR and a low FPR at any threshold
value.
    The accuracy of the decision tree model was 86.81%, which is shown in Figure 13:




 Figure 13: Accuracy of the decision tree model          Figure 14: Graph of the ROC curve of the
                                                         decision tree model

                                                                                                     148
   The model has high accuracy and is able to effectively distinguish between positive and negative
classes, which is confirmed by a high AUC value (0.94). The ROC curve indicates a good balance
between sensitivity and specificity, which makes the model reliable for classification tasks.
   The model built on the basis of the random forest method has an accuracy of 88.05%, which is
shown in Figure 15. The model has high accuracy and is able to effectively distinguish between
positive and negative classes, which is confirmed by a high AUC value (0.94). The ROC curve
indicates a good one balancing between sensitivity and specificity that makes the model reliable for
classification problems (Figure 16):




 Figure 15: Accuracy of the random forest model        Figure 16: Graph of the ROC curve

   Value AUC = 0.94 is very tall indicator that indicates a high productivity models. The curve shows
high True Positive Rate even at low False Positive Rate that means that the model detects well
positive cases at a minimum quantity false positives activations. Proximity curve to the top left
corner of the graph indicates a high sensitivity and specificity models. The model is significant
deviates from random lines, confirming high precision models.
   The evaluation was carried out using the following metrics and is presented in Table 1:

   •   Accuracy;
   •   Precision;
   •   Recall;
   •   F- measure.

   The table shows the comparison of different machine learning algorithms applied to predict user
behavior in terms of metrics such as accuracy, accuracy, completeness, and F-measure. High
accuracy rates are observed for KNN, SVM, decision tree, and random forest methods with values
ranging from 0.87 to 0.88, indicating their ability to make correct predictions. The random forest
method performed best with an accuracy of 0.88, indicating its potential to predict general user
behavior patterns, while logistic regression performed slightly lower (0.84).

Table 1
Evaluation methods
                      The method     By the method of Method of         Decision       Random
                      of k-nearest   support vectors  logistic          tree method    forest
                       neighbors                      regression                       method
     Metrics             Result      Result           Result            Result         Result
    Precision             0.87       0.87             0.84              0.87           0.88
    Precision             0.82       0.82             0.83              0.84           0.84
  Completeness             0.9       0.91             0.82              0.89           0.91
   F- measure             0.86       0.86             0.82              0.86           0.87



                                                                                                 149
    In the context of user behavior analysis, accuracy and completeness help assess a model's ability
to correctly identify positive behavioral patterns. All algorithms have similar accuracy rates (about
0.82-0.84), which indicates their effectiveness in correctly predicting certain user behavior. However,
the SVM and random forest methods showed the highest completeness values (0.91), indicating their
ability to detect all possible behavior scenarios without missing any important cases. The F-measure
as a harmonic mean of accuracy and completeness confirms the balance and reliability of the models,
particularly in the decision tree, SVM (0.86) and random forest (0.87) methods, making these
approaches the most effective for predicting complex user behavior.

6. Conclusions
This paper analyzed k-nearest neighbors (KNN), support vector (SVM), logistic regression, decision
tree, random forest, and deep neural network methods for predicting user behavior. The study found
that all six methods are effective and capable of providing accurate predictions of customer behavior.
   The behavior of the bank's customers was studied based on such parameters as demographic data,
the history of interaction with the bank, the availability of loans, the method of communication, as
well as macroeconomic indicators. The analysis showed that certain factors, such as age, type of
work, availability of a home loan and previous campaigns of the bank, have a significant impact on
the probability of a client signing up for a term deposit.
   According to the obtained results, the accuracy of the k-nearest neighbors method was 86.62%,
which indicates its good prognostic ability. The support vector method (SVM) also showed an
accuracy of 86.62%. Logistic regression showed an accuracy of 84.20%, which is also a satisfactory
result. The decision tree model had an accuracy of 86.81%, while the accuracy of the random forest
model was 88.20%, which is the highest among all methods studied. The deep neural network (DNN)
showed an accuracy of 86.00%.
   Based on these results, it can be concluded that the random forest is the most effective among the
studied methods for predicting the behavior of bank customers. However, the choice of method
should depend on the specifics of the data and the specifics of the task. In some cases, a combination
of methods, such as using k-nearest neighbors and logistic regression, can lead to even better results
and improve the quality of the prediction.

Declaration on Generative AI
The authors have not employed any Generative AI tools.

References
[1] N. Dakhno, O. Barabash, H. Shevchenko, O. Leshchenko and A. Musienko, " Modified Gradient
    Method for K- positive Operator Models for Unmanned Aerial Vehicle Control,
    International Conference he Methods and Systems of Navigation and Motion Control (MSNMC),
    KYIV, Ukraine, 2020, pp. 81-84, doi: 10.1109/MSNMC50359.2020.9255516.
[2] Dakhno N., Barabash O., Shevchenko H., Leshchenko O., Dudnik A. Integro-differential Models
    with a K- symmetric Operator for Controlling Unmanned Aerial Vehicles Using a Improved
    Gradient Method. 2021 IEEE 6th International Conference " Actual Problems of Unmanned
    Aerial Vehicles Development (APUAVD). Proceedings. October 19 21, 2021, Kyiv, Ukraine. P.
    61 65. DOI: 10.1109/APUAVD53804.2021.9615431.
[3] Dakhno, N., Leshchenko, O., Kravchenko, Y., Dudnik, A., Trush, O., Khankishiev, V. Dynamic
    model of the spread of viruses in a computer network using differential equations (2021) 2021
    IEEE 3rd International Conference he Advanced Trends in Information Theory, ATIT 2021 -
    Proceedings, pp. 111-115.


                                                                                                   150
[4] Shevchenko, H., Dakhno, N., Leshchenko, O., Barabash, O., Kravchenko, Y., & Dudnik, A. (2022,
     December). Using Mathematical Optimization Methods that Maximize Audience Reach with
     Budget Constraints. In 2022 IEEE 4th International Conference he Advanced Trends in
     Information Theory (ATIT) (pp. 249-254). IEEE.
[5] Y. Kravchenko, O. Leshchenko, N. Dakhno, O. Pliushch, O. Trush and Y. Yermakov,"
     Development of Model of Artificial Ecosystem he the Basis of Genetic Algorithm,
     4ht International Conference he Advanced Trends in Information Theory (ATIT), 2022, pp. 199
     203
[6] Kovaluk T., Dukhnovska K., Kovtun O., Nikolaienko, A., Yurchuk, I. Text classification using
     term co-occurrence matrix. XX International Scientific Conference " Dynamic System Modeling
     and Stability Investigation " (DSMSI-2023). December 19-21, 2023.
[7] Sawant, P., & Kulkarni, R. (2013). A Knowledge Based Methodology To Understand The User
     Browsing Behavior For Quality Measurement Of The Websites Using Web Usage Mining.
     International Journal Of Engineering And Computer Science, 2, 1522-1538.
[8] T. Liang, B. Zeng, J. Liu, L. Ye and C. Zou, "An Unsupervised User Behavior Prediction Algorithm
     Based on Machine Learning and Neural Network For Smart Home," in IEEE Access, vol. 6, pp.
     49237-49247, 2018, doi: 10.1109/ACCESS.2018.2868984.
[9] Hubanova, T., Shchokin, R., Hubanov, O., Antonov, V., Slobodianiuk, P., Podolyaka, S.
     Information technologies in improving crime prevention mechanisms in the border regions of
     southern Ukraine (2021) Journal of Information Technology Management, 13, pp. 75-90. Cited
     50 times. DOI: 10.22059/JITM.2021.80738
[10] Alazzam, F.A.F., Shakhatreh, H.J.M., Gharaibeh, Z.I.Y., Didiuk, I., Sylkin, O. Developing an
     Information Model for E-Commerce Platforms: A Study on Modern SocioEconomic Systems in
     the Context of Global Digitalization and Legal Compliance (2023) Ingenierie des Systemes
     d'Information, 28 (4), pp. 969-974. Cited 32 times. DOI: 10.18280/isi.280417
[11] Aggarwal, CC, Zhai, C. A survey of text classification algorithms. In: Aggarwal, CC, Zhai, C
     (ed.) Mining text data. Berlin: Springer, 2012
[12] MACHINE LEARNING AZ : DOWNLOAD PRACTICE DATASETS [Electronic resource]
     Resource access mode: https://www.superdatascience.com/machine-learning/ 4. Russell, SJ,
     Norvig, P. Artificial intelligence: a modern approach. 3rd ed. Upper Saddle River, NJ: Prentice
     hall /
[13] Li, S., & Amenta, N. (2015). Brute-force k- nearest neighbors search he the GPU. In Similarity
     Search and Applications: 8th International Conference, SISAP 2015, Glasgow, UK, October 12-
     14, 2015, Proceedings 8 (pp. 259-270). Springer International Publishing.
[14] Hopcroft, JE, Ullman, JD, & Aho, AV (1983). Data structures and algorithms (Vol. 175). Boston,
     MA, USA: Addison-wesley.
[15] Avinash Navlani « Support Vector Machines with Scikit-learn ", 2019. [Electronic resource] -
     Resource access mode: https://www.datacamp. com / community / tutorials / svm-classification-
     scikit-learn-python
[16] Patel, F. Decision Tree                the CART Algorithm. [Electronic resource].
     https://medium.com/analytics-vidhya/decision-tree-the-cart-algorithm-28c481d28813
[17] Breiman, L. Classification and Regression Trees / L. Breiman, JH Friedman, RA Olshen, CT
     Stone. Chapman and Hall/CRC, 1984. 368
[18] Velu, A. (2021). Application of logistics regression models in risk management. International
     Journal of Innovations in Engineering Research and Technology, 8(04), 251 260. Retrieved from
     https://repo.ijiert.org/index.php/ijiert/article/view/2594.
[19] 27. Hosmer, DW, & Lemeshow, S. (2000). Applied Logistics Regression. John Wiley & Sons, Inc.
     DOI: https://doi.org/10.1002/0471722146.
[20] Akanbi, LA, Oyedele, AO, Oyedele, LO, & Salami, RO (2020). Deep learning model for
     Demolition Waste Prediction in a circular economy. Journal of Cleaner Production, 274, 122843.

                                                                                                151
[21] Schulz, Hannes; Behnke, Sven (1 November 2012). "Deep Learning". KI - Künstliche Intelligence.
     26 (4): 357 363. doi:10.1007/s13218-012-0198-z. ISSN 1610-1987. S2CID 220523562.
[22] Jump up to:a b LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey (2015). " Deep Learning " (PDF).
     Nature. 521 (7553): 436 444. Bibcode:2015 Natur.521..436L. doi:10.1038/nature14539. PMID
     26017442. S2CID 3074096.
[23] Le, QV (2013, May). Building high-level features using large scale unsupervised learning. In 2013
     IEEE international conference on acoustics, speech and signal processing (pp. 8595-8598). IEEE.
[24] LeCun, Y.; et al. (1998). " Gradient-based learning applied that document recognition ".
     Proceedings of the IEEE. 86 (11): 2278 2324. doi:10.1109/5.726791
[25] Abdel-Hamid, O.; et al. (2014). "Convolutional Neural Networks for Speech Recognition".
     IEEE/ACM Transactions on Audio, Speech, and Language Processing. 22 (10): 1533 1545. doi:
     10.1109/taslp.2014.2339736.
[26] Metrics and scoring: quantifying the quality of predictions. URL: https:// scikit - learn. org /
     stable / modules / model _ evaluation. html (date of application 04/03/2024). (electronic resource)
[27] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., & Duchesnay, É.
     (2011). Scikit-learn: Machine learning in Python. the Journal of machine learning research, 12,
     2825-2830.
[28] Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of
     machine learning research, 13(2).
[29] Brownlee, J. (2019). A gentle introduction that the rectified linear unit (ReLU). Machine learning
     mastery, 6.




                                                                                                    152