Intelligent System for Processing and Forecasting
                         Financial Assets and Risks
                         Nickolay Rudnichenko1, Vladimir Vychuzhanin1, Tetiana Otradskya1 and Denys
                         Shvedov1
                         1 Odessa Polytechnic National University, Shevchenko Avenue 1, Odessa, 65001, Ukraine


                                         Abstract
                                         The article contains a description of the results of development and research of intelligent system for
                                         processing, accounting and forecasting financial assets and risks. An analysis of current problems and
                                         technical aspects related to the assessment of financial flows and risks of loss of assets in the context
                                         of modern market development in the field of accounting and forecasting of financial time series was
                                         carried out. A justification is given for the feasibility and effectiveness of using machine learning
                                         methods to automate the solution of operational and analytical problems in the field of accounting for
                                         financial assets. The proposed concept of using machine learning within the framework of the
                                         developed application software is described, the key aspects of its design and implementation of the
                                         software and modular structure are considered. The results of a comparative analysis of the created
                                         ACF and PACF models, assessment of the quality metrics of robotic models in different conditions
                                         when assessing financial flows and risks are presented, an analysis of the results obtained is carried
                                         out and further ways of developing the approach proposed in the article to improve its efficiency are
                                         considered.

                                         Keywords
                                         Data analysis, finance risks, financial time series, machine learning, software development, automatic
                                         correction of trend and seasonality


                         1. Introduction
                                   1.1.        Research relevance

                         In the modern world, where data volumes are constantly growing, the importance of accurate
                         accounting and forecasting of financial indicators becomes particularly significant. The
                         relevance in this field primarily lies in the increasing need for precise and efficient tools capable
                         of automating operational and analytical actions on data from financial flows, taking into
                         account the risks of incorrect investments [1].
                            This not only allows optimizing investors’ budgets but also provides client companies with
                         means for more effective management of material resources in terms of planning potential
                         revenues and various expenses, ensuring the principles of overall financial stability.
                            In the context of formalizing the process of processing data from financial flows or
                         indicators, it should be noted that their nature can vary. Specifically, data from the outset may
                         be well structured, semi-structured, or unstructured [2].
                            Structured financial data have a clearly defined format, often adhering to a relational
                         database model, characterized by a standard hierarchical order.
                            They are highly organized, easily accessible, and directly utilized for analysis. In well-defined
                         schemas and collections, structured data is predominantly stored in tabular formats. For
                         instance, names, dates, addresses, credit card numbers, stock information, geolocation, etc.,
                         serve as examples of structured financial data [3]. Semi-structured financial data are often not
                         stored in relational databases like structured data but possess certain organizational properties
                         that facilitate their analysis.
                         ____________________________________
                         CMIS-2024: Seventh International Workshop on Computer Modeling and Intelligent Systems, May 3, 2024,
                         Zaporizhzhia, Ukraine
                            nickolay.rud@gmail.com (N. Rudnichenko); icst_nuop@ukr.net (V.Vychuzhanin); tv_61@ukr.net (T. Otradskya);
                         studylearnerstudy@gmail.com (D. Shvedov);
                            0000-0002-7343-8076 (N. Rudnichenko); 0000-0002-6302-1832 (V.Vychuzhanin); 0000-0002-5808-5647 (T.
                         Otradskya); 0009-0002-4823-8782 (D. Shvedov)
                                  © 2024 Copyright for this paper by its authors.
                                  Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).

CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
   Examples of such data include risk profiles, investment plans, and various financial
accounting data. HTML, XML, JSON documents, NoSQL databases, etc., are the examples of semi-
structured data types used in financial analysis.
   For unstructured financial data, there is no predefined format or organization, significantly
complicating their collection, processing, and analysis, as they mostly consist of textual and
multimedia materials. Examples of unstructured financial data include dividend or investment
reports, emails, blog posts, wikis, text documents, PDF files, images, presentations, web pages,
and many other types of business documents containing valuable financial information [2, 4].
   Within the framework of managing various types of data from financial assets of users or
companies, the promising approach is the application of modern machine learning (ML)
methods. This direction opens up a range of possibilities for automating the processing and
analysis of large volumes of diverse financial data (executed transactions for the payment of
goods, receipt of services, acquisition of securities, crypto assets, or other investment
contributions). It allows uncovering hidden patterns and dependencies that may not be easily
identified by humans in manual mode or through the use of specialized deterministic
calculation programs [5].
   The advantages of modern ML algorithms include their ability to automatically adapt to
dynamic changes in financial profiles and investment risks of users, allowing an increase in the
accuracy of data forecasting. ML algorithms can identify patterns and relationships in data that
may be non-obvious to humans. This enables precise financial forecasts and informed
investment decisions based on the analysis of diverse data.
   Consequently, the application of ML in the field of financial risk assessment holds significant
practical interest. Specifically, accounting and forecasting of financial indicators can be utilized
in various sectors such as banking, financial consulting, asset management, and personal
finance. The accuracy of such forecasts can greatly impact decision-making in these areas,
particularly in risk management, budget planning, and investment planning [6].
   Among the features of applying supervised ML algorithms, it is important to note that the
models created upon these algorithms form a solution basis for tasks, relying mostly on
information obtained from the input dataset. Another important characteristic of ML models is
their ability to self-improve, meaning the formalization and utilization of acquired experience
based on a data-driven approach. In fact, the more diverse and balanced data models analyze,
the more accurate their predictions become. This is particularly relevant in sectors where high
accuracy is required, such as finance, medicine, or security systems [7].
   The primary scientific challenge in this context is achieving and instilling in the model a high
level of generalization ability to effectively solve tasks on new, previously unknown data not
used during training, with a minimal level of errors. This is why, when using ML in the field of
financial analysis, it is necessary to integrate approaches from other applied and theoretical
scientific directions, including mathematical statistics, optimization methods, and traditional
mathematical domains. It is important to take into account the unique characteristics of model-
building algorithms related to both non-functional aspects (computational efficiency or speed)
and functional aspects (overfitting issues, sample balance, hyperparameter tuning) of ML
utilization [8].
   Various methodologies are used for forecasting financial data and creating client profiles,
typically based on the analysis of historical data and influencing factors. This involves
establishing statistical relationships between different characteristics and developing predictive
models. Previously, the single-factor forecasting methods were dominant, relying on time series
and regression analyses. However, not all of these approaches are sufficiently effective for
assessing financial indicators and investment potential [8]. All of this makes the research topic
relevant within the scope of this work.

        1.2.   Aspects of financial data processing issues
It is necessary to note that the specificity of processing and accounting financial data for their
preliminary preparation to address forecasting tasks involves several substantive factors.
   Specifically, forecasting in the financial domain is a non-trivial task as it entails analyzing and
predicting changes in various financial flows, such as income, expenses, savings, and
investments. Particularly in the risk analysis process, it is essential to consider the fact that
personal finances have distinctive characteristics that vary in each specific case for different
users. This is because they often rely on individual behavioral factors, which can be less
predictable than market indicators. This process is crucial for individual financial planning as it
helps individuals determine how best to allocate their resources to achieve financial goals [5].
   In general, the process of accounting and forecasting finances is accompanied by a series of
challenges, among which the following can be highlighted [9]:
   •    High unpredictability of the external environment. Chosen strategies of investment
   behavior and risk assessment can be extremely unpredictable due to various unforeseen
   events, such as sudden fluctuations in exchange rates, seasonal declines or surges in demand
   and supply, accidents and technological disasters, changes in population employment,
   political decisions and sanctions, or other types of crises. These events can significantly
   impact an individual’s risk profile and complicate the creation of accurate long-term financial
   forecasts.
   •    Income variability. For many investors, income is not constant and may vary from
   month to month, especially for professionals or experts whose activities are largely
   commission-based. This variability complicates forecasting and requires a more flexible
   approach to budgeting and financial planning.
   •    Behavioral factors. Financial decisions often depend on behavioral factors such as
   personal beliefs, habits, emotions, and cognitive biases. For example, psychological barriers
   may hinder effective saving or investing, while impulsive purchases may lead to unplanned
   expenses.
   •    Economic and market conditions. External economic factors, such as inflation, interest
   rates, market fluctuations, and economic crises, can have a significant impact on personal
   finances. Predicting these conditions is challenging, but they must be taken into account
   when planning finances [10].
Thus, the specificity of forming and managing data structures regarding financial flows should
be considered from the perspective of systematic intelligent analysis. Inaccuracies, data errors,
outliers, and data incompleteness can significantly impact the quality of forecasts. In financial
time series, there is a considerable amount of "noise" — random fluctuations that do not have
an obvious explanation. Distinguishing the "signal" from the "noise" is one of the main
challenges in financial modeling and forecasting, leading to various types of risks that are not
always easily predictable and quantifiable with a high level of certainty. All of this underscores
the relevance of this research and the fulfillment of the stated objective, which involves
developing software for the accounting and forecasting of financial assets and risks based on the
application of modern ML algorithms to automate the data analysis process.

        1.3.    Discussion with methods

Currently, there are various approaches to the analysis of heterogeneous data of time financial
series of different orders and states (both stationary and non-stationary) in the context of
integration with ML models. According to solving problem we should characterize: vector
autoregression (VAR), generalized autoregressive conditional heteroskedasticity (GARCH) and
automatic correction of trend and seasonality (ARIMA) [8,10].
   VAR is a statistical method that allows us to model relationships between several time series
simultaneously. A feature of the method is the expression of input variables in the form of linear
combinations of past values that they took in the system. Var advantages are:
   •    taking into account the relationship between several time series, which can be used
   when analyzing heterogeneous financial data, for example, in the case when the prices of
   various assets or market indicators depend and influence each other stochastically;
   •    allows us to formalize the modeling of various emerging effects and delays in
   relationships in the data, which helps to identify key criteria for the financial series over
   time;
   •    there is potential application for assessing the impact of various non-obvious factors on
   financial risks.
   Disadvantages:
   •    choosing the optimal number of lags and parameters is a labor-intensive and non-trivial
   process, requiring a large number of experiments on heterogeneous data;
   •    difficulty in interpreting results in the case of non-stationary processes;
   •    difficulties in software implementation for risk assessments in the context of their
   ranking into groups and formalization of the probabilistic factor.
   GARCH is used to model and forecast the volatility of financial time series. GARCH allows us
to take into account the nature of the occurrence of variability in data, which helps to predict
future levels flows volatility, which is used in the securities valuation [9]. The volatility of a time
series can change over time and depend on past values of this series; modeling is implemented
through conditional heteroscedasticity based on previously obtained values (conditional
variance depending on past squared errors). Advantages:
   •    taking into account variability in data to formalize the data structure, which can increase
   the accuracy of the forecast;
   •    adaptability for stationary time series by supporting modeling of dynamic volatility
   effects, including by clustering data according to internal financial market conditions.
   Weakness:
   •    the limitation of the method in the context of use for big data due to the need to conduct
   a large number of heterogeneous experiments to obtain reliable results;
   •    high sensitivity to initial conditions, which can lead to unstable results if the parameters
   or model specification are incorrectly selected;
   •    are not effective when the nature of the dependencies of financial time series is initially
   uncertain (probabilistic).
   ARIMA is a statistical method for analyzing and forecasting time series that allows us to
model a financial time series as a combination of autoregressive, integrated and moving average
components. The process of building an ARIMA model includes the following steps: identifying
the necessary parameters based on the analysis of autocorrelations and partial autocorrelations
of the time series, estimating the model parameters using the least squares or maximum
likelihood method, checking the adequacy of the model, and analyzing the residuals [11]. ARIMA
advantages:
   •    adaptability by supporting automatic correction of trend and seasonality, as well as
   searching for anomalies and patterns in the data.
   •    flexibility in accounting for integrated and moving averages through the use of complex
   components;
   •    support for forecasting horizon management capabilities, including short-term and
   long-term time periods, which allows expanding models in composite forecasting scenarios;
   •    automatic identification of model parameters based on data;
   •    extensibility of the model through the possibility of aggregated accounting of external
   factors, which is useful to increase forecasts accuracy in risks assessing.
   Disadvantages:
   •    difficulty of application for short and noisy financial time series with a decrease in
   forecasting accuracy;
   •    reduced efficiency when used for high-frequency data (intra-day or online fed in multi-
   stream collection systems) due to the need for a large number of observations and a high
   degree of autocorrelation.
   As a result of the comparative analysis of the considered methods within the framework of
our task, the most appropriate is the use of ARIMA due to the greater degree of adaptability and
software implementation possibility in the separate module form.

2. Project concept proposal
Within this research, in addition to developing the software logic for a web application for
financial data accounting, the creation of hybrid polynomial regression ML models for data
forecasting will be conducted, particularly based on ARIMA. ARIMA represents a unique
combination of autoregressive models and moving average models. The advantage of this
approach is the ability to use such models for automating the forecasting of time series values
(in our case, the values of financial assets) based on their past values, taking into account the
obtained error estimates from previous forecasts, providing the models with a higher degree of
adaptability [11].
    During the data analysis process, it is proposed to perform several preliminary stages. This
includes data import and preliminary processing, their structuring, determination of statistical
indicators, and assessment of data normality. Thus, it is possible to better understand the
nature of the considered data and identify the characteristics of financial risks [12].
    In this context, an important factor is the identification of the characteristic trend of
seasonality in transactions and investments because it can negatively impact the accuracy of
forecasting within a defined horizon. Therefore, it is necessary to exclude this aspect before the
forecasting process begins. It is also essential to analyze the variance to determine whether the
data changes over time or if the variance is constant, which serves as a criterion for the balance
of the data sample.
    As it is impossible to empirically check the normality of a large volume of data visually, it is
advisable to implement in the created web application a check for data normality using the
Jarque-Bera test with the following expression
                                              s 2 (k − 3) 2
                                     JB = n      +          ,                                   (1)
                                               6     24
where 𝑛 is the sample size, 𝑠 is the skewness coefficient, and 𝑘 is the kurtosis coefficient. For
normally distributed data, 𝑠 = 0 and 𝑘 = 3. In this case, the value of 𝐽𝐵 is equal to 0.
    According to the null hypothesis that the data is normally distributed, the Jarque-Bera test
has shown that the 𝐽𝐵 test asymptotically follows the Chi-square distribution in this equation
[13].
    The definition of the autocorrelation function (ACF) and the partial autocorrelation function
(PACF) is also an important concept for determining the models we create. In time series
analysis, the ACF shows the degree of linear statistical relationship between the values of a time
series. Numerically, the ACF is a sequence of correlation coefficients between the original series
and its copy, shifted by a given number of series intervals (this number is called the lag ACF). In
general view proposed ACF is defined as follows
                                              m
                                   f ( L) =  rt ,t − L ,                                       (2)
                                           L =0
where m - time series members number, r - correlation coefficient.
    The autocorrelation function (ACF) measures the linear predictability of time series at time t,
as it only uses lagged values. The partial autocorrelation function allows a stationary financial
time series to have partial correlation with its own values. This is convenient in our case due to
the support for regressing data values at short lags.
    Since most our financial data is stationary, applying ARIMA models requires transforming
variables. Therefore, ACF and PACF must be analyzed in combination to determine the order of
the model, specifically to minimize forecast errors in model creation.
    The complete definition of the ARIMA model involves choosing values for the parameters 𝑝,
𝑑, and 𝑞, which is a complex task. Parameter 𝑝 represents the order of the autoregressive part, 𝑑
is the degree of differencing the series, and 𝑞 is the order of the moving average part. It’s worth
noting that within the proposed concept of financial data forecasting, a necessary condition is 𝑝
> 1 [14].
    The key idea within this concept, reflecting its scientific novelty, is the development and
application of the designated ARIMA approach using ML within the framework of the analysis of
stationary data on financial transactions based on the probabilistic approach (probability is
ranked according to the Harrington desirability function). To reduce the degree of uncertainty
when analyzing financial risks, a fuzzy logic model based on the Sugeno algorithm is used. All
this forms a hybrid model with significant generalizing capabilities for predicting financial risks
over various planning horizons, reducing (formalizing) the overall level of uncertainty in the
impact of external factors, which is implemented in software system.
3. Intelligent system implementation
Let’s describe the process of implementing the software project based on creating diagrams that
formalize its logic and structure. When working with the web application, the user has the
option to choose from several functions, some of which are related and have a certain
dependency on each other:
   •    User transactions import. This functionality allows the user to import bank statements
   (currently supporting statements from monobank in CSV format). The system transforms
   these statements into internal entities called Transactions. These data are directly used for
   forecasting the user’s income and expenses.
   •    Verification of imported transactions in a list with the ability to filter and sort.
   •    Forecasting. The system provides access to the forecasting module, where we can
   configure certain parameters such as: the type of forecasting (expenses or profits), the
   forecasting period, or the forecasting horizon, the start date of forecasting, i.e., the date of the
   last transaction on which the model will be trained and from which the forecast will be
   generated for the specified number of months.
   •    Visualization. The system provides the ability to view forecasting results for each type of
   forecast. The display is presented in the form of cumulative forecast sums and transactions
   for the forecast period. This will allow for the observation of trends in data changes and
   correlation between the forecast and actual data.
   •    Financial risk assessment. Implements functionality to obtain optimistic and pessimistic
   risk values from increasing expense levels and decreasing profit levels within the forecasting
   horizon based on the scenario method.
   The developed scheme for detailing the stages of operation of the web application for
accounting, processing, and forecasting user financial data is shown in Fig. 1. An important
detail of the architecture of the created web application is the support for the re-creation of the
model on different data fragments, its subsequent training, and use for generating user-created
forecasts.


Figure 1: The detailed scheme of the web application’s operational stages

   The models used in the web application optionally have the ability for dynamic learning,
which is necessitated by the need to use new portions of financial data for more detailed
analysis and the formation of a more generalized forecast each time a model is generated. To
overcome this problem, the "Forecast" entity has been created, which contains the forecasting
results and the parameters used in forecasting.
   From the perspective of object-oriented design efficiency, this allows for a reduction in
computational resource costs. If the dataset and configurations remain unchanged, the forecast
result will remain the same.
   Therefore, the module will retrieve results from the database without performing additional
operations.
   To describe the usage process of all modules in the web application during its design, we will
use the UML (Unified Modeling Language). The developed general sequence diagram of the
application’s operation is presented in Fig, 2.


Figure 2: The sequence diagram of user actions in the software

   After logging into the web application with their account, the user has the option to import
transactions into the system through the transaction import form. The user can add a file or files
in CSV format, generated by a bank or another financial institution with which they have
relationships.
   After migrating transactions into the system, the user can download a forecasting model to
assess potential profits, expenses, and aggregated financial risks using the forecasting
functionality. This module is programmatically linked to a form where the user can configure
parameters for forecasting financial risks.
   After confirming the parameters, they are passed to the ML control block, which deserializes
the models, uses them, compares the parameters with the generated forecasts, and, if the data is
correctly specified, forms a set of forecast values for each model.
   The result is then sent to the user interface on the web page in the form of a report and
graphical dependencies. In case the forecast is not reliable (error exceeds the set value) or the
data is invalid, an informational message is displayed to the user in the form of a pop-up
window.
   From a software perspective, the application is a distributed microservices SPA based on
JavaScript programming language, Nodejs platform, and React frontend framework, providing
interactive user interaction capabilities. Due to the client-server architecture, the server-side
backend and DB are separately launched and deployed.
   The backend performs all functional tasks related to data accounting and processing,
interacts with the DB for searching, filtering, creating, editing, reading, or deleting performed
records.
   Additionally, the Python programming language and a set of functional libraries for creating
structures to manipulate data, including pandas and numpy, are employed.
   Interaction between the backend and frontend parts of the web application is achieved
through REST API. The backend interacts with MongoDB using mongoose. The library provides
built-in validation for schemas to automate checking data compliance with specified rules and
formats before they are saved or updated. The overall architecture of the software is shown in
Fig. 3.
Figure 3: Architectural diagram of the components of the system in a general view

   To perform a forecast for the selected model, data is required. In our case, this involves a test
dataset of transactions over 3 years for the customer of monobank. Each transaction is
described by the Transaction schema, and all transactions are stored in the system for further
use in other modules. The Transaction entity retains information about each transaction in a
structured format. The main modules that utilize the Transaction entity are the statistical
processing and forecasting modules. Overall, the forecasting module performs only a few roles:
   •    formation of forecasted values for profits, expenses, and financial risks;
   •    a set of tools for interaction between the web application and a Python script, which is
   used for deserialization, description, training, and utilization of ML models;
   •    connection to the statistical module for aggregating transactions used to build graphical
   visualizations of the reporting results. With this data, the user can correlate real data with
   forecasted data, evaluate results with specified parameters, conduct additional
   computational experiments, and perform a consolidated analysis of financial risks, with the
   further use of data for developing their investment strategy.
   The forecasting module uses the Forecast schema to record data about generated forecasts
and the parameters used in the process. ForecastResult is a distinct software type, essentially a
collection of data that describes a set of financial time series resulting from the model’s forecast
and has the following structure:
   •    dateTime [string], the date of one forecasted value of a financial asset, described in the
   form of an ISO string;
   •    amount [number], the amount of one forecasted value of a financial asset;
   •    ForecastOptions, a specialized object data type that describes the parameters used for
   generating a forecast or calculated for assessing the relevance of the forecast;
   •    startTime [string], the date from which the forecast creation will commence. It is also
   used for aggregating transactions; the model should not consider data beyond this date, as
   otherwise, the results would be inaccurate;
   •    period [1M, 2M, 3M], the period or forecast horizon. Currently, the software supports
   forecasting values for 1, 2, or 3 months, due to increasing errors and decreasing forecast
   reliability with longer periods;
   •    nTransactions [number], the number of transactions in the system, a parameter used to
   assess the relevance of the forecast;
   •    modelVersion [string], the version of the model, an internal parameter that essentially
   serves as metadata;
   •    forecastType [income, expenses], the type of financial asset for forecasting.
   The user is provided with an authentication and registration system in the web application,
so each user has their own personal record, and the User schema is used for this purpose.
   The module responsible for importing transactions from the bank statement is the Mono
module, which depends on the Transaction schema. Its main functionality involves parsing the
bank statement structure in CSV format and migrating the data to the system. The
MonoController, which is an element responsible for managing endpoints for this module in the
web application architecture, has only one API endpoint, /api/v1/mono/import. It is designed
to receive the input configuration set for importing all transactions, including accountID, date
(import start date), and the CSV file itself.
   The imported transactions are utilized in the Analytics module, which is responsible for
executing forecasts and aggregating transactions for correlation with the forecast. The module
operates with the Forecast schema, used to store the forecast results in the DB. The module
consists of AnalyticsController and AnalyticsService. AnalyticsController is responsible for the
API and interaction with the main functionality in the service.
   Below is a description of the code execution sequence for creating the ML model for
forecasting:
   1. Reading parameters from sys.argv, which are passed as requests when launching the
   Python process, namely pediod (forecast period) and forecast_type (forecast type). These
   two parameters are necessary for using factors in constructing the dataframe and
   determining parameters for the model.
   2. Parsing transaction data from data.json using pandas.
   3. Sorting and preparing data.
   4. Determining model parameters based on the type of forecasting.
   5. Normalizing data and creating additional factors.
   6. Removing outliers for expense forecasting by introducing errors through increasing
   model indicators. This is part of the normalization process, where records in the dataframe
   with abnormally high or low values are removed.
   7. Splitting data into test and training sets.
   8. Creating an ML model.
   9. Generating predictions for test data and determining model metrics.
   10. Creating a forecast based on the period parameter.
   11. Preparing output data and saving them to results.json.
   The text indicates that illustrates the diagram of the software user interface (Fig. 4).


Figure 4: The pages of system’s interface

   For interactions with the transaction import and forecasting modules, a custom interface has
been developed. The web application interface consists of several endpoints:
   •     Authorization, a page that greets the user and has a form for creating an account or
   logging in with pre-existing data, namely email and password.
   •     Transactions, a page consisting of a table that displays transactions created by the user
   or imported through the import module for accounting. The table provides the ability to sort
   and filter data based on certain available parameters. The user menu includes a button that
   opens a form for creating a transaction. The Import button, located in the upper right corner
   of the page, opens a form where the user can select specific parameters for import and a CSV
   file with a bank statement.
   •     Analytics, the page contains a button in the upper right corner, Forecast, which opens a
   form with settings for generating a forecast. After generating the forecast, graphs appear on
   the page displaying the forecasted results in the form of a line chart, as well as transactions
   for this period for a more detailed analysis and correlation between real and forecasted data.
   After considering the main aspects of the system implementation, it is necessary to explore
the possibilities of its application for financial data forecasting tasks.

4. ML models implementation and research
To conduct research using the created ML models, it is necessary to form training and testing
datasets, which are formed based on a sample of 50,000 client transactions of Monobank over a
3-year period. The scikit-learn library’s train_test_split method was used to create the datasets,
allowing the dataset to be split using the test_size parameter.
    The training dataset is provided to the hybrid model during its training along with
corresponding evaluation results, enabling the model to establish a connection between input
characteristics and accurate responses.
    The testing dataset is used to assess the model’s effectiveness. In this case, the model is not
given the target characteristic; instead, it must predict it based on other characteristics.
According to the obtained results, predictive values are compared with actual results.
    The ACF and PACF plots based on (1) and (2) are shown in Fig. 5 depict the three parameters
of the created hybrid ARIMA model: 𝑝 = 1, 𝑑 = 1, 𝑞 = 0.


Figure 5: The ACF and PACF plots
   Before starting the model training, it is important to establish criteria for evaluating its
effectiveness to ensure the chosen algorithm’s feasibility. One way to evaluate could be
comparing the model’s performance with results obtained through random generation of the
target characteristic, i.e., a baseline model.
   If the chosen algorithm shows worse results than simple random guessing, it may signal the
need to reconsider the approach, possibly using alternative methods. In our case, for solving the
financial risk regression task, it is appropriate to compare with the mean value of the training
dataset. To compare the performance of the created hybrid model based on ARIMA, models
such as Facebook Prophet and XGBoost were also chosen.
   The results of measuring MAE for the constructed models are presented in Fig.6.


Figure 6: The results of measuring MAE for the constructed models

   It should be noted that the overall nature of the distribution of the obtained evaluation
results for the ACF and PACF models is quite balanced, with a spread not exceeding values of
0.25 on the positive side and 0.075 on the negative side.
   The data sample is uniform, resulting in financial risk values ranging from 0.05 to 0.2. This
indicates a fairly high accuracy in forecasting financial asset values and a moderate growth in
calculation error.
    As can be seen, the lowest MAE value, around 445, is achieved by XGBoost. However, the
computational time of this model is nearly three times greater than the hybrid ARIMA-based
model, which has an MAE of over 610. The Prophet model is the least accurate and fastest.

5. Conclusions
As a result of this study, software capable of automated processing, tracking, and forecasting
financial assets and risks was developed, utilizing hybrid ML models with ARIMA.
   The research involved exploring models for time series analysis, specifically the proposed
hybrid ARIMA, Prophet, and XGBoost regression models within the context of predicting
expenses and profits based on a prepared dataset of user transactions over a 3-year period.
MAE metrics for each model are presented to aid in selecting the best-performing model for
further performance enhancement.
   Considering the chosen algorithm, it is advisable to further calculate the weight values of
factors and dynamically adjust the hyperparameters of the created hybrid ARIMA model to
increase its productivity and accuracy. An optimized model can be used for more precise
forecasting of user spending and profit risks.
   It’s worth noting that obtaining predictive values of financial risks, taking into account user
behavioral factors within their investment strategy, enables a clearer and more predictable
assessment of profits. Predicting expenses for stable profits is less relevant.
   A subsequent step in algorithm research involves testing the model on different datasets,
such as data from different users. Adding new features to the dataset, such as holidays or
weather data, may improve the model’s effectiveness and prediction results, enhancing its
generalization capability.

References
[1] C. Peng, Digital Inclusive Finance Data Mining and Model-Driven Analysis of the Impact of
     Urban-Rural Income Gap, Wireless Communications and Mobile Computing 7 (2022) 1-8.
     DOI:10.1155/2022/5820145.
[2] N. Rudnichenko, V. Vychuzhanin, I. Petrov, D. Shibaev, Decision Support System for the
     Machine Learning Methods Selection in Big Data Mining, in: Proceedings of The Third
     International Workshop on Computer Modeling and Intelligent Systems (CMIS-2020),
     CEUR-WS, 2608, 2020, pp. 872-885.
[3] V. Vychuzhanin, N. Rudnichenko, Z. Sagova, M. Smieszek, V. Cherniavskyi, A. Golovan,
     Analysis and structuring diagnostic large volume data of technical condition of complex
     equipment in transport, in: 24th Slovak-Polish International Scientific Conference on
     Machine Modelling and Simulations - MMS 2019, Liptovský Ján, Slovakia, 2019.
     DOI:10.1088/1757-899X/776/1/012049
[4] J. Bertomeu, Machine learning improves accounting: discussion, implementation and
     research opportunities. Review of Accounting Studies 25 3 (2020) 1135-1155.
     DOI:10.1007/s11142-020-09554-9
[5] Y. Baştanlar, M. Ozuysal, Introduction to machine learning. Methods in Molecular Biology
     (Clifton, N.J.) 1107 (2014) 105-128. DOI: 10.1007/978-1-62703-748-8_7
[6] J. Bertomeu, E. Cheynel, E. Floyd, W. Pan, Using machine learning to detect misstatements.
     Review of Accounting Studies 26 2 (2019) 468-519. DOI: 10.1007/s11142-020-09563-8
[7] M. Alloghani, D. Al-Jumeily, J. Mustafina, A. Hussain, A.J. Aljaaf, A systematic review on
     supervised and unsupervised machine learning algorithms for data. Supervised and
     Unsupervised Learning for Data Science, Unsupervised and Semi-supervised Learning,
     Springer International Publishing, Cham (2014) 3-21. DOI: 0.1007/978-3-030-22475-2_1
[8] D. Alex, C. Timothy, A regression based approach to short term load forecasting. IEEE
     transaction on power systems 5 4 (1990) 1535-1550. DOI: 10.1109/59.99410
[9] I. Moghram, S. Rahman, Analysis evaluation of five short term load forecasting techniques.
     IEEE transaction on power systems 4 4 (1989) 1484-1491. DOI: 10.1109/59.41700
[10] X.L. Dong, T. Rekatsinas, Data integration and machine learning: a natural synergy, in ACM
     SIGMOD International Conference on Management of Data, Houston, 2018, pp. 1645-1650.
     DOI: 10.1145/3183713.3197387
[11] I. Ali, N. Bilal, F. Marwa, Forecasting Stock Prices with an Integrated Approach Combining
     ARIMA and Machine Learning Techniques ARIMAML. Journal of Computer and
     Communications. 11 (2023) 58-70. DOI:10.4236/jcc.2023.118005.
[12] M. Alhomsi, H. Ahmed, Forecasting of ExchangeRate: Autoregressive modelsvs. XGBoost :
     thesis, 2020.
[13] V. Arumugam, V. Natarajan, Time Series Modeling and Forecasting Using Autoregressive
     Integrated Moving Average and Seasonal Autoregressive Integrated Moving Average
     Models. Instrumentation Mesure Metrologie 4 (2023) 161-168.
[14] W. Charles, Jr. Chase, ARIMA Models. Demand-Driven Forecasting 13 (2013) 203-237.