=Paper= {{Paper |id=Vol-2713/paper47 |storemode=property |title=Machine learning approaches for financial time series forecasting |pdfUrl=https://ceur-ws.org/Vol-2713/paper47.pdf |volume=Vol-2713 |authors=Vasily Derbentsev,Andriy Matviychuk,Nataliia Datsenko,Vitalii Bezkorovainyi,Albert Azaryan |dblpUrl=https://dblp.org/rec/conf/m3e2/DerbentsevMDBA20 }} ==Machine learning approaches for financial time series forecasting== https://ceur-ws.org/Vol-2713/paper47.pdf
434


    Machine learning approaches for financial time series
                       forecasting

        Vasily Derbentsev1[0000-0002-8988-2526], Andriy Matviychuk1[0000-0002-8911-5677],
      Nataliia Datsenko1[0000-0002-8239-5303], Vitalii Bezkorovainyi1[0000-0002-4998-8385] and
                              Albert Azaryan2[0000-0003-0892-8332]
                 1 Kyiv National Economic University named after Vadym Hetman,

                          54/1 Peremohy Ave., Kyiv, 03057, Ukraine
       derbv@kneu.edu.ua, editor@nfmte.com, d_tashakneu@ukr.net,
                                   retal.vs@gmail.com
    2 Kryvyi Rih National University, 11 Vitalii Matusevych Str., Kryvyi Rih, 50027, Ukraine

                                 azaryan325@gmail.com



         Abstract. This paper is discusses the problems of the short-term forecasting of
         financial time series using supervised machine learning (ML) approach. For this
         goal, we applied several the most powerful methods including Support Vector
         Machine (SVM), Multilayer Perceptron (MLP), Random Forests (RF) and
         Stochastic Gradient Boosting Machine (SGBM). As dataset were selected the
         daily close prices of two stock index: SP 500 and NASDAQ, two the most
         capitalized cryptocurrencies: Bitcoin (BTC), Ethereum (ETH), and exchange rate
         of EUR-USD. As features we used only the past price information. To check the
         efficiency of these models we made out-of-sample forecast for selected time
         series by using one step ahead technique. The accuracy rates of the forecasted
         prices by using ML models were calculated. The results verify the applicability
         of the ML approach for the forecasting of financial time series. The best out of
         sample accuracy of short-term prediction daily close prices for selected time
         series obtained by SGBM and MLP in terms of Mean Absolute Percentage Error
         (MAPE) was within 0.46-3.71 %. Our results are comparable with accuracy
         obtained by Deep learning approaches.

         Keywords: financial time series, short-term forecasting, machine learning,
         support vector machine, random forest, gradient boosting, multilayer
         perceptron.


1        Introduction

Forecasting financial tine series have been in focus of researchers for a long time. This
topic continues to be relevant from both theoretical and applied points of view. Brokers,
financial analysts and traders make daily decisions about buying and selling various
financial assets, including currency, stocks, bonds and others. To reduce the risk of such
transactions and to obtain the expected return on their investments, each of them must

___________________
Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License
Attribution 4.0 International (CC BY 4.0).
                                                                                      435


analyze a number of factors that affect market conditions and generate upward or
downward trends.
   In this regard, the problem of developing adequate forecasting approaches is relevant
to the scientific community as well as to financial analysts, investors and traders.
   There are two main approaches to solving the problem of forecasting financial assets.
The first one is to construct a casual model that describes the relationship between the
asset’s value and other macroeconomic factors. This approach was implemented within
the framework of fundamental analysis and based on different mathematical tools, such
as econometric modeling and systems of differential equations [13; 17; 28].
   Another approach is based on the analysis of past observations selected asset and
used variety of technical indicators and oscillators that help predict market trends. This
approach has been realized in technical analysis which is actively used now in addition
to time series analyses [7; 13; 28]. Within time series framework has been developed
manifold class of linear and nonlinear approaches, such as ARIMA-GARCH models
[6; 27].
   Recent time the methods and algorithms of Machine Learning (ML) which have
developed within Data Science paradigm [14; 36] ML have been also applied to
forecasting financial and economic time series [2; 12], and various automated trading
systems (bots) built on these algorithms began to be used for trading. Results of
numerous empirical studies have shown that ML approaches outperform time series
models in forecasting different financial assets [10; 18; 22; 26; 31; 38].
   The main advantage of ML is that the algorithms themselves interpret the data, so
we don’t need to perform their initial decomposition. Depending on the purpose of the
analysis, these algorithms themselves build the logic of modeling on the basis of
available data.
   This avoids the complex and lengthy pre-model stage of statistical testing of various
hypotheses about studied process. The main hypothesis, in particular, in terms of the
purpose of our study, is only the thesis of the ability of ML methods to effectively
analyze the financial time series, to identify hidden patterns and time correlations,
which are the basis for making qualitative short-term forecasts.
   The main goal of our paper is to compare the predictive properties of the most
efficient ML algorithms: Artificial Neural Network (ANN), Support Vector Machine
(SVM), Random Forest (RF) and Gradient Boosting Machine (GBM) for short-term
forecast financial time series (stock indices, currencies and cryptocurrencies). At the
same time, as predictors (features) we used only the past values of the studied time
series. Our main assumption is that ML methods be able to extract latent patterns from
the data, which allows us to make more efficient predictions.
   This paper is organized as follows: in Section 2 is represented brief literature review
devoted using ML approaches in the field of financial time series forecasting. Section
3 is described the main concept of applied methods. Data description and empirical
results are given in Section 4. Concluding remarks and future perspectives are given in
Section 5.
436


2      Brief review recent studies

It should be noted that financial time series forecasting have been studied for a long
time. Since, ML approaches proved their efficiency in many areas and became popular;
they have been widely used for research financial time series. Numerous articles in
scientific journals, reviews, conferences and internet resources are devoted to this topic.
    Last five years researchers basically have been focused attention on novel network
ML approaches Deep Learning (DL), which includes asset of powerful methods, such
as Recurrent Neural Network (RNN), Long-Short Time Memory (LSTM), and
Convolutional Neural Network (CNN) and so on [22; 24; 32; 34; 38]. Recently was
published detailed overview devoted to using DL approaches in the field of financial
forecasting [24]. The main finding this survey is that generally DL framework
outperforms time series models and often shows higher accuracy than traditional ML
algorithms.
    The key advantage of DL models is very powerful in feature learning and selection
of input data using a general-purpose learning procedure. But DL models have such
disadvantage that it takes much more time to train them, besides, it is a nontrivial
problem is tuning hyperparameters. At the same time, traditional ML models often
show comparable accuracy in time series forecasting.
    As for using ML algorithms in financial forecasting, the most common are Neural
Networks (ANNs) of various architecture [1; 8; 10; 20; 21; 33; 38], Support Vector
Machines (SVM) [23; 25; 29; 30; 35], and Fuzzy Logic (FL) [25; 39].
    The application of these approaches for forecasting task has shown their efficiency
for both traditional financial assets [1; 8; 18; 20; 21; 23; 24; 29; 30; 35] and
cryptocurrencies [8; 26; 31; 38].
    Several studies [1; 8; 20; 21] presented the results that ANNs have better predictive
properties then other ML approaches for forecasting financial time series. At the same
time, there are a number of research papers (see, for example, Okasha, [29];
Sapankevych and Sankar, [30]; Hitam and Ismail, [19]), which presented results that
SVMs have also been proven to outperform other non-linear techniques including
neural-network based non-linear prediction techniques such as multi-layer perceptron
(MLP).
    It should be noted, that much less attention has been paid to another powerful class
of ML approaches of designing ensembles Classification and Regression Trees
(C&RT): Random Forest (RF) [4; 5] and Gradient Boosting Machine (GBM) [15; 16],
which used bagging (RF) and boosting (GBM) technique. Both RF and GBM are
powerful methods that can efficient capture complex nonlinear patterns in data.
    Thus, Varghade and Patel [35] tested RF and SVM to forecasting stock market index
S&P CNX NIFTY. They noted that the Decision Trees model outperforms the SVR,
although RF at times is found to overfit the data.
    Kumar and Thenmozh [23] explored set of classification models for predicting
direction of index S&P CNX NIFTY. Their empirical results suggest that both the SVM
and RF outperforms the other classification methods (NN, Linear Discriminant
Analysis, Logit), in terms of predicting the direction of the stock market movement, but
at the same time SVM it turned out to be more accurate.
                                                                                    437


   Recently there have been appeared several papers devoted to applying ensembles
approaches for forecasting cryptocurrency prices [3; 9; 11]. Borges and Neves [3] tested
four ML algorithms for prediction price trend: LR, RF, SVM and GBM. All learning
algorithms outperform the Buy and Hold investment strategy in cryptomarket. The best
result was obtained by ensembles voting (accuracy 59.3%).
   Chen et al. [9] applied a set of learning models including RF, XGBoost, Quadratic
Discriminant Analysis, SVM and LSTM for Bitcoin 5-minute interval and daily prices.
Authors used wide dataset including as features technological, market and trading,
socio-media and fundamental factors. Somewhat unexpected was that for daily prices
better results were obtained by using statistical methods (average accuracy 65%) unlike
ML methods (average accuracy 55.3%). Among the best ML the SVM was the best,
with an accuracy of 65.3%.


3      Methodology

In this paper we have been applied supervised ML technique for forecasting financial
time series. Consider a sample of pairs of features =       , ,..., ,...,       and the
labels y: ( , ) , ,..., length n. In our case labels (or target) are values of selected
financial assets, and features are only lagged daily values these assets
     ,    ,...,    , > .
   Our main goal is to predict future value of target variable on the next time period
(next day since we used daily quotes) by using several ML approaches (SVM, ANN,
RF and GBM) and compare their forecasting performance.
   Thus, our task is to construct some functional (regression) or rule-based (decision
tree based) dependence of the form

                                      =      ,    ,                                 (1)

where = ( ) , = 1,2, . . . , are vectors of features;        – weights of the features,
n – total number of samples in dataset; k – number of features.

3.1    Support Vector Machine (SVM)

The support vector machine (SVM) is an extension of the support vector classifier that
results from enlarging the feature space in a specific way by using kernels function.
The main idea of the SVM method is to map the original vectors into a space of a higher
dimension and search for a separating hyperplane with a maximum margin in this space.
Two parallel hyperplanes are constructed on both sides of the hyperplane separating the
classes. The separating hyperplane will be the hyperplane that maximizes the distance
to two parallel hyperplanes. The algorithm works under the assumption that the lager
difference or distance between these parallel hyperplanes (margine) provides the
smaller average error of the classifier.
   Support Vector Regression (SVR) is the regression process performed by SVM
which tries to identify the hyperplane that maximizes the margin between two classes
438


and minimize the total error. In order for an efficient SVM to be constructed, a penalty
of complexity is also introduced, balancing forecasting accuracy and computational
performance.
   Unlike classic regression problem SVR seeks coefficients that minimize a different
type of loss, where only residuals larger in absolute value than some positive constant
contribute to the loss function. This is an extension of the margin used in support vector
classifiers to the regression setting.
   The mathematical formalization of SVR is reduced to the following. Let’s regression
equation is written in the form
                                 ( )=⟨ , ⟩−                      ,                                         (2)
where ⟨⋅,⋅⟩ – is operator of inner product; is a constant.
  Then the problem is reduced to minimizing functional:

        ⟨   , ⟩+      ∑    (|⟨ , ⟩ −            −       |− )→                     , = 1,2, . . . , ,       (3)
                                                                              ,

where C is the regularization parameter or penalty coefficient for incorrectly estimating
the output associated with input vectors, which also controls the relationship between
a smooth boundary; l is the number of samples in training set (l < n, as a rule
  ≈ 0.7 ÷ 0.8 ); is the margin value.
   After changing variables and some algebraic transformations loss function for SVR
can be presented in such form:

              ⟨   , ⟩+     ∑    (       +       )→                        , = 1,2, . . . , ,               (4)
                                                        ,   ,,       ,,

where      = (− ( ) + − ),           = ( ( ) − − ) are slack variables, that allow
individual observations to be on the wrong side of the margin or the hyperplane;
     ,    is the kernel function. The most commonly used kernel functions are Linear,
Polynomial, Gaussian, Radial Based Function (RBF) and so on.
   Loss function (4) is minimized under condition

                    − −    ≤             ,              −        ≤         + +
                                                                                         .                 (5)
                    , ≥ 0, = 1,2, . . . ,
The Lagrangian of this problem can be expressed in terms of the dual variables                         ,     ,
thus the regression equation on support vectors can be written in the such form:

                       ( )=∑        (       −       )        ,            −       .                        (6)


3.2    Artificial Neural Network (ANN)
ANNs are the most popular methods of ML. Numerous empirical studies show the
efficiency of ANNs in the different fields both for classification and regression
problem: pattern recognition, image and voice analysis, machine translation and so on.
Several last decides they are widely used for analysis and forecasting financial time
                                                                                    439


series. In [12; 22; 31] it was shown that ANNs have better predictive properties than
time series models and other ML algorithms for financial time series forecasting
problem.
   In this paper we have been used network model of the most common architecture:
Multi-Layer Perceptron (MLP) with three layers: input layer, one hidden layer and
output layer with one neuron that represent target variable (predicted value). It should
be noted that despite simple structure MLP be able to take into account complex
patterns in data due using different nonlinear activation functions.
   The network output depends on its configuration, weights and activation functions
of neurons on the hidden and output layers:

                      =    ∑          ∑               +     +     ,                 (7)

where (⋅), (⋅) – activation functions of neurons of the hidden and input layer,
respectively;    – the weight of the connections between the i-th neuron of the hidden
layer and the output of the network;     – the weight of the connections between the j-
th neuron of the input and the i-th neuron of the hidden layers; , – bias neurons of
the output and hidden layers.
   Network learning consists in finding and setting the neurons weights (synaptic
weights) which minimized difference between the target variable and the network
output. The search of minimum of the loss function was performed by the gradient
descent method, embodied in the back-propagation algorithm.

3.3    Gradient Boosting Machine (GBM)
Boosting is a procedure for sequentially building a composition of machine learning
algorithms, when each of them seeks to compensate for the shortcomings of the
composition of all previous algorithms. In contrast to bagging, boosting does not use
simple voting but a weighted one. The major attractions of boosting are that it is easy
to design computationally efficient weak classifiers (as a rule used shallow decision
trees). Boosting over decision trees is considered one of the most efficient methods in
terms of classification quality.
   Gradient Boosting Machine method (GBM) was proposed Friedman [15; 16].
Commonly the basic steps of GBM are the next.
   The final classifier ( ) is constructed as a weighted sum of N basic algorithms
ℎ ( , ) (Decision Trees):

                               ( )=       ∑    ℎ ( , ),                             (8)

where is vector of adjusted parameters,        is the weight coefficient.
   Let’s we choose the initial classifier ℎ ( , ), for example, it may be the median or
mean of the time series (target variable).
   If we on the N–1 step have already built new classifier        ( , ), then we select
the next basic algorithm ℎ ( ) that reduced the error given by previous classifier as
much, as possible:
440


                 ∑          ,     ( , )+        ℎ ( , )    → min ,                    (9)
                                                                ,

where (⋅) – is loss function.
   We can select ℎ ( . ) that minimized sum of squares deviations for all samples in
training set

                     ℎ ( , ) = argmin ∑         (ℎ( , ) − ) ,                        (10)
                                    ( , )

where is deviations that equal to the anti-gradient of the loss function (⋅).
  In this way we perform predictions for samples in the training set by using gradient
descent in the l-dimensional space.
  If a new basic algorithm has been found, it is possible to select its coefficient by
analogy with the gradient descent:

                  = argmin ∑             ,      ( , )+    ℎ ( , ) ,                  (11)

It should be note, that boosting usually does not result the overfitting problem because
shallow decision trees are used. These trees have a large bias, but are not inclined to
overfitting.
    The effective ways to solve this problem is to reduce the step: instead of moving to
the optimal direction of the anti-gradient, a shortened step can be taken by

                         ( , )=             ( , )+    ℎ ( ) ,                        (12)

where    ∈ [0,1] is the learning rate.


3.4     Random Forest (RF)
The main concept of the RF is that a composition of weak classifiers can give good
results for both classification and regression problems. Proposed by Breiman [4; 5] in
1996 the RF is based on bagging technique (bootstrap aggregation) over decision trees.
Bagging reduces the variance of the base algorithms if they are weakly correlated. In
RF the correlation between trees is reduced by randomization in two directions.
   Firstly, each tree is trained on a bootstrapped subset. Secondly, the feature by which
splitting is performed in each node is not selected from all possible features, but only
from their random subset of size m. The main distinction between bagging and RF is
the choice of these features subset. RF works well when all of the features are at least
marginally relevant, since the number of features selected for any given tree is small.
Using a small value of m will typically be helpful when we have a large number of
correlated predictors.
   The RF algorithm generates each of the N trees independently, which makes it very
easy to parallelize. For each tree, it constructs a full binary tree of maximum depth. The
main concept is that classifiers (trees) do not correct each other’s mistakes, but
compensate for them when voting. Basic classifiers should be independent and they can
be based on different groups of methods or trained on independent datasets. Bagging
                                                                                                                                                                                                         441


allows us to reduce prediction error in the case when the variance of the error base
method is high.
   Thereby efficiency of RF performance is achieved even though some trees will query
on useless features and make random predictions. But some of the trees will happen to
query on good features and will make good predictions (because the leaves are
estimated based on the training data).
   If we have enough trees, the random ones will wash out as noise, and only the “good”
trees will have an effect on the final result (classification or prediction).


4                   Empirical results

4.1                 Dataset
To reduce our analysis to the most popular financial assets, we were used daily close
prices two stock index: Nasdaq and SP&500, two most capitalized cryptocurrencies:
Bitcoin (BTC), Ethereum (ETH), and exchange rate EUR USD. Our initial dataset
covers the period from 01/01/2015 to 30/06/2020 for all series (for ETH from
06/08/2020) according to the Yahoo Finance [37].
   So, our dataset includes 1384 observations for Nasdaq, 1383 for SP&500, 1434 for
exchange rate EUR USD, 2008 for BTC and 1278 for ETH.
   It should be noted that selected time series during this period had different type of
dynamics due to we can better estimate forecasting performance for ML approaches
(see fig. 1).
                                            Plot of selected variables (series)                                                                       Plot of selected variables (series)
                                                 Include cases: 653:1383                                                   1600                                                                      14000
          11000                                                                                    3600
                                                                                                                           1400
          10500
                                                                                                   3400                                                                                              12000
          10000                                                                                                            1200
                                                                                                   3200
          9500
                                                                                                                           1000                                                                      10000
          9000                                                                                     3000
NASDAQ:




                                                                                                                           800
                                                                                                           SP500:




          8500
                                                                                                                    ETH:




                                                                                                                                                                                                             BTC:




                                                                                                   2800                                                                                              8000
          8000                                                                                                             600

          7500                                                                                     2600
                                                                                                                           400                                                                       6000
          7000
                                                                                                   2400
          6500                                                                                                             200
                                                                                                   2200                                                                                              4000
          6000                                                                                                                0

          5500                                                                                      2000
             600         700         800         900         1000    1100    1200    1300    1400                          -200                                                                       2000
                   650         750         850         950       1050    1150    1250    1350    1450                          -50   0   50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800
                                                  NASDAQ (L)          SP500 (R)                                                                              ETH (L)        BTC (R)


                                                                   (a)                                                                                                   (b)
              Fig. 1. Dynamics of traditional assets (a) and cryptocurrencies (b) from 30/06/2018 to
                                                    30/06/2020.

On purpose of training models, fitting and tuning their parameters dataset was divided
into the training and test subsets in the ratio of 80% and 20%. Moreover, the last 100
observations (from 22/03/2020 to 30/06/2020) were reserved for validation which was
performed by out-of-sample one-step ahead forecast.
442


    Since we focus on ML approach of forecasting financial time series data, the main
purpose of our paper is to get the most accurate one-step ahead forecast of daily prices,
based on only their past value.
   According to some empirical studies devoted forecasting financial time series, there
is a seasonal lag which is a multiple of 5 if we use daily observations and a multiple of
7 for cryptocurrencies because the fact that cryptocurrencies are traded 24/7.
   For stabilization variance all features were taken in to natural logarithm. This is
special case of Box-Cox transform.

4.2    Hyper-parameters tuning
It should be noted that hyper-parameters tuning is an important and sophisticated step
of the model design. First of all, it is necessary to choose the functional form of the loss
function. In the point of main purposes of our study the quadratic loss, which generally
used for solving the regression problem, was selected.
    According to our hypothesis regarding lag length as MLP models, we tested the
following architectures:
─ 7 inputs and from 5 to 14 hidden layer neurons for cryptocurrencies;
─ 5 inputs and from 5 to 10 hidden layer neurons.

The most common functions such as logistic, hyperbolic tan, exponential and ReLu
were tested as activation functions. Training MLP for each time series and different lag
values (number of input neurons) was conducted over 100 epochs, of which the best 5
architectures were selected for each case (in terms of minimum PE error on the test
sample and matching the model residuals to normal distribution).
    The final prediction for each asset was obtained as the prediction of the ensemble of
networks, that is, average of the best 5 corresponding MLP models.
    For SVM models we have chosen RBF as a kernel which is the best for regression
problem. Regularization parameter was estimated by the greed search in the range from
1 to 15 and it was selected C=10.
    Both of tree-based methods (RF, and GBM) based on partitions the data into training
and testing sets by randomly selecting cases. We applied in this study stochastic
modification of GBM (SGBM) which based on such partition. The training sample is
used to fitting models by adding simple trees to ensembles. Testing set is used to
validate their performance. For regression tasks validation is usually measured as the
average error. We select 30% of the dataset as test cases for both approaches.
    Since the RF is not inclined to overfitting, one can choose a large number of trees
for the ensemble. We designed RF model with 500 trees. At the same time, in order for
the model to be able to describe complex nonlinear patterns in data, it is necessary to
use complex trees. So, we have been chosen 15 the maximum number of levels.
    Other important parameter for RF is the number of features to consider at each split.
As noted in the Section 3.2 it is recommended to choose this value as ≈ (where M
is the total number of features) for regression task. We tested different RF models with
value m within 8 to 12.
                                                                                                                                                                                                                                                        443


   As stop condition for number of trees in SGBM (boosting steps) we took the number
of trees at which the error on the test stops decreasing. This is necessary in order to
avoid the overfitting. For boosting, unlike the RF, the simple trees are usually used.
That’s why we fitted maximum number of levels in trees and number of terminal nodes
by the criteria of lowest average squared error on both training and test samples.
   For GBM an important parameter is a learning rate (shrinkage). Regularization by
shrinkage consists in modifying the update rule (12) by tuning . We selected this value
on the grid search according to minimum prediction error on the test set. The final
values of hyper-parameters setting are reported in table 1.

                                                         Table 1. Final hyper-parameters setting for RF and SGBM.
                                      Parameters                                                                                                                              RF                                                   GBM
                                     Loss-function                                                                                                                          quadratic                                             quadratic
                       Training / test subsamples proportion, %                                                                                                              70/30                                                 70/30
                               Random subsample rate                                                                                                                           0.7                                                   0.7
                       Maximum number of trees in ensemble                                                                                                                    500                                                   400
                         Maximum number of levels in trees                                                                                                                     10                                                     5
                      Maximum number of features to consider at                                                                                                                12                                                     -
                                      each split
                      Maximum number of terminal nodes in trees                                                                                                                   150                                                    15
                          Minimum samples in child nodes                                                                                                                           5                                                      -
                              Learning rate (shrinkage)                                                                                                                                                                                  0.1


4.3                          Forecasting performance
The short-term forecasts for selected time series were made for absolute values of prices
(log prices). The target variable is the prediction the value of close prices for each series
in the next time period (day) although we used daily observation. All models were
trained with the same set of features.
   On figures 2-3 were shown quality models fitting for BTC, and NASDAQ obtained
by MLP and SVM. figures 4-5 presented results for RF, and SGBM respectively.

                                                                                                                                                                      NASDAQ (Observed) vs. NASDAQ (Predictions) (NASDAQ.sta)
                                                              Samples: Train                                                                                                          Include cases: 16:1284
                                                         Include cases: 16:1284                                                                         10500
                  10500
                                                                                                                                                        10000
                  10000
                                                                                                                                                         9500
                   9500
                                                                                                                                                         9000
                   9000
                                                                                                                                                         8500
                   8500
                                                                                                                                 NASDAQ (Predictions)




                                                                                                                                                         8000
NASDAQ (Output)




                   8000
                   7500                                                                                                                                  7500

                   7000                                                                                                                                  7000

                   6500                                                                                                                                  6500

                   6000                                                                                                                                  6000

                   5500                                                                                                                                  5500

                   5000                                                                                                                                  5000
                   4500                                                                                                                                  4500
                   4000                                                                                                                                  4000
                   3500                                                                                                                                  3500
                      3500          4500          5500          6500          7500          8500          9500           10500                              3500          4500          5500          6500          7500          8500          9500
                             4000          5000          6000          7000          8000          9000          10000                                             4000          5000          6000          7000          8000          9000          10000
                                                                NASDAQ (Target)                                                                                                                 NASDAQ (Observed)


                                                                        (a)                                                                                                                              (b)
                   Fig. 2. Fitting accuracy on the training and test subsets for NASDAQ: (a) MLP, (b) SVM.
444

                                                                                                                                                                                                                              BTC (Observed) vs. BTC (Predictions) (Spreadsheet41)
                                                                                               Samples: Train                                                                                                                               Include cases: 16:1908
                                                                                           Include cases: 1:1908                                                                                              19000
                                22000                                                                                                                                                                         18000
                                                                                                                                                                                                              17000
                                20000
                                                                                                                                                                                                              16000
                                18000                                                                                                                                                                         15000
                                                                                                                                                                                                              14000
                                16000                                                                                                                                                                         13000
                                14000                                                                                                                                                                         12000




                                                                                                                                                                                          BTC (Predictions)
                                                                                                                                                                                                              11000
BTC (Output)




                                12000                                                                                                                                                                         10000
                                                                                                                                                                                                               9000
                                10000
                                                                                                                                                                                                               8000
                                 8000                                                                                                                                                                          7000
                                                                                                                                                                                                               6000
                                 6000                                                                                                                                                                          5000
                                 4000                                                                                                                                                                          4000
                                                                                                                                                                                                               3000
                                 2000                                                                                                                                                                          2000
                                                                                                                                                                                                               1000
                                        0
                                                                                                                                                                                                                   0
                                -2000                                                                                                                                                                         -1000
                                    -2000                               2000              6000              10000           14000           18000           22000                                                 -2000       2000              6000           10000            14000           18000           22000
                                                               0                 4000                8000           12000           16000           20000                                                                 0             4000            8000            12000           16000           20000
                                                                                                       BTC (Target)                                                                                                                                       BTC (Observed)


                                                                                                              (a)                                                                                                                                                  (b)
                                                           Fig. 3. Fitting accuracy on the training and test subsets for BTC: (a) MLP, (b) SVM.

                                                                                                                                                                                                                                   Observed values vs. Predicted values
                                                                                      Summary of Random Forest                                                                                                                         Dependent variable: NASDAQ
                                                                                            Response: NASDAQ                                                                                                                     Training set sample; Number of trees: 200
                                                                               Number of trees: 200; Maximum tree size: 100                                                                                                               Include cases: 16:1383
                                                                                          Include cases: 16:1383                                                                         11000
                                                           26000

                                                           24000                                                                                                                         10000

                                                           22000
                                                                                                                                                                                          9000
                                   Average Squared Error




                                                           20000
                                                                                                                                                              Predicted value




                                                                                                                                                                                          8000
                                                           18000

                                                           16000                                                                                                                          7000

                                                           14000
                                                                                                                                                                                          6000
                                                           12000
                                                                                                                                                                                          5000
                                                           10000

                                                           8000                                                                                                                           4000
                                                                          20      40       60         80      100    120     140     160     180     200                                     3000                     4000       5000          6000       7000         8000        9000      10000         11000
                                                                                                      Number of Trees                                                                                                                                 Observed value


                                                                                                              (a)                                                                                                                                                  (b)
                                                                                  Summary of Random Forest
                                                                                          Response: BTC                                                                                                                            Observed values vs. Predicted values
                                                                        Number of trees: 200; Maximum tree size: 100                                                                                                                      Dependent variable: BTC
                                                                                    Include cases: 15:1908                                                                                                                       Training set sample; Number of trees: 200
                                 2,1E5                                                                                                                                                                                                     Include cases: 15:1908
                                                                                                                                                                                          18000
                                  2E5
                                                                                                                                                                                          16000
                                 1,9E5
                                                                                                                                                                                          14000
                                 1,8E5
      A verage S quared Error




                                                                                                                                                                                          12000
                                 1,7E5
                                                                                                                                                                       Predicted value




                                                                                                                                                                                          10000
                                 1,6E5
                                                                                                                                                                                             8000
                                 1,5E5
                                 1,4E5                                                                                                                                                       6000

                                 1,3E5                                                                                                                                                       4000

                                 1,2E5                                                                                                                                                       2000

                                 1,1E5                                                                                                                                                                        0

                                  1E5                                                                                                                                                     -2000
                                                                   20      40        60         80      100     120        140     160      180     200                                       -2000                       2000            6000             10000           14000           18000           22000
                                                                                                                                                                                                                      0           4000            8000             12000           16000           20000
                                                                                                 Number of Trees
                                                                                                                                                                                                                                                       Observed value


                                                                                                              (c)                                                                                                                                                  (d)
                      Fig. 4. Fitting accuracy on the training and test subsets for RF: (a, b) NASDAQ, (c, d) BTC.

These graphs characterize the dependence of the predicted values (vertical axis) on the
actual data (horizontal axis) on the test set and allow us to visually determine the quality
of the fitting.
                                                                                                                                                                                                                               445

                                                     Summary of Boosted Trees
                                                                                                                                                          Observed values vs. Predicted values
                                                       Response: NASDAQ                                                                                      Dependent variable: NASDAQ
                                         Optimal number of trees: 395; Maximum tree size: 5                                                               Analysis sample;Number of trees: 395
                                                     Include cases: 16:1283                                                                                      Include cases: 16:1283
                                   3E5                                                                                          11000


                                                                                                                                10000
                                 2,5E5
                                                                                                                                 9000
        A verage Squared Error




                                   2E5
                                                                                                                                 8000




                                                                                                              Predicted value
                                 1,5E5                                                                                           7000


                                   1E5                                                                                           6000


                                                                                                                                 5000
                                 50000
                                                                                                                                 4000
                                     0
                                            50      100      150     200      250     300     350   400                          3000
                                                                                                                                    3000        4000     5000     6000        7000        8000        9000      10000         11000
                                                               Number of Trees
                                                                                                                                                                         Observed value


                                                                           (a)                                                                                                   (b)
                                                   Summary of Boosted Trees                                                                               Observed values vs. Predicted values
                                                         Response: BTC                                                                                          Dependent variable: BTC
                                                                                                                                                          Analysis sample;Number of trees: 160
                                         Optimal number of trees: 385; Maximum tree size: 5
                                                                                                                                                                 Include cases: 15:1908
                                                     Include cases: 15:1908                                                     22000
                                 3,5E6
                                                                                                                                20000

                                  3E6                                                                                           18000
                                                                                                                                16000
                                 2,5E6
  Average Squared Error




                                                                                                                                14000
                                                                                                          Predicted value




                                                                                                                                12000
                                  2E6
                                                                                                                                10000
                                 1,5E6                                                                                           8000
                                                                                                                                 6000
                                  1E6
                                                                                                                                 4000
                                                                                                                                 2000
                                  5E5
                                                                                                                                    0
                                    0                                                                                           -2000
                                            50      100     150      200     250      300     350   400                             -2000         2000          6000          10000           14000           18000           22000
                                                                                                                                            0            4000          8000           12000           16000           20000
                                                              Number of Trees
                                                                                                                                                                         Observed value


                                                                           (c)                                                                                                   (d)
              Fig. 5. Fitting accuracy on the training and test subsets for SGBM: (a, b) NASDAQ, (c, d)
                                                          BTC.

Figures 4-5 shows both the dependence of the predicted values (vertical axis) on the
actual data (horizontal axis) (graphs (b, d)) and dependence of models fitting quality on
number of trees in ensemble for RF (fig. 4) and SGBM (fig. 5) on the training and test
subsets (graphs (a, c)) for selected assets (NASDAQ, BTC).
   Forecasting results for last 100 observations (hold-out dataset) for all models and
selected time series are shown on fig. 6-9. On fig. 6-10 (a) represented results obtained
by SVM and MLP, on fig. 6-10 (b) results for RF and SGBM.
    Analysis of the graphs allows us to conclude that SGBM and MLP well approximate
time series dynamics, but one can see a certain delay in the model graphs in comparison
to real data. RF and SVM showed good approximation not for all time series.
    Summary accuracy results in terms of MAPE and RMSE metrics are shown in table
2.
    Thus, we can conclude MLP, SVM and SGBM methods have the same order of
accuracy for the out-of-sample dataset prediction, although boosting also was
somewhat more accurate. The best prediction performance is produced by SGBM for
EUR-USD – 0.46 % (MAPE), and the best result for NASDAQ also provided SGBM
– 2.38%. For BTC better performance shown SVM but MLP outperformed other
models for SP&500.
446

1,15                                                                                                                                       1,15


1,14                                                                                                                                       1,14


1,13                                                                                                                                       1,13


                                                                                                                                           1,12
1,12

                                                                                                                                           1,11
1,11

                                                                                                                                           1,10
1,10
                                                                                                                                           1,09
1,09
                                                                                                                                           1,08
                                                                                                                                                                                                                                                              EUR_USD
1,08                                                                                                                        EUR                                                                                                                               RF
                                                                                                                            SVM            1,07
                                                                                                                                                                                                                                                              SGBM
                                                                                                                            MLP
1,07                                                                                                                                       1,06


1,06                                                                                                                                       1,05
       1   5   9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101                                                            1   5   9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101


                                                                         (a)                                                                                                                                    (b)
                                                                         Fig. 6. Out of sample prediction EUR/USD.

10500                                                                                                                                      10500

10000                                                                                                                                      10000

 9500                                                                                                                                       9500

 9000                                                                                                                                       9000

 8500                                                                                                                                       8500

 8000                                                                                                                                       8000

 7500                                                                                                                                       7500

 7000                                                                                                                                       7000

                                                                                                                                                                                                                                                                  BTC
 6500                                                                                                                                       6500
                                                                                                                           BTC                                                                                                                                    RF
                                                                                                                           SVM                                                                                                                                    SGBM
 6000                                                                                                                      MLP              6000

 5500                                                                                                                                       5500
           1       9        17        25        33        41        49        57        65        73        81        89        97                    1        9        17        25        33        41        49        57        65        73        81        89        97
               5       13        21        29        37        45        53        61        69        77        85        93        101                  5        13        21        29        37        45        53        61        69        77        85        93        101


                                                                         (a)                                                                                                                                    (b)
                                                                         Fig. 7. Out of sample prediction BTC /USD.

320                                                                                                                                        300

300                                                                                                                                        280

280
                                                                                                                                           260

260
                                                                                                                                           240

240
                                                                                                                                           220
220
                                                                                                                                           200
200
                                                                                                                                           180
180

                                                                                                                                           160
160
                                                                                                                                                                                                                                                                  ETH
                                                                                                                      ETH                                                                                                                                         RF
                                                                                                                      SVM                  140
140                                                                                                                                                                                                                                                               SGBM
                                                                                                                      MLP

120                                                                                                                                        120


100                                                                                                                                        100
       1   5   9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101                                                            1   5       9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101


                                                                         (a)                                                                                                                                    (b)
                                                                          Fig. 8. Out of sample prediction ETH/USD.
                                                                                                                                                                                                                                       447

3600                                                                                             3600


3400                                                                                             3400


3200                                                                                             3200


3000                                                                                             3000


2800                                                                                             2800


2600                                                                                             2600
                                                                                    SP500
                                                                                    SVM                                                                                                                                SP500
2400                                                                                MLP          2400                                                                                                                  RF
                                                                                                                                                                                                                       SGBM

2200                                                                                             2200



2000                                                                                             2000
       1       5       9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101          1       5       9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101


                                                       (a)                                                                                                                (b)
                                                          Fig. 9. Out of sample prediction S&P 500.

10500                                                                                            10500


10000                                                                                            10000


 9500                                                                                             9500



 9000                                                                                             9000



 8500                                                                                             8500


                                                                                                  8000
 8000

                                                                              NASDAQ
                                                                              SVM                 7500
 7500                                                                                                                                                                                                              NASDAQ
                                                                              MLP
                                                                                                                                                                                                                   RF
                                                                                                                                                                                                                   SGBM
                                                                                                  7000
 7000

                                                                                                  6500
 6500                                                                                                       1            9        17        25        33        41        49        57        65        73        81        89        97
           1       5    9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97                         5        13        21        29        37        45        53        61        69        77        85        93        101


                                                       (a)                                                                                                                (b)
                                                       Fig. 10. Out of sample prediction NASDAQ.

                                     Table 2. Out-of-sample accuracy forecasting performance results.
                                                     SGBM                               RF                                         SVM                                                    MLP
                                            MAPE, % RMSE MAPE, % RMSE MAPE, % RMSE MAPE, % RMSE
                       EUR/USD                  0.46          0.0656            0.47          0.0067                         0.40                0.0065                         0.45                    0.0067
                       BTC/USD                  2.44           283.6            2.65           321.8                         1.03                 106.5                         2.25                     274.5
                       ETH/USD                  5.09           15.6             5.17           15.89                         8.36                 260.4                         5.25                     14.83
                       S&P 500                  2.54           97.99            2.85           108.9                         2.91                 106.5                         2.35                     91.2
                       NASDAQ                   2.38           289.3            2.51           289.1                         2.77                 340.3                         2.23                     257.6

It should be noted that our results are comparable with accuracy obtained by Deep
learning approaches [24]. Therefore, using both tree-based ensembles, ANNs and SVM
are powerful enough forecasting tools for financial time series.
448


5       Conclusion and discussion

Our research has shown efficiency of using ML approaches to predicting financial time
series. According to our results, the out of sample accuracy of short-term forecasting
daily quotes obtained by SGBM has the same rate as MLP and SVM. In terms of MAPE
for selected time series it was within 0.46-5.9 %. Moreover, for NASDAQ and ETH
SGBM outperformed other approaches. The worse results were obtained by RF which
was used as a baseline.
   At the same time all models showed the worst results for ETH, accuracy rate
(MAPE) turned out in the range from 5.9 (SGBM) to 8.38 % (RF).
   By designing models, we explored different sets of features: from 5 to 15 lags of
target variable (from 7 to 14 for cryptocoins). Our final dataset contained only past
values of target variable with 14 and 15 lag depth. In this case larger dataset provided
better training for all models and given more efficient results.
   It should be noted that we used a minimal dataset - only lag values of the studied
series (closing prices). In our opinion, forecasting accuracy can be improved by
including additional features, for example, open, max, min and average prices,
fundamental variables, different indicators and oscillators, such as, Price rate-of-
change, Relative strength index, and so on.
   Future research should extend by investigating of the prediction power of described
ML approaches by using additional features. In the conclusion, we note that the
proposed methodology by the development of combined ensemble of C&RT with other
powerful ML models, such as NN and SVM is a promising approach to forecasting
financial time series. Moreover, it seems to us promising to use DL approaches for
features selection and making prediction.


References
 1.   Bahrammirzaee, A.: A comparative survey of artificial intelligence applications in finance:
      artificial neural networks, expert system and hybrid intelligent systems. Neural Computing
      and Applications 19(8), 1165–1195 (2010). doi:10.1007/s00521-010-0362-z
 2.   Bontempi, G., Taieb S., Borgne Y.: Machine Learning Strategies for Time Series
      Forecasting. Business Intelligence. Lecture Notes in Business Information Processing 138,
      62–77 (2013). doi:10.1007/978-3-642-36318-4_3
 3.   Borges, T.A., Neves, R.N.: Ensemble of Machine Learning Algorithms for Cryptocurrency
      Investment with Different Data Resampling Methods. Applied Soft Computing 90, 106187
      (2020). doi:10.1016/j.asoc.2020.106187
 4.   Breiman, L., Friedman, H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees.
      Chapman & Hall/CRC, Boca Raton (1984)
 5.   Breiman, L.: Random Forests. Machine Learning 45, 5–32 (2001).
      doi:10.1023/A:1010933404324
 6.   Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting, 3rd edn. Springer
      International Publishing, New York (2016). doi:10.1007/978-3-319-29854-2
 7.   Brooks, A.: Trading Price Action Trends: Technical Analysis of Price Charts Bar by Bar
      for the Serious Trader. John Wiley & Sons, New Jersey (2012)
                                                                                            449


 8.   Caporale, G.M., Plastun, A., Oliinyk, V.: Bitcoin fluctuations and the frequency of price
      overreactions. Financial Markets and Portfolio Management 33, 109–131 (2019).
      doi:10.1007/s11408-019-00332-5
 9.   Chen, Z., Li, C., Sun, W.: Bitcoin price prediction using machine learning: An approach to
      sample dimension engineering. Journal of Computational and Applied Mathematics 365,
      112395 (2020). doi:10.1016/j.cam.2019.112395
10.   Derbentsev, V., Datsenko, N., Stepanenko, O., Bezkorovainyi, V.: Forecasting
      Cryptocurrency Prices Time Series Using Machine Learning. CEUR Workshop
      Proceedings 2422, 320–334 (2019)
11.   Derbentsev, V., Matviychuk, A., Soloviev, V.N.: Forecasting of Cryptocurrency Prices
      Using Machine Learning. In: Pichl, L., Eom, C., Scalas, E., Kaizoji, T. (eds.) Advanced
      Studies of Financial Technologies and Cryptocurrency Markets, pp. 211–231. Springer,
      Singapore (2020). doi:10.1007/978-981-15-4498-9_12
12.   Di Persio, L., Honchar O.: Multitask Machine Learning for Financial Forecasting.
      International Journal of Circuits, Systems and Signal Processing 12, 444–451 (2018)
13.   Eiamkanitchat, N., Moontuy, T., Ramingwong, S.: Fundamental analysis and technical
      analysis integrated system for stock filtration. Cluster Computing 20, 883–894 (2017).
      doi:10.1007/s10586-016-0694-2
14.   Flach, P.: Machine Learning: The Art and Science of Algorithms that Make Sense of Data.
      Cambridge University Press, Cambridge (2012)
15.   Friedman, J.H.: Greedy Function Approximation: A Gradient Boosting Machine. The
      Annals of Statistics 29(5), 1189–1232 (2001)
16.   Friedman, J.H.: Stochastic Gradient Boosting. Computational Statistics & Data Analysis
      38(4), 367–378 (2002). doi:10.1016/S0167-9473(01)00065-2
17.   Glabadanidis, P.: Market Timing and Moving Averages: An Empirical Analysis of
      Performance in Asset Allocation. Palgrave Macmillan, New York (2015).
      doi:10.1057/9781137359834
18.   Hamid, S.A., Habib, A.: Financial Forecasting with Neural Networks. Academy of
      Accounting and Financial Studies Journal 18(4), 37–55 (2014)
19.   Hitam, N.A., Ismail, A.R.: Comparative Performance of Machine Learning Algorithms for
      Cryptocurrency Forecasting. Indonesian journal of electrical engineering and computer
      science 11(3), 1121–1128 (2018). doi:10.11591/ijeecs.v11.i3.pp1121-1128
20.   Kara, Y., Boyacioglu, M., Baykan, Ö.K.: Predicting direction of stock price index
      movement using artificial neural networks and support vector machines: The sample of the
      Istanbul Stock Exchange. Expert Systems with Applications 38(5), 5311–5319 (2011).
      doi:10.1016/j.eswa.2010.10.027
21.   Kourentzes, N., Barrow, D.K., Crone, S.F.: Neural network ensemble operators for time
      series forecasting. Expert Systems with Applications 41(9), 4235-4244 (2014).
      doi:10.1016/j.eswa.2013.12.011
22.   Kumar, D., Rath, S.K.: Predicting the Trends of Price for Ethereum Using Deep Learning
      Technique. In: Dash, S., Lakshmi, C., Das, S., Panigrahi, B. (eds.) Artificial Intelligence
      and Evolutionary Computations in Engineering Systems. Advances in Intelligent Systems
      and Computing, vol. 1056, pp. 103–114. Springer, Singapore (2020). doi:10.1007/978-981-
      15-0199-9_9
23.   Kumar, M., Thenmozhi, M.: Forecasting Stock Index Movement: A Comparison of Support
      Vector Machines and Random Forest. Paper presented at Indian Institute of Capital Markets
      9th Capital Markets Conference (2006). doi:10.2139/ssrn.876544
450


24.   Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., Alsaadi, F.E.: A survey of deep neural
      network architectures and their applications. Neurocomputing 234, 11-26 (2017).
      doi:10.1016/j.neucom.2016.12.038
25.   Matviychuk, A.: Fuzzy logic approach to identification and forecasting of financial time
      series using Elliott wave theory. Fuzzy Economic Review XI(2), 51–68 (2006).
      doi:10.25102/fer.2006.02.04
26.   McNally, S., Roche, J., Caton, S.: Predicting the price of Bitcoin using Machine Learning.
      In: 2018 26th Euromicro International Conference on Parallel, Distributed and Network-
      based      Processing     (PDP),     21-23    March       2018,   Cambridge,    UK(2018).
      doi:10.1109/PDP2018.2018.00060
27.   Mills, T.C.: Forecasting Financial Time Series. In: Clements, M.P., Hendry, D.F. (eds.) The
      Oxford Handbook of Economic Forecasting. Oxford University Press, Oxford (2011).
      doi:10.1093/oxfordhb/9780195398649.013.0019
28.   Nti, I.K., Adekoya, A.F., Weyori, B.A.: A systematic review of fundamental and technical
      analysis of stock market predictions. Artificial Intelligence Review 53, 3007–3057 (2020).
      doi:10.1007/s10462-019-09754-z
29.   Okasha, M.K.: Using Support Vector Machines in Financial Time Series Forecasting.
      International Journal of Statistics and Applications 4(1), 28–39 (2014).
      doi:10.5923/j.statistics.20140401.03
30.   Sapankevych, N.I., Sankar, R.: Time Series Prediction Using Support Vector Machines: A
      Survey. IEEE Computational Intelligence Magazine 4(2), 24–38 (2009).
      doi:10.1109/MCI.2009.932254
31.   Saxena, A., Sukumar, T.R.: Predicting bitcoin price using lstm And Compare its
      predictability with ARIMA model. International Journal of Pure and Applied Mathematics
      119(17), 2591–2600 (2018)
32.   Semerikov, S.O., Teplytskyi, I.O., Yechkalo, Yu.V., Kiv, A.E.: Computer Simulation of
      Neural Networks Using Spreadsheets: The Dawn of the Age of Camelot. CEUR Workshop
      Proceedings 2257, 122–147 (2018)
33.   Sezer, O.B., Gudelek, M.U., Ozbayoglu, A.M.: Financial time series forecasting with deep
      learning : A systematic literature review: 2005–2019. Applied Soft Computing 90, 106181
      (2020). doi:10.1016/j.asoc.2020.106181
34.   Tarasenko, A.O., Yakimov, Y.V., Soloviev, V.N.: Convolutional neural networks for image
      classification. CEUR Workshop Proceedings 2546, 101–114 (2019)
35.   Varghade, P., Patel, R.: Comparison of SVR and Decision Trees for Financial Series
      Prediction. IJACTE 1(1), 101–105 (2012)
36.   Volkova, N.P., Rizun, N.O., Nehrey, M.V.: Data science: opportunities to transform
      education. CEUR Workshop Proceedings 2433, 48–73 (2019)
37.   Yahoo Finance: Stock Market Live, Quotes, Business & Finance News.
      https://finance.yahoo.com (2020). Accessed 30 Jun 2020
38.   Yao, Y., Yi, J., Zhai, S., Lin, Y., Kim, T., Zhang, G., Lee, L.Y.: Predictive Analysis of
      Cryptocurrency Price Using Deep Learning. International Journal of Engineering &
      Technology 7(3), 258–264 (2018). doi:10.14419/ijet.v7i3.27.17889
39.   Zarandi, M.H.F., Rezaee, B., Turksen, I.B., Neshat, E.: A type-2 fuzzy rule-based experts
      system model for stock price analysis. Expert Systems with Applications 36(1), 139–154
      (2009). doi:10.1016/j.eswa.2007.09.034