<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Machine Learning for Multi-step Ahead Forecasting of Volatility Proxies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jacopo De Stefani</string-name>
          <email>jacopo.de.stefani@ulb.ac.be</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olivier Caelen</string-name>
          <email>olivier.caelen@worldline.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dalila Hattab</string-name>
          <email>dalila.hattab@equensworldline.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianluca Bontempi</string-name>
          <email>gianluca.bontempi@ulb.ac.be</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Equens Worldline R&amp;D, Lille (Seclin)</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Machine Learning Group, Departement d'Informatique, Universite Libre de Bruxelles</institution>
          ,
          <addr-line>Boulevard du Triomphe CP212, 1050 Brussels</addr-line>
          ,
          <country>Belgium Worldline</country>
          <institution>SA/NV R&amp;D</institution>
          ,
          <addr-line>Bruxelles</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In nance, volatility is de ned as a measure of variation of a trading price series over time. As volatility is a latent variable, several measures, named proxies, have been proposed in the literature to represent such quantity. The purpose of our work is twofold. On one hand, we aim to perform a statistical assessment of the relationships among the most used proxies in the volatility literature. On the other hand, while the majority of the reviewed studies in the literature focuses on a univariate time series model (NAR), using a single proxy, we propose here a NARX model, combining two proxies to predict one of them, showing that it is possible to improve the prediction of the future value of some proxies by using the information provided by the others. Our results, employing arti cial neural networks (ANN), k-Nearest Neighbours (kNN) and support vector regression (SVR), show that the supplementary information carried by the additional proxy could be used to reduce the forecasting error of the aforementioned methods. We conclude by explaining how we wish to further investigate such relationship.</p>
      </abstract>
      <kwd-group>
        <kwd>nancial time series</kwd>
        <kwd>volatility forecasting</kwd>
        <kwd>multi-step ahead forecast</kwd>
        <kwd>machine learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Introduction and problem statement
In time series forecasting, the largest body of research focuses on the prediction
of the future values of a time series, with either a single or a multiple steps ahead
forecasting horizon, given historical knowledge about the series itself. In
statistical terms, such problem is equivalent to the forecast of the expected value of the
time series in the future, conditioned on the past available information. In the
context of stock market, the solution to the aforementioned problem could allow
to determine the future valuation of a company, thus giving an information to
the traders about how to act upon such change in the valuation. However, from
the traders' standpoint, price is not the only variable of interest.The knowledge
of the intensity of the uctuations a ecting this price (i.e. the stock volatility)
allow them to assess the risk associated to their investment. Since volatility is not
directly observable given the time series, according to the granularity and the
type of the available data, one could compute di erent measures, named
volatility proxies [21]. Although volatility proxies based on intraday trading data exist
[17], due to the restrictions on the access to such ne grained data, the rest of our
analysis will be focused on proxies for daily data. A standard approach to
volatility forecasting, once a given proxy has been selected, is to apply either a
statistical Generalized AutoRegressive Conditional Heteroskedasticity (GARCH)-like
model [2], or to apply a machine learning model. In addition, several hybrid
approaches are emerging [16, 10, 19], including a non-linear computational
component into the standard GARCH equations. In all the aforementioned cases,
we deal with a univariate problem, where a single time series is used to predict
the future values of the series itself. An exception is represented by the work of
[30] where a volatility proxy is combined with external information (namely the
volume of the queries to a web search engine for a given keyword). This paper
proposes a method for multiple step ahead forecast of a volatility proxy
incorporating the information from a second proxy in order to improve the prediction
quality. The purpose of our work is twofold. First, we aim to perform a statistical
assessment of the relationships among the most used proxies in the volatility
literature. Second, we explore a NARX (Nonlinear Autoregressive with eXogenous
input) approach to estimate multiple steps of the output, where the output and
the input are two di erent proxies. In particular, our preliminary results show
that the statistical dependencies between proxies can be used to improve the
forecasting accuracy. The rest of the paper will be structured as follows: Section
2 will introduce the notation and provide a uni ed view on the di erent
volatility proxies. Section 3 will introduce the formulation of the volatility forecasting
problem as a machine learning task and will described the di erent tested
models. Section 4 concludes the paper with a discussion of the results and the future
research directions.
2</p>
      <p>Volatility proxies: de nition and notation
In this paper we consider univariate time series whose value at time t is denoted
by the scalar value yt. Let us consider the following quantities of interest, each of
them on a daily time scale: Pt(o); Pt(c); Pt(h); Pt(l), respectively the stock prices at
the opening, closing of the trading day and the maximum and minimum value
for each trading day; vt being the volume 1. We will assume the availability of a
training set of T past observations of each univariate series.</p>
      <p>In the absence of detailed information concerning the price movements within
a given trading day, stock volatility becomes directly unobservable [27]. To cope
with such problem, several di erent measures (also called proxies) have been
proposed in the econometrics literature [21, 12, 20, 13] to capture this information.
However, there is no consensus in the scienti c literature upon which volatility
1 Number of traded stocks in a given day.
proxy should be employed for a given purpose. We will proceed by reviewing the
di erent types of proxies available in the literature: SD;n, i and G.
Volatility as variance The rst proxy corresponds to the natural de nition of
volatility [21], that is a rolling standard deviation of a given stock's continuously
compounded returns over a past time window of size n:
where</p>
      <p>v
tSD;n = tuu n 1
rt = ln
1
n 1
X(rt i
i=0
P (c) !</p>
      <p>t
P (c)
t 1
rn)2
represents the daily continuously compounded return for day t computed from
the closing prices Pt(c) and rn represents the returns' average over the period
ft; ; t ng. In this formulation, n represents the degree of smoothing that is
applied to the original time series.</p>
      <p>Volatility as a proxy of the coarse grained intraday information The ti
family of proxies is analytically derived in [12] by incorporating supplementary
information (i.e. opening, maximum and minimum price for a given trading day)
and trying to optimize the quality of the estimation.</p>
      <p>The rst estimator t0, which the authors propose as benchmark value, simply
consists of the squared value of the returns (i.e. the ratio of the logarithms of
the closing price time series):</p>
      <p>"
t0 = ln</p>
      <p>Pt(+c)1 !#2</p>
      <p>
        The second proposition t1 is able to reduce the variance of the estimator,
by including the opening price, and computing a weighted average between two
components, representing respectively the nightly and daily volatility:
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
t1 =
      </p>
      <p>
        The value of f 2 [0; 1] represents the fraction of the trading day in which the
market is closed. In the case of CAC40, we have that f &gt; 1 f , since trading is
only performed during roughly one third of the day. In this case, the weighting
scheme proposed in (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ) will give higher weight to the intraday volatility, with
respect to the nightly one.
The third estimator, derived in [20] through the modeling of the price
evolution as a stochastic di usion process with unknown variance, is a function of
the variation range (i.e. the di erence between maximum and minimum value
for the current trading day):
      </p>
      <p>Here, a is a weighting parameter, whose optimal value, according to the
authors is shown to be 0:17, regardless of the value of f .</p>
      <p>Furthermore, the same study introduces a family of estimators based on the
normalization of the maximum, minimum and closing values by the opening
price of the considered day. We can then de ne:</p>
      <p>P (h) !</p>
      <p>t
P (o)</p>
      <p>t
u = ln
d = ln</p>
      <p>P (l) !</p>
      <p>t
P (o)
t
c = ln</p>
      <p>P (c) !</p>
      <p>t
P (o)
t
where u is the normalized high price, d is the normalized low price and c is
the normalized closing price.</p>
      <p>
        We can derive Equation (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ), by starting from a general, analytic form for the
estimator, and then deriving the optimal values of the coe cient by minimizing
the estimation variance.
0:383c2
      </p>
      <p>The values of the coe cients are set assuming that the price dynamics
follows a Brownian motion and enforcing scale invariance properties and price and
time symmetry conditions. For all the details concerning the proof, we refer the
interested reader to [12].</p>
      <p>
        Equation (
        <xref ref-type="bibr" rid="ref9">9</xref>
        ) is derived from Equation (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ) by eliminating the cross product
terms.
      </p>
      <p>
        Last but not least, the best estimator in terms of estimation variance e
ciency is obtained by combining the overnight volatility measure with the optimal
estimator described in Equation (
        <xref ref-type="bibr" rid="ref8">8</xref>
        ).
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(
        <xref ref-type="bibr" rid="ref8">8</xref>
        )
(
        <xref ref-type="bibr" rid="ref9">9</xref>
        )
a
f
log
      </p>
      <p>Pt(+o)1 !2
tG = vuut! + Xp j ( tG j )2 + Xq
j=1
i=1</p>
      <p>2
i"t i
GARCH-based volatility Even though the GARCH(p,q ) [14] (Generalized
AutoRegressive Conditional Heteroskedasticity) family of models is generally
employed for volatility forecasting, we decided to consider it here as a lter,
that, given the original time series, returns its estimation of the series volatility.
All GARCH models assume that the return time series can be expressed as
the sum of two components: a deterministic trend and a stochastic
timevarying component "t. The stochastic component can be further decomposed
and expressed as the product between a sequence of independent and identically
distributed random variables Zt with null mean and unit variance and a time
varying scaling factor tG.</p>
      <p>The core of the model is the variance equation, describing how the residuals "t
and the tG past volatility a ects the future volatility.</p>
      <p>
        The coe cients !; i; j are tted according to the maximum likelihood
estimated procedure proposed in [4]. In the case of our proxies, we consider the
estimation of the volatility made by a GARCH (p = 1,q = 1) model as suggested
in [13].
(
        <xref ref-type="bibr" rid="ref10">10</xref>
        )
(
        <xref ref-type="bibr" rid="ref11">11</xref>
        )
(
        <xref ref-type="bibr" rid="ref12">12</xref>
        )
(
        <xref ref-type="bibr" rid="ref13">13</xref>
        )
(
        <xref ref-type="bibr" rid="ref14">14</xref>
        )
(
        <xref ref-type="bibr" rid="ref15">15</xref>
        )
(
        <xref ref-type="bibr" rid="ref16">16</xref>
        )
3
      </p>
      <p>Multiple step ahead volatility forecasting
The Nonlinear Auto Regressive (NAR) formulation of a univariate time series
as an input-output mapping allows the use of supervised machine learning
techniques for time series one-step-ahead forecasting [6].</p>
      <p>y = f (x) + !
y = [yt+1]
x = [yt d;</p>
      <p>; yt d m+1]</p>
      <p>To be more precise, this model assumes an autoregressive dependence of the
future value of the time series on the past m (lag or embedding order) values,
with a given delay2 d and an additional null-mean noise term !.
2 In the following of the paper we will assume d = 0 for the sake of simplicity.
With this structure, the forecasting task can be reduced to a two-step process.
First the mapping f between the inputs x and the outputs y is learned through
a supervised learning task, and then such mapping is used to produce the
onestep-ahead forecast of the future values.</p>
      <p>Extensions of this technique allows to perform multiple-step ahead forecast
(i.e. y = [yt+H ; ; yt+1]). Such extensions can be summarised into two main
classes: single output (Direct and Recursive strategies) and multiple output
(MIMO ) strategies. The former learns a multi-input single output dependency
while the latter learns a multi-input multiple output dependency. We invite the
interested reader to see [24], [6], [25] for more details.</p>
      <p>
        In what follows, we focus on two multi-step ahead single output learning
task, employing the Direct strategy [5], [23], [8]. In the rst one (NAR), we will
focus on the multiple step ahead forecast of a primary volatility proxy tP using
only its past values as input information, while in the second one (NARX), also
the past values of an additional volatility proxy tX will be incorporated in the
model described in (
        <xref ref-type="bibr" rid="ref14">14</xref>
        ):
yNAR = yNARX = [ tP+H ;
      </p>
      <p>; tP+1]
xNAR = [ tP d;
xNARX = [ tP d;
; tP d m+1]
; tP d m+1; tX d;
; tX d m+1]</p>
      <p>
        We compare the two approaches for embedding orders m 2 f2; 5g, several
forecasting horizons h 2 f2; 5; 8; 10; 12g and for di erent estimators of the
dependency f . More precisely, as estimators of the dependency, we employ a naive
model, a GARCH(
        <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
        ), and three machine learning approaches: a feedforward
Arti cial Neural Networks, a k-Nearest Neighbors approach and Support Vector
Machine based regression.
3.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Naive</title>
      <p>The Naive method is employed mainly as a benchmark for comparison for the
other models, simply consisting in taking the last available historical value:
^tP+h =</p>
      <p>
        P
t 1
(
        <xref ref-type="bibr" rid="ref17">17</xref>
        )
(
        <xref ref-type="bibr" rid="ref18">18</xref>
        )
(
        <xref ref-type="bibr" rid="ref19">19</xref>
        )
(
        <xref ref-type="bibr" rid="ref20">20</xref>
        )
3.2
      </p>
      <p>
        GARCH(
        <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
        )
The GARCH model corresponds to the one described in subsection 2, in
equations (
        <xref ref-type="bibr" rid="ref13">13</xref>
        ) and (
        <xref ref-type="bibr" rid="ref11">11</xref>
        ), with p = 1 and q = 1.
3.3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Arti cial Neural Networks</title>
      <p>In machine learning and cognitive science, an arti cial neural network (ANN) is a
network of interconnected processing elements, called neurons, which are used to
estimate or approximate functions that can depend on a large number of inputs</p>
      <p>B m H
^tP+h = foh BBBbo + X wio tP i + X wjo fh</p>
      <p>B@ i=1 j=1
| Linear{zAR(m) } |
m !CC
X wij tP i + bj C
i=1 }ACC</p>
      <sec id="sec-3-1">
        <title>Non-linear{zcomponent</title>
        <p>0
0
B
B</p>
        <p>B
^tP+h = foh BBBbo</p>
        <p>B
B
B
@</p>
        <p>m
+ X wio tP i + w(i+m)o tX i
i=1
|</p>
        <p>H
+ X wjo fh
j=1
|</p>
      </sec>
      <sec id="sec-3-2">
        <title>Linear {AzRX(m) }</title>
        <p>m
X wij tP i + w(i+m)j tX i + bj
i=1</p>
      </sec>
      <sec id="sec-3-3">
        <title>Non-linear{zcomponent</title>
        <p>1
C
C
C</p>
        <p>C
! CC</p>
        <p>C
C
C</p>
        <p>
          A
}
that are generally unknown. For our task, we will focus on a speci c family
of arti cial neural networks, the multi-layer perceptron (MLP), with a single
hidden layer. Equations (
          <xref ref-type="bibr" rid="ref21">21</xref>
          ) and (
          <xref ref-type="bibr" rid="ref22">22</xref>
          ) describe the structure of the model for a
single forecasting horizon t + h in the context of the Direct strategy, respectively
for a NAR and a NARX model.
        </p>
        <p>
          It should be noted that, as shown in Equation (
          <xref ref-type="bibr" rid="ref21">21</xref>
          ), the model can be
decomposed into a linear autoregressive component of order m and a nonlinear
component whose structure depends on the number of hidden nodes H (selected
through k-fold cross-validation). When an external regressor is added, its
inuence will a ect both the linear and nonlinear component, as shown in (
          <xref ref-type="bibr" rid="ref22">22</xref>
          ).
In both cases the activity functions foh( ) and fh( ) are both logistic functions.
Finally, our implementation of the MLP models is based on the nnet package
for R [28].
        </p>
        <p>where y(xi) is the output vector of the ith nearest neighbor of the input
vector x in the dataset. The choice of the optimal number of neighbors k will
3.4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>K Nearest Neighbors</title>
      <p>The k-Nearest neighbors (kN N ) model is a local nonlinear model used for
classi cation and regression. In the case of regression, the prediction for a given
input vector x is obtained through local learning [3], a method that produces
predictions by tting a simple local model in the neighborhood of the point to
be predicted. The neighborhood of a point is de ned by taking the the k values
having the minimal values for a chosen distance metric de ned on the space of
the input vector [1].</p>
      <p>
        In this case, every data point is represented in the form (x; y) where x
represents the vector of input values and y the corresponding output vector, as
described in Figure 1. Then the prediction for an unknown input vector x is
computed as follows:
y^(x ) =
1 X y(xi)
k
i2kNN
1
(
        <xref ref-type="bibr" rid="ref21">21</xref>
        )
(
        <xref ref-type="bibr" rid="ref22">22</xref>
        )
(
        <xref ref-type="bibr" rid="ref23">23</xref>
        )
be performed through automatic leave-one-out selection as described in [5]. Our
implementation of the kNN models is based on the R package gbcode [7].
3.5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Support Vector Regression</title>
      <p>
        Support Vector Regression is a regression methodology, based on the Support
Vector Machine theoretical framework [9]. The key idea behind SVR is that the
regression model can be expressed using a subset of the input training examples,
called the support vectors. In more formal terms, the model (Equation (
        <xref ref-type="bibr" rid="ref24">24</xref>
        )) is
a linear combination over all the n support vector of a bivariate kernel function
k( ; ) taking as inputs the data point x whose forecast is required and the ith
support vector xi. The coe cients i; i are determined through the
minimization of an empirical risk function (cf. [22]), solved as a continuous optimization
problem.
      </p>
      <p>y =
n
X( i
i=1</p>
      <p>
        kx xik2
k(x; xi) = e 2 2
i )k(x; xi)
(
        <xref ref-type="bibr" rid="ref24">24</xref>
        )
(
        <xref ref-type="bibr" rid="ref25">25</xref>
        )
      </p>
      <p>
        Among the di erent available kernel functions we employ the radial basis one
(Equation(
        <xref ref-type="bibr" rid="ref25">25</xref>
        )), for which the optimal value of the parameter is determined
through grid search. Here, the SVM implementation of the R package e1071
[18] is used for the experiments. As for ANN and kNN, we will be testing two
di erent dataset structures (cf. Figure 1), representing respectively the exclusion
(on the left) and the inclusion (on the right) of an external regressor.
      </p>
      <p>Direct NAR</p>
      <p>Direct NARX
{ A single model f h for each horizon h.
{ Forecast at h step is made using hth
model.</p>
      <p>{ A single model f h for each horizon h.
{ Forecast at h step is made using hth</p>
      <p>model.</p>
      <p>x y
3P 2P 1P 5P
4P 3P 2P 6P
::: ::: ::: :::
TP 5 TP 6 TP 7 T 2</p>
      <p>J</p>
      <p>P
3
P
4
:::</p>
      <p>P
2
P
3
:::</p>
      <p>P
1
P
2
:::
x</p>
      <p>X
3
X
4
:::</p>
      <p>X
2
X
3
:::</p>
      <p>X
1
X
2
:::
y
P
5
P
6
:::
TP 5 TP 6 TP 7 TX 5 TX 6 TX 7 TP 2
Fig. 1: Comparison of the dataset structure and model identi cation procedure
for NAR and NARX forecasting strategies. The primary proxy is denoted with
tP , while the secondary one is tX . The example datasets are shown for a model
order m = 3 and a forecasting horizon h = 3.
4.1</p>
      <p>Experimental Results</p>
    </sec>
    <sec id="sec-6">
      <title>Dataset description</title>
      <p>The proxies have been computed on the 40 time series of the french stock market
index CAC40 from 05-01-2009 to 22-10-2014 (approximately 6 years) for a total
1489 OHLC (Opening, High, Low, Closing) samples for each time series. In
addition to the proxies, we include also the continuously compounded return
and the volume variable (representing the number of trades in given trading
day).
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>Statistical analysis</title>
      <p>e
m
rt ? rt louV σ1 σ6 σ4 σ5 σ2 σ3 σ0 5 σSD 15 SD 21 σSD σG
σ
?
?
?
?
?
?
?
?
?
?
?
?</p>
      <sec id="sec-7-1">
        <title>Average GARCH(1,1) ANN-Dir kNN-Dir</title>
        <p>SVR-Dir
X
;
;
;
;
6
V olume
SD;5
SD;15
SD;21
;
;
6
V olume
SD;5
SD;15
SD;21
;
;
6
V olume
SD;5
SD;15
SD;21</p>
      </sec>
      <sec id="sec-7-2">
        <title>Horizon - H</title>
        <p>forecasting horizon and model order we performed a number of training and test
tasks by following a rolling origin strategy [26]. The size of the training set is
2N and the procedure is repeated for 50 testing sets of length H. The regressor
3
combinations have been selected in order to test whether the belonging (=) or
not to the same proxy family (6=) impacts the forecasting performance. The
employed error measure is the Mean Absolute Scaled Error [15], normalized at
each forecasting horizon by the the MASE of the Naive method.</p>
        <p>MASE =</p>
        <p>T
T 1</p>
        <p>PT P
t=1 t
PT P
t=2 t
^tP</p>
        <p>
          P
t 1
(
          <xref ref-type="bibr" rid="ref26">26</xref>
          )
We include in our analysis a GARCH(
          <xref ref-type="bibr" rid="ref1 ref1">1,1</xref>
          ) method [13] as a baseline reference
method. While employing an additional regressor, model orders higher than 2
have not been tested due to the excessive computational time required by the
corresponding technique for the given task or due to numerical convergence
problems. A rst observation from the table is that all the ML methods, both
in the single input and the multiple input con guration, are able to outperform
the reference GARCH method. Moreover, both the increase of the model
order m and the introduction of an additional regressor are able to improve the
methods' performances. However, only the addition of an external regressor, for
horizons greater than 8 steps ahead is shown to bring a statistically signi cant
improvement (paired t-test, pv=0.05). Even though no model appear to clearly
outperform all the others on every horizons, we can observe that the SVR model
family is generally able to produce smaller forecast errors than those based on
ANN and k-NN.
5
        </p>
        <p>Conclusion and Future work
After having shown the bene ts of including an additional proxies in our models,
our main aim is to investigate how the forecasting quality of volatility could be
improved, mainly by tuning three parameters in our methods: the choice of the
additional proxy, the employed machine learning technique and the size of the
training window. In order to further advance our research, we also plan to study
how the current approach could be generalized, in order to include an arbitrary
number of volatility proxies.</p>
        <p>Acknowledgments. Jacopo De Stefani acknowledges the support of the
ULBWORLDLINE agreement. Gianluca Bontempi acknowledges the funding of the
Brufence project (Scalable machine learning for automating defense system)
supported by INNOVIRIS (Brussels Institute for the encouragement of scienti c
research and innovation).</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Altman</surname>
            ,
            <given-names>N.S.:</given-names>
          </string-name>
          <article-title>An introduction to kernel and nearest-neighbor nonparametric regression</article-title>
          .
          <source>The American Statistician</source>
          <volume>46</volume>
          (
          <issue>3</issue>
          ),
          <volume>175</volume>
          {
          <fpage>185</fpage>
          (
          <year>1992</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Andersen</surname>
            ,
            <given-names>T.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bollerslev</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Arch and garch models</article-title>
          .
          <source>Encyclopedia of Statistical Sciences</source>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Atkeson</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>A.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schaal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Locally weighted learning for control</article-title>
          .
          <source>In: Lazy learning</source>
          , pp.
          <volume>75</volume>
          {
          <fpage>113</fpage>
          . Springer (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bollerslev</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Generalized autoregressive conditional heteroskedasticity</article-title>
          .
          <source>Journal of econometrics 31(3)</source>
          ,
          <volume>307</volume>
          {
          <fpage>327</fpage>
          (
          <year>1986</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bontempi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taieb</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          :
          <article-title>Conditionally dependent strategies for multiple-stepahead prediction in local learning</article-title>
          .
          <source>International journal of forecasting 27(3)</source>
          ,
          <volume>689</volume>
          {
          <fpage>699</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bontempi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taieb</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le Borgne</surname>
            ,
            <given-names>Y.A.:</given-names>
          </string-name>
          <article-title>Machine learning strategies for time series forecasting</article-title>
          .
          <source>In: Business Intelligence</source>
          , pp.
          <volume>62</volume>
          {
          <fpage>77</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bontempi</surname>
          </string-name>
          , Gianluca:
          <article-title>Code from the handbook "statistical foundations of machine learning"</article-title>
          , https://github.com/gbonte/gbcode
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Cheng, H.,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scripps</surname>
          </string-name>
          , J.:
          <article-title>Multistep-ahead time series prediction</article-title>
          .
          <source>In: Paci c-Asia Conference on Knowledge Discovery and Data Mining</source>
          . pp.
          <volume>765</volume>
          {
          <fpage>774</fpage>
          . Springer (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Cortes</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vapnik</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Support vector machine</article-title>
          .
          <source>Machine learning 20(3)</source>
          ,
          <volume>273</volume>
          {
          <fpage>297</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dash</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dash</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>An evolutionary hybrid fuzzy computationally e cient egarch model for volatility prediction</article-title>
          .
          <source>Applied Soft Computing</source>
          <volume>45</volume>
          ,
          <issue>40</issue>
          {
          <fpage>60</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Field</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          :
          <article-title>Meta-analysis of correlation coe cients: a monte carlo comparison of xed-and random-e ects methods</article-title>
          .
          <source>Psychological methods 6</source>
          (
          <issue>2</issue>
          ),
          <volume>161</volume>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Garman</surname>
            ,
            <given-names>M.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klass</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          :
          <article-title>On the estimation of security price volatilities from historical data</article-title>
          .
          <source>Journal of</source>
          business pp.
          <volume>67</volume>
          {
          <issue>78</issue>
          (
          <year>1980</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>P.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lunde</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A forecast comparison of volatility models: does anything beat a garch (1, 1</article-title>
          )?
          <source>Journal of applied econometrics 20(7)</source>
          ,
          <volume>873</volume>
          {
          <fpage>889</fpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Hentschel</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>All in the family nesting symmetric and asymmetric garch models</article-title>
          .
          <source>Journal of Financial Economics</source>
          <volume>39</volume>
          (
          <issue>1</issue>
          ),
          <volume>71</volume>
          {
          <fpage>104</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hyndman</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koehler</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          :
          <article-title>Another look at measures of forecast accuracy</article-title>
          .
          <source>International journal of forecasting 22(4)</source>
          ,
          <volume>679</volume>
          {
          <fpage>688</fpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kristjanpoller</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fadic</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Minutolo</surname>
            ,
            <given-names>M.C.</given-names>
          </string-name>
          :
          <article-title>Volatility forecast using hybrid neural network models</article-title>
          .
          <source>Expert Systems with Applications</source>
          <volume>41</volume>
          (
          <issue>5</issue>
          ),
          <volume>2437</volume>
          {
          <fpage>2442</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Martens</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Measuring and forecasting s&amp;p 500 index-futures volatility using high-frequency data</article-title>
          .
          <source>Journal of Futures Markets</source>
          <volume>22</volume>
          (
          <issue>6</issue>
          ),
          <volume>497</volume>
          {
          <fpage>518</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Meyer, D.,
          <string-name>
            <surname>Dimitriadou</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hornik</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weingessel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leisch</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.C.</given-names>
          </string-name>
          : e1071:
          <article-title>Misc functions of the department of statistics, probability theory group (formerly: E1071), tu wien</article-title>
          , https://cran.r-project.org/web/packages/ e1071/
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Monfared</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Enke</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Volatility forecasting using a hybrid gjr-garch neural network model</article-title>
          .
          <source>Procedia Computer Science</source>
          <volume>36</volume>
          ,
          <issue>246</issue>
          {
          <fpage>253</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Parkinson</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The extreme value method for estimating the variance of the rate of return</article-title>
          .
          <source>Journal of Business</source>
          pp.
          <volume>61</volume>
          {
          <issue>65</issue>
          (
          <year>1980</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Poon</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Granger</surname>
            ,
            <given-names>C.W.</given-names>
          </string-name>
          :
          <article-title>Forecasting volatility in nancial markets: A review</article-title>
          .
          <source>Journal of economic literature 41(2)</source>
          ,
          <volume>478</volume>
          {
          <fpage>539</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Sapankevych</surname>
            ,
            <given-names>N.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sankar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Time series prediction using support vector machines: a survey</article-title>
          .
          <source>IEEE Computational Intelligence Magazine</source>
          <volume>4</volume>
          (
          <issue>2</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Sorjamaa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reyhani</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lendasse</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Methodology for long-term prediction of time series</article-title>
          .
          <source>Neurocomputing</source>
          <volume>70</volume>
          (
          <issue>16</issue>
          ),
          <volume>2861</volume>
          {
          <fpage>2869</fpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Taieb</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontempi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atiya</surname>
            ,
            <given-names>A.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sorjamaa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A review and comparison of strategies for multi-step ahead time series forecasting based on the nn5 forecasting competition</article-title>
          .
          <source>Expert systems with applications 39(8)</source>
          ,
          <volume>7067</volume>
          {
          <fpage>7083</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Taieb</surname>
            ,
            <given-names>S.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sorjamaa</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bontempi</surname>
          </string-name>
          , G.:
          <article-title>Multiple-output modeling for multi-stepahead time series forecasting</article-title>
          .
          <source>Neurocomputing</source>
          <volume>73</volume>
          (
          <issue>10</issue>
          ),
          <year>1950</year>
          {
          <year>1957</year>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Tashman</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          :
          <article-title>Out-of-sample tests of forecasting accuracy: an analysis and review</article-title>
          .
          <source>International journal of forecasting 16(4)</source>
          ,
          <volume>437</volume>
          {
          <fpage>450</fpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Tsay</surname>
            ,
            <given-names>R.S.:</given-names>
          </string-name>
          <article-title>Analysis of nancial time series</article-title>
          , vol.
          <volume>543</volume>
          . John Wiley &amp; Sons (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Venables</surname>
            ,
            <given-names>W.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ripley</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          : Modern Applied Statistics with S. Springer, New York, fourth edn. (
          <year>2002</year>
          ), http://www.stats.ox.ac.uk/pub/MASS4, iSBN 0-387- 95457-0
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Ward</surname>
            <given-names>Jr</given-names>
          </string-name>
          ,
          <string-name>
            <surname>J.H.</surname>
          </string-name>
          :
          <article-title>Hierarchical grouping to optimize an objective function</article-title>
          .
          <source>Journal of the American statistical association</source>
          <volume>58</volume>
          (
          <issue>301</issue>
          ),
          <volume>236</volume>
          {
          <fpage>244</fpage>
          (
          <year>1963</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nichols</surname>
            ,
            <given-names>E.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Deep learning stock volatility with google domestic trends</article-title>
          .
          <source>arXiv preprint arXiv:1512.04916</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>