=Paper= {{Paper |id=Vol-2191/paper16 |storemode=property |title=Towards an Open Book Architecture for Deep Learning Networks: Data Properties and Architectures - Evidence from Time Series Analytics |pdfUrl=https://ceur-ws.org/Vol-2191/paper16.pdf |volume=Vol-2191 |authors=Vera Fleißner,Kai Heinrich,Michael Seifert |dblpUrl=https://dblp.org/rec/conf/lwa/FleissnerHS18 }} ==Towards an Open Book Architecture for Deep Learning Networks: Data Properties and Architectures - Evidence from Time Series Analytics== https://ceur-ws.org/Vol-2191/paper16.pdf
 Towards an Open Book Architecture for Deep Learning
 Networks: Data Properties and Network Architectures -
         Evidence from Time Series Analytics

                     Vera Fleißner1, Kai Heinrich1, Michael Seifert 1
                                   1
                                       TU-Dresden, Germany




       Abstract. In the field of time series predictions there are effective and consistent
       inaccuracies which potentially cause wrong economic decisions. Neural net-
       works are seen as a potential option to improve predictions and are already used
       in some cases. Nevertheless it is unclear so far for which kind of time series data
       the use of neural networks is the right choice but also what kind of neural net-
       works are feasible for which data characteristics. The main goal of this research-
       in-progress paper is therefore to conduct a between different time series charac-
       teristics and network architectures by using a descriptive analysis based on an
       extensive literature review. This goal is a step towards developing design guide-
       lines for (deep) neural networks in the context of time series analysis.

       Keywords: time series, deep learning, neural network, forecasting, prediction,
       survey.


1      Introduction

The field of time series prediction through intelligent systems using Artificial Neural
Networks (ANN) is steadily growing [1]. A reason for that lies within the potential of
complex intelligent systems such as Long Term Short Memory (LSTM) or Convolu-
tional Neural Networks (CNN) to overcome shortcomings of traditional algorithms
such as the vast amount of statistical assumptions to model complex non-linear time
series [2].
   Neural Networks are especially popular in the field of Big Data Analytics and its
subsequent domains like Natural Language Processing (NLP) and Pattern Recognition,
often times ranking first in prediction quality for several benchmark datasets [3].
   When it comes to the task of time series prediction, neural networks have widely
been adopted for the task [4]. However, time series can exhibit multiple characteristics
like non-linearity, noise of deterministic chaos that makes them hard to predict and
demands suitable models for certain characteristics to be caught. While time series pre-
diction is applied in many fields such as credit scoring, inflation analysis or stock mar-
ket prediction, it remains unclear (1) how those black box models catch certain inherent
characteristics of time series and (2) how they react to different problem and sample
characteristics like sample size or prediction window and (3) how they perform in terms
of prediction quality compared to other algorithms [5].
2


   Our main goal within the full research project is to present guidelines on deep learn-
ing architectures based on time series data characteristics. Towards this goal we first
aim to reveal the research gap of intransparent behavior of deep learning algorithms in
the context of time series analysis. We therefore propose the following questions:

        RQ1: What time series characteristics are usually present in the data when
         using neural network architectures?
        RQ2: How do different architectures handle different characteristics?

Our paper is structured as follows: We first give some preliminaries on time series
characteristics in Section 2. After we described our research methodology in Section 3,
we present the results of our descriptive analysis in Section 4 in order to answer research
questions RQ1 and RQ2 in Sections 4.1 and 4.2 respectively. The paper closes with a
discussion and an outlook in Section 5.


2       Time Series Characteristics

The most common characteristic in a time series is stationarity. We can assume a time
series as stationary when it has constant mean, variance and covariance for a given time
step 𝑘 regardless of time 𝑡. When decomposing time series, we can yield a trend and a
seasonality component, if such effects are present in the current time series. If a time
series shows trends and seasonality they are not stationary, since they do not have con-
stant moments throughout the observation interval. Furthermore a time series can show
signs of non-linearity. Non-linearity can lead to small influences of big shocks and vice
versa. Influences in non-linear time series will also depend on the sign of the shock
effect [6]. Most of the time series data will show evidence of multiple characteristics
[7]. In addition to that, those characteristics might determine each other. In addition to
that, the prediction error grows with every additional prediction made upon a previous
value, since those values depend themselves on predictions rather than a real data point.
Further characteristics include anomalies and deterministic chaos behavior. Anomalies
can occur in several forms: Additive outlier effects one observation that lies within
normal range but does not reflect a correct value, might be caused by measure errors;
Innovation outlier effects several values (e.g., driving noise during an erroneous inter-
val); changing trend caused by a shock event in the data or a permanent level shift of
the measurement system (PLS). Dismissing anomalies can lead to significantly worse
and biased predictions [6]. Deterministic chaos as an additional source for seemingly
unpredictable randomness, while limiting long term prediction, can be facilitated in
short-term prediction models if detected correctly [8]. Time series can also exhibit signs
of no deterministic stochastic process at all, which is evidence for the presence of a
noisy component. Noise, like many other characteristics is not an exclusive property
and can be mixed with deterministic parts, so that a time series can just show signs of
partial noise [9].
                                                                                           3


3        Research methodology

   As a basis for our descriptive analysis we employ a systematic literature review [10,
11]. We derive constraints for the review from our RQ1 and RQ2. The technical con-
straints of our literature review can be summarized as follows: Only peer-reviewed
journals were included; English is the written language of the paper; Papers are acces-
sible; List of databases: IEEE; Science Direct; AIS; Springer; JSTOR; Emerald; ACM;
Web of Science. The content constraints can be derived by the research questions: The
focus of the literature review was on application of approaches to time series analysis,
so that we excluded papers that were solely based on theoretical algorithm introduc-
tions. We combined a group of search terms from three different topic areas: algorithm,
task and domain. The task and domain are given by “prediction” and “time series” re-
spectively. For the area of algorithm we compiled a list of commonly known methods
applied within the field of intelligent systems (e.g., neural networks, deep learning).
The summary of the search terms that were yielding at least one result can be found in
Table 1.

                           Table 1. Search terms of the literature review


      Search Term                                                           # of results
      net* AND time series AND fore*                                        113
      net* AND time series AND prediction                                   93
      perceptron AND time series AND fore*                                  5
      perceptron AND time series AND prediction                             4
      rbf AND time series AND prediction                                    5
      Sum of papers initially found                                         226


  We initially found 226 papers. After we applied the technical constraints and re-
moved duplicates 192 papers were left. We then filtered out papers with unsuitable non
numeric time series data (170 papers left) and removed papers that were not topic rele-
vant (e.g., no focus on intelligent systems) to our cause (135). In the next section we
conduct a descriptive analysis and first answering RQ1 in Section 4.1 and subsequently
RQ2 in Section 4.2


4        Descriptive analysis and results

4.1      Time Series Characteristics

The majority of the found papers is published in the field of computer science. The
research goals of the papers can be divided into model comparison, prediction (new
problem), improve prediction (existing problem) and improve model. The improve pre-
diction task makes up the majority of the papers (51%), followed by the improve model
task (22%). The distribution of time series characteristics within the literature review
database is depicted in Fig. 1. We can see that non-linear time series are the subject of
a majority of papers along with trend and seasonal components.
4




                Fig. 1. Time Series Characteristics from Literature Review




   The different dataset sizes and their frequency of occurrence have a peak at around
500 data points, the majority of the studies does not extend into big data territory in
terms of volume. Only a small fraction of the studies use datasets with 10000 observa-
tions or more. This is particularly interesting when applying deep learning algorithms
that often times long for rather large training datasets. Considering the used algorithms
in the studies observed we used a taxonomy approach by [12] to group them. First we
classified them by basic network architecture where we divide them up into classic
ANNs with standard activation functions (denoted “ANN”), ANNs with different func-
tions and different architectures (denoted “no ANN”), e.g., Ridge Polynomial or Ab-
ductive Networks and hybrid methods consisting of at least a two stage approach ap-
plying classic ANN and other methods, e.g., fuzzy functions (denoted “hybrid
method”). Secondly we divide them up due to their properties, where we distinguished
between modular networks (“modular”) that can easily be expanded using additional
layers, feedforward and feedbackward architectures, interlayer networks that allow
connections within each layer, convolutional networks and network architectures that
allow recurrence using time features. Since hybrid methods can both be related to ar-
chitecture as well as features (e.g., using hybrid activation functions), we added hybrid
to the distinction of properties as well. Fig. 2 shows the correlation between the two
distinction schemes (basic network architecture as blue, red and green bars vs. prop-
erties on the x-axis). It can be derived from Fig 2. that some neural network properties
occur more often than others. While the property of modular networks is uniformly
distributed, networks that do not have a classical ANN as base network (no ANN) are
more frequently used to directly model time dependencies. The same applies to feed-
backward architectures, since they require special base networks in order to work
properly. We rarely see convolution or interlayer architectures in time series analysis,
although recent publications suggest that they are used more frequently now, due to
their ability to reduce dimensionality. Hybrid methods are used more often with classic
                                                                                       5


ANN based systems rather than with other base networks, since the added method most
of the time replaces the complex and different no ANN network.


4.2    Dependencies of neural networks and time series characteristics
Since in most cases from our literature review hybrid methods or special networks (no
ANN) are used to model time series features we depict the correlation between hybrid
modeling and no ANN with time series characteristics in Fig. 3. We clearly see that
with most of the features like chaos or non-stationary present, hybrid models are far
more often used, since standard ANNs are in some cases not able to achieve satisfactory
results by themselves but rather need another additional method that either precedes or
succeeds the application of the neural network. A special focus should also be given to
the time architectural features of ANNs (e.g., LSTM, RNN) and their over proportional
use when it comes to non-stationarity, chaos and especially noise. As in the classical
models like GARCH or ARIMA, the noise is most likely autocorrelated and therefore
has to be dealt with accordingly or otherwise it will be included into the model and will
introduce a prediction bias. Considering the two advanced ANN classes “no ANN“ and
“hybrid method”, it is apparent from the descriptive analysis, that some special treat-
ment of time series data is needed and that the data cannot simply be processed by an
out-of-the-box algorithm. When we look into the detailed algorithm classes of either
class, we find a variety of different methods like Ridge Polynomial or Abductive Net-
works as well as fuzzy logic operators used as hybrid functions with classical ANNs
that do not correlate with time series features in this small literature sample.
           Fig. 2. Types of network architectures with base network dependencies
6


               Fig. 3. Time Series Characteristics and ANN model




5      Summary and outlook

Since time series data plays an increasingly large role in economic applications not only
in the finance sector but also in the context of event and process as well as failure data
in the context of Industry 4.0, the knowledge base is in need of guidelines on how to
design networks in order to handle certain types of time series with different character-
istics. Towards that notion, we conducted an extended literature review with regard to
time series characteristics and neural network architecture. We showed that several of
those characteristics are of great concern in the literature and that certain hybrid or non-
base ANNs are built for special cases. However, there are is no correlation between
certain types of characteristics and those algorithms. This can be explained by the spe-
cial designs of hybrid functions and networks, such as wavelet networks that are spe-
cifically designed and configured to deal with one particular problem. In general the
transparency on how different time series characteristics can be dealt with in the context
of neuronal networks remain foggy. As this work continues it will conduct extensive
benchmarking studies based on simulated and real time events in order to find useful
correlations between time series characteristics and neural network architecture in the
context of prediction problems.
                                                                                     7


6      References

1.  Liu, W., Zidong;Liu, Xiaohui;Zeng, Nianyin;Liu, Yurong;Alsaadi, Fuad E.: A sur-
    vey of deep neural network architectures and their applications. Neurocomputing.
    234, 11‐26 (2017).
2. Tu, J.V.: Advantages and disadvantages of using artificial neural networks versus
    logistic regression for predicting medical outcomes. J. Clin. Epidemiol. 49, 1225‐
    1231 (1996).
3. Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Netw.
    61, 85–117 (2015).
4. Längkvist, M., Lars;Loutfi, Amy: A review of unsupervised feature learning and
    deep learning for time-series modeling. Pattern Recognit. Lett. 42, 11‐24 (2014).
5. Tkáč, M., Robert: Artificial neural networks in business: Two decades of research.
    Appl. Soft Comput. 38, 788‐804 (2016).
6. Brockwell, P.J., Davis, R.A.: Introduction to Time Series and Forecasting.
    Springer International Publishing (2016).
7. Franses, P.H., Van Dijk, D.: Forecasting stock market volatility using (nonlinear)
    GARCH models. J. Forecast. 229–235 (1996).
8. Farmer, J.D., Sidorowich, J.J.: Predicting chaotic time series. Phys. Rev. Lett. 59,
    845 (1987).
9. Mann, M.E., Lees, J.M.: Robust estimation of background noise and signal detec-
    tion in climatic time series. Clim. Change. 33, 409–445 (1996).
10. Fettke, P.: State-of-the-Art des State-of-the-Art - Eine Untersuchung der For-
    schungsmethode          Review       innerhalb     der      Wirtschaftsinformatik.
    Wirtschaftsinformatik. 48, 257–266 (2006).
11. Webster, J., Watson, R.T.: Analyzing the past to prepare for the future: Writing a.
    MIS Q. 26, 13–23 (2002).
12. Nickerson, R.C. ;Varshne., Upkar;Muntermann, Jan: A method for taxonomy de-
    velopment and its application in information systems. Eur. J. Inf. Syst. 22, 336‐
    359 (2013).