=Paper=
{{Paper
|id=Vol-1649/123
|storemode=property
|title=Early Failure Detection for Predictive Maintenance of Sensor Parts
|pdfUrl=https://ceur-ws.org/Vol-1649/123.pdf
|volume=Vol-1649
|authors=Tomáš Kuzin, Tomáš Borovička
|dblpUrl=https://dblp.org/rec/conf/itat/KuzinB16
}}
==Early Failure Detection for Predictive Maintenance of Sensor Parts==
ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 123–130 http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, c 2016 T. Kuzin, T. Borovička Early Failure Detection for Predictive Maintenance of Sensor Parts Tomáš Kuzin, Tomáš Borovička Faculty of Information Technology, Czech Technical University in Prague, Prague, The Czech Repubic ❦✉③✐♥t♦♠❅❢✐t✳❝✈✉t✳❝③, t♦♠❛s✳❜♦r♦✈✐❝❦❛❅❢✐t✳❝✈✉t✳❝③ Abstract: Maintenance of a sensor part typically means current condition or predict failure of sensors based on renewal of the sensor in regular intervals or replacing the their own measurements and propose an optimal time for malfunctioning sensor. However optimal timing of the re- their replacement in order to avoid failures. placement can reduce maintenance costs. The aim of this article is to suggest a predictive maintenance strategy for 2 Related Work sensors using condition monitoring and early failure de- tection based on their own collected measurements. Several articles and works on "classical" predictive main- Three different approaches that deal with early failure tenance and condition monitoring [1, 2] were published in detection of sensor parts are introduced 1) approach based the literature. Predictive maintenance strategy is usually on feature extraction and status classification, 2) approach a rule-based maintenance grounded on on-line condition based on time series modeling and 3) approach based on monitoring, which relies on an appropriately chosen set anomaly detection using autoencoders. All methods were of external sensors. The proper sensor set plays the key illustrated on real-world data and were proven to be appli- role [2]. Unfortunately none of these techniques are useful cable for condition monitoring. if it is needed to monitor the state of sensors themselves. Moreover, many published works base their approaches 1 Introduction on sensor networks, where malfunction of one sensor can be identified utilizing measurements of other sensors in In the last decade the amount of used sensors across all the network. However, this paper focuses on "standalone" sectors has significantly raised. This is important and a sensors where no more devices sensing the same or corre- still continuing trend. lated phenomena are available. Thus these approaches use In the classical concept, predictive maintenance takes only measurements of the sensor itself. Since there are not place when the maintained asset is expensive or important many available publications for this case, further review for key business processes. In other words when proper is focused on categorization of faults and fault detection utilization of the machinery has important economic or techniques of both the sensors and sensor networks. safety consequences. This is not the characteristic case Sensors provide a huge amount of information about ob- of sensor parts which are usually cheap and play a minor served phenomena. However, to make meaningful conclu- role. For such assets maintenance typically means sim- sions, the quality of the data has to be ensured. Sensors ple replacement and reactive maintenance strategy would alone can malfunction and that can distort an image of the be the most common choice. However, machines become phenomena. Most of the methods follow a common frame- more and more dependent on sensor parts and that brings work, characterize the normal behavior of sensor readings, new challenges in their maintenance. Proper timing of re- identify significant deviations and mark them as faults. placement has direct influence on maintenance expenses. In case of sensor networks the most frequent types of Especially in cases where other processes depend on the faults have been described and categorized by Ni, K. et sensor readings and the sensor failure or malfunction may al[3]. They describe two distinct approaches to deal with stop the operation or cause collateral loses of a machinery. faults. The first is a data-centric view which examines the In case of sensors "classical" condition monitoring data collected by a given sensor and describes fault mod- scheme utilizing properly chosen set of external sensors els based on data features. In contrary there is a system- makes no sense. On the other hand sensors themselves centric view which examines physical malfunctions of a provide on-line measurements during their whole opera- sensor and how those may manifest themselves in the re- tional service. These data may be exploited to estimate the sulting data. According to Ni et. al. these two views are current state of the measuring device. Therefore applying related to one another and every fault can be mapped be- smarter maintenance strategy for sensor parts makes per- tween these two. The important fault categories discussed fect sense and may introduce significant savings. in [3] are summarized in Table 1. In this article the focus This article deals with the possibilities of smarter main- is on the data centric point of view. tenance strategies for sensor parts. The main idea is to ap- Sharma, A.B. et al.[4] loosely follow on the work of Ni ply machine learning techniques in order to monitor the et al. and propose specific algorithms for fault detection. 124 T. Kuzin, T. Borovička They focus only on a subset of fault types examined in [3] described through autocorrelations in measurements col- and summarized in Table 1. lected by a single sensor. These can be used to create a regressive model of sensed phenomena. A sensor mea- surement can be than compared against its predicted value Table 1: Taxonomy of Faults described by Ni et al.[3]. to determine if it is faulty. Data-centric point of view Advantage is that this approach is more general than ❋❛✉❧t ❉❡❢✐♥✐t✐♦♥ classification and can be used even if there are no labeled Outlier Isolated data point or sensor unexpect- data available nor multiple strongly correlated sensors. edly distant from models. Spike Multiple data points with a much greater Learning-based Methods use training data to infer than expected rate of change. model of "normal" sensor behavior. If the "normal" sen- “Stuck-at” Sensor values experience zero variation sor behavior and the effects of sensor faults are well un- for an unexpected length of time. derstood, learning-based methods may be suitable to de- High Noise or Sensor values experience unexpectedly tect and classify sensor faults. In [4] authors successfully Variance high variation or noise. use Hidden Markov Models to construct a model of sen- sor measurements. The main advantage of learning based Four different classes of approaches for detecting above methods is that they can simultaneously detect and classify mentioned faults are discussed. faults. Rule-based Methods use domain knowledge to develop 3 Preliminaries heuristic constraints that the sensor readings must satisfy. Violations of those constraints imply faults. For above 3.1 Classification mentioned fault types following simple rules are typically used [4]: In the terminology of machine learning, classification is The variance (or the standard deviation) of the sample considered an instance of supervised learning, i.e. ma- readings within a window of size wsize is computed. If chine learning technique where a training set of correctly it is above a certain threshold, the samples are corrupted identified observations is available [5]. The main goal by the noise fault. If the variance is zero the samples are of classification is assigning a new observation X to one corrupted by the constant fault. In order to detect short from a finite set of categories with the use of the training noise faults, the data had to be appropriately preprocessed. data set containing instances whose category membership If the rate of change is above a threshold, it can be assumed is known. that the data were affected by short faults. Every instance of the input dataset is a vector X = The performance of this method strongly depends on (x1 , x2 , . . . , xd ) typically called feature vector, where d is parameters wsize and the threshold. Parameter setting is the number of features (0 < i <= d) and xi is the value not trivial and usually requires domain knowledge of the of the it h feature. Every instance belongs to one of the k examined problem. classes C = c1 , c2 , . . . , ck . The classification process consists of two phases. In the first phase, called learning phase, the training data set with Estimation-based Methods can be used when a physi- labels is used to build a model. It means that the knowl- cal phenomena is sensed concurrently by multiple sensors edge from reference data is being extracted and stored in and dependence between sensor measurements can be ex- form of a model. In the second phase, the model is used ploited to generate estimates for the individual sensor mea- to classify unlabeled data. This phase is often called re- surements. The dependence can be expressed by spatial call. An algorithm that implements classification is called correlation. Regardless of the cause of the correlation, it classifier. can be used to model the normal behavior. The estima- tion can be done for example by Linear Least-Squares Es- timation. This method is most suitable for cases when the Naive Bayes In machine learning naive Bayes classifiers phenomena is sensed by almost identical sensors. As an are a family of probabilistic classifiers based on Bayes the- example one can imagine multiple barometric altimeters orem. It assumes that a value of a particular feature is in- on a single aircraft. In this case there is a strong presump- dependent of a value of any other feature, given the class tion that the values are strongly correlated. variable [6]. This assumption is often violated in prac- tice but even though Naive Bayes classifier is still powerful classification techniques. Time-series-based Methods utilize the fact, that mea- Learning naive Bayes model proceeds with calculation surements of a sensor are not random and therefore con- of probabilities from the training data set. The probability tain some kind of regular patterns. This patterns can be to be estimated is a conditional probability P(c j |x1 , ..., xd ) Early Failure Detection for Predictive Maintenance of Sensor Parts 125 for each class c j when object X = (x1 , x2 , . . . , xd ) is given The ARIMA Model ARIMA (autoregressive integrated [7]. moving average model) is a general time series model. It Using the Bayes rule combines two independent models, autoregressive (AR) and moving-average (MA). They are combined in a sin- gle equation (Equation 4). By convention the AR terms P(A)P(B | A) P(A | B) = (1) are added and the MA terms are subtracted. P(B) the posterior probability can be expressed by Equation xt = C + ϕ1 · xt−1 + · · · + ϕ p · xt−p − θ1 · εt−1 − · · · − θq · εt−q (4) P(c j )P(x1 , . . . , xd | c j ) where P(c j | X1 , . . . , Xd ) = , (2) P(x1 , . . . , xd ) • xi is i-th element of the series, where • C is a constant, • P(c j | x1 , . . . , xd ) is the posterior probability of class c j when object X = (x1 , x2 , . . . , xd ) is given. • ϕ1 , ϕ2 are parameters of the autoregressive model, • P(c j ) is the prior probability of class c j . • εi is random error component of i-th member of the series. • P(x1 , . . . , xd | c j ) is the posterior probability of an ob- ject X = (x1 , . . . , xd ) when class c j is given. We call • θ1 , θ2 are parameters of the moving average model. this probability likelihood. ARIMA models are extensively examined in literature. • P(x1 , . . . , xd ) is the prior probability of an object X = For more information the reader is reffered to [9] or [10]. (x1 , x2 , . . . , xd ). The resulting model is represented by prior probabilities 3.3 Artificial Neural Networks of each class and likelihood probabilities for each combi- nation of class and feature. The likelihoods are usually Artificial neural network is an information processing represented by a mean and variance of normal distribution paradigm inspired by biological nervous systems. It is estimated from the training set. composed of a large number of highly interconnected pro- The recall of naive Bayes algorithm is done by looking cessing units (neurons) working in unity to solve a specific up the prior and likelihood probabilities which belong to problems. input data and calculating posterior probabilities for each A neuron is a simplistic model of a biological neural class. Thanks to the assumption of strong conditional in- cell. Each neuron has one or more inputs and produces dependence between all features conditioned by the class, single output. The inputs simulate the stimuli signals that the likelihood can be calculated as follows. the neuron gets from other neurons, while the output sim- ulates the response signal which the neuron generates. n The biological neuron fires (i.e generates the response P(x1 , . . . , xd | c j ) = ∏ P(xi | c j ) (3) signal) only if the gathered stimuli signals exceed a cer- i=1 tain threshold. In other word the neuron fires only if the The resulting class is determined by the highest poste- stimuli − treshold > 0. In the context of ANNs the term rior probability. bias b is used instead of "threshold"1 . The artificial equivalent to gathered stimuli signals is 3.2 Time Series Modeling called inner potential (ξ ) and typically is defined as a weighted sum of the input signals plus the bias. Each input Time series is a series of observations of a process or an (x j ) is multiplied by a specific real number w j called the event in equal time intervals. It is called time series, be- weight. These weights are parameters of each neuron. The cause the observations are usually taken with respect to calculation of inner potential is summarized in Equation 5. time. This is however not necessity, because the observa- tions may be taken with respect to space as well [8]. ξ = ∑ wj ∗xj +b = W ·X +b (5) Modeling techniques try to find a model which de- all j scribes the series, i.e. a model capable to generate iden- The actual output is obtained by applying activation tical series. The model may help to better understand the function ϕ(·) on the gathered inner potential. There can underlying phenomena or serve as forecasting tool to pre- be used variety of activation functions. Very popular for dict future values of the series. Stochastic models like ARIMA assume that the time se- ries consist of regular pattern manifesting the underlying phenomena and a random noise. 1 Due the conventions bias = (−1 · threshold). 126 T. Kuzin, T. Borovička its properties is sigmoid function, where the output of the series with minimum of 14 days measurements before the neuron y is given by formula in Equation 6. sensor failed. The aim is to label the sensor faulty within two days before the failure. More than two days before 1 the failure the sensor can be considered faultless. All three y = ϕ(ξ ) = (6) 1 + e−ξ approaches are described in detail in the following subsec- For more complex tasks like anomaly detection a sin- tions. gle neuron is not powerful enough and therefore more complex structures are introduced. A neural network is a 4.1 Classification-based approach group of neurons connected together. Connecting neurons to form a ANN can be done in various ways. The first suggested approach is based on supervised learn- Networks where the neurons are arranged in separate ing, namely classification. Supervised learning techniques layers and the output from one layer is used as an input require examples with labels to learn from. This approach, to the next layer are called feed-forward networks. This therefore, requires information about failures to prepare means there are no loops in the network and information the labels. If no information about the failures is available is always fed forward, never fed back. and the labels can not be supplied this approach can not be ANNs, like their biological artworks, learn by example. applied. Therefore in order to train a neural network a set of in- The sensor readings are in a form of a time-series. Slid- put examples with known expected responses is necessary. ing window of N measurements is used to calculate the Classical method of training ANNs is called "backpropa- feature vector for classification. The raw measurements it- gation" which is an abbreviation for "backward propaga- self can be used directly as a feature vector, however, the tion of errors". dimensionality is then equal to the size of the sliding win- Typical goal in a training of neural networks is to dow multiplied by the number of measured phenomenons. find weights W = (w1 , . . . , wk ) and biases B = (b1 , . . . , bl ) Typically, simple features (such as variance, average, me- which minimize the error or cost function C(W, B) over all dian or slope) or more complex features (e.g. Fourier or instances in the training set. wavelet coefficients) are extracted from the sliding win- More specific information about different ANN types dow [14, 15, 16]. For on-line condition monitoring the can be found in literature [11, 12]. feature vector is extracted from a window aligned with the most current readings. The instance is then classi- fied by pre-trained classifier. If the instance is classified Autoencoder is a specific type of feed-forward neural net- as "failed" the current condition of the sensor is evaluated work, with an input layer, an output layer and one or more as faulty. hidden layers. The main properties of an autoencoder are, In order to train a model the labels have to be prepared. that the output layer has the same number of neurones as To prepare the training dataset historical readings and a the input layer and instead of being trained to predict some set of times related to the failures or generally the events target value Y given inputs X, autoencoders are trained to to be detected are used. To obtain faulty instance sliding reconstruct their own inputs X ′ . window is placed over the readings of a failed sensor and Especially interesting are autoencoders, where hidden aligned with the time of failure. A feature vector is ex- layers have less nodes than input/output layer. Such a net- tracted from such a window and marked with label "failed" work is forced to comprehend nonlinear, reduced repre- (i.e. class y=1). For each failure one instance with a label sentation of the original data. "failed" is obtained. Non-faulty instances can be extracted Such a autoencoder network can have a variety of uses. by sliding the window over the time series of non-failed They can serve for non linear dimensionality reduction, sensor2 . data compression or to learn generative model of the However, by using every possible shift unnecessarily data[13]. large number of instances is obtained. Therefore, non- faulty instances are extracted by placing the window ran- 4 Approach domly over readings. Extracted feature vectors are marked with label ’ok’ (i.e. class y=0). In this case the ratio be- Influenced by related work reviewed in the Section 2, three tween classes can be easily controlled. The whole process different approaches to deal with condition monitoring of demonstrates Figure 1. sensor parts are introduced. Each approach is based on The number of features is reduced with iterative forward a different principle; the first approach is based on fea- feature selection method. Initially a model is trained with ture extraction and status classification, the second ap- only one feature, in each iteration one feature as added and proach is based on time series modeling and the third model is retrained. If the new model performs significantly approach is based on anomaly detection using autoen- better than the previous, the feature is kept in the feature coders. Approaches are illustrated on data set with mea- vector, otherwise the feature is discarded. surements from 2000 accelerometers (hereafter referred as 2 Non-failed sensor is a sensor for which do not exist any record of sensors). For each sensor the data set contains one time failure. Early Failure Detection for Predictive Maintenance of Sensor Parts 127 Figure 2: Working scheme of regression-model-based ap- proach duce random component and get the most precise predic- tions Monte Carlo principle is typically engaged to gener- Figure 1: Working scheme of creating the training set. ate multiple predictions. The final prediction is obtained as a mean value of k predicted values. Knowing how the prediction is obtained allows us to Classification model is trained with extracted feature create hypothesis about the expected value and construct vectors to recognize faulty and non-faulty instances. Ar- a confidence interval for the predicted value as shown in bitrary classifier can be used. The aim of this article is to Figure 3. prove the concept that classification can be used for con- dition monitoring and thus maintenance strategy for sen- sor parts. Therefore for simplicity and interpretability the Naive Bayes classifier is applied. In Naive Bayes the instance is typically classified to a class with higher posterior probability. To increase con- fidence of positive classification the minimal threshold value of posterior probability can be set on the class with failed instances. With this threshold of minimal probabil- ity for positive class can be controlled trade-off between sensitivity and specificity of the naive Bayes classifier. With a higher threshold the classifier will be more certain about the prediction, however, it may mark more failures Figure 3: ARIMA model predictions with the confidential as non-faulty and vice versa. interval. 4.2 Time-series Modeling-based approach If the actual reading of a sensor is out of the confidence The second approach basically follows the method sug- interval of the corresponding predicted value the sensor is gested in [4]. It assumes that malfunction of a sensor is marked as ‘faulty’. preceded by an abnormal behavior. The working princi- ple basically follows the common framework for anomaly detection. It uses time-series modelling in order to model 4.3 Autoencoder-based approach "normal" sensor behavior. A regressive model is trained on the historical measure- The last suggested approach is, similarly to the previous ments of a specific sensor and used to generate predictions. approach, based on an assumption that the failure of a sen- The ARIMA model[9, 10] is general regressive model sor is preceded by its anomalous behavior. In this particu- popular in time-series modeling. Especially in cases, when lar case auto-encoders are utilized to detect anomalies. the time-series contains significant regular patterns, which Inputs to the autoencoder network are the raw values is more or less the case of sensor readings [4]. For that from a sliding window drawn over historical measure- reason the general ARIMA model is used to obtain the ments of the sensor. However, it is also possible to extract predictions. different features and use them as inputs of the autoen- Predicted values are compared with the actual readings coder. As a result, this method requires a certain amount and if the difference is higher than a certain threshold, of historical data, in order to train an autoencoder network. measurements are marked as faulty. The whole working scheme is shown in Figure 4. The working scheme is depicted in Figure 2. The structure of an autoencoder is defined by following The ARIMA model prescription contains random mem- parameters: size of the input and output layer, number of bers, therefore it is a stochastic process. In order to re- hidden layers and number of nodes in the hidden layers. 128 T. Kuzin, T. Borovička Figure 4: Working scheme of anomaly-detection-based approach. Figure 5: Histogram of reconstruction errors. Since the raw data from the sensor are used as inputs to the autoencoder network, the number of nodes in the input and also the output layer is determined by size of the curve the TPR (i.e. True Positive Rate or Sensitivity) is sliding window. Influenced by [13] the autoencoder has plotted in function of the FPR (False Positive Rate or (1- three hidden layers. The number of neurons is related to Specificity)) for different setting of model’s parameters. the number of neurons in the input / output layer. Let n be the number of input respectively output neurons than the hidden layers have 0.75n, 0.5n, 0.75n neurons. The output of an autoencoder itself is not especially in- teresting. Rather a reconstruction error, defined as mean squared error between the real measurements and output of the autoencoder, is calculated. Let X = (x1 . . . xn ) be the input vector of an autoencoder network and X ′ = (x1′ . . . xn′ ) is the corresponding output, the reconstruction error is 1 n RE(X) = (xi − xi′ )2 n∑1 If the reconstruction error is higher than a certain thresh- old τ the current condition of a sensor is marked as "faulty". The threshold is estimated with a heuristic method. The Figure 6: Classification-based approach - ROC curve. main idea is to consider the reconstruction error being a random variable. Then the underlying distribution of the random variable can be easily estimated. Having a distri- bution of the reconstruction error, if the value of the er- 5.2 Time-series Modeling-based Approach ror does not lie in a right-sided (upper) confidence interval with confidence level α it is marked "faulty". In order to evaluate this approach on predicting failures the method is evaluated as a binary classifier. P(RE(X) < τ) = 1 − α A window of a size M is placed before the time of a failure and if an anomaly is within the window, the failure Figure 5 demonstrated the histogram which is used in is considered as detected. If an anomaly is detected outside order to estimate the underlying distribution function of of this window it is considered as false positive detection. the reconstruction error. As presented in the section 4.2 this method marks as anomalies all the moments, where the actual reading is not 5 Experimental Results within the confidence interval. The level of significance α can be set explicitly, and its 5.1 Classification-based Approach effect can be examined. In Figure 7 are shown detected anomalies for α = 0.0015. The red segments mark the Having labeled data, performance of a classifier can be times of failures. easily measured. TPR (true positive rate) is defined as The experiment is repeated multiple times for different number of detected failures to the number of all failures α. The results are presented by the ROC curve showed in a given dataset. FPR (false positive rate) is defined as in Figure 8. Each point on the ROC curve represents a the number of positively identified to the number of all TPR/FPR pair corresponding to a particular value of α. It negative samples in a dataset. The acquired results are pre- demonstrates how the sensitivity versus specificity can be sented by the ROC curve showed in Figure 6. In a ROC controlled by choosing the α. Early Failure Detection for Predictive Maintenance of Sensor Parts 129 Figure 7: Time-series modeling based approach - detected Figure 9: Anomaly-detection-based approach - detected failures. failures. Figure 8: TS-modeling-detection-based approach - ROC Figure 10: Anomaly-detection-based approach - ROC curve. curve. 5.3 Autoencoder-based Approach applicable with various sensor devices. All three meth- In order to evaluate the autoencoder-based method, the ods exploit different principles and hence have different same procedure as in the case of time-series modeling assumptions and requirements. based approach, is used. In Figure 9 is shown the resulting Classification based approach utilizes labels if available. series of the reconstruction errors. The red-marked points If not this approach is not applicable. The other two ap- are the moments of failure(i.e. the event one intend to pre- proaches are more general since they do not require any dict). It is visible that the time of failure is preceded by meta-data and work just with the sensor measurements. significant raise of reconstruction error. However, there However, both assume that the failure is preceded by are also other anomalous moments(peaks in the series of anomalous behavior. The time series modeling approach reconstruction errors), that are not related to the incoming exploits the fact that sensors measurements are in a form of failure of the sensor. Having the domain knowledge of the time-series and often contain regular patterns, which man- sensor operation, those can be easily explained, since they ifest themselves in a form of autocorrelations. Therefore are related to the observed phenomena. they it can be described by a model. The autoencoder- The ROC curve in Figure 10 presents the results of this based approach contrary to the time-series modeling does approach. Sensitivity and specificity trade-off is controlled not model the "normal" behavior. by the level o f signi f icance described in the the Section All methods were able to detect failures before they oc- 4.3. curred and thus proved to be applicable for condition mon- itoring and utilized for predictive maintenance of sensor 6 Conclusion parts. Further more, all the approaches can be parametrize to find an ideal trade-off between sensitivity and specificity Three different approaches to deal with the condition mon- of the prediction. The best results has the approach based itoring and predictive maintenance of sensors have been on classification. This can be expected considering the fact described and illustrated on real-world data. All those ap- that, unlike the other two approaches, it uses additional proaches are chosen with a regard to be general and thus meta-data (labels) about the sensor failures. 130 T. Kuzin, T. Borovička References [1] Inc., M. A. Common maintenance strategies. 2015, [On- line; accessed 17-January-2016]. Available from: ❤tt♣s✿ ✴✴✇✇✇✳♠❛✐♥t❡♥❛♥❝❡❛ss✐st❛♥t✳❝♦♠✴ [2] Kennedy, S. New tools for PdM. 2006, [Online; ac- cessed 17-February-2016]. Available from: ❤tt♣✿✴✴✇✇✇✳ ♣❧❛♥ts❡r✈✐❝❡s✳❝♦♠✴❛rt✐❝❧❡s✴✷✵✵✻✴✵✼✷✴ [3] Ni, K.; Ramantahan, N.; Nabil, M.; et al. Sensor Network Data Fault Types. ACM Transactions on Sensor Networks, volume 5, no. 3, May 2009. [4] Sharma, A. B.; Golubchi, L.; Govindan, R. Sensor Faults: Detection Methods and Prevalence in Real-World Datasets. ACM Transactions on Sensor Networks, volume 6, no. 3, June 2010. [5] Alpaydin, E. Introduction to machine learning. MIT Press, second edition, 2010, ISBN 978-0-262-01243-0. [6] Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach. Prentice Hall, second edition, 2003, ISBN 978- 0137903955. [7] StatSoft. Naive Bayes Classifier. 2016, [Online; ac- cessed 19-February-2016]. Available from: ❤tt♣✿✴✴✇✇✇✳ st❛ts♦❢t✳❝♦♠✴t❡①t❜♦♦❦✴♥❛✐✈❡✲❜❛②❡s✲❝❧❛ss✐❢✐❡r [8] Vu, K. M. Optimal Discrete Control Theory: The Rational Function Structure Model. Ottawa: AuLac Technologies, 2007, ISBN 978-0-9783996-0-3, 51–99 pp. [9] Nau, R. Lecture notes on forecasting. 2014, [On- line; accessed 19-February-2016]. Available from: ❤tt♣✿✴✴♣❡♦♣❧❡✳❞✉❦❡✳❡❞✉✴⑦r♥❛✉✴❙❧✐❞❡s❴♦♥❴ ❆❘■▼❆❴♠♦❞❡❧s✲✲❘♦❜❡rt❴◆❛✉✳♣❞❢ [10] Lu, Y.; Simaan, M. A. Automated Box–Jenkins forecast- ing modelling. Elsevier Automation in Construction 18, November 2008: pp. 547–558. [11] Nielsen, M. Neural Networks and Deep Learning. De- termination Press, 2015. Available from: ❤tt♣✿✴✴ ♥❡✉r❛❧♥❡t✇♦r❦s❛♥❞❞❡❡♣❧❡❛r♥✐♥❣✳❝♦♠✴✐♥❞❡①✳❤t♠ [12] Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning, 2016, book in preparation for MIT Press. Available from: ❤tt♣✿✴✴✇✇✇✳❞❡❡♣❧❡❛r♥✐♥❣❜♦♦❦✳♦r❣ [13] Candel, A.; Lanford, J.; LeDell, E.; et al. Deep Learning with H2O. 2015, third Edition. Available from: ❤tt♣s✿✴✴ ❤✷♦✳❣✐t❜♦♦❦s✳✐♦✴❞❡❡♣✲❧❡❛r♥✐♥❣✴ [14] Fu, T.-c. A review on time series data mining. Engineer- ing Applications of Artificial Intelligence, volume 24, no. 1, 2011: pp. 164–181. [15] Chen, Y.; Nascimento, M. A.; Ooi, B. C.; et al. Spade: On shape-based pattern detection in streaming time series. In Data Engineering, 2007. ICDE 2007. IEEE 23rd Interna- tional Conference on, IEEE, 2007, pp. 786–795. [16] Xing, Z.; Pei, J.; Philip, S. Y.; et al. Extracting Interpretable Features for Early Classification on Time Series. In SDM, volume 11, SIAM, 2011, pp. 247–258.