=Paper= {{Paper |id=Vol-1649/123 |storemode=property |title=Early Failure Detection for Predictive Maintenance of Sensor Parts |pdfUrl=https://ceur-ws.org/Vol-1649/123.pdf |volume=Vol-1649 |authors=Tomáš Kuzin, Tomáš Borovička |dblpUrl=https://dblp.org/rec/conf/itat/KuzinB16 }} ==Early Failure Detection for Predictive Maintenance of Sensor Parts== https://ceur-ws.org/Vol-1649/123.pdf

ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 123–130
http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, c 2016 T. Kuzin, T. Borovička

Early Failure Detection for Predictive Maintenance of Sensor Parts

Tomáš Kuzin, Tomáš Borovička

Faculty of Information Technology,
Czech Technical University in Prague,
Prague, The Czech Repubic
❦✉③✐♥t♦♠❅❢✐t✳❝✈✉t✳❝③,
t♦♠❛s✳❜♦r♦✈✐❝❦❛❅❢✐t✳❝✈✉t✳❝③

Abstract: Maintenance of a sensor part typically means current condition or predict failure of sensors based on
renewal of the sensor in regular intervals or replacing the their own measurements and propose an optimal time for
malfunctioning sensor. However optimal timing of the re- their replacement in order to avoid failures.
placement can reduce maintenance costs. The aim of this
article is to suggest a predictive maintenance strategy for
2 Related Work
sensors using condition monitoring and early failure de-
tection based on their own collected measurements. Several articles and works on "classical" predictive main-
Three different approaches that deal with early failure tenance and condition monitoring [1, 2] were published in
detection of sensor parts are introduced 1) approach based the literature. Predictive maintenance strategy is usually
on feature extraction and status classification, 2) approach a rule-based maintenance grounded on on-line condition
based on time series modeling and 3) approach based on monitoring, which relies on an appropriately chosen set
anomaly detection using autoencoders. All methods were of external sensors. The proper sensor set plays the key
illustrated on real-world data and were proven to be appli- role [2]. Unfortunately none of these techniques are useful
cable for condition monitoring. if it is needed to monitor the state of sensors themselves.
Moreover, many published works base their approaches
1 Introduction on sensor networks, where malfunction of one sensor can
be identified utilizing measurements of other sensors in
In the last decade the amount of used sensors across all the network. However, this paper focuses on "standalone"
sectors has significantly raised. This is important and a sensors where no more devices sensing the same or corre-
still continuing trend. lated phenomena are available. Thus these approaches use
In the classical concept, predictive maintenance takes only measurements of the sensor itself. Since there are not
place when the maintained asset is expensive or important many available publications for this case, further review
for key business processes. In other words when proper is focused on categorization of faults and fault detection
utilization of the machinery has important economic or techniques of both the sensors and sensor networks.
safety consequences. This is not the characteristic case Sensors provide a huge amount of information about ob-
of sensor parts which are usually cheap and play a minor served phenomena. However, to make meaningful conclu-
role. For such assets maintenance typically means sim- sions, the quality of the data has to be ensured. Sensors
ple replacement and reactive maintenance strategy would alone can malfunction and that can distort an image of the
be the most common choice. However, machines become phenomena. Most of the methods follow a common frame-
more and more dependent on sensor parts and that brings work, characterize the normal behavior of sensor readings,
new challenges in their maintenance. Proper timing of re- identify significant deviations and mark them as faults.
placement has direct influence on maintenance expenses. In case of sensor networks the most frequent types of
Especially in cases where other processes depend on the faults have been described and categorized by Ni, K. et
sensor readings and the sensor failure or malfunction may al[3]. They describe two distinct approaches to deal with
stop the operation or cause collateral loses of a machinery. faults. The first is a data-centric view which examines the
In case of sensors "classical" condition monitoring data collected by a given sensor and describes fault mod-
scheme utilizing properly chosen set of external sensors els based on data features. In contrary there is a system-
makes no sense. On the other hand sensors themselves centric view which examines physical malfunctions of a
provide on-line measurements during their whole opera- sensor and how those may manifest themselves in the re-
tional service. These data may be exploited to estimate the sulting data. According to Ni et. al. these two views are
current state of the measuring device. Therefore applying related to one another and every fault can be mapped be-
smarter maintenance strategy for sensor parts makes per- tween these two. The important fault categories discussed
fect sense and may introduce significant savings. in [3] are summarized in Table 1. In this article the focus
This article deals with the possibilities of smarter main- is on the data centric point of view.
tenance strategies for sensor parts. The main idea is to ap- Sharma, A.B. et al.[4] loosely follow on the work of Ni
ply machine learning techniques in order to monitor the et al. and propose specific algorithms for fault detection.
124 T. Kuzin, T. Borovička

They focus only on a subset of fault types examined in [3] described through autocorrelations in measurements col-
and summarized in Table 1. lected by a single sensor. These can be used to create a
regressive model of sensed phenomena. A sensor mea-
surement can be than compared against its predicted value
Table 1: Taxonomy of Faults described by Ni et al.[3].
to determine if it is faulty.
Data-centric point of view Advantage is that this approach is more general than
❋❛✉❧t ❉❡❢✐♥✐t✐♦♥ classification and can be used even if there are no labeled
Outlier Isolated data point or sensor unexpect- data available nor multiple strongly correlated sensors.
edly distant from models.
Spike Multiple data points with a much greater Learning-based Methods use training data to infer
than expected rate of change. model of "normal" sensor behavior. If the "normal" sen-
“Stuck-at” Sensor values experience zero variation sor behavior and the effects of sensor faults are well un-
for an unexpected length of time. derstood, learning-based methods may be suitable to de-
High Noise or Sensor values experience unexpectedly tect and classify sensor faults. In [4] authors successfully
Variance high variation or noise. use Hidden Markov Models to construct a model of sen-
sor measurements. The main advantage of learning based
Four different classes of approaches for detecting above methods is that they can simultaneously detect and classify
mentioned faults are discussed. faults.

Rule-based Methods use domain knowledge to develop 3 Preliminaries
heuristic constraints that the sensor readings must satisfy.
Violations of those constraints imply faults. For above 3.1 Classification
mentioned fault types following simple rules are typically
used [4]: In the terminology of machine learning, classification is
The variance (or the standard deviation) of the sample considered an instance of supervised learning, i.e. ma-
readings within a window of size wsize is computed. If chine learning technique where a training set of correctly
it is above a certain threshold, the samples are corrupted identified observations is available [5]. The main goal
by the noise fault. If the variance is zero the samples are of classification is assigning a new observation X to one
corrupted by the constant fault. In order to detect short from a finite set of categories with the use of the training
noise faults, the data had to be appropriately preprocessed. data set containing instances whose category membership
If the rate of change is above a threshold, it can be assumed is known.
that the data were affected by short faults. Every instance of the input dataset is a vector X =
The performance of this method strongly depends on (x1 , x2 , . . . , xd ) typically called feature vector, where d is
parameters wsize and the threshold. Parameter setting is the number of features (0 < i <= d) and xi is the value
not trivial and usually requires domain knowledge of the of the it h feature. Every instance belongs to one of the k
examined problem. classes C = c1 , c2 , . . . , ck .
The classification process consists of two phases. In the
first phase, called learning phase, the training data set with
Estimation-based Methods can be used when a physi- labels is used to build a model. It means that the knowl-
cal phenomena is sensed concurrently by multiple sensors edge from reference data is being extracted and stored in
and dependence between sensor measurements can be ex- form of a model. In the second phase, the model is used
ploited to generate estimates for the individual sensor mea- to classify unlabeled data. This phase is often called re-
surements. The dependence can be expressed by spatial call. An algorithm that implements classification is called
correlation. Regardless of the cause of the correlation, it classifier.
can be used to model the normal behavior. The estima-
tion can be done for example by Linear Least-Squares Es-
timation. This method is most suitable for cases when the Naive Bayes In machine learning naive Bayes classifiers
phenomena is sensed by almost identical sensors. As an are a family of probabilistic classifiers based on Bayes the-
example one can imagine multiple barometric altimeters orem. It assumes that a value of a particular feature is in-
on a single aircraft. In this case there is a strong presump- dependent of a value of any other feature, given the class
tion that the values are strongly correlated. variable [6]. This assumption is often violated in prac-
tice but even though Naive Bayes classifier is still powerful
classification techniques.
Time-series-based Methods utilize the fact, that mea- Learning naive Bayes model proceeds with calculation
surements of a sensor are not random and therefore con- of probabilities from the training data set. The probability
tain some kind of regular patterns. This patterns can be to be estimated is a conditional probability P(c j |x1 , ..., xd )
Early Failure Detection for Predictive Maintenance of Sensor Parts 125

for each class c j when object X = (x1 , x2 , . . . , xd ) is given The ARIMA Model ARIMA (autoregressive integrated
[7]. moving average model) is a general time series model. It
Using the Bayes rule combines two independent models, autoregressive (AR)
and moving-average (MA). They are combined in a sin-
gle equation (Equation 4). By convention the AR terms
P(A)P(B | A)
P(A | B) = (1) are added and the MA terms are subtracted.
P(B)
the posterior probability can be expressed by Equation
xt = C + ϕ1 · xt−1 + · · · + ϕ p · xt−p − θ1 · εt−1 − · · · − θq · εt−q
(4)
P(c j )P(x1 , . . . , xd | c j ) where
P(c j | X1 , . . . , Xd ) = , (2)
P(x1 , . . . , xd )
• xi is i-th element of the series,
where
• C is a constant,
• P(c j | x1 , . . . , xd ) is the posterior probability of class
c j when object X = (x1 , x2 , . . . , xd ) is given. • ϕ1 , ϕ2 are parameters of the autoregressive model,

• P(c j ) is the prior probability of class c j . • εi is random error component of i-th member of the
series.
• P(x1 , . . . , xd | c j ) is the posterior probability of an ob-
ject X = (x1 , . . . , xd ) when class c j is given. We call • θ1 , θ2 are parameters of the moving average model.
this probability likelihood.
ARIMA models are extensively examined in literature.
• P(x1 , . . . , xd ) is the prior probability of an object X = For more information the reader is reffered to [9] or [10].
(x1 , x2 , . . . , xd ).

The resulting model is represented by prior probabilities 3.3 Artificial Neural Networks
of each class and likelihood probabilities for each combi-
nation of class and feature. The likelihoods are usually Artificial neural network is an information processing
represented by a mean and variance of normal distribution paradigm inspired by biological nervous systems. It is
estimated from the training set. composed of a large number of highly interconnected pro-
The recall of naive Bayes algorithm is done by looking cessing units (neurons) working in unity to solve a specific
up the prior and likelihood probabilities which belong to problems.
input data and calculating posterior probabilities for each A neuron is a simplistic model of a biological neural
class. Thanks to the assumption of strong conditional in- cell. Each neuron has one or more inputs and produces
dependence between all features conditioned by the class, single output. The inputs simulate the stimuli signals that
the likelihood can be calculated as follows. the neuron gets from other neurons, while the output sim-
ulates the response signal which the neuron generates.
n The biological neuron fires (i.e generates the response
P(x1 , . . . , xd | c j ) = ∏ P(xi | c j ) (3) signal) only if the gathered stimuli signals exceed a cer-
i=1
tain threshold. In other word the neuron fires only if the
The resulting class is determined by the highest poste- stimuli − treshold > 0. In the context of ANNs the term
rior probability. bias b is used instead of "threshold"1 .
The artificial equivalent to gathered stimuli signals is
3.2 Time Series Modeling called inner potential (ξ ) and typically is defined as a
weighted sum of the input signals plus the bias. Each input
Time series is a series of observations of a process or an (x j ) is multiplied by a specific real number w j called the
event in equal time intervals. It is called time series, be- weight. These weights are parameters of each neuron. The
cause the observations are usually taken with respect to calculation of inner potential is summarized in Equation 5.
time. This is however not necessity, because the observa-
tions may be taken with respect to space as well [8]. ξ = ∑ wj ∗xj +b = W ·X +b (5)
Modeling techniques try to find a model which de- all j
scribes the series, i.e. a model capable to generate iden-
The actual output is obtained by applying activation
tical series. The model may help to better understand the
function ϕ(·) on the gathered inner potential. There can
underlying phenomena or serve as forecasting tool to pre-
be used variety of activation functions. Very popular for
dict future values of the series.
Stochastic models like ARIMA assume that the time se-
ries consist of regular pattern manifesting the underlying
phenomena and a random noise. 1 Due the conventions bias = (−1 · threshold).
126 T. Kuzin, T. Borovička

its properties is sigmoid function, where the output of the series with minimum of 14 days measurements before the
neuron y is given by formula in Equation 6. sensor failed. The aim is to label the sensor faulty within
two days before the failure. More than two days before
1 the failure the sensor can be considered faultless. All three
y = ϕ(ξ ) = (6)
1 + e−ξ approaches are described in detail in the following subsec-
For more complex tasks like anomaly detection a sin- tions.
gle neuron is not powerful enough and therefore more
complex structures are introduced. A neural network is a 4.1 Classification-based approach
group of neurons connected together. Connecting neurons
to form a ANN can be done in various ways. The first suggested approach is based on supervised learn-
Networks where the neurons are arranged in separate ing, namely classification. Supervised learning techniques
layers and the output from one layer is used as an input require examples with labels to learn from. This approach,
to the next layer are called feed-forward networks. This therefore, requires information about failures to prepare
means there are no loops in the network and information the labels. If no information about the failures is available
is always fed forward, never fed back. and the labels can not be supplied this approach can not be
ANNs, like their biological artworks, learn by example. applied.
Therefore in order to train a neural network a set of in- The sensor readings are in a form of a time-series. Slid-
put examples with known expected responses is necessary. ing window of N measurements is used to calculate the
Classical method of training ANNs is called "backpropa- feature vector for classification. The raw measurements it-
gation" which is an abbreviation for "backward propaga- self can be used directly as a feature vector, however, the
tion of errors". dimensionality is then equal to the size of the sliding win-
Typical goal in a training of neural networks is to dow multiplied by the number of measured phenomenons.
find weights W = (w1 , . . . , wk ) and biases B = (b1 , . . . , bl ) Typically, simple features (such as variance, average, me-
which minimize the error or cost function C(W, B) over all dian or slope) or more complex features (e.g. Fourier or
instances in the training set. wavelet coefficients) are extracted from the sliding win-
More specific information about different ANN types dow [14, 15, 16]. For on-line condition monitoring the
can be found in literature [11, 12]. feature vector is extracted from a window aligned with
the most current readings. The instance is then classi-
fied by pre-trained classifier. If the instance is classified
Autoencoder is a specific type of feed-forward neural net-
as "failed" the current condition of the sensor is evaluated
work, with an input layer, an output layer and one or more
as faulty.
hidden layers. The main properties of an autoencoder are,
In order to train a model the labels have to be prepared.
that the output layer has the same number of neurones as
To prepare the training dataset historical readings and a
the input layer and instead of being trained to predict some
set of times related to the failures or generally the events
target value Y given inputs X, autoencoders are trained to
to be detected are used. To obtain faulty instance sliding
reconstruct their own inputs X ′ .
window is placed over the readings of a failed sensor and
Especially interesting are autoencoders, where hidden
aligned with the time of failure. A feature vector is ex-
layers have less nodes than input/output layer. Such a net-
tracted from such a window and marked with label "failed"
work is forced to comprehend nonlinear, reduced repre-
(i.e. class y=1). For each failure one instance with a label
sentation of the original data.
"failed" is obtained. Non-faulty instances can be extracted
Such a autoencoder network can have a variety of uses.
by sliding the window over the time series of non-failed
They can serve for non linear dimensionality reduction,
sensor2 .
data compression or to learn generative model of the
However, by using every possible shift unnecessarily
data[13].
large number of instances is obtained. Therefore, non-
faulty instances are extracted by placing the window ran-
4 Approach domly over readings. Extracted feature vectors are marked
with label ’ok’ (i.e. class y=0). In this case the ratio be-
Influenced by related work reviewed in the Section 2, three tween classes can be easily controlled. The whole process
different approaches to deal with condition monitoring of demonstrates Figure 1.
sensor parts are introduced. Each approach is based on The number of features is reduced with iterative forward
a different principle; the first approach is based on fea- feature selection method. Initially a model is trained with
ture extraction and status classification, the second ap- only one feature, in each iteration one feature as added and
proach is based on time series modeling and the third model is retrained. If the new model performs significantly
approach is based on anomaly detection using autoen- better than the previous, the feature is kept in the feature
coders. Approaches are illustrated on data set with mea- vector, otherwise the feature is discarded.
surements from 2000 accelerometers (hereafter referred as 2 Non-failed sensor is a sensor for which do not exist any record of

sensors). For each sensor the data set contains one time failure.
Early Failure Detection for Predictive Maintenance of Sensor Parts 127

Figure 2: Working scheme of regression-model-based ap-
proach

duce random component and get the most precise predic-
tions Monte Carlo principle is typically engaged to gener-
Figure 1: Working scheme of creating the training set. ate multiple predictions. The final prediction is obtained
as a mean value of k predicted values.
Knowing how the prediction is obtained allows us to
Classification model is trained with extracted feature create hypothesis about the expected value and construct
vectors to recognize faulty and non-faulty instances. Ar- a confidence interval for the predicted value as shown in
bitrary classifier can be used. The aim of this article is to Figure 3.
prove the concept that classification can be used for con-
dition monitoring and thus maintenance strategy for sen-
sor parts. Therefore for simplicity and interpretability the
Naive Bayes classifier is applied.
In Naive Bayes the instance is typically classified to a
class with higher posterior probability. To increase con-
fidence of positive classification the minimal threshold
value of posterior probability can be set on the class with
failed instances. With this threshold of minimal probabil-
ity for positive class can be controlled trade-off between
sensitivity and specificity of the naive Bayes classifier.
With a higher threshold the classifier will be more certain
about the prediction, however, it may mark more failures
Figure 3: ARIMA model predictions with the confidential
as non-faulty and vice versa.
interval.

4.2 Time-series Modeling-based approach
If the actual reading of a sensor is out of the confidence
The second approach basically follows the method sug- interval of the corresponding predicted value the sensor is
gested in [4]. It assumes that malfunction of a sensor is marked as ‘faulty’.
preceded by an abnormal behavior. The working princi-
ple basically follows the common framework for anomaly
detection. It uses time-series modelling in order to model 4.3 Autoencoder-based approach
"normal" sensor behavior.
A regressive model is trained on the historical measure- The last suggested approach is, similarly to the previous
ments of a specific sensor and used to generate predictions. approach, based on an assumption that the failure of a sen-
The ARIMA model[9, 10] is general regressive model sor is preceded by its anomalous behavior. In this particu-
popular in time-series modeling. Especially in cases, when lar case auto-encoders are utilized to detect anomalies.
the time-series contains significant regular patterns, which Inputs to the autoencoder network are the raw values
is more or less the case of sensor readings [4]. For that from a sliding window drawn over historical measure-
reason the general ARIMA model is used to obtain the ments of the sensor. However, it is also possible to extract
predictions. different features and use them as inputs of the autoen-
Predicted values are compared with the actual readings coder. As a result, this method requires a certain amount
and if the difference is higher than a certain threshold, of historical data, in order to train an autoencoder network.
measurements are marked as faulty. The whole working scheme is shown in Figure 4.
The working scheme is depicted in Figure 2. The structure of an autoencoder is defined by following
The ARIMA model prescription contains random mem- parameters: size of the input and output layer, number of
bers, therefore it is a stochastic process. In order to re- hidden layers and number of nodes in the hidden layers.
128 T. Kuzin, T. Borovička

Figure 4: Working scheme of anomaly-detection-based
approach.

Figure 5: Histogram of reconstruction errors.
Since the raw data from the sensor are used as inputs
to the autoencoder network, the number of nodes in the
input and also the output layer is determined by size of the curve the TPR (i.e. True Positive Rate or Sensitivity) is
sliding window. Influenced by [13] the autoencoder has plotted in function of the FPR (False Positive Rate or (1-
three hidden layers. The number of neurons is related to Specificity)) for different setting of model’s parameters.
the number of neurons in the input / output layer. Let n be
the number of input respectively output neurons than the
hidden layers have 0.75n, 0.5n, 0.75n neurons.
The output of an autoencoder itself is not especially in-
teresting. Rather a reconstruction error, defined as mean
squared error between the real measurements and output
of the autoencoder, is calculated. Let X = (x1 . . . xn ) be the
input vector of an autoencoder network and X ′ = (x1′ . . . xn′ )
is the corresponding output, the reconstruction error is

1 n
RE(X) = (xi − xi′ )2
n∑1

If the reconstruction error is higher than a certain thresh-
old τ the current condition of a sensor is marked as
"faulty".
The threshold is estimated with a heuristic method. The Figure 6: Classification-based approach - ROC curve.
main idea is to consider the reconstruction error being a
random variable. Then the underlying distribution of the
random variable can be easily estimated. Having a distri-
bution of the reconstruction error, if the value of the er- 5.2 Time-series Modeling-based Approach
ror does not lie in a right-sided (upper) confidence interval
with confidence level α it is marked "faulty". In order to evaluate this approach on predicting failures the
method is evaluated as a binary classifier.
P(RE(X) < τ) = 1 − α A window of a size M is placed before the time of a
failure and if an anomaly is within the window, the failure
Figure 5 demonstrated the histogram which is used in is considered as detected. If an anomaly is detected outside
order to estimate the underlying distribution function of of this window it is considered as false positive detection.
the reconstruction error. As presented in the section 4.2 this method marks as
anomalies all the moments, where the actual reading is not
5 Experimental Results within the confidence interval.
The level of significance α can be set explicitly, and its
5.1 Classification-based Approach effect can be examined. In Figure 7 are shown detected
anomalies for α = 0.0015. The red segments mark the
Having labeled data, performance of a classifier can be times of failures.
easily measured. TPR (true positive rate) is defined as The experiment is repeated multiple times for different
number of detected failures to the number of all failures α. The results are presented by the ROC curve showed
in a given dataset. FPR (false positive rate) is defined as in Figure 8. Each point on the ROC curve represents a
the number of positively identified to the number of all TPR/FPR pair corresponding to a particular value of α. It
negative samples in a dataset. The acquired results are pre- demonstrates how the sensitivity versus specificity can be
sented by the ROC curve showed in Figure 6. In a ROC controlled by choosing the α.
Early Failure Detection for Predictive Maintenance of Sensor Parts 129

Figure 7: Time-series modeling based approach - detected Figure 9: Anomaly-detection-based approach - detected
failures. failures.

Figure 8: TS-modeling-detection-based approach - ROC Figure 10: Anomaly-detection-based approach - ROC
curve. curve.

5.3 Autoencoder-based Approach
applicable with various sensor devices. All three meth-
In order to evaluate the autoencoder-based method, the ods exploit different principles and hence have different
same procedure as in the case of time-series modeling assumptions and requirements.
based approach, is used. In Figure 9 is shown the resulting Classification based approach utilizes labels if available.
series of the reconstruction errors. The red-marked points If not this approach is not applicable. The other two ap-
are the moments of failure(i.e. the event one intend to pre- proaches are more general since they do not require any
dict). It is visible that the time of failure is preceded by meta-data and work just with the sensor measurements.
significant raise of reconstruction error. However, there However, both assume that the failure is preceded by
are also other anomalous moments(peaks in the series of anomalous behavior. The time series modeling approach
reconstruction errors), that are not related to the incoming exploits the fact that sensors measurements are in a form of
failure of the sensor. Having the domain knowledge of the time-series and often contain regular patterns, which man-
sensor operation, those can be easily explained, since they ifest themselves in a form of autocorrelations. Therefore
are related to the observed phenomena. they it can be described by a model. The autoencoder-
The ROC curve in Figure 10 presents the results of this based approach contrary to the time-series modeling does
approach. Sensitivity and specificity trade-off is controlled not model the "normal" behavior.
by the level o f signi f icance described in the the Section All methods were able to detect failures before they oc-
4.3. curred and thus proved to be applicable for condition mon-
itoring and utilized for predictive maintenance of sensor
6 Conclusion parts. Further more, all the approaches can be parametrize
to find an ideal trade-off between sensitivity and specificity
Three different approaches to deal with the condition mon- of the prediction. The best results has the approach based
itoring and predictive maintenance of sensors have been on classification. This can be expected considering the fact
described and illustrated on real-world data. All those ap- that, unlike the other two approaches, it uses additional
proaches are chosen with a regard to be general and thus meta-data (labels) about the sensor failures.
130 T. Kuzin, T. Borovička

References
[1] Inc., M. A. Common maintenance strategies. 2015, [On-
line; accessed 17-January-2016]. Available from: ❤tt♣s✿
✴✴✇✇✇✳♠❛✐♥t❡♥❛♥❝❡❛ss✐st❛♥t✳❝♦♠✴
[2] Kennedy, S. New tools for PdM. 2006, [Online; ac-
cessed 17-February-2016]. Available from: ❤tt♣✿✴✴✇✇✇✳
♣❧❛♥ts❡r✈✐❝❡s✳❝♦♠✴❛rt✐❝❧❡s✴✷✵✵✻✴✵✼✷✴
[3] Ni, K.; Ramantahan, N.; Nabil, M.; et al. Sensor Network
Data Fault Types. ACM Transactions on Sensor Networks,
volume 5, no. 3, May 2009.
[4] Sharma, A. B.; Golubchi, L.; Govindan, R. Sensor Faults:
Detection Methods and Prevalence in Real-World Datasets.
ACM Transactions on Sensor Networks, volume 6, no. 3,
June 2010.
[5] Alpaydin, E. Introduction to machine learning. MIT Press,
second edition, 2010, ISBN 978-0-262-01243-0.
[6] Russell, S.; Norvig, P. Artificial Intelligence: A Modern
Approach. Prentice Hall, second edition, 2003, ISBN 978-
0137903955.
[7] StatSoft. Naive Bayes Classifier. 2016, [Online; ac-
cessed 19-February-2016]. Available from: ❤tt♣✿✴✴✇✇✇✳
st❛ts♦❢t✳❝♦♠✴t❡①t❜♦♦❦✴♥❛✐✈❡✲❜❛②❡s✲❝❧❛ss✐❢✐❡r
[8] Vu, K. M. Optimal Discrete Control Theory: The Rational
Function Structure Model. Ottawa: AuLac Technologies,
2007, ISBN 978-0-9783996-0-3, 51–99 pp.
[9] Nau, R. Lecture notes on forecasting. 2014, [On-
line; accessed 19-February-2016]. Available from:
❤tt♣✿✴✴♣❡♦♣❧❡✳❞✉❦❡✳❡❞✉✴⑦r♥❛✉✴❙❧✐❞❡s❴♦♥❴
❆❘■▼❆❴♠♦❞❡❧s✲✲❘♦❜❡rt❴◆❛✉✳♣❞❢
[10] Lu, Y.; Simaan, M. A. Automated Box–Jenkins forecast-
ing modelling. Elsevier Automation in Construction 18,
November 2008: pp. 547–558.
[11] Nielsen, M. Neural Networks and Deep Learning. De-
termination Press, 2015. Available from: ❤tt♣✿✴✴
♥❡✉r❛❧♥❡t✇♦r❦s❛♥❞❞❡❡♣❧❡❛r♥✐♥❣✳❝♦♠✴✐♥❞❡①✳❤t♠
[12] Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning,
2016, book in preparation for MIT Press. Available from:
❤tt♣✿✴✴✇✇✇✳❞❡❡♣❧❡❛r♥✐♥❣❜♦♦❦✳♦r❣
[13] Candel, A.; Lanford, J.; LeDell, E.; et al. Deep Learning
with H2O. 2015, third Edition. Available from: ❤tt♣s✿✴✴
❤✷♦✳❣✐t❜♦♦❦s✳✐♦✴❞❡❡♣✲❧❡❛r♥✐♥❣✴
[14] Fu, T.-c. A review on time series data mining. Engineer-
ing Applications of Artificial Intelligence, volume 24, no. 1,
2011: pp. 164–181.
[15] Chen, Y.; Nascimento, M. A.; Ooi, B. C.; et al. Spade: On
shape-based pattern detection in streaming time series. In
Data Engineering, 2007. ICDE 2007. IEEE 23rd Interna-
tional Conference on, IEEE, 2007, pp. 786–795.
[16] Xing, Z.; Pei, J.; Philip, S. Y.; et al. Extracting Interpretable
Features for Early Classification on Time Series. In SDM,
volume 11, SIAM, 2011, pp. 247–258.