<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Series</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Early Failure Detection for Predictive Maintenance of Sensor Parts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tomáš Kuzin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomáš Borovicˇka</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Information Technology, Czech Technical University in Prague</institution>
          ,
          <addr-line>Prague, The Czech Repubic ❦✉3✐♥t♦♠❅❢✐t✳❝✈✉t✳❝3, t♦♠❛s✳❜♦r♦✈✐❝❦❛❅❢✐t✳❝✈✉t✳❝3</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <volume>1649</volume>
      <fpage>123</fpage>
      <lpage>130</lpage>
      <abstract>
        <p>Maintenance of a sensor part typically means renewal of the sensor in regular intervals or replacing the malfunctioning sensor. However optimal timing of the replacement can reduce maintenance costs. The aim of this article is to suggest a predictive maintenance strategy for sensors using condition monitoring and early failure detection based on their own collected measurements. Three different approaches that deal with early failure detection of sensor parts are introduced 1) approach based on feature extraction and status classification, 2) approach based on time series modeling and 3) approach based on anomaly detection using autoencoders. All methods were illustrated on real-world data and were proven to be applicable for condition monitoring.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>In the last decade the amount of used sensors across all
sectors has significantly raised. This is important and a
still continuing trend.</p>
      <p>In the classical concept, predictive maintenance takes
place when the maintained asset is expensive or important
for key business processes. In other words when proper
utilization of the machinery has important economic or
safety consequences. This is not the characteristic case
of sensor parts which are usually cheap and play a minor
role. For such assets maintenance typically means
simple replacement and reactive maintenance strategy would
be the most common choice. However, machines become
more and more dependent on sensor parts and that brings
new challenges in their maintenance. Proper timing of
replacement has direct influence on maintenance expenses.
Especially in cases where other processes depend on the
sensor readings and the sensor failure or malfunction may
stop the operation or cause collateral loses of a machinery.</p>
      <p>In case of sensors "classical" condition monitoring
scheme utilizing properly chosen set of external sensors
makes no sense. On the other hand sensors themselves
provide on-line measurements during their whole
operational service. These data may be exploited to estimate the
current state of the measuring device. Therefore applying
smarter maintenance strategy for sensor parts makes
perfect sense and may introduce significant savings.</p>
      <p>This article deals with the possibilities of smarter
maintenance strategies for sensor parts. The main idea is to
apply machine learning techniques in order to monitor the
current condition or predict failure of sensors based on
their own measurements and propose an optimal time for
their replacement in order to avoid failures.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Several articles and works on "classical" predictive
maintenance and condition monitoring [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ] were published in
the literature. Predictive maintenance strategy is usually
a rule-based maintenance grounded on on-line condition
monitoring, which relies on an appropriately chosen set
of external sensors. The proper sensor set plays the key
role [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Unfortunately none of these techniques are useful
if it is needed to monitor the state of sensors themselves.
Moreover, many published works base their approaches
on sensor networks, where malfunction of one sensor can
be identified utilizing measurements of other sensors in
the network. However, this paper focuses on "standalone"
sensors where no more devices sensing the same or
correlated phenomena are available. Thus these approaches use
only measurements of the sensor itself. Since there are not
many available publications for this case, further review
is focused on categorization of faults and fault detection
techniques of both the sensors and sensor networks.
      </p>
      <p>Sensors provide a huge amount of information about
observed phenomena. However, to make meaningful
conclusions, the quality of the data has to be ensured. Sensors
alone can malfunction and that can distort an image of the
phenomena. Most of the methods follow a common
framework, characterize the normal behavior of sensor readings,
identify significant deviations and mark them as faults.</p>
      <p>
        In case of sensor networks the most frequent types of
faults have been described and categorized by Ni, K. et
al[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. They describe two distinct approaches to deal with
faults. The first is a data-centric view which examines the
data collected by a given sensor and describes fault
models based on data features. In contrary there is a
systemcentric view which examines physical malfunctions of a
sensor and how those may manifest themselves in the
resulting data. According to Ni et. al. these two views are
related to one another and every fault can be mapped
between these two. The important fault categories discussed
in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are summarized in Table 1. In this article the focus
is on the data centric point of view.
      </p>
      <p>
        Sharma, A.B. et al.[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] loosely follow on the work of Ni
et al. and propose specific algorithms for fault detection.
They focus only on a subset of fault types examined in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
and summarized in Table 1.
      </p>
      <p>Four different classes of approaches for detecting above
mentioned faults are discussed.</p>
      <p>
        Rule-based Methods use domain knowledge to develop
heuristic constraints that the sensor readings must satisfy.
Violations of those constraints imply faults. For above
mentioned fault types following simple rules are typically
used [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]:
      </p>
      <p>The variance (or the standard deviation) of the sample
readings within a window of size wsize is computed. If
it is above a certain threshold, the samples are corrupted
by the noise fault. If the variance is zero the samples are
corrupted by the constant fault. In order to detect short
noise faults, the data had to be appropriately preprocessed.
If the rate of change is above a threshold, it can be assumed
that the data were affected by short faults.</p>
      <p>The performance of this method strongly depends on
parameters wsize and the threshold. Parameter setting is
not trivial and usually requires domain knowledge of the
examined problem.</p>
      <p>Estimation-based Methods can be used when a
physical phenomena is sensed concurrently by multiple sensors
and dependence between sensor measurements can be
exploited to generate estimates for the individual sensor
measurements. The dependence can be expressed by spatial
correlation. Regardless of the cause of the correlation, it
can be used to model the normal behavior. The
estimation can be done for example by Linear Least-Squares
Estimation. This method is most suitable for cases when the
phenomena is sensed by almost identical sensors. As an
example one can imagine multiple barometric altimeters
on a single aircraft. In this case there is a strong
presumption that the values are strongly correlated.</p>
      <p>Time-series-based Methods utilize the fact, that
measurements of a sensor are not random and therefore
contain some kind of regular patterns. This patterns can be
described through autocorrelations in measurements
collected by a single sensor. These can be used to create a
regressive model of sensed phenomena. A sensor
measurement can be than compared against its predicted value
to determine if it is faulty.</p>
      <p>
        Advantage is that this approach is more general than
classification and can be used even if there are no labeled
data available nor multiple strongly correlated sensors.
Learning-based Methods use training data to infer
model of "normal" sensor behavior. If the "normal"
sensor behavior and the effects of sensor faults are well
understood, learning-based methods may be suitable to
detect and classify sensor faults. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] authors successfully
use Hidden Markov Models to construct a model of
sensor measurements. The main advantage of learning based
methods is that they can simultaneously detect and classify
faults.
3
3.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>Preliminaries</title>
      <sec id="sec-3-1">
        <title>Classification</title>
        <p>
          In the terminology of machine learning, classification is
considered an instance of supervised learning, i.e.
machine learning technique where a training set of correctly
identified observations is available [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The main goal
of classification is assigning a new observation X to one
from a finite set of categories with the use of the training
data set containing instances whose category membership
is known.
        </p>
        <p>Every instance of the input dataset is a vector X =
(x1, x2, . . . , xd ) typically called feature vector, where d is
the number of features (0 &lt; i &lt;= d) and xi is the value
of the it h feature. Every instance belongs to one of the k
classes C = c1, c2, . . . , ck.</p>
        <p>The classification process consists of two phases. In the
first phase, called learning phase, the training data set with
labels is used to build a model. It means that the
knowledge from reference data is being extracted and stored in
form of a model. In the second phase, the model is used
to classify unlabeled data. This phase is often called
recall. An algorithm that implements classification is called
classifier.</p>
        <p>
          Naive Bayes In machine learning naive Bayes classifiers
are a family of probabilistic classifiers based on Bayes
theorem. It assumes that a value of a particular feature is
independent of a value of any other feature, given the class
variable [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This assumption is often violated in
practice but even though Naive Bayes classifier is still powerful
classification techniques.
        </p>
        <p>
          Learning naive Bayes model proceeds with calculation
of probabilities from the training data set. The probability
to be estimated is a conditional probability P(c j|x1, ..., xd )
for each class c j when object X = (x1, x2, . . . , xd ) is given
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
        </p>
        <p>Using the Bayes rule
(1)
(2)
(3)
P(A | B) =</p>
        <p>P(A)P(B | A)</p>
        <p>P(B)
the posterior probability can be expressed by Equation
P(c j | X1, . . . , Xd ) =</p>
        <p>P(c j)P(x1, . . . , xd | c j)</p>
        <p>P(x1, . . . , xd )
,
where
• P(c j | x1, . . . , xd ) is the posterior probability of class
c j when object X = (x1, x2, . . . , xd ) is given.
• P(c j) is the prior probability of class c j.
• P(x1, . . . , xd | c j) is the posterior probability of an
object X = (x1, . . . , xd ) when class c j is given. We call
this probability likelihood.
• P(x1, . . . , xd ) is the prior probability of an object X =
(x1, x2, . . . , xd ).</p>
        <p>The resulting model is represented by prior probabilities
of each class and likelihood probabilities for each
combination of class and feature. The likelihoods are usually
represented by a mean and variance of normal distribution
estimated from the training set.</p>
        <p>The recall of naive Bayes algorithm is done by looking
up the prior and likelihood probabilities which belong to
input data and calculating posterior probabilities for each
class. Thanks to the assumption of strong conditional
independence between all features conditioned by the class,
the likelihood can be calculated as follows.</p>
        <p>n
P(x1, . . . , xd | c j) = ∏ P(xi | c j)
i=1</p>
        <p>The resulting class is determined by the highest
posterior probability.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Time Series Modeling</title>
        <p>
          Time series is a series of observations of a process or an
event in equal time intervals. It is called time series,
because the observations are usually taken with respect to
time. This is however not necessity, because the
observations may be taken with respect to space as well [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>Modeling techniques try to find a model which
describes the series, i.e. a model capable to generate
identical series. The model may help to better understand the
underlying phenomena or serve as forecasting tool to
predict future values of the series.</p>
        <p>Stochastic models like ARIMA assume that the time
series consist of regular pattern manifesting the underlying
phenomena and a random noise.</p>
        <p>The ARIMA Model ARIMA (autoregressive integrated
moving average model) is a general time series model. It
combines two independent models, autoregressive (AR)
and moving-average (MA). They are combined in a
single equation (Equation 4). By convention the AR terms
are added and the MA terms are subtracted.
where
• xi is i-th element of the series,
• C is a constant,
• ϕ1, ϕ2 are parameters of the autoregressive model,
• εi is random error component of i-th member of the
series.
• θ1, θ2 are parameters of the moving average model.</p>
        <p>
          ARIMA models are extensively examined in literature.
For more information the reader is reffered to [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] or [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
3.3
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>Artificial Neural Networks</title>
        <p>Artificial neural network is an information processing
paradigm inspired by biological nervous systems. It is
composed of a large number of highly interconnected
processing units (neurons) working in unity to solve a specific
problems.</p>
        <p>A neuron is a simplistic model of a biological neural
cell. Each neuron has one or more inputs and produces
single output. The inputs simulate the stimuli signals that
the neuron gets from other neurons, while the output
simulates the response signal which the neuron generates.</p>
        <p>The biological neuron fires (i.e generates the response
signal) only if the gathered stimuli signals exceed a
certain threshold. In other word the neuron fires only if the
stimuli − treshold &gt; 0. In the context of ANNs the term
bias b is used instead of "threshold"1.</p>
        <p>The artificial equivalent to gathered stimuli signals is
called inner potential (ξ ) and typically is defined as a
weighted sum of the input signals plus the bias. Each input
(x j) is multiplied by a specific real number w j called the
weight. These weights are parameters of each neuron. The
calculation of inner potential is summarized in Equation 5.
ξ = ∑ w j ∗ x j + b = W · X + b
all j
(5)</p>
        <p>The actual output is obtained by applying activation
function ϕ(·) on the gathered inner potential. There can
be used variety of activation functions. Very popular for
1Due the conventions bias = (−1 · threshold).
its properties is sigmoid function, where the output of the
neuron y is given by formula in Equation 6.
(6)
y = ϕ(ξ ) =</p>
        <p>1
1 + e−ξ</p>
        <p>For more complex tasks like anomaly detection a
single neuron is not powerful enough and therefore more
complex structures are introduced. A neural network is a
group of neurons connected together. Connecting neurons
to form a ANN can be done in various ways.</p>
        <p>Networks where the neurons are arranged in separate
layers and the output from one layer is used as an input
to the next layer are called feed-forward networks. This
means there are no loops in the network and information
is always fed forward, never fed back.</p>
        <p>ANNs, like their biological artworks, learn by example.
Therefore in order to train a neural network a set of
input examples with known expected responses is necessary.
Classical method of training ANNs is called
"backpropagation" which is an abbreviation for "backward
propagation of errors".</p>
        <p>Typical goal in a training of neural networks is to
find weights W = (w1, . . . , wk) and biases B = (b1, . . . , bl )
which minimize the error or cost function C(W, B) over all
instances in the training set.</p>
        <p>
          More specific information about different ANN types
can be found in literature [
          <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
          ].
        </p>
        <p>Autoencoder is a specific type of feed-forward neural
network, with an input layer, an output layer and one or more
hidden layers. The main properties of an autoencoder are,
that the output layer has the same number of neurones as
the input layer and instead of being trained to predict some
target value Y given inputs X , autoencoders are trained to
reconstruct their own inputs X ′.</p>
        <p>Especially interesting are autoencoders, where hidden
layers have less nodes than input/output layer. Such a
network is forced to comprehend nonlinear, reduced
representation of the original data.</p>
        <p>
          Such a autoencoder network can have a variety of uses.
They can serve for non linear dimensionality reduction,
data compression or to learn generative model of the
data[
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Approach</title>
      <p>Influenced by related work reviewed in the Section 2, three
different approaches to deal with condition monitoring of
sensor parts are introduced. Each approach is based on
a different principle; the first approach is based on
feature extraction and status classification, the second
approach is based on time series modeling and the third
approach is based on anomaly detection using
autoencoders. Approaches are illustrated on data set with
measurements from 2000 accelerometers (hereafter referred as
sensors). For each sensor the data set contains one time
series with minimum of 14 days measurements before the
sensor failed. The aim is to label the sensor faulty within
two days before the failure. More than two days before
the failure the sensor can be considered faultless. All three
approaches are described in detail in the following
subsections.
4.1</p>
      <sec id="sec-4-1">
        <title>Classification-based approach</title>
        <p>The first suggested approach is based on supervised
learning, namely classification. Supervised learning techniques
require examples with labels to learn from. This approach,
therefore, requires information about failures to prepare
the labels. If no information about the failures is available
and the labels can not be supplied this approach can not be
applied.</p>
        <p>
          The sensor readings are in a form of a time-series.
Sliding window of N measurements is used to calculate the
feature vector for classification. The raw measurements
itself can be used directly as a feature vector, however, the
dimensionality is then equal to the size of the sliding
window multiplied by the number of measured phenomenons.
Typically, simple features (such as variance, average,
median or slope) or more complex features (e.g. Fourier or
wavelet coefficients) are extracted from the sliding
window [
          <xref ref-type="bibr" rid="ref14 ref15 ref16">14, 15, 16</xref>
          ]. For on-line condition monitoring the
feature vector is extracted from a window aligned with
the most current readings. The instance is then
classified by pre-trained classifier. If the instance is classified
as "failed" the current condition of the sensor is evaluated
as faulty.
        </p>
        <p>In order to train a model the labels have to be prepared.
To prepare the training dataset historical readings and a
set of times related to the failures or generally the events
to be detected are used. To obtain faulty instance sliding
window is placed over the readings of a failed sensor and
aligned with the time of failure. A feature vector is
extracted from such a window and marked with label "failed"
(i.e. class y=1). For each failure one instance with a label
"failed" is obtained. Non-faulty instances can be extracted
by sliding the window over the time series of non-failed
sensor2.</p>
        <p>However, by using every possible shift unnecessarily
large number of instances is obtained. Therefore,
nonfaulty instances are extracted by placing the window
randomly over readings. Extracted feature vectors are marked
with label ’ok’ (i.e. class y=0). In this case the ratio
between classes can be easily controlled. The whole process
demonstrates Figure 1.</p>
        <p>The number of features is reduced with iterative forward
feature selection method. Initially a model is trained with
only one feature, in each iteration one feature as added and
model is retrained. If the new model performs significantly
better than the previous, the feature is kept in the feature
vector, otherwise the feature is discarded.</p>
        <p>2Non-failed sensor is a sensor for which do not exist any record of
failure.</p>
        <p>Classification model is trained with extracted feature
vectors to recognize faulty and non-faulty instances.
Arbitrary classifier can be used. The aim of this article is to
prove the concept that classification can be used for
condition monitoring and thus maintenance strategy for
sensor parts. Therefore for simplicity and interpretability the
Naive Bayes classifier is applied.</p>
        <p>In Naive Bayes the instance is typically classified to a
class with higher posterior probability. To increase
confidence of positive classification the minimal threshold
value of posterior probability can be set on the class with
failed instances. With this threshold of minimal
probability for positive class can be controlled trade-off between
sensitivity and specificity of the naive Bayes classifier.
With a higher threshold the classifier will be more certain
about the prediction, however, it may mark more failures
as non-faulty and vice versa.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Time-series Modeling-based approach</title>
        <p>
          The second approach basically follows the method
suggested in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. It assumes that malfunction of a sensor is
preceded by an abnormal behavior. The working
principle basically follows the common framework for anomaly
detection. It uses time-series modelling in order to model
"normal" sensor behavior.
        </p>
        <p>
          A regressive model is trained on the historical
measurements of a specific sensor and used to generate predictions.
The ARIMA model[
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ] is general regressive model
popular in time-series modeling. Especially in cases, when
the time-series contains significant regular patterns, which
is more or less the case of sensor readings [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. For that
reason the general ARIMA model is used to obtain the
predictions.
        </p>
        <p>Predicted values are compared with the actual readings
and if the difference is higher than a certain threshold,
measurements are marked as faulty.</p>
        <p>The working scheme is depicted in Figure 2.</p>
        <p>The ARIMA model prescription contains random
members, therefore it is a stochastic process. In order to
reduce random component and get the most precise
predictions Monte Carlo principle is typically engaged to
generate multiple predictions. The final prediction is obtained
as a mean value of k predicted values.</p>
        <p>Knowing how the prediction is obtained allows us to
create hypothesis about the expected value and construct
a confidence interval for the predicted value as shown in
Figure 3.</p>
        <p>If the actual reading of a sensor is out of the confidence
interval of the corresponding predicted value the sensor is
marked as ‘faulty’.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Autoencoder-based approach</title>
        <p>The last suggested approach is, similarly to the previous
approach, based on an assumption that the failure of a
sensor is preceded by its anomalous behavior. In this
particular case auto-encoders are utilized to detect anomalies.</p>
        <p>Inputs to the autoencoder network are the raw values
from a sliding window drawn over historical
measurements of the sensor. However, it is also possible to extract
different features and use them as inputs of the
autoencoder. As a result, this method requires a certain amount
of historical data, in order to train an autoencoder network.</p>
        <p>The whole working scheme is shown in Figure 4.</p>
        <p>The structure of an autoencoder is defined by following
parameters: size of the input and output layer, number of
hidden layers and number of nodes in the hidden layers.</p>
        <p>
          Since the raw data from the sensor are used as inputs
to the autoencoder network, the number of nodes in the
input and also the output layer is determined by size of the
sliding window. Influenced by [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] the autoencoder has
three hidden layers. The number of neurons is related to
the number of neurons in the input / output layer. Let n be
the number of input respectively output neurons than the
hidden layers have 0.75n, 0.5n, 0.75n neurons.
        </p>
        <p>The output of an autoencoder itself is not especially
interesting. Rather a reconstruction error, defined as mean
squared error between the real measurements and output
of the autoencoder, is calculated. Let X = (x1 . . . xn) be the
input vector of an autoencoder network and X ′ = (x1′ . . . xn′)
is the corresponding output, the reconstruction error is
1 n</p>
        <p>RE(X ) = n ∑1(xi − xi′)2</p>
        <p>If the reconstruction error is higher than a certain
threshold τ the current condition of a sensor is marked as
"faulty".</p>
        <p>The threshold is estimated with a heuristic method. The
main idea is to consider the reconstruction error being a
random variable. Then the underlying distribution of the
random variable can be easily estimated. Having a
distribution of the reconstruction error, if the value of the
error does not lie in a right-sided (upper) confidence interval
with confidence level α it is marked "faulty".</p>
        <p>P(RE(X ) &lt; τ) = 1 − α</p>
        <p>Figure 5 demonstrated the histogram which is used in
order to estimate the underlying distribution function of
the reconstruction error.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Results</title>
      <sec id="sec-5-1">
        <title>Classification-based Approach</title>
        <p>Having labeled data, performance of a classifier can be
easily measured. TPR (true positive rate) is defined as
number of detected failures to the number of all failures
in a given dataset. FPR (false positive rate) is defined as
the number of positively identified to the number of all
negative samples in a dataset. The acquired results are
presented by the ROC curve showed in Figure 6. In a ROC
curve the TPR (i.e. True Positive Rate or Sensitivity) is
plotted in function of the FPR (False Positive Rate or
(1Specificity)) for different setting of model’s parameters.
In order to evaluate this approach on predicting failures the
method is evaluated as a binary classifier.</p>
        <p>A window of a size M is placed before the time of a
failure and if an anomaly is within the window, the failure
is considered as detected. If an anomaly is detected outside
of this window it is considered as false positive detection.</p>
        <p>As presented in the section 4.2 this method marks as
anomalies all the moments, where the actual reading is not
within the confidence interval.</p>
        <p>The level of significance α can be set explicitly, and its
effect can be examined. In Figure 7 are shown detected
anomalies for α = 0.0015. The red segments mark the
times of failures.</p>
        <p>The experiment is repeated multiple times for different
α. The results are presented by the ROC curve showed
in Figure 8. Each point on the ROC curve represents a
TPR/FPR pair corresponding to a particular value of α. It
demonstrates how the sensitivity versus specificity can be
controlled by choosing the α.
In order to evaluate the autoencoder-based method, the
same procedure as in the case of time-series modeling
based approach, is used. In Figure 9 is shown the resulting
series of the reconstruction errors. The red-marked points
are the moments of failure(i.e. the event one intend to
predict). It is visible that the time of failure is preceded by
significant raise of reconstruction error. However, there
are also other anomalous moments(peaks in the series of
reconstruction errors), that are not related to the incoming
failure of the sensor. Having the domain knowledge of the
sensor operation, those can be easily explained, since they
are related to the observed phenomena.</p>
        <p>The ROC curve in Figure 10 presents the results of this
approach. Sensitivity and specificity trade-off is controlled
by the level o f signi f icance described in the the Section
4.3.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>Three different approaches to deal with the condition
monitoring and predictive maintenance of sensors have been
described and illustrated on real-world data. All those
approaches are chosen with a regard to be general and thus
applicable with various sensor devices. All three
methods exploit different principles and hence have different
assumptions and requirements.</p>
      <p>Classification based approach utilizes labels if available.
If not this approach is not applicable. The other two
approaches are more general since they do not require any
meta-data and work just with the sensor measurements.
However, both assume that the failure is preceded by
anomalous behavior. The time series modeling approach
exploits the fact that sensors measurements are in a form of
time-series and often contain regular patterns, which
manifest themselves in a form of autocorrelations. Therefore
they it can be described by a model. The
autoencoderbased approach contrary to the time-series modeling does
not model the "normal" behavior.</p>
      <p>All methods were able to detect failures before they
occurred and thus proved to be applicable for condition
monitoring and utilized for predictive maintenance of sensor
parts. Further more, all the approaches can be parametrize
to find an ideal trade-off between sensitivity and specificity
of the prediction. The best results has the approach based
on classification. This can be expected considering the fact
that, unlike the other two approaches, it uses additional
meta-data (labels) about the sensor failures.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Inc</surname>
            .,
            <given-names>M. A.</given-names>
          </string-name>
          <article-title>Common maintenance strategies</article-title>
          .
          <year>2015</year>
          , [Online; accessed 17-January-2016]. Available from: ❤tt♣s✿ ✴✴✇✇✇✳♠❛✐♥t❡♥❛♥❝❡❛ss✐st❛♥t✳❝♦♠✴
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Kennedy</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>New tools for PdM</article-title>
          .
          <year>2006</year>
          , [Online; accessed 17-February-2016]. Available from: ❤tt♣✿✴✴✇✇✇✳ ♣❧❛♥ts❡r✈✐❝❡s✳❝♦♠✴❛rt✐❝❧❡s✴✷✵✵✻✴✵✼✷✴
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Ni</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ramantahan</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nabil</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ; et al.
          <article-title>Sensor Network Data Fault Types</article-title>
          .
          <source>ACM Transactions on Sensor Networks</source>
          , volume
          <volume>5</volume>
          , no.
          <issue>3</issue>
          , May
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Sharma</surname>
            ,
            <given-names>A. B.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Golubchi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Govindan</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Sensor</surname>
          </string-name>
          <article-title>Faults: Detection Methods and Prevalence in Real-World Datasets</article-title>
          .
          <source>ACM Transactions on Sensor Networks</source>
          , volume
          <volume>6</volume>
          , no.
          <issue>3</issue>
          ,
          <string-name>
            <surname>June</surname>
          </string-name>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Alpaydin</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <article-title>Introduction to machine learning</article-title>
          . MIT Press, second edition,
          <year>2010</year>
          , ISBN 978-0-
          <fpage>262</fpage>
          -01243-0.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Russell</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ; Norvig,
          <string-name>
            <given-names>P.</given-names>
            <surname>Artificial Intelligence</surname>
          </string-name>
          :
          <string-name>
            <given-names>A Modern</given-names>
            <surname>Approach</surname>
          </string-name>
          . Prentice Hall, second edition,
          <year>2003</year>
          , ISBN 978- 0137903955.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>StatSoft. Naive</given-names>
            <surname>Bayes Classifier</surname>
          </string-name>
          .
          <year>2016</year>
          , [Online; accessed 19-February-2016]. Available from: ❤tt♣✿✴✴✇✇✇✳ st❛ts♦❢t✳❝♦♠✴t❡①t❜♦♦❦✴♥❛✐✈❡✲❜❛②❡s✲❝❧❛ss✐❢✐❡r
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Vu</surname>
          </string-name>
          ,
          <string-name>
            <surname>K. M. Optimal Discrete Control Theory: The Rational Function Structure Model</surname>
          </string-name>
          . Ottawa: AuLac Technologies,
          <year>2007</year>
          ,
          <source>ISBN 978-0-9783996-0-3</source>
          ,
          <fpage>51</fpage>
          -
          <lpage>99</lpage>
          pp.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Nau</surname>
            ,
            <given-names>R</given-names>
          </string-name>
          . Lecture notes on forecasting.
          <year>2014</year>
          , [Online; accessed 19-February-2016]. Available from: ❤tt♣✿✴✴♣❡♦♣❧❡✳❞✉❦❡✳❡❞✉✴⑦r♥❛✉✴❙❧✐❞❡s❴♦♥❴ ❆❘■▼❆❴♠♦❞❡❧s✲✲❘♦❜❡rt❴◆❛✉✳♣❞❢
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Lu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Simaan</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          <string-name>
            <surname>Automated</surname>
          </string-name>
          Box-
          <article-title>Jenkins forecasting modelling</article-title>
          .
          <source>Elsevier Automation in Construction 18</source>
          ,
          <year>November 2008</year>
          : pp.
          <fpage>547</fpage>
          -
          <lpage>558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Nielsen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Neural Networks and Deep Learning</article-title>
          . Determination Press,
          <year>2015</year>
          . Available from: ❤tt♣✿✴✴ ♥❡✉r❛❧♥❡t✇♦r❦s❛♥❞❞❡❡♣❧❡❛r♥✐♥❣✳❝♦♠✴✐♥❞❡①✳❤t♠
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Goodfellow</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ; Bengio,
          <string-name>
            <given-names>Y.</given-names>
            ;
            <surname>Courville</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Deep</given-names>
            <surname>Learning</surname>
          </string-name>
          ,
          <year>2016</year>
          , book in preparation for MIT Press. Available from: ❤tt♣✿✴✴✇✇✇✳❞❡❡♣❧❡❛r♥✐♥❣❜♦♦❦✳♦r❣
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Candel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Lanford</surname>
          </string-name>
          , J.; LeDell, E.; et al.
          <source>Deep Learning with H2O</source>
          .
          <year>2015</year>
          ,
          <string-name>
            <given-names>third</given-names>
            <surname>Edition</surname>
          </string-name>
          . Available from: ❤tt♣s✿✴✴ ❤✷♦✳❣✐t❜♦♦❦s✳✐♦✴❞❡❡♣✲❧❡❛r♥✐♥❣✴
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Fu</surname>
          </string-name>
          , T.-c.
          <article-title>A review on time series data mining</article-title>
          .
          <source>Engineering Applications of Artificial Intelligence</source>
          , volume
          <volume>24</volume>
          , no.
          <issue>1</issue>
          ,
          <issue>2011</issue>
          : pp.
          <fpage>164</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Nascimento</surname>
            ,
            <given-names>M. A.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Ooi</surname>
            ,
            <given-names>B. C.</given-names>
          </string-name>
          ; et al. Spade:
          <article-title>On shape-based pattern detection in streaming time series</article-title>
          .
          <source>In Data Engineering</source>
          ,
          <year>2007</year>
          .
          <article-title>ICDE 2007</article-title>
          . IEEE 23rd International Conference on, IEEE,
          <year>2007</year>
          , pp.
          <fpage>786</fpage>
          -
          <lpage>795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Xing</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ;
          <string-name>
            <surname>Pei</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ; Philip, S. Y.; et al.
          <article-title>Extracting Interpretable Features for Early Classification on Time Series</article-title>
          . In SDM, volume
          <volume>11</volume>
          ,
          <string-name>
            <surname>SIAM</surname>
          </string-name>
          ,
          <year>2011</year>
          , pp.
          <fpage>247</fpage>
          -
          <lpage>258</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>