=Paper= {{Paper |id=Vol-1649/123 |storemode=property |title=Early Failure Detection for Predictive Maintenance of Sensor Parts |pdfUrl=https://ceur-ws.org/Vol-1649/123.pdf |volume=Vol-1649 |authors=Tomáš Kuzin, Tomáš Borovička |dblpUrl=https://dblp.org/rec/conf/itat/KuzinB16 }} ==Early Failure Detection for Predictive Maintenance of Sensor Parts== https://ceur-ws.org/Vol-1649/123.pdf
ITAT 2016 Proceedings, CEUR Workshop Proceedings Vol. 1649, pp. 123–130
http://ceur-ws.org/Vol-1649, Series ISSN 1613-0073, c 2016 T. Kuzin, T. Borovička



                Early Failure Detection for Predictive Maintenance of Sensor Parts

                                                      Tomáš Kuzin, Tomáš Borovička

                                                      Faculty of Information Technology,
                                                     Czech Technical University in Prague,
                                                         Prague, The Czech Repubic
                                                          ❦✉③✐♥t♦♠❅❢✐t✳❝✈✉t✳❝③,
                                                      t♦♠❛s✳❜♦r♦✈✐❝❦❛❅❢✐t✳❝✈✉t✳❝③

      Abstract: Maintenance of a sensor part typically means               current condition or predict failure of sensors based on
      renewal of the sensor in regular intervals or replacing the          their own measurements and propose an optimal time for
      malfunctioning sensor. However optimal timing of the re-             their replacement in order to avoid failures.
      placement can reduce maintenance costs. The aim of this
      article is to suggest a predictive maintenance strategy for
                                                                           2    Related Work
      sensors using condition monitoring and early failure de-
      tection based on their own collected measurements.                   Several articles and works on "classical" predictive main-
         Three different approaches that deal with early failure           tenance and condition monitoring [1, 2] were published in
      detection of sensor parts are introduced 1) approach based           the literature. Predictive maintenance strategy is usually
      on feature extraction and status classification, 2) approach         a rule-based maintenance grounded on on-line condition
      based on time series modeling and 3) approach based on               monitoring, which relies on an appropriately chosen set
      anomaly detection using autoencoders. All methods were               of external sensors. The proper sensor set plays the key
      illustrated on real-world data and were proven to be appli-          role [2]. Unfortunately none of these techniques are useful
      cable for condition monitoring.                                      if it is needed to monitor the state of sensors themselves.
                                                                           Moreover, many published works base their approaches
      1    Introduction                                                    on sensor networks, where malfunction of one sensor can
                                                                           be identified utilizing measurements of other sensors in
      In the last decade the amount of used sensors across all             the network. However, this paper focuses on "standalone"
      sectors has significantly raised. This is important and a            sensors where no more devices sensing the same or corre-
      still continuing trend.                                              lated phenomena are available. Thus these approaches use
         In the classical concept, predictive maintenance takes            only measurements of the sensor itself. Since there are not
      place when the maintained asset is expensive or important            many available publications for this case, further review
      for key business processes. In other words when proper               is focused on categorization of faults and fault detection
      utilization of the machinery has important economic or               techniques of both the sensors and sensor networks.
      safety consequences. This is not the characteristic case                Sensors provide a huge amount of information about ob-
      of sensor parts which are usually cheap and play a minor             served phenomena. However, to make meaningful conclu-
      role. For such assets maintenance typically means sim-               sions, the quality of the data has to be ensured. Sensors
      ple replacement and reactive maintenance strategy would              alone can malfunction and that can distort an image of the
      be the most common choice. However, machines become                  phenomena. Most of the methods follow a common frame-
      more and more dependent on sensor parts and that brings              work, characterize the normal behavior of sensor readings,
      new challenges in their maintenance. Proper timing of re-            identify significant deviations and mark them as faults.
      placement has direct influence on maintenance expenses.                 In case of sensor networks the most frequent types of
      Especially in cases where other processes depend on the              faults have been described and categorized by Ni, K. et
      sensor readings and the sensor failure or malfunction may            al[3]. They describe two distinct approaches to deal with
      stop the operation or cause collateral loses of a machinery.         faults. The first is a data-centric view which examines the
         In case of sensors "classical" condition monitoring               data collected by a given sensor and describes fault mod-
      scheme utilizing properly chosen set of external sensors             els based on data features. In contrary there is a system-
      makes no sense. On the other hand sensors themselves                 centric view which examines physical malfunctions of a
      provide on-line measurements during their whole opera-               sensor and how those may manifest themselves in the re-
      tional service. These data may be exploited to estimate the          sulting data. According to Ni et. al. these two views are
      current state of the measuring device. Therefore applying            related to one another and every fault can be mapped be-
      smarter maintenance strategy for sensor parts makes per-             tween these two. The important fault categories discussed
      fect sense and may introduce significant savings.                    in [3] are summarized in Table 1. In this article the focus
         This article deals with the possibilities of smarter main-        is on the data centric point of view.
      tenance strategies for sensor parts. The main idea is to ap-            Sharma, A.B. et al.[4] loosely follow on the work of Ni
      ply machine learning techniques in order to monitor the              et al. and propose specific algorithms for fault detection.
124                                                                                                                      T. Kuzin, T. Borovička

      They focus only on a subset of fault types examined in [3]      described through autocorrelations in measurements col-
      and summarized in Table 1.                                      lected by a single sensor. These can be used to create a
                                                                      regressive model of sensed phenomena. A sensor mea-
                                                                      surement can be than compared against its predicted value
       Table 1: Taxonomy of Faults described by Ni et al.[3].
                                                                      to determine if it is faulty.
                    Data-centric point of view                           Advantage is that this approach is more general than
       ❋❛✉❧t          ❉❡❢✐♥✐t✐♦♥                                      classification and can be used even if there are no labeled
       Outlier        Isolated data point or sensor unexpect-         data available nor multiple strongly correlated sensors.
                      edly distant from models.
       Spike          Multiple data points with a much greater        Learning-based Methods use training data to infer
                      than expected rate of change.                   model of "normal" sensor behavior. If the "normal" sen-
       “Stuck-at”     Sensor values experience zero variation         sor behavior and the effects of sensor faults are well un-
                      for an unexpected length of time.               derstood, learning-based methods may be suitable to de-
       High Noise or Sensor values experience unexpectedly            tect and classify sensor faults. In [4] authors successfully
       Variance       high variation or noise.                        use Hidden Markov Models to construct a model of sen-
                                                                      sor measurements. The main advantage of learning based
       Four different classes of approaches for detecting above       methods is that they can simultaneously detect and classify
      mentioned faults are discussed.                                 faults.


      Rule-based Methods use domain knowledge to develop              3     Preliminaries
      heuristic constraints that the sensor readings must satisfy.
      Violations of those constraints imply faults. For above         3.1   Classification
      mentioned fault types following simple rules are typically
      used [4]:                                                       In the terminology of machine learning, classification is
         The variance (or the standard deviation) of the sample       considered an instance of supervised learning, i.e. ma-
      readings within a window of size wsize is computed. If          chine learning technique where a training set of correctly
      it is above a certain threshold, the samples are corrupted      identified observations is available [5]. The main goal
      by the noise fault. If the variance is zero the samples are     of classification is assigning a new observation X to one
      corrupted by the constant fault. In order to detect short       from a finite set of categories with the use of the training
      noise faults, the data had to be appropriately preprocessed.    data set containing instances whose category membership
      If the rate of change is above a threshold, it can be assumed   is known.
      that the data were affected by short faults.                       Every instance of the input dataset is a vector X =
         The performance of this method strongly depends on           (x1 , x2 , . . . , xd ) typically called feature vector, where d is
      parameters wsize and the threshold. Parameter setting is        the number of features (0 < i <= d) and xi is the value
      not trivial and usually requires domain knowledge of the        of the it h feature. Every instance belongs to one of the k
      examined problem.                                               classes C = c1 , c2 , . . . , ck .
                                                                         The classification process consists of two phases. In the
                                                                      first phase, called learning phase, the training data set with
      Estimation-based Methods can be used when a physi-              labels is used to build a model. It means that the knowl-
      cal phenomena is sensed concurrently by multiple sensors        edge from reference data is being extracted and stored in
      and dependence between sensor measurements can be ex-           form of a model. In the second phase, the model is used
      ploited to generate estimates for the individual sensor mea-    to classify unlabeled data. This phase is often called re-
      surements. The dependence can be expressed by spatial           call. An algorithm that implements classification is called
      correlation. Regardless of the cause of the correlation, it     classifier.
      can be used to model the normal behavior. The estima-
      tion can be done for example by Linear Least-Squares Es-
      timation. This method is most suitable for cases when the       Naive Bayes In machine learning naive Bayes classifiers
      phenomena is sensed by almost identical sensors. As an          are a family of probabilistic classifiers based on Bayes the-
      example one can imagine multiple barometric altimeters          orem. It assumes that a value of a particular feature is in-
      on a single aircraft. In this case there is a strong presump-   dependent of a value of any other feature, given the class
      tion that the values are strongly correlated.                   variable [6]. This assumption is often violated in prac-
                                                                      tice but even though Naive Bayes classifier is still powerful
                                                                      classification techniques.
      Time-series-based Methods utilize the fact, that mea-              Learning naive Bayes model proceeds with calculation
      surements of a sensor are not random and therefore con-         of probabilities from the training data set. The probability
      tain some kind of regular patterns. This patterns can be        to be estimated is a conditional probability P(c j |x1 , ..., xd )
Early Failure Detection for Predictive Maintenance of Sensor Parts                                                                                               125

      for each class c j when object X = (x1 , x2 , . . . , xd ) is given              The ARIMA Model ARIMA (autoregressive integrated
      [7].                                                                             moving average model) is a general time series model. It
         Using the Bayes rule                                                          combines two independent models, autoregressive (AR)
                                                                                       and moving-average (MA). They are combined in a sin-
                                                                                       gle equation (Equation 4). By convention the AR terms
                                            P(A)P(B | A)
                          P(A | B) =                                             (1)   are added and the MA terms are subtracted.
                                               P(B)
         the posterior probability can be expressed by Equation
                                                                                       xt = C + ϕ1 · xt−1 + · · · + ϕ p · xt−p − θ1 · εt−1 − · · · − θq · εt−q
                                                                                                                                                           (4)
                                            P(c j )P(x1 , . . . , xd | c j )              where
              P(c j | X1 , . . . , Xd ) =                                    ,   (2)
                                                  P(x1 , . . . , xd )
                                                                                          • xi is i-th element of the series,
         where
                                                                                          • C is a constant,
         • P(c j | x1 , . . . , xd ) is the posterior probability of class
           c j when object X = (x1 , x2 , . . . , xd ) is given.                          • ϕ1 , ϕ2 are parameters of the autoregressive model,

         • P(c j ) is the prior probability of class c j .                                • εi is random error component of i-th member of the
                                                                                            series.
         • P(x1 , . . . , xd | c j ) is the posterior probability of an ob-
           ject X = (x1 , . . . , xd ) when class c j is given. We call                   • θ1 , θ2 are parameters of the moving average model.
           this probability likelihood.
                                                                                         ARIMA models are extensively examined in literature.
         • P(x1 , . . . , xd ) is the prior probability of an object X =               For more information the reader is reffered to [9] or [10].
           (x1 , x2 , . . . , xd ).

         The resulting model is represented by prior probabilities                     3.3     Artificial Neural Networks
      of each class and likelihood probabilities for each combi-
      nation of class and feature. The likelihoods are usually                         Artificial neural network is an information processing
      represented by a mean and variance of normal distribution                        paradigm inspired by biological nervous systems. It is
      estimated from the training set.                                                 composed of a large number of highly interconnected pro-
         The recall of naive Bayes algorithm is done by looking                        cessing units (neurons) working in unity to solve a specific
      up the prior and likelihood probabilities which belong to                        problems.
      input data and calculating posterior probabilities for each                          A neuron is a simplistic model of a biological neural
      class. Thanks to the assumption of strong conditional in-                        cell. Each neuron has one or more inputs and produces
      dependence between all features conditioned by the class,                        single output. The inputs simulate the stimuli signals that
      the likelihood can be calculated as follows.                                     the neuron gets from other neurons, while the output sim-
                                                                                       ulates the response signal which the neuron generates.
                                                    n                                      The biological neuron fires (i.e generates the response
                      P(x1 , . . . , xd | c j ) = ∏ P(xi | c j )                 (3)   signal) only if the gathered stimuli signals exceed a cer-
                                                   i=1
                                                                                       tain threshold. In other word the neuron fires only if the
         The resulting class is determined by the highest poste-                       stimuli − treshold > 0. In the context of ANNs the term
      rior probability.                                                                bias b is used instead of "threshold"1 .
                                                                                           The artificial equivalent to gathered stimuli signals is
      3.2   Time Series Modeling                                                       called inner potential (ξ ) and typically is defined as a
                                                                                       weighted sum of the input signals plus the bias. Each input
      Time series is a series of observations of a process or an                       (x j ) is multiplied by a specific real number w j called the
      event in equal time intervals. It is called time series, be-                     weight. These weights are parameters of each neuron. The
      cause the observations are usually taken with respect to                         calculation of inner potential is summarized in Equation 5.
      time. This is however not necessity, because the observa-
      tions may be taken with respect to space as well [8].                                              ξ = ∑ wj ∗xj +b = W ·X +b                        (5)
         Modeling techniques try to find a model which de-                                                     all j
      scribes the series, i.e. a model capable to generate iden-
                                                                                         The actual output is obtained by applying activation
      tical series. The model may help to better understand the
                                                                                       function ϕ(·) on the gathered inner potential. There can
      underlying phenomena or serve as forecasting tool to pre-
                                                                                       be used variety of activation functions. Very popular for
      dict future values of the series.
         Stochastic models like ARIMA assume that the time se-
      ries consist of regular pattern manifesting the underlying
      phenomena and a random noise.                                                          1 Due the conventions bias = (−1 · threshold).
126                                                                                                                                   T. Kuzin, T. Borovička

      its properties is sigmoid function, where the output of the             series with minimum of 14 days measurements before the
      neuron y is given by formula in Equation 6.                             sensor failed. The aim is to label the sensor faulty within
                                                                              two days before the failure. More than two days before
                                             1                                the failure the sensor can be considered faultless. All three
                            y = ϕ(ξ ) =                                (6)
                                          1 + e−ξ                             approaches are described in detail in the following subsec-
         For more complex tasks like anomaly detection a sin-                 tions.
      gle neuron is not powerful enough and therefore more
      complex structures are introduced. A neural network is a                4.1    Classification-based approach
      group of neurons connected together. Connecting neurons
      to form a ANN can be done in various ways.                              The first suggested approach is based on supervised learn-
         Networks where the neurons are arranged in separate                  ing, namely classification. Supervised learning techniques
      layers and the output from one layer is used as an input                require examples with labels to learn from. This approach,
      to the next layer are called feed-forward networks. This                therefore, requires information about failures to prepare
      means there are no loops in the network and information                 the labels. If no information about the failures is available
      is always fed forward, never fed back.                                  and the labels can not be supplied this approach can not be
         ANNs, like their biological artworks, learn by example.              applied.
      Therefore in order to train a neural network a set of in-                  The sensor readings are in a form of a time-series. Slid-
      put examples with known expected responses is necessary.                ing window of N measurements is used to calculate the
      Classical method of training ANNs is called "backpropa-                 feature vector for classification. The raw measurements it-
      gation" which is an abbreviation for "backward propaga-                 self can be used directly as a feature vector, however, the
      tion of errors".                                                        dimensionality is then equal to the size of the sliding win-
         Typical goal in a training of neural networks is to                  dow multiplied by the number of measured phenomenons.
      find weights W = (w1 , . . . , wk ) and biases B = (b1 , . . . , bl )   Typically, simple features (such as variance, average, me-
      which minimize the error or cost function C(W, B) over all              dian or slope) or more complex features (e.g. Fourier or
      instances in the training set.                                          wavelet coefficients) are extracted from the sliding win-
         More specific information about different ANN types                  dow [14, 15, 16]. For on-line condition monitoring the
      can be found in literature [11, 12].                                    feature vector is extracted from a window aligned with
                                                                              the most current readings. The instance is then classi-
                                                                              fied by pre-trained classifier. If the instance is classified
      Autoencoder is a specific type of feed-forward neural net-
                                                                              as "failed" the current condition of the sensor is evaluated
      work, with an input layer, an output layer and one or more
                                                                              as faulty.
      hidden layers. The main properties of an autoencoder are,
                                                                                 In order to train a model the labels have to be prepared.
      that the output layer has the same number of neurones as
                                                                              To prepare the training dataset historical readings and a
      the input layer and instead of being trained to predict some
                                                                              set of times related to the failures or generally the events
      target value Y given inputs X, autoencoders are trained to
                                                                              to be detected are used. To obtain faulty instance sliding
      reconstruct their own inputs X ′ .
                                                                              window is placed over the readings of a failed sensor and
         Especially interesting are autoencoders, where hidden
                                                                              aligned with the time of failure. A feature vector is ex-
      layers have less nodes than input/output layer. Such a net-
                                                                              tracted from such a window and marked with label "failed"
      work is forced to comprehend nonlinear, reduced repre-
                                                                              (i.e. class y=1). For each failure one instance with a label
      sentation of the original data.
                                                                              "failed" is obtained. Non-faulty instances can be extracted
         Such a autoencoder network can have a variety of uses.
                                                                              by sliding the window over the time series of non-failed
      They can serve for non linear dimensionality reduction,
                                                                              sensor2 .
      data compression or to learn generative model of the
                                                                                 However, by using every possible shift unnecessarily
      data[13].
                                                                              large number of instances is obtained. Therefore, non-
                                                                              faulty instances are extracted by placing the window ran-
      4    Approach                                                           domly over readings. Extracted feature vectors are marked
                                                                              with label ’ok’ (i.e. class y=0). In this case the ratio be-
      Influenced by related work reviewed in the Section 2, three             tween classes can be easily controlled. The whole process
      different approaches to deal with condition monitoring of               demonstrates Figure 1.
      sensor parts are introduced. Each approach is based on                     The number of features is reduced with iterative forward
      a different principle; the first approach is based on fea-              feature selection method. Initially a model is trained with
      ture extraction and status classification, the second ap-               only one feature, in each iteration one feature as added and
      proach is based on time series modeling and the third                   model is retrained. If the new model performs significantly
      approach is based on anomaly detection using autoen-                    better than the previous, the feature is kept in the feature
      coders. Approaches are illustrated on data set with mea-                vector, otherwise the feature is discarded.
      surements from 2000 accelerometers (hereafter referred as                    2 Non-failed sensor is a sensor for which do not exist any record of

      sensors). For each sensor the data set contains one time                failure.
Early Failure Detection for Predictive Maintenance of Sensor Parts                                                                    127




                                                                      Figure 2: Working scheme of regression-model-based ap-
                                                                      proach



                                                                      duce random component and get the most precise predic-
                                                                      tions Monte Carlo principle is typically engaged to gener-
        Figure 1: Working scheme of creating the training set.        ate multiple predictions. The final prediction is obtained
                                                                      as a mean value of k predicted values.
                                                                         Knowing how the prediction is obtained allows us to
         Classification model is trained with extracted feature       create hypothesis about the expected value and construct
      vectors to recognize faulty and non-faulty instances. Ar-       a confidence interval for the predicted value as shown in
      bitrary classifier can be used. The aim of this article is to   Figure 3.
      prove the concept that classification can be used for con-
      dition monitoring and thus maintenance strategy for sen-
      sor parts. Therefore for simplicity and interpretability the
      Naive Bayes classifier is applied.
         In Naive Bayes the instance is typically classified to a
      class with higher posterior probability. To increase con-
      fidence of positive classification the minimal threshold
      value of posterior probability can be set on the class with
      failed instances. With this threshold of minimal probabil-
      ity for positive class can be controlled trade-off between
      sensitivity and specificity of the naive Bayes classifier.
      With a higher threshold the classifier will be more certain
      about the prediction, however, it may mark more failures
                                                                      Figure 3: ARIMA model predictions with the confidential
      as non-faulty and vice versa.
                                                                      interval.

      4.2   Time-series Modeling-based approach
                                                                         If the actual reading of a sensor is out of the confidence
      The second approach basically follows the method sug-           interval of the corresponding predicted value the sensor is
      gested in [4]. It assumes that malfunction of a sensor is       marked as ‘faulty’.
      preceded by an abnormal behavior. The working princi-
      ple basically follows the common framework for anomaly
      detection. It uses time-series modelling in order to model      4.3   Autoencoder-based approach
      "normal" sensor behavior.
         A regressive model is trained on the historical measure-     The last suggested approach is, similarly to the previous
      ments of a specific sensor and used to generate predictions.    approach, based on an assumption that the failure of a sen-
      The ARIMA model[9, 10] is general regressive model              sor is preceded by its anomalous behavior. In this particu-
      popular in time-series modeling. Especially in cases, when      lar case auto-encoders are utilized to detect anomalies.
      the time-series contains significant regular patterns, which       Inputs to the autoencoder network are the raw values
      is more or less the case of sensor readings [4]. For that       from a sliding window drawn over historical measure-
      reason the general ARIMA model is used to obtain the            ments of the sensor. However, it is also possible to extract
      predictions.                                                    different features and use them as inputs of the autoen-
         Predicted values are compared with the actual readings       coder. As a result, this method requires a certain amount
      and if the difference is higher than a certain threshold,       of historical data, in order to train an autoencoder network.
      measurements are marked as faulty.                                 The whole working scheme is shown in Figure 4.
         The working scheme is depicted in Figure 2.                     The structure of an autoencoder is defined by following
         The ARIMA model prescription contains random mem-            parameters: size of the input and output layer, number of
      bers, therefore it is a stochastic process. In order to re-     hidden layers and number of nodes in the hidden layers.
128                                                                                                                      T. Kuzin, T. Borovička




      Figure 4: Working scheme of anomaly-detection-based
      approach.

                                                                                Figure 5: Histogram of reconstruction errors.
         Since the raw data from the sensor are used as inputs
      to the autoencoder network, the number of nodes in the
      input and also the output layer is determined by size of the        curve the TPR (i.e. True Positive Rate or Sensitivity) is
      sliding window. Influenced by [13] the autoencoder has              plotted in function of the FPR (False Positive Rate or (1-
      three hidden layers. The number of neurons is related to            Specificity)) for different setting of model’s parameters.
      the number of neurons in the input / output layer. Let n be
      the number of input respectively output neurons than the
      hidden layers have 0.75n, 0.5n, 0.75n neurons.
         The output of an autoencoder itself is not especially in-
      teresting. Rather a reconstruction error, defined as mean
      squared error between the real measurements and output
      of the autoencoder, is calculated. Let X = (x1 . . . xn ) be the
      input vector of an autoencoder network and X ′ = (x1′ . . . xn′ )
      is the corresponding output, the reconstruction error is

                                    1 n
                         RE(X) =        (xi − xi′ )2
                                    n∑1

         If the reconstruction error is higher than a certain thresh-
      old τ the current condition of a sensor is marked as
      "faulty".
         The threshold is estimated with a heuristic method. The            Figure 6: Classification-based approach - ROC curve.
      main idea is to consider the reconstruction error being a
      random variable. Then the underlying distribution of the
      random variable can be easily estimated. Having a distri-
      bution of the reconstruction error, if the value of the er-         5.2   Time-series Modeling-based Approach
      ror does not lie in a right-sided (upper) confidence interval
      with confidence level α it is marked "faulty".                      In order to evaluate this approach on predicting failures the
                                                                          method is evaluated as a binary classifier.
                         P(RE(X) < τ) = 1 − α                                A window of a size M is placed before the time of a
                                                                          failure and if an anomaly is within the window, the failure
        Figure 5 demonstrated the histogram which is used in              is considered as detected. If an anomaly is detected outside
      order to estimate the underlying distribution function of           of this window it is considered as false positive detection.
      the reconstruction error.                                              As presented in the section 4.2 this method marks as
                                                                          anomalies all the moments, where the actual reading is not
      5     Experimental Results                                          within the confidence interval.
                                                                             The level of significance α can be set explicitly, and its
      5.1   Classification-based Approach                                 effect can be examined. In Figure 7 are shown detected
                                                                          anomalies for α = 0.0015. The red segments mark the
      Having labeled data, performance of a classifier can be             times of failures.
      easily measured. TPR (true positive rate) is defined as                The experiment is repeated multiple times for different
      number of detected failures to the number of all failures           α. The results are presented by the ROC curve showed
      in a given dataset. FPR (false positive rate) is defined as         in Figure 8. Each point on the ROC curve represents a
      the number of positively identified to the number of all            TPR/FPR pair corresponding to a particular value of α. It
      negative samples in a dataset. The acquired results are pre-        demonstrates how the sensitivity versus specificity can be
      sented by the ROC curve showed in Figure 6. In a ROC                controlled by choosing the α.
Early Failure Detection for Predictive Maintenance of Sensor Parts                                                                     129




      Figure 7: Time-series modeling based approach - detected        Figure 9: Anomaly-detection-based approach - detected
      failures.                                                       failures.




      Figure 8: TS-modeling-detection-based approach - ROC            Figure 10: Anomaly-detection-based approach - ROC
      curve.                                                          curve.


      5.3   Autoencoder-based Approach
                                                                      applicable with various sensor devices. All three meth-
      In order to evaluate the autoencoder-based method, the          ods exploit different principles and hence have different
      same procedure as in the case of time-series modeling           assumptions and requirements.
      based approach, is used. In Figure 9 is shown the resulting        Classification based approach utilizes labels if available.
      series of the reconstruction errors. The red-marked points      If not this approach is not applicable. The other two ap-
      are the moments of failure(i.e. the event one intend to pre-    proaches are more general since they do not require any
      dict). It is visible that the time of failure is preceded by    meta-data and work just with the sensor measurements.
      significant raise of reconstruction error. However, there       However, both assume that the failure is preceded by
      are also other anomalous moments(peaks in the series of         anomalous behavior. The time series modeling approach
      reconstruction errors), that are not related to the incoming    exploits the fact that sensors measurements are in a form of
      failure of the sensor. Having the domain knowledge of the       time-series and often contain regular patterns, which man-
      sensor operation, those can be easily explained, since they     ifest themselves in a form of autocorrelations. Therefore
      are related to the observed phenomena.                          they it can be described by a model. The autoencoder-
         The ROC curve in Figure 10 presents the results of this      based approach contrary to the time-series modeling does
      approach. Sensitivity and specificity trade-off is controlled   not model the "normal" behavior.
      by the level o f signi f icance described in the the Section       All methods were able to detect failures before they oc-
      4.3.                                                            curred and thus proved to be applicable for condition mon-
                                                                      itoring and utilized for predictive maintenance of sensor
      6     Conclusion                                                parts. Further more, all the approaches can be parametrize
                                                                      to find an ideal trade-off between sensitivity and specificity
      Three different approaches to deal with the condition mon-      of the prediction. The best results has the approach based
      itoring and predictive maintenance of sensors have been         on classification. This can be expected considering the fact
      described and illustrated on real-world data. All those ap-     that, unlike the other two approaches, it uses additional
      proaches are chosen with a regard to be general and thus        meta-data (labels) about the sensor failures.
130                                                                            T. Kuzin, T. Borovička

      References
       [1] Inc., M. A. Common maintenance strategies. 2015, [On-
           line; accessed 17-January-2016]. Available from: ❤tt♣s✿
           ✴✴✇✇✇✳♠❛✐♥t❡♥❛♥❝❡❛ss✐st❛♥t✳❝♦♠✴
       [2] Kennedy, S. New tools for PdM. 2006, [Online; ac-
           cessed 17-February-2016]. Available from: ❤tt♣✿✴✴✇✇✇✳
           ♣❧❛♥ts❡r✈✐❝❡s✳❝♦♠✴❛rt✐❝❧❡s✴✷✵✵✻✴✵✼✷✴
       [3] Ni, K.; Ramantahan, N.; Nabil, M.; et al. Sensor Network
           Data Fault Types. ACM Transactions on Sensor Networks,
           volume 5, no. 3, May 2009.
       [4] Sharma, A. B.; Golubchi, L.; Govindan, R. Sensor Faults:
           Detection Methods and Prevalence in Real-World Datasets.
           ACM Transactions on Sensor Networks, volume 6, no. 3,
           June 2010.
       [5] Alpaydin, E. Introduction to machine learning. MIT Press,
           second edition, 2010, ISBN 978-0-262-01243-0.
       [6] Russell, S.; Norvig, P. Artificial Intelligence: A Modern
           Approach. Prentice Hall, second edition, 2003, ISBN 978-
           0137903955.
       [7] StatSoft. Naive Bayes Classifier. 2016, [Online; ac-
           cessed 19-February-2016]. Available from: ❤tt♣✿✴✴✇✇✇✳
           st❛ts♦❢t✳❝♦♠✴t❡①t❜♦♦❦✴♥❛✐✈❡✲❜❛②❡s✲❝❧❛ss✐❢✐❡r
       [8] Vu, K. M. Optimal Discrete Control Theory: The Rational
           Function Structure Model. Ottawa: AuLac Technologies,
           2007, ISBN 978-0-9783996-0-3, 51–99 pp.
       [9] Nau, R. Lecture notes on forecasting. 2014, [On-
           line; accessed 19-February-2016]. Available from:
           ❤tt♣✿✴✴♣❡♦♣❧❡✳❞✉❦❡✳❡❞✉✴⑦r♥❛✉✴❙❧✐❞❡s❴♦♥❴
           ❆❘■▼❆❴♠♦❞❡❧s✲✲❘♦❜❡rt❴◆❛✉✳♣❞❢
      [10] Lu, Y.; Simaan, M. A. Automated Box–Jenkins forecast-
           ing modelling. Elsevier Automation in Construction 18,
           November 2008: pp. 547–558.
      [11] Nielsen, M. Neural Networks and Deep Learning. De-
           termination Press, 2015. Available from: ❤tt♣✿✴✴
           ♥❡✉r❛❧♥❡t✇♦r❦s❛♥❞❞❡❡♣❧❡❛r♥✐♥❣✳❝♦♠✴✐♥❞❡①✳❤t♠
      [12] Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning,
           2016, book in preparation for MIT Press. Available from:
           ❤tt♣✿✴✴✇✇✇✳❞❡❡♣❧❡❛r♥✐♥❣❜♦♦❦✳♦r❣
      [13] Candel, A.; Lanford, J.; LeDell, E.; et al. Deep Learning
           with H2O. 2015, third Edition. Available from: ❤tt♣s✿✴✴
           ❤✷♦✳❣✐t❜♦♦❦s✳✐♦✴❞❡❡♣✲❧❡❛r♥✐♥❣✴
      [14] Fu, T.-c. A review on time series data mining. Engineer-
           ing Applications of Artificial Intelligence, volume 24, no. 1,
           2011: pp. 164–181.
      [15] Chen, Y.; Nascimento, M. A.; Ooi, B. C.; et al. Spade: On
           shape-based pattern detection in streaming time series. In
           Data Engineering, 2007. ICDE 2007. IEEE 23rd Interna-
           tional Conference on, IEEE, 2007, pp. 786–795.
      [16] Xing, Z.; Pei, J.; Philip, S. Y.; et al. Extracting Interpretable
           Features for Early Classification on Time Series. In SDM,
           volume 11, SIAM, 2011, pp. 247–258.