<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fast Predictive Maintenance in Industrial Internet of Things (IIoT) with Deep Learning (DL): A Review</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thomas Rieger</string-name>
          <email>thomas.rieger@plymouth.ac.uk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefanie Regier</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ingo Stengel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nathan Clarke</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Karlsruhe University of Applied Sciences</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computing and Mathematics, Plymouth University</institution>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>69</fpage>
      <lpage>79</lpage>
      <abstract>
        <p>Applying Deep Learning in the field of Industrial Internet of Things is a very active research field. The prediction of failures of machines and equipment in industrial environments before their possible occurrence is also a very popular topic, significantly because of its cost saving potential. Predictive Maintenance (PdM) applications can benefit from DL, especially because of the fact that high complex, non-linear and unlabeled (or partially labeled) data is the normal case. Especially with PdM applications being used in connected smart factories, low latency predictions are essential. Because of this real-time processing becomes more important. The aim of this paper is to provide a narrative review of the most current research covering trends and projects regarding the application of DL methods in IoT environments. Especially papers discussing the area of predictions and realtime processing with DL models are selected because of their potential use for PdM applications. The reviewed papers were selected by the authors based on a qualitative rather than a quantitative level.</p>
      </abstract>
      <kwd-group>
        <kwd>Predictive Maintenance</kwd>
        <kwd>Industrial Internet of Things</kwd>
        <kwd>IIoT</kwd>
        <kwd>Deep Learning</kwd>
        <kwd>Real-time</kwd>
        <kwd>Data Streams</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This paper provides an analysis of selected literature applying DL techniques and
Artificial Neural Networks (ANN) in the field of industrial IoT (IIoT) to produce fast
predictions as required, among others, in maintenance applications. PdM attempts to
predict failures before their possible occurrence to avoid unscheduled outages of
machines and plants. The aim is to avoid breakdowns by their timely prediction and
maximizing the service life at the same time. The predictions are based on data
comprising accumulated knowledge and current conditions.</p>
      <p>
        IIoT environments produce massive amounts of data. The necessity to perform data
analytics on such massive data brings the characterizing features of Big Data into
play, like the "5V's" volume, variety, velocity, variability, and veracity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The high
volume and the high complexity of data put massive demands on existing data
processing techniques. Additionally, evolving data streams and real-time demands
intensify the demands even more [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Sensors typically generate continuous streams of
data. The term of data streams refers to data continuously generated typically at a high
rate [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. In fully automated industrial environments, obtaining information in
realtime and react immediately becomes indispensable. In IIoT environments Machine to
Machine (M2M) communication has high significance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Intelligent sensors and
devices not only sending data but communicating with their environment, anticipate
immediate responses. In such IIoT Environments the characteristic of taking a
snapshot of the entire data set and performing calculations with unpredictable response
time contrasts with the demand for real-time communication and the presence of
continuous flowing data streams [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To cope with such demands self-adaptive algorithms
continuously learning and improving their models are essential. In addition, such
algorithms should provide high performance and real-time behaviour. This is not only
true when they are running on powerful cloud systems but also on fog and edge
systems or IoT devices [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>The methodological approach of this paper is a narrative review. The reviewed papers
were selected by the authors based on a qualitative rather than a quantitative level.
Papers covering the most current research for the topic fast predictions in IIoT with
DL were given priority. There are many papers covering the topic of DL in (I)IoT. To
the best of our knowledge, there is no paper in literature covering the specific topic of
PdM in connection with DL an (I)IoT.</p>
      <p>This review provides a classification of different DL approaches mentioned for use in
industry und IoT. It also covers the topics of real-time processing and data streams in
regard to the mentioned DL approaches. Techniques intended to improve the
realtime and stream processing ability of different approaches mentioned in the reviewed
papers are evaluated and classified. Special focus is set on the ability of the mentioned
approaches to provide predictions. The paper concludes with a summary and outlook
on future developments.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Deep Learning Approaches in Industrial Internet of</title>
    </sec>
    <sec id="sec-3">
      <title>Things</title>
      <p>This section starts with a short introduction into DL and ANN. A classification of
different DL methods mentioned for the use in industry und IoT will then be
provided. The classification will be done by the theoretic approaches, application areas and
strength and weaknesses in regard to the demands of PdM in IIoT environments. The
reviewed papers covering the topics of DL methods in Cyber Physical Systems (CPS),
IoT, Industry 4.0 (I4.0), as well as the topics of real-time and data stream processing.</p>
      <p>
        DL can be defined as a subcategory of Machine Learning (ML) whereas ML is a
segment in the field of Artificial Intelligence (AI). DL itself is often defined as a class
of optimized ANNs comprising numerous layers (hidden layers). The high number of
layers and neurons allow the abstraction of more complex problems and support
further characteristics like the ability to unsupervised learning or automatic feature
extraction [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Examples are Deep Neural Networks (DNN), Deep Belief Networks
(DBN) or Recurrent Neural Networks (RNN).
      </p>
      <p>
        The basic idea behind an ANN is to imitate the biological neural network in
mammalian brains. Components of an ANN are neurons (in ANNs often called nodes) and
connections between those nodes. The nodes are organized in layers producing
nonlinear output data based on the input data. The connections between the nodes transfer
the output of one node to the input of another node. Weights assigned to each
connection determine the relevance of the transferred signal. As in biological neural
networks the output signal of a neuron (node) is ruled by a threshold function. To set up
an ANN all weights have to be set to an initial value (often just simple estimates). By
training the network those weights are adjusted in a holistic way following a defined
learning rate to achieve a valid and balanced network. This is also often referred to as
“connections developing over time with training". ANNs are known for more than 50
years and various ways have been developed since [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] the following DL models are listed for IoT application: Auto-encoder (AE),
RNN, Restricted Boltzmann Machine (RBN), DBN, Long Short-term Memory
(LSTM), Convolutional Neural Network (CNN), Variational Auto-encoder (VAE),
Generative Adversarial Network (GAN) and Ladder Net. The DL models are
categorized in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] into the three main groups of generative approaches (AE, RBM, DBN,
VAE), discriminative approaches (RNN, LSTM, CNN) and hybrid (GAN, Ladder
Net) as a combination of the two approaches mentioned before. This categorisation
mainly refer to the underlying learning method whereas generative approaches
basically follow the principle of unsupervised learning and discriminative approaches
follow the principle supervised learning. Beside the definition of the required number
of layers (complexity) the underlying learning method is a decisive factor for the
selection of a DL approach. The categorization in generative and discriminative
approaches chosen by [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] can be fundamentally found in many other works. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]
different DL models are also categorized by their suitability in IoT applications. The
relevant characteristics mentioned in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] are the ability to work with (partially)
unlabelled data (feature extraction, feature discovery), the magnitude of needed training
dataset, dimensionality reduction abilities, the ability to deal with noisy data and time
series data and their general performance classification. For the reduction of high
dimensional data and to cope with unlabelled data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] recommends the combination
of RNN with DBN and AE. If the system is meant to make predictions like in PdM
systems, DBN and AEs are often used as an upfront layer providing classified data to
a subsequent RNN [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>
        In case of spatial-temporal data like mobility data, RNNs are recommended
because they show good results when data is developing in a sequential way. But if data
also comprises long term dependencies, RNNs are not a good choice because RNNs
does not memorize previous states and results [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. An approach to handle sequential
data streams from human mobility and transportation transition models containing
long term dependencies (behaviours) is described in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The described solution is a
combination of RNN with LSTM in the form of a specialized RNN architecture.
Besides the ability to handle long term dependencies the LMST also adds labelling and
predictive functionality to that combination. The combination of RNN with LSTM to
cope with data streams or time-series data comprising long-term dependencies (like
certain behaviours or wear and tear of machineries) can be found in many other works
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        The paper “IoT Data Analytics Using Deep Learning” [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] describes how to select
the right ANN to archive predictions from data streams and time-series data. To
retrieve trends and predictions and also validate those trends and predictions in parallel
by anomaly detection, a combination of LSTM with Naive Bayes models is proposed.
The LSTM produces the predictions on data streams whereas the Naive Bayes model
is responsible for anomaly detection performed on the results of the LSTM.
      </p>
      <p>
        This paper also reflects on the fact that Simple Feedforward ANN (FNN) like
Single-layer Perceptron (SLP) and Multi-layer Perceptron (MLP) using standard
backpropagation (BP) for training are often not a good choice because they does not
perform well in complex situations and on data streams with long-term dependencies.
This is especially true when data streams comprise time series data and the aim of the
model is to predict future events or trends. Data streams and time-series data usually
have dependencies over time. Such dependencies are typical for IoT data and provide
relevant insights. In simple ANNs data moves straight through the layers with the
assumption that input data is independent from output data. Because of this, there is
no way to remember previous input and output states (previous results). This is bad if
previous data is linked to current data. Using RNN instead can archive better results
in data streams and time-series data. Because the connections between nodes in a
RNN are in the form of sequences or loops, it is possible to remember previous states.
To avoid gradient explosions normally only a view states are remembered. Therefore
only short-term dependencies are recognized. Because of this [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] recommends the
application of LSTM in complex IoT environments to recognize long-term
dependencies in the data. LSTM are a variant of RNN introducing memory units. Those
memory units are able to remember important previous states and forget the
unimportant ones [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        To predict the behaviour of energy systems in the manner of smart grids [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
remark that more intelligent systems are necessary to produce accurate predictions on
the future energy consumptions. In the paper “Deep learning for estimating building
energy consumption” [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] it is stated that ANN-based prediction methods are a
promising approach because of their ability to handle massive and highly non-linear time
series data coming from different heterogeneous data sources (e.g. SmartMeter) and
containing a lot of uncertainty (unlabelled data). In the paper [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] they benchmarked
two different approaches of the RBN, namely Conditional Restricted Boltzmann
Machine (CRBM) and Factored Conditional Restricted Boltzmann Machine (FCRBM),
on a synthetic benchmark dataset. Based on this experiment the authors come to the
conclusion that FCRBN outperforms in comparison to RNN, Support Vector Machine
(SVM), as well as CBRM because of its added factored conditional history layer. A
RBM is a stochastic ANN consisting of two layers, a visible layer and a hidden layer.
In simple terms, the visible layer of a RBM contains a node for each possible value in
the input data whereas the hidden layer defines categories of values. Because in a
RBM each visible layer node is connected to any hidden layer node a RBN is good in
feature classification, feature extraction and complexity reduction (by identifying the
most important features). For DL RBMs can be stacked. In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] RBM is extended by
a conditional history layer (CRBM) enabling the RBN to detect long-term
dependencies in time-series data. Additionally the output of one stacked CRMB layer is
factored (FCRBM) to reduce the number of possible compositions.
      </p>
      <p>
        Another paper in the field of energy management also emphasizes the very
powerful forecasting abilities of DL. In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] the application of AE and LSTM is described
for predicting the power generation of solar systems. The accuracy reached by a
combination of AE and LMST (Auto-LSTM) is compared to other neural networks
(namely MLP) as well as to a physical model. The benchmark data is taken from 21
real solar power plants and the benchmark is taken from an experimental setup
described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The following measurements are taken as benchmarks: average
rootmean-square deviation (RMSD), average mean absolute error (MAE), average
absolute deviation (Abs. Dev.), average BIAS and average correlation. The measured
results show that all ANN- and DL-based models show far better results than the
physical model. Among all ANN- and DL-based models Auto-LSTM is the best choice in
this specific scenario and specific data set. The capability to extraction features in
unlabelled data is mentioned as a decisive factor in making predictions.
      </p>
      <p>
        The paper “An enhancement deep feature fusion method for rotating machinery
fault diagnosis “ [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] points out the strength of AEs in feature extraction and feature
learning. The paper describes how to further improve the feature learning ability with
reduced influence of background noises by stacking Deep AE (noise reduction) and
contractive AE (enhanced feature recognition), called deep feature fusion method.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Fast Predictions using DL</title>
      <p>
        In many IoT applications real-time processing is essential. For example in a PdM
system high latency could lead to unintentional reactive maintenance because of
insufficient lead time to plan the maintenance tasks [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. How fast real-time processing
needs to be, strongly depends on the application case. According [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] in micro
manufacturing systems, where vast volumes of micro parts are manufactured with high
speed, the term real-time means microseconds. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] shows that with systems for fault
detection and PdM the rejection rate of the manufactured micro parts decrease by
increasing processing speed [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In other scenarios, the term of real-time can mean
seconds, minutes or hours. For example in PdM Applications for offshore wind
turbines the frequency with which the data is available is mostly minutes and hours [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        The paper “Metro Density Prediction with Recurrent Neural Network on Streaming
CDR Data” [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] describes the implementation of a real-time public transportation
crowd prediction system using a weight-sharing recurrent neural network in
combination with parallel streaming analytical programming. Fast response time to emergent
situations (e.g. entrance records in metro stations combined with telecommunication
data) demand real-time analysis. The use of a powerful neural network model with
strong learning capability offers a wide range of new insights but contrast with the
need for fast response time. The way to meet this goal is described in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] with three
steps: a) adopting a RNN model to improve its ability to work on data streams, b)
implement strategies for parallelization of RNNs and c) the use of parallel streaming
analytical algorithms over a cloud-based stream processing platform. In the project
described in [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] each metro station is modelled by an independent RNN. Shared
layers are introduced to share weights from stations which are in similar “situations”
(e.g. a downtown station during rush hour) across several models dynamically.
Weight-sharing also enables co-training in parallel [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        The application of RNNs and their many variations for fast data analytics is also
recommended in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Especially on typical sensor data like serial data, time-series
data and data streams, RNNs can provide better performance than other models. Such
sensor data is dominating in most PdM applications [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In order to be able to develop and permanently adapt models on massive data
comprising the behaviour of people and their spatial and temporal attributes together with
transportation capacities, real-time processing and real-time learning capabilities are
essential. The paper [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] describes a multi-task deep LSTM learning architecture. The
basic idea of this concept is not to use a joint feature vector but various LSTM tasks
separated by their domain (e.g. respectively a separate task for mobility and
transportation mode prediction). This architecture performs parallel learning whereas the
results are aggregated depending on the intended insights [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        Assistance systems in cars like traffic sign recognition must deliver accurate results
with low latency. The paper [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] describes how to apply DNN in this field. The model
of the system is continuously updated (online learning) and fed only with completely
unlabelled data (raw images). A CNN with 9 layers is used for image recognition. To
improve the performance of system max-pooling layers are combined with
convolutional layers in an alternating way. The convolutional layers perform convolution on
2D input pixel maps. The max-pooling layer works like a pre-processor between two
convolutional layers transforming the output of a preceding convolutional layer to the
input of a subsequent convolutional layer by eliminating overlapping regions in the
pixel maps. This eliminates redundant processing in the complex and time consuming
convolutional layers. The approach described in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is referred to as Multi-Column
DNN (MCDNN).
      </p>
      <p>
        The paper [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] describes a real-time oriented solution for traffic sign detection and
recognition. The primary focus is on the need for parallel processing because of the
need to detect diverse traffic sign at the same time. In this approach also CNN is used
for image processing in combination with AdaBoost to improve performance and
parallel GPU processing.
      </p>
      <p>
        Because of its memory cells LSTM models are good if data comprises long-term
dependencies. If the data structure allows the separation of single entities with their
specific behaviour as well as the formation of groups of entities, it could be then
possible to process each entity and every group with its own neural network. This opens
up parallel processing possibilities of the single neural networks. Normally each
single and parallel processed neural network provides its result to an aggregation layer
aggregating all outputs to an overall result. The paper “A Hierarchical Deep Temporal
Model for Group Activity Recognition” [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] describes how to recognize situations in
a volleyball match. One LSTM model per player predicts the behaviour of this player,
remembering his previous behaviour in the match (long-term dependencies). Each
single situation of the match is then modelled as a group of the players. The LSTMs
are hierarchically ordered where the LSTM models of all involved players are
subordinated to a scene. The scenes and the players behaviour is extracted based on images
using CNN [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>
        The paper [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] mentions that because of the demands for real-time processing, the
organization of layers and connections have changed. Fully connected networks
where each node of a layer is connected to all nodes of the subsequent layer can
handle complex problems but also demand a lot computing power. Dropout all
connections not really influencing the result is a strategy to reduce the complexity of a DL
network, and therefore its computing demand, without affecting accuracy in a relevant
manner. Besides dropout [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] also mention max pooling layers, batch normalization
and transfer learning as additional strategies for performance optimization.
      </p>
      <p>
        Despite all the mentioned papers discussing performance enhancements and
realtime abilities of DL models, [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] considers that highest accuracy still stands over all
in mostly all current DL projects. The paper “An Analysis of Deep Neural Network
Models for practical Applications” [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] argues that numerous DL approaches
described in literature are simply not suitable for practical use. This is for example
because of their long processing time or excessive power consumption. In his paper he
demands to spend more attention to performance issues because they are key factors
in practical DL applications. The paper compares 14 different specific DL projects
like AlexNet or GoogLeNet by comparing their accuracy, memory footprint,
parameters, operations count, inference time, and power consumption. The paper shows that
a small increase in accuracy lead to an enormous increase in computational power and
computation time. It is recommended to define a maximum energy consumption for
each DL project and adjust the accuracy according to it [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper we provided a narrative review of selected literature applying DL
techniques in the field of IIoT to produce fast predictions of maintenance issues. The
papers have shown that the use of DL in IoT and PdM is a vital topic in industry. Many
different applications are in use in practice and are constantly being developed and
improved.</p>
      <p>Frequently reported are combinations of different DL models to combine different
advantages and strengths in one application. Also, the need for real-time processing of
complex data and data streams has been demonstrated in certain application scenarios.
This include in particular applications for predictions such as PdM. In order to
increase the real-time capability, concepts of parallel DL networks using a final
aggregation layer, or intermediate layers for the reduction of complexity are frequently
used. Although many activities can be observed in the area of real-time processing of
DL models, there are also critical voices criticizing the absolute focus on accuracy
and calling for a greater focus on performance and lighter applications suitable for
practical use. Almost all reports agree that a lot of research is still needed in this area.</p>
      <p>LSTM, RNN</p>
      <p>
        LMST, RNN
[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
Mocanu,
et al., 2016
      </p>
      <p>RBM,
CRBM,</p>
      <p>
        FCRBM
[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] Gensler, DBN,
Autoet al., 2016 LSTM
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] Shao,
et al., 2017
[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] Liang,
et al., 2016
      </p>
      <p>AE</p>
      <p>
        RNN
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] Ciresan, CNN
et al., 2012
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] Lim,
et al., 2017
      </p>
      <p>
        CNN
[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] Ibrahim, CNN, LSTM
et al., 2016
      </p>
      <p>Image recognition in real-time in
combination with max-pooling layers, good for
short-term dependencies, not good for
predictions
Image recognition in real-time in
combination with max-pooling layers, good for
short-term dependencies, not good for
predictions
CNN for Image recognition
LSTM for predictions considering
longterm dependencies; hierarchical LMST
model for individuals and group
behaviours
LMST for data containing long term
dependencies; time-series and IoT data
Streams; LSTM adds labelling and
predictive functionality in combination with
RNN
RNN good when sequential data and data
streams
LMST and RNN suitable for time-series
and IoT data Streams
RBM for feature extraction, dimensional- Predictive IoT
applicaity reduction, classification tions e.g. for smart
CRBM extends RBM with long-term cities or smart energy
predictions by adding a conditional histo- grids
ry layer
FCRBM improves performance by
reducing the number of possible compositions
of each output layer in a stacked (C)BRM
DBN perform good for predictions on
time-series data
Auto-LSTM for predictions on
timeseries data, combination of AE and
LSTM
Good for feature extraction, unsupervised IoT applications like
learning, noise reduction and compres- fault diagnosis
sion (relevant feature detection), often
used as pre-processing layer for
complexity reduction, short-term dependencies
only, not good for predictions
Adopted RNNs used for data streams and
weight-sharing, as well as co-training in
parallel</p>
      <p>Predictive IoT
applications like power
generation forecasts
IoT, Transport, Mobility
Predictions because of
long-term dependencies
in data
RNN for short-term IoT
applications like
condition monitoring
Applications running
parallel RNNs with
shared layers
Cloud-based stream
processing
Real-time and parallel
processing IoT
applications like traffic sign
recognition
Real-time and parallel
processing IoT
applications like traffic sign
recognition
Recognition of
individuals and groups e.g. to
determine current
behaviour or dynamics</p>
      <p>
        Table 1 gives an overview of the reviewed papers with the DL-Methods
mentioned. For each paper the characteristics (or strength and weaknesses) as well as the
recommended application areas (like predictions) of the DL-Methods mentioned in
the corresponding paper are summarized. Table 1 makes no statement regarding the
validity of results in a quantitative way. The categorisation of the different DL models
is only made in a qualitative way. This is because among all reviewed papers only in
[
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] concrete measured values are defined. All other papers solely provide qualitative
statements. How to measure and evaluate the validity and quality of results of
different DL methods is an open question [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. So far, few approaches for measuring,
evaluating and benchmarking have been developed. Moreover, those approaches are
usually not verifiable as generally valid. For instance, in the case of classifications the
use of accuracy estimation techniques, such as the "holdout method" or "n-fold
crossvalidation", can be used to evaluate performance, predictive ability and model
accuracy [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. As such, mentioned techniques divide a training set via varying approaches
into data areas for learning and validation. For most models no measuring, evaluating
and benchmarking concept has yet been defined. In general, the evaluation is done
here by expert opinions [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The paper [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] points out that there is a demand for
improved measuring and benchmark methods. Proven measurement methods to
generate representative benchmarks are needed in order to be able to assess DL models.
      </p>
      <p>
        The papers [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] to [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] are not part of Table 1 because
they are used as reference regarding basic statements and explanations made in this
paper. These papers were not on the topic of DL methods and techniques.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Pusala</surname>
          </string-name>
          ,
          <string-name>
            <surname>Murali</surname>
          </string-name>
          , et al.
          <year>2016</year>
          .
          <article-title>Massive Data Analysis: Tasks, Tools, Applications and Challenges. Big Data Analytics</article-title>
          . s.l. : Springer Verlag,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2. Zhang, Liangwei, et al.
          <year>2017</year>
          .
          <article-title>Sliding Window-Based Fault Detection From HighDimensional Data Streams</article-title>
          .
          <source>IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS</source>
          .
          <year>2017</year>
          , Bd.
          <volume>47</volume>
          , 2.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Krawczyk</surname>
          </string-name>
          , Bartosz und Wozniak, Michal.
          <year>2015</year>
          .
          <article-title>Data stream classification and big data analytics</article-title>
          .
          <source>Neurocomputing</source>
          .
          <year>2015</year>
          ,
          <volume>150</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ait-Alla</surname>
          </string-name>
          , Abderrahim, et al.
          <year>2015</year>
          .
          <article-title>Real-time fault detection for advanced maintenance of sustainable technical systems</article-title>
          .
          <source>Procedia CIRP</source>
          .
          <year>2015</year>
          ,
          <volume>41</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Bauer</surname>
          </string-name>
          , Dennis, Stock, Daniel und Bauernhansl, Thomas.
          <year>2017</year>
          .
          <article-title>Movement towards service-orientation and app-orientation in manufacturing IT</article-title>
          .
          <source>10th CIRP Conference on Intelligent Computation in Manufacturing Engineering - CIRP ICME '16</source>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mohammadi</surname>
          </string-name>
          ,
          <string-name>
            <surname>Mehdi</surname>
          </string-name>
          , et al.
          <year>2018</year>
          .
          <article-title>Deep Learning for IoT Big Data and Streaming Analytics: A Survey, IEEE COMMUNICATIONS SURVEYS</article-title>
          &amp; TUTORIALS, arXiv:
          <fpage>1712</fpage>
          .04301v2
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>L. N.</given-names>
          </string-name>
          , et al.
          <year>2015</year>
          .
          <article-title>Risk Perceptions for Wearable Devices</article-title>
          . Cornell University Library. [Online]
          <year>2015</year>
          . http://arxiv.org/pdf/1504.05694.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <surname>Xuan</surname>
          </string-name>
          , et al.
          <year>2016</year>
          ,
          <article-title>DeepTransport: Prediction and Simulation of Human Mobility and Transportation Mode at a Citywide Level, Center for Spatial Information Science</article-title>
          , The University of Tokyo, Japan
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <surname>Xiaofeng</surname>
          </string-name>
          , et al.
          <year>2017</year>
          ,
          <article-title>IoT Data Analytics Using Deep Learning, Key Laboratory for Embedded and Networking Computing of Hunan Province</article-title>
          , Hunan University.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Mocanu</surname>
          </string-name>
          ,
          <string-name>
            <surname>Elena</surname>
          </string-name>
          , et al.
          <year>2016</year>
          ,
          <article-title>Deep learning for estimating building energy consumption</article-title>
          , Department of Electrical Engineering, Eindhoven University of Technology, The Netherlands
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Gensler</surname>
          </string-name>
          ,
          <string-name>
            <surname>André</surname>
          </string-name>
          , et al.
          <year>2016</year>
          ,
          <article-title>Deep Learning for Solar Power Forecasting - An Approach Using Autoencoder and LSTM Neural Networks</article-title>
          ,
          <source>2016 IEEE International Conference on Systems, Man, and Cybernetics • SMC 2016 | October 9-12</source>
          ,
          <year>2016</year>
          • Budapest, Hungary
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Haidong</surname>
          </string-name>
          , et al.
          <year>2017</year>
          ,
          <article-title>An enhancement deep feature fusion method for rotating machinery fault diagnosis</article-title>
          , School of Aeronautics, Northwestern Polytechnical University, 710072 Xi'an, China
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. Rippel, Daniel, Lütjen, Michael und Freitag, Michael.
          <year>2015</year>
          .
          <article-title>SIMULATION OF MAINTENANCE ACTIVIES FOR MICRO-MANUFACTURING SYSTEMS BY USE OF PREDICTIVE QUALITY CONTROL CHARTS</article-title>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Freitag</surname>
          </string-name>
          ,
          <string-name>
            <surname>Michael</surname>
          </string-name>
          , et al.
          <year>2015</year>
          .
          <article-title>A Concept for the Dynamic Adjustment of Maintenance Intervals by Analysing Hereogenoeous Data</article-title>
          .
          <source>Applied Mechanics and Materials</source>
          .
          <volume>794</volume>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Liang</surname>
          </string-name>
          , Victor C., et al.
          <year>2016</year>
          ,
          <article-title>Mercury: Metro Density Prediction with Recurrent Neural Network on Streaming CDR Data</article-title>
          ,
          <source>ICDE 2016 Conference 978-1-5090-2020-1/16</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ciresan</surname>
          </string-name>
          ,
          <string-name>
            <surname>Dan</surname>
          </string-name>
          , et al.
          <year>2012</year>
          <article-title>, Multi-Column Deep Neural Network for Traffic Sign Classification, IDSIA - USI - SUPSI | Galleria 2</article-title>
          , Manno - Lugano 6928, Switzerland
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Lim</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kwangyong</surname>
          </string-name>
          , et al.
          <year>2017</year>
          ,
          <article-title>Real-time traffic sign recognition based on a general purpose GPU and deep-learning</article-title>
          , Department of Computer Science, Yonsei University, 50 Yonsei-ro Seodaemun-gu, Seoul, Republic of Korea
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Ibrahim</surname>
          </string-name>
          , Mostafa S., et al.
          <year>2016</year>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Hierarchical Deep</surname>
          </string-name>
          <article-title>Temporal Model for Group Activity Recognition</article-title>
          , School of Computing Science, Simon Fraser University, Burnaby, Canada
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Canziani</surname>
          </string-name>
          ,
          <string-name>
            <surname>Alfredo</surname>
          </string-name>
          , et al.
          <year>2016</year>
          ,
          <article-title>AN ANALYSIS OF DEEP NEURAL NETWORK MODELS FOR PRACTICAL APPLICATIONS</article-title>
          , Weldon School of Biomedical Engineering Purdue University, Faculty of Mathematics, Informatics and Mechanics University of Warsaw, arXiv:
          <fpage>1605</fpage>
          .07678v4
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Krawczyk</surname>
          </string-name>
          , Bartosz und Wozniak, Michal.
          <year>2015</year>
          .
          <article-title>Data stream classification and big data analytics</article-title>
          .
          <source>Neurocomputing</source>
          .
          <year>2015</year>
          ,
          <volume>150</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Bhatia</surname>
          </string-name>
          ,
          <string-name>
            <surname>Nidhi</surname>
          </string-name>
          , et al. (
          <year>2015</year>
          ),
          <article-title>Deep Learning Techniques and its Various Algorithms</article-title>
          and Techniques,
          <source>International Journal of Engineering Innovation &amp; Research</source>
          , Volume
          <volume>4</volume>
          ,
          <string-name>
            <surname>Issue</surname>
            <given-names>5</given-names>
          </string-name>
          , ISSN:
          <fpage>2277</fpage>
          -
          <lpage>5668</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Chatfield</surname>
          </string-name>
          , ken, et al. (
          <year>2014</year>
          ),
          <article-title>Return of the Devil in the Details: Delving Deep into Convolutional Nets</article-title>
          , Visual Geometry Group, Department of Engineering Science, University of Oxford, arXiv:
          <fpage>1405</fpage>
          .3531v4
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23. DEVCOONS Website, http://www.devcoons.
          <article-title>com/literature-review-of-deep-machinelearning-for-feature-extraction/</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>