<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>A. B. Diallo);
nakagawa@ist.osaka-u.ac.jp (H. Nakagawa);
t-tutiya@ist.osaka-u.ac.jp (T. Tsuchiya)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Preemptive Anomaly Prediction in IoT Components</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alhassan Boner Diallo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hiroyuki Nakagawa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tatsuhiro Tsuchiya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Osaka University</institution>
          ,
          <addr-line>Osaka</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0001</lpage>
      <abstract>
        <p>The Internet-of-Things (IoT) has become a very promising and fruitful area of research. The rapid development of IoT is revolutionizing our daily utilization of technology in every way. The IoT paradigm is that the devices making up an IoT system have resource constraints such as storage, computing and energy consumption. That paradigm makes possible a lfexible and pervasive communication between devices that are bound to low resources. These constraints may create a state where there is anomaly occurrence on the component level that may impact the whole system. Some innovative techniques have been proposed to quantify the reliability of these devices for the aforementioned constraints. However, there is a gap between the quantification of the component reliability and the predictive and preemptive maintenance of these components. In this study, we propose an approach combining reliability quantification and reinforcement learning to build a mechanism that can achieve a predictive maintenance for the components of an IoT system such as devices and links. In the approach, a component-level mechanism is built to synthesize the reliability data, and to determine the probability of anomaly occurrence for each component. The approach is being applied to a self-adaptive IoT system for smart environment monitoring named DeltaIoT.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;self-adaptive systems</kwd>
        <kwd>IoT</kwd>
        <kwd>preliability</kwd>
        <kwd>reinforcement learning</kwd>
        <kwd>q-learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>based on the data they collect and provide. The
reliability of the IoT systems depends on the reliability of
Recently, the Internet of Things (IoT) has been one of the components that make up the system. As the IoT
the fastest growing fields in the computing domain. Its devices are constrained by nature, there must be some
paradigm has been applied to many critical applications mechanism in place to ensure their reliability at all time,
such as early warning systems for earthquake or tsunami, in order to have accurate decision models based on the
smart home security, trafic management, healthcare, and data provided by the lower layer of the IoT architecture.
education systems, etc. Despite a rapid development and IoT reliability is a critical domain of research that has
improvement in the IoT research area, many challenges seen a lot of important contributions over the years.
Mulremain. The challenges faced in IoT are related mainly tiple ways of quantifying the reliability of IoT
compoto the following properties: scalability, availability, reli- nents have been proposed. However, there is a gap
beability, interoperability, security, mobility, performance, tween that quantified reliability and its application in
preetc. dictive maintenance. In other words, how can we predict</p>
      <p>The IoT infrastructure is made up of low resource an accurate maintenance date for IoT components, based
devices, meaning that they have low storage and low on the reliability measurement? To achieve that, we must
computing power compared to other devices within the build first mechanisms that can synthesize the
reliabilcomputing domain. This is the result of the desire to ac- ity information from anomalies to determine whether
commodate the energy consumption as most of the com- the system has become less reliable from that anomaly
ponent rely on battery to power them up[1][2]. Nowa- occurrence. The ability to reason about the quantified
days, the IoT paradigm is applied to many mission-critical reliability of the IoT system is a valuable step towards
systems, such as factory management, personal body sen- achieving predictive maintenance. The idea here is to
sors in healthcare, surveillance systems in nuclear power build a dynamic decision-making process that can collect
plants. These areas of application require a failure free reliability data in a periodic manner and try to estimate
system; otherwise there will be disastrous consequences. a future failure time.</p>
      <p>We must be able to trust these systems in all conditions Fundamentally, we can define reliability as the study
as they impact the way we make numerous decisions of failures. The reliability of a system or a computing
device is its quality over a certain period of time. To
quantify the reliability of a system or computing device,
we use standard metrics all related to time like Mean
Time To Failure, Mean Time Between Failures, and Mean
Time To Repair, etc. Quantifying reliability is essential
to assessing the continued success in the operation of an
information system or a computing device.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>ity depends on the reliability of the device layer and the
network layer. For example the device layer collects and
Computing systems require a high degree of performance transmits anomalous data, which are sent through the
and availability, but above all, they must be reliable. The network to the application layer. Beyond being able to
appropriate way of assessing the reliability of a com- reason about the fitness of our IoT devices, we must also
puting system depends on the type and mission of the be able to attest to the reliability of the network
infrassystem. In their study, Xie et al. [3] addressed several tructure that forms the backbone of IoT communication.
key metrics for reliability quantification. Some of these There are two approaches of network reliability studies
key metrics are Mean Time To Failure (MTTF), Mean which are discussed in this section; studies for
enhancTime Between Failures (MTBF), failure rate. The MTTF ing QoS in networks, and studies aimed at quantifying
metric quantifies the expected operating time of a sys- reliability metrics for networks. Some research has also
tem before the occurrence of a failure. The MTBF metric been conducted to evaluate IoT reliability at a system
as the name indicates, quantifies the operating time be- level. These approaches are at a high level and do not
tween one failure occurrence to another. The failure rate capture the individual detail for reliability, such as which
function helps to quantify the failure of a system within devices are responsible for failures, or which parts of the
a specified window of time. The maintainability metric network are responsible for trafic problems.
quantifies the probability that a system can go back to
operating normally after the occurrence of an anomaly or a
failure. The availability metric quantifies the probability 3. Focus: Anomaly Prediction
of the system being expected normally operating.</p>
      <p>The methods and techniques to analyze the reliability In our study, we consider the types of anomalies
accordof computing systems depend on the domains that make ing to where and how frequent they occur. Anomalies
up the system. There are mainly four domains or level: can occur on each layer of the architecture with diferent
system, hardware, software, and network. The assess- degree of frequency. The device layer and network layer
ment of the reliability at a system level is the result of of the architecture are where anomalies occur the most,
the combined assessment of the hardware, software, and whereas the application layer is less prone to anomalies.
network levels. In the hardware domain, the reliability As for the occurrence frequency, we consider two main
assessment is related to the decay of the quality over time forms of occurrence in the IoT components: cyclic
anomaof the physical components of the computing system. In lies and random anomalies. The former type of anomalies
the software domain, according to the study in [4], there are linked to the nature of the component itself. Each
is no concern over a physical decay of the quality over component has a starting time and an ending time. The
time. As for the network reliability, it may be subjected probability of anomaly occurrence is very small when
to a decrease of performance over time due to internal the reliability is quantified closed to the starting time. On
and external factors on the hardware and software that the other hand, the probability is great when quantified
make up the network. towards the ending time. The latter type of anomalies,</p>
      <p>In the case of IoT systems, their reliability can be as- called random anomalies, stem from random external as
sessed by quantifying the reliability of the diferent lay- well as internal factors, like noise, interference, etc.
ers of their architecture. In [5], the IoT functionalities Our approach combines reliability quantification and
are grouped into the sensing and actuation, the commu- machine learning to solve the problem of predictive
mainnication, and the end-user application and services. A tenance from the aforementioned anomalies. Reliability
basic architecture of an IoT system can be divided into quantification is achieved using the metrics introduced
three layers: a device layer, a network layer, and an ap- in [3]. Even though the concept of component anomalies
plication layer. The device layer is responsible for the is mentioned throughout this paper, detecting anomalies
sensing and actuation. At the device level, the reliabil- is not the main focus of this study. In their review of
ity is constrained by the battery life, the low capacity IoT reliability and anomaly detection techniques, Moore
of both the memory and the CPU which prevent them et al. [8] noted that no study had explored the
potenusing complex encryption to protect the transmitted data tial of synthesizing quantified reliability data. The study
[6]. The device reliability is further constrained by false pointed out that the decrease of reliability of a smart
reading events that are common for sensors, when they home system has diferent consequence than a decrease
collect and transmit data erroneously after an undetected of reliability of a power plant surveillance system. The
failure[7]. decrease in reliability of the IoT system increases the</p>
      <p>The network layer is responsible for the communica- probability of anomaly occurrence within the system. As
tion between the devices of the system. The application stated in the background, each layer of the IoT
archilayer is responsible for the services and the interactions tecture has its own way of assessing reliability. In this
with the end-user applications. In most cases, its reliabil- study, we cover mainly anomaly occurrence at the device
layer and the network layer of the architecture. The main DeltaIoT [9]. Self-adaptive systems are able to modify
goal of our study is to enable the IoT system to achieve their behavior at runtime, in a response to a change in
predictive maintenance, i.e., predict a probable failure their operating environment, to achieve their goals. In
time of one or more components and preemptively apply this research, the study is not only about engineering
correction to the components, based on their quantified reactive self-adaptivity, rather it is also about designing
reliability. Based on this goal, we include in the study robust IoT system that are subjected to environmental
components where corrections can be applied after a changes. A typical IoT network system is composed of
failure or an anomaly. Therefore, some components of devices with diferent types of sensors and actuators,
usuthe device layer such as the battery, the memory and the ally linked together wirelessly through the internet[2].
CPU, are out of the scope of this paper. The reason is that The concept of Internet-of-Things enables devices to
opthey cannot be automatically maintained after a failure erate with the constraints of energy consumption, low
or an anomaly occurrence. These components, once the computing power and low storage power. The networks
reliability has decreased or a failure has occurred, would connecting the devices are also prone to congestion
esrequire a system where a Human-in-the-loop is placed pecially when there is a burst in demand, e. g., during
in for maintenance. an emergency situation for a system deployed to
mon</p>
      <p>There are components of an IoT system that can be itor large geographical areas to detect potential
disascalibrated after the decrease of reliability or occurrence ters as early as possible[10]. All these constraints make
of an anomaly. Such components can be sensors at the the engineering of dependable and reliable IoT systems
device layer or links at the network layer. Therefore, our more challenging. The next paragraph introduces an IoT
approach is applied to the sensor devices and the net- system which is used in the case study of applying our
work links in order to achieve predictive maintenance. approach.</p>
      <p>There are some consequences for undiagnosed anoma- The DeltaIoT system is a platform for smart
environlous data to be ignored within the diferent layers of the ment monitoring. The system, introduced in [9], is a
IoT architecture. Therefore, to decrease the vulnerability self-adaptive system, enabling it to react to
environmenof the IoT-centered systems, there is a need to design tal changes. The DeltaIoT system “enables researchers to
lightweight solutions that are capable of handling the evaluate and compare new methods, techniques and tools
anomaly detection tasks without impacting the resource for self-adaptation in Internet of Things”. The DeltaIoT
constrained systems. system has been built into two versions and they are
deployed at the campus of KU Leuven University. The
two versions difer in the number of devices present in
4. Motivating Example: DeltaIoT each network and the geographical deployment of each
version of the system. DeltaIoT system is described in
In this section, we describe the motivating example of Figure 1. DeltaIoT has a multihop communication system
our research which is a self-adaptive IoT system named in cycles of 570 seconds. The system experiences
external and internal stimulations that causes it to change its
behavior to achieve its goals. There are two main causes
for adaptation. The first cause for adaptation is an
interference in the network causing the links to experience
delay or packet loss. The second cause for adaptation is
the fluctuating load of messages. This results in some
or all links to be clogged creating delay and packet loss.</p>
      <p>There are three quality requirements the system must
fulfil. The first quality requirement is about the average
packet loss over 12 hours, which should not exceed 10% of
the overall messages sent through the links. The second
quality requirement concerns the average latency over
12 hours which should not exceed 5% of the cycle time. Figure 2: Overview of the component-level mechanism
The third quality requirement concerns the average
energy consumption over 12 hours. It has to be minimized
during that period. 5. Approach</p>
      <p>One of the main mission of the Internet of Things
systems is to collect and communicate data about the en- In this section we describe in details our approach and its
vironment or the people around which they are deployed. practical implementation. The goal of the approach is to
DeltaIoT, like many other IoT systems, alternates sensing determine a high probability failure time or an anomaly
and actuation during its operation. In many cases, the occurrence time in order to apply corrective measures.
actuation is performed based on the results of the sens- We build two mechanisms. The first mechanism is on
ing. Therefore, anomalies during data sensing and during the component level, that is the level of devices and links.
data communication may have a negative efect on the It captures the behavior of each individual component.
system performance or operation. Collecting anomalous The reliability of each component is computed by this
data typically happens on the device level by the sensors. mechanism. The second mechanism is on the level of
It can be caused by diferent reasons like noise or defect the MAPE feedback loop. The MAPE stands for Monitor,
due to environmental factors. When this happens, the Analyzer, Planner and Executor. The feedback loop is
sensors can be calibrated again to perform with a great ac- used in autonomic computing to achieve self-adaptation
curacy. Anomalies occurring on the links of the DeltaIoT in software systems[13]. The system-level mechanism
system are related to the decrease in the QoS. The packet is connected to the monitor component of the feedback
loss and the latency are some of the manifestations of loop.
these anomalies occurring in those links. The backbone of the component-level mechanism is an</p>
      <p>We have presented a mechanism for an eficient con- anomaly agent that is instantiated by each component of
ifguration space reduction [ 11]. The mechanism focused the IoT system. The quantified reliability is determined
on the analysis after an anomaly has happened at a com- using mainly two metrics: mean time between anomalies,
ponent level. In this paper, the main focus of the study anomaly rate. The function of the anomaly agent is to
is to forecast an anomaly before it happens. It is impor- predict an anomaly time, depending on the quantified
tant to reduce the time between anomaly occurrence and reliability of the component. The anomaly agent has to
detection. It is equally important to minimize the time predict an accurate anomaly time. It behaves according
from anomaly detection to correction. Moreover, precise to the principles of reinforcement learning. It is rewarded
anomaly understanding aids in constructing more precise for the accurate prediction of the anomaly time. Figure
probabilistic model of the system, which helps to find 2 illustrates the component-level mechanism of the
apmore reliable configuration of the system using proba- proach. According to [14], “reinforcement learning is an
bilistic model checking [12]. Many anomaly detection area of machine learning concerned with how intelligent
techniques have been proposed for computing devices in agents ought to take actions in an environment in order
general, each with its advantages and drawbacks. How- to maximize the notion of cumulative reward”. The main
ever, techniques for anomaly forecasting are few. In the motivation of using reinforcement learning is to record
Internet of Things domain, to the best of our knowledge, the diferent states of the system and their transitions
our study is the only one that makes use of reliability [15][16]. The system has an optimal state in which the
quantification and machine learning approach to predict probability of each component’s reliability is high. The
anomaly occurrence. As explained in the approach, if next state is an in-between state where the component’s
the time of anomaly occurrence could be predicted, then reliability is just average. Lastly, the system has a critical
corrective measures can be applied in order to prevent state in which an anomaly has already occurred or is
the anomaly from happening.
very likely to occur. Capturing these diferent states and
reasoning about them can be helpful in discovering an
optimal time for predictive maintenance. To implement
the anomaly agent, we use an approach that relies on
Time Diference Learning [ 17]. The agent is implemented
according to a Q-Learning algorithm [18]. The approach
is well suited for situations with great degree of random
variables and uncertainty. In the next subsections, we
explain the two mechanisms in detail.
5.1. Component-level mechanism
actions related to an anomaly time. We formalize our
problem as a Markov Decision Process or MDP. The
component, which is the environment interacting with the
agent, is modeled as a Markov Process. The Q-learning
algorithm used to create the agent, is chosen because it
is model-free, of-policy, and value-based algorithm.</p>
      <p>The MDP describing the environment for the learning
process, contains a tuple of four elements. The first
element is a set of finite states S. the second element is a set
of finite actions A. the number of states is function of the
number of actions. The actions to be performed by the
agent are, for each run, adding an integer value to the
current time and to check whether the time corresponds
to the anomaly time. The third element of the tuple is the
reward R to be received after transitioning from state S
to state S’ as a result of performing an action. The fourth
element of the tuple is the probability P related to the
performed action.</p>
      <p>The Q in Q-learning is a measure of the quality of a
state-action combination. When an action is taken by a
learning agent, the reward of that action along with the
learning rate, the discount factor and the initial condition
or previous value of Q, are used to determine the new
value of Q for that state.</p>
      <p>The network of most IoT systems is composed of several
heterogeneous devices. These heterogeneous devices
possess sometimes diferent characteristics that can
hinder their interoperability. Therefore, when designing a
mechanism for anomaly prediction, each individual
component of the network must have a self-centered module
that captures its unique characteristics. The
componentlevel mechanism is illustrated in figure 2. The mechanism
has two main parts. The first part is a reliability
quantification algorithm, where the reliability of the module
is quantified based on the previously mentioned metrics.</p>
      <p>The IoT system operates in an environment where the
quality of its components deprecates over time. Some
components can be calibrated back to normal like the
sensors and the network links. However, the physical aspect (, ) = − 1(, ) +  [+
of the system in most of the cases cannot be calibrated.  *  (′, ′) − − 1(, )] (4)
Therefore, that aspect is out of the scope of this study.</p>
      <p>We track the component based on the three metrics: the
mean time between anomalies (MTBA), the anomaly rate 5.2. System-level mechanism
(AR) and the probability of anomaly (PA). First we deter- The mechanism is implemented on the monitor level of
mine the anomaly rate AR by determining the number of the MAPE feedback loop. The monitor part of the MAPE
anomalies per cycle of time. It is calculated by dividing feedback loop observes the system and the operating
enthe number of anomalies over the cycle of time. vironment with which the system is interacting, to check
 whether there are changes. We leverage this function
 = (1) of the monitor, and append the system-level mechanism
  on it. The mechanism performs two main tasks. The</p>
      <p>The MTBA is the time the system or the component is first task of the mechanism is to check the results from
operating normally before an anomaly occurrence. The the component during each cycle performed by the IoT
MTBA is determined by the following formula system. The second task is to aggregate the results of the
from the components over all the cycles performed by
   = (2) the system.</p>
      <p>1</p>
      <p>The probabilbity of anomaly occurrence PA, is
determined using the MTBA is the following formula
5.3. Learning Process
− 1
  = ((   )* )</p>
      <sec id="sec-2-1">
        <title>The second part of the component-level mechanism</title>
        <p>is a Q-Learning agent, where the agent learns the
characteristics of the component, based on the quantified
reliability and the overall environment of the component.
The agent must learn to predict an anomaly time.
Therefore, the actions to be taken by the agent are prediction
In the component-level mechanism, our method first
de(3) termines an accurate quantification of the metrics, that
can give a snapshot of the quality of the component at
each period of the system operating cycle. This is most
required during the time of data sensing and data
forwarding. The components of the IoT system, i.e., sensors
and links, operate diferently in the environment. We
have described earlier the kind of anomalies that the
components are subjected to. The sensors can have random</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>6. Conclusion</title>
      <sec id="sec-3-1">
        <title>In this research, we are investigating the possibility of</title>
        <p>preemptive forecasting of anomalies that occur at the
device and network layers of an IoT architecture, by
implementing an anomaly agent based on the Time
Difference Learning method. In the next step, we plan to
implement another anomaly agent based on the Monte
Carlo method and evaluate the performance of these two
agents.
anomalies like noise but also cyclic anomalies. The links
of the network have external interferences or message
clogging leading to anomalies. However, most of these
anomalies are related to the decrease of accuracy of the
device and decrease of power settings of the link. The
approach determines the number of anomalies that are
occurring during each cycle. Therefore, for each cycle we
can observe a diferent anomaly rate. That observation
helps to determine and update the mean time between
anomalies during all the cycles. We determine the actions
performed by the agent as adding time in seconds to the
current time. The reason is that the current time is the
time when the agent decides to make a prediction about
an anomaly time. The agent decides to make a prediction
after getting the anomaly probability for that period. The
amount of time in seconds to add to the current time is
function of the anomaly probability of that period. If the
anomaly probability is high, the amount is small, and
on the other hand, if it is low, either no time is added,
or a big amount. After each prediction, the Q value of
that state-action combination is updated according to the
reward obtained.
[6] F. A. Alaba, M. Othman, I. A. T. Hashem, F. Alotaibi,</p>
        <p>Internet of things security: A survey, Journal of
Network and Computer Applications 88 (2017) 10–
28.
[7] A. Karkouch, H. Mousannif, H. Al Moatassime,</p>
        <p>T. Noel, A model-driven architecture-based data
quality management framework for the internet of
things, in: 2016 2nd International Conference on
Cloud Computing Technologies and Applications
(CloudTech), IEEE, 2016, pp. 252–259.
[8] S. J. Moore, C. D. Nugent, S. Zhang, I. Cleland, Iot
reliability: a review leading to 5 key research
directions, CCF Transactions on Pervasive Computing
and Interaction (2020) 1–17.
[9] M. U. Iftikhar, G. S. Ramachandran, P. Bollansée,</p>
        <p>D. Weyns, D. Hughes, Deltaiot: A real world
exemplar for self-adaptive internet of things (artifact),
in: DARTS-Dagstuhl Artifacts Series, volume 3,
Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,
2017.
[10] S. Y. Shin, S. Nejati, M. Sabetzadeh, L. C. Briand,</p>
        <p>C. Arora, F. Zimmer, Dynamic adaptation of
software-defined networks for iot systems: a
searchbased approach, in: Proceedings of the IEEE/ACM
15th International Symposium on Software
Engineering for Adaptive and Self-Managing Systems,
2020, pp. 137–148.
[11] A. B. Diallo, H. Nakagawa, T. Tsuchiya,
Adaptation space reduction using an explainable
framework, in: Proc. of the IEEE 45th Annual Computers,
Software, and Applications Conference (COMPSAC
2021), IEEE, 2021, pp. 1654–1661.
[12] H. Nakagawa, H. Toyama, T. Tsuchiya, Expression
caching for runtime verification based on
parameterized probabilistic models, The Journal of Systems
and Software, Elsevier 156 (2019) 300–311.
[13] J. O. Kephart, D. M. Chess, The vision of autonomic
[1] D. E. Kouicem, A. Bouabdallah, H. Lakhlef, Internet computing, Computer 36 (2003) 41–50.
of things security: A top-down survey, Computer [14] J. Hu, H. Niu, J. Carrasco, B. Lennox, F. Arvin,
Networks 141 (2018) 199–221. Voronoi-based multi-robot autonomous exploration
[2] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aled- in unknown environments via deep reinforcement
hari, M. Ayyash, Internet of things: A survey on learning, IEEE Transactions on Vehicular
Technolenabling technologies, protocols, and applications, ogy 69 (2020) 14413–14423.</p>
        <p>IEEE communications surveys &amp; tutorials 17 (2015) [15] M. Wiering, M. Van Otterlo, Reinforcement
learn2347–2376. ing, Adaptation, learning, and optimization 12
[3] M. Xie, Y.-S. Dai, K.-L. Poh, Computing system re- (2012).</p>
        <p>liability: models and analysis, Springer Science &amp; [16] R. Riveret, Y. Gao, G. Governatori, A. Rotolo, J. Pitt,
Business Media, 2004. G. Sartor, A probabilistic argumentation framework
[4] A. Mavrogiorgou, A. Kiourtis, C. Symvoulidis, for reinforcement learning agents, Autonomous
D. Kyriazis, Capturing the reliability of unknown Agents and Multi-Agent Systems 33 (2019) 216–274.
devices in the iot world, in: 2018 Fifth Interna- [17] R. S. Sutton, A. G. Barto, Temporal-diference
learntional Conference on Internet of Things: Systems, ing, Reinforcement learning: an introduction (1998)
Management and Security, IEEE, 2018, pp. 62–69. 167–200.
[5] A. Rayes, S. Salam, Internet of things from hype to [18] C. J. Watkins, P. Dayan, Q-learning, Machine
learnreality, Springer (2017). ing 8 (1992) 279–292.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>