=Paper=
{{Paper
|id=Vol-2978/casa-paper2
|storemode=property
|title=Preemptive Anomaly Prediction in IoT Components (short paper)
|pdfUrl=https://ceur-ws.org/Vol-2978/casa-paper2.pdf
|volume=Vol-2978
|authors=Alhassan Boner Diallo,Hiroyuki Nakagawa,Tatsuhiro Tsuchiya
|dblpUrl=https://dblp.org/rec/conf/ecsa/DialloNT21
}}
==Preemptive Anomaly Prediction in IoT Components (short paper)==
<pdf width="1500px">https://ceur-ws.org/Vol-2978/casa-paper2.pdf</pdf>
<pre>
Preemptive Anomaly Prediction in IoT Components
Alhassan Boner Diallo, Hiroyuki Nakagawa and Tatsuhiro Tsuchiya
Osaka University, Osaka, Japan


                                          Abstract
                                          The Internet-of-Things (IoT) has become a very promising and fruitful area of research. The rapid development of IoT is
                                          revolutionizing our daily utilization of technology in every way. The IoT paradigm is that the devices making up an IoT
                                          system have resource constraints such as storage, computing and energy consumption. That paradigm makes possible a
                                          flexible and pervasive communication between devices that are bound to low resources. These constraints may create a
                                          state where there is anomaly occurrence on the component level that may impact the whole system. Some innovative
                                          techniques have been proposed to quantify the reliability of these devices for the aforementioned constraints. However,
                                          there is a gap between the quantification of the component reliability and the predictive and preemptive maintenance of
                                          these components. In this study, we propose an approach combining reliability quantification and reinforcement learning to
                                          build a mechanism that can achieve a predictive maintenance for the components of an IoT system such as devices and links.
                                          In the approach, a component-level mechanism is built to synthesize the reliability data, and to determine the probability of
                                          anomaly occurrence for each component. The approach is being applied to a self-adaptive IoT system for smart environment
                                          monitoring named DeltaIoT.

                                          Keywords
                                          self-adaptive systems, IoT, preliability, reinforcement learning, q-learning


1. Introduction                                                                                                    based on the data they collect and provide. The relia-
                                                                                                                   bility of the IoT systems depends on the reliability of
Recently, the Internet of Things (IoT) has been one of                                                             the components that make up the system. As the IoT
the fastest growing fields in the computing domain. Its                                                            devices are constrained by nature, there must be some
paradigm has been applied to many critical applications                                                            mechanism in place to ensure their reliability at all time,
such as early warning systems for earthquake or tsunami,                                                           in order to have accurate decision models based on the
smart home security, traffic management, healthcare, and                                                           data provided by the lower layer of the IoT architecture.
education systems, etc. Despite a rapid development and                                                               IoT reliability is a critical domain of research that has
improvement in the IoT research area, many challenges                                                              seen a lot of important contributions over the years. Mul-
remain. The challenges faced in IoT are related mainly                                                             tiple ways of quantifying the reliability of IoT compo-
to the following properties: scalability, availability, reli-                                                      nents have been proposed. However, there is a gap be-
ability, interoperability, security, mobility, performance,                                                        tween that quantified reliability and its application in pre-
etc.                                                                                                               dictive maintenance. In other words, how can we predict
   The IoT infrastructure is made up of low resource                                                               an accurate maintenance date for IoT components, based
devices, meaning that they have low storage and low                                                                on the reliability measurement? To achieve that, we must
computing power compared to other devices within the                                                               build first mechanisms that can synthesize the reliabil-
computing domain. This is the result of the desire to ac-                                                          ity information from anomalies to determine whether
commodate the energy consumption as most of the com-                                                               the system has become less reliable from that anomaly
ponent rely on battery to power them up[1][2]. Nowa-                                                               occurrence. The ability to reason about the quantified
days, the IoT paradigm is applied to many mission-critical                                                         reliability of the IoT system is a valuable step towards
systems, such as factory management, personal body sen-                                                            achieving predictive maintenance. The idea here is to
sors in healthcare, surveillance systems in nuclear power                                                          build a dynamic decision-making process that can collect
plants. These areas of application require a failure free                                                          reliability data in a periodic manner and try to estimate
system; otherwise there will be disastrous consequences.                                                           a future failure time.
We must be able to trust these systems in all conditions                                                              Fundamentally, we can define reliability as the study
as they impact the way we make numerous decisions                                                                  of failures. The reliability of a system or a computing
                                                                                                                   device is its quality over a certain period of time. To
CASA: 4th Context-aware, Autonomous and Smart Architectures                                                        quantify the reliability of a system or computing device,
International Workshop, ECSA’21 15-17 September 2021
" a-bonerdiallo@ist.osaka-u.ac.jp (A. B. Diallo);
                                                                                                                   we use standard metrics all related to time like Mean
nakagawa@ist.osaka-u.ac.jp (H. Nakagawa);                                                                          Time To Failure, Mean Time Between Failures, and Mean
t-tutiya@ist.osaka-u.ac.jp (T. Tsuchiya)                                                                           Time To Repair, etc. Quantifying reliability is essential
 0000-0001-5280-4113 (H. Nakagawa)                                                                                to assessing the continued success in the operation of an
                                    © 2021 Copyright for this paper by its authors. Use permitted under Creative
                                    Commons License Attribution 4.0 International (CC BY 4.0).                     information system or a computing device.
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
2. Background                                                ity depends on the reliability of the device layer and the
                                                             network layer. For example the device layer collects and
Computing systems require a high degree of performance transmits anomalous data, which are sent through the
and availability, but above all, they must be reliable. The network to the application layer. Beyond being able to
appropriate way of assessing the reliability of a com- reason about the fitness of our IoT devices, we must also
puting system depends on the type and mission of the be able to attest to the reliability of the network infras-
system. In their study, Xie et al. [3] addressed several tructure that forms the backbone of IoT communication.
key metrics for reliability quantification. Some of these There are two approaches of network reliability studies
key metrics are Mean Time To Failure (MTTF), Mean which are discussed in this section; studies for enhanc-
Time Between Failures (MTBF), failure rate. The MTTF ing QoS in networks, and studies aimed at quantifying
metric quantifies the expected operating time of a sys- reliability metrics for networks. Some research has also
tem before the occurrence of a failure. The MTBF metric been conducted to evaluate IoT reliability at a system
as the name indicates, quantifies the operating time be- level. These approaches are at a high level and do not
tween one failure occurrence to another. The failure rate capture the individual detail for reliability, such as which
function helps to quantify the failure of a system within devices are responsible for failures, or which parts of the
a specified window of time. The maintainability metric network are responsible for traffic problems.
quantifies the probability that a system can go back to op-
erating normally after the occurrence of an anomaly or a
failure. The availability metric quantifies the probability 3. Focus: Anomaly Prediction
of the system being expected normally operating.
   The methods and techniques to analyze the reliability In our study, we consider the types of anomalies accord-
of computing systems depend on the domains that make ing to where and how frequent they occur. Anomalies
up the system. There are mainly four domains or level: can occur on each layer of the architecture with different
system, hardware, software, and network. The assess- degree of frequency. The device layer and network layer
ment of the reliability at a system level is the result of of the architecture are where anomalies occur the most,
the combined assessment of the hardware, software, and whereas the application layer is less prone to anomalies.
network levels. In the hardware domain, the reliability As for the occurrence frequency, we consider two main
assessment is related to the decay of the quality over time forms of occurrence in the IoT components: cyclic anoma-
of the physical components of the computing system. In lies and random anomalies. The former type of anomalies
the software domain, according to the study in [4], there are linked to the nature of the component itself. Each
is no concern over a physical decay of the quality over component has a starting time and an ending time. The
time. As for the network reliability, it may be subjected probability of anomaly occurrence is very small when
to a decrease of performance over time due to internal the reliability is quantified closed to the starting time. On
and external factors on the hardware and software that the other hand, the probability is great when quantified
make up the network.                                         towards the ending time. The latter type of anomalies,
   In the case of IoT systems, their reliability can be as- called random anomalies, stem from random external as
sessed by quantifying the reliability of the different lay- well as internal factors, like noise, interference, etc.
ers of their architecture. In [5], the IoT functionalities      Our approach combines reliability quantification and
are grouped into the sensing and actuation, the commu- machine learning to solve the problem of predictive main-
nication, and the end-user application and services. A tenance from the aforementioned anomalies. Reliability
basic architecture of an IoT system can be divided into quantification is achieved using the metrics introduced
three layers: a device layer, a network layer, and an ap- in [3]. Even though the concept of component anomalies
plication layer. The device layer is responsible for the is mentioned throughout this paper, detecting anomalies
sensing and actuation. At the device level, the reliabil- is not the main focus of this study. In their review of
ity is constrained by the battery life, the low capacity IoT reliability and anomaly detection techniques, Moore
of both the memory and the CPU which prevent them et al. [8] noted that no study had explored the poten-
using complex encryption to protect the transmitted data tial of synthesizing quantified reliability data. The study
[6]. The device reliability is further constrained by false pointed out that the decrease of reliability of a smart
reading events that are common for sensors, when they home system has different consequence than a decrease
collect and transmit data erroneously after an undetected of reliability of a power plant surveillance system. The
failure[7].                                                  decrease in reliability of the IoT system increases the
   The network layer is responsible for the communica- probability of anomaly occurrence within the system. As
tion between the devices of the system. The application stated in the background, each layer of the IoT archi-
layer is responsible for the services and the interactions tecture has its own way of assessing reliability. In this
with the end-user applications. In most cases, its reliabil- study, we cover mainly anomaly occurrence at the device
Figure 1: DeltaIoT network structure


layer and the network layer of the architecture. The main     DeltaIoT [9]. Self-adaptive systems are able to modify
goal of our study is to enable the IoT system to achieve      their behavior at runtime, in a response to a change in
predictive maintenance, i.e., predict a probable failure      their operating environment, to achieve their goals. In
time of one or more components and preemptively apply         this research, the study is not only about engineering
correction to the components, based on their quantified       reactive self-adaptivity, rather it is also about designing
reliability. Based on this goal, we include in the study      robust IoT system that are subjected to environmental
components where corrections can be applied after a           changes. A typical IoT network system is composed of
failure or an anomaly. Therefore, some components of          devices with different types of sensors and actuators, usu-
the device layer such as the battery, the memory and the      ally linked together wirelessly through the internet[2].
CPU, are out of the scope of this paper. The reason is that   The concept of Internet-of-Things enables devices to op-
they cannot be automatically maintained after a failure       erate with the constraints of energy consumption, low
or an anomaly occurrence. These components, once the          computing power and low storage power. The networks
reliability has decreased or a failure has occurred, would    connecting the devices are also prone to congestion es-
require a system where a Human-in-the-loop is placed          pecially when there is a burst in demand, e. g., during
in for maintenance.                                           an emergency situation for a system deployed to mon-
   There are components of an IoT system that can be          itor large geographical areas to detect potential disas-
calibrated after the decrease of reliability or occurrence    ters as early as possible[10]. All these constraints make
of an anomaly. Such components can be sensors at the          the engineering of dependable and reliable IoT systems
device layer or links at the network layer. Therefore, our    more challenging. The next paragraph introduces an IoT
approach is applied to the sensor devices and the net-        system which is used in the case study of applying our
work links in order to achieve predictive maintenance.        approach.
There are some consequences for undiagnosed anoma-               The DeltaIoT system is a platform for smart environ-
lous data to be ignored within the different layers of the    ment monitoring. The system, introduced in [9], is a
IoT architecture. Therefore, to decrease the vulnerability    self-adaptive system, enabling it to react to environmen-
of the IoT-centered systems, there is a need to design        tal changes. The DeltaIoT system “enables researchers to
lightweight solutions that are capable of handling the        evaluate and compare new methods, techniques and tools
anomaly detection tasks without impacting the resource        for self-adaptation in Internet of Things”. The DeltaIoT
constrained systems.                                          system has been built into two versions and they are
                                                              deployed at the campus of KU Leuven University. The
                                                              two versions differ in the number of devices present in
4. Motivating Example: DeltaIoT                               each network and the geographical deployment of each
                                                              version of the system. DeltaIoT system is described in
In this section, we describe the motivating example of
                                                              Figure 1. DeltaIoT has a multihop communication system
our research which is a self-adaptive IoT system named
                                                              in cycles of 570 seconds. The system experiences exter-
nal and internal stimulations that causes it to change its
behavior to achieve its goals. There are two main causes
for adaptation. The first cause for adaptation is an inter-
ference in the network causing the links to experience
delay or packet loss. The second cause for adaptation is
the fluctuating load of messages. This results in some
or all links to be clogged creating delay and packet loss.
There are three quality requirements the system must
fulfil. The first quality requirement is about the average
packet loss over 12 hours, which should not exceed 10% of
the overall messages sent through the links. The second
quality requirement concerns the average latency over
12 hours which should not exceed 5% of the cycle time.        Figure 2: Overview of the component-level mechanism
The third quality requirement concerns the average en-
ergy consumption over 12 hours. It has to be minimized
during that period.                                           5. Approach
   One of the main mission of the Internet of Things
systems is to collect and communicate data about the en-      In this section we describe in details our approach and its
vironment or the people around which they are deployed.       practical implementation. The goal of the approach is to
DeltaIoT, like many other IoT systems, alternates sensing     determine a high probability failure time or an anomaly
and actuation during its operation. In many cases, the        occurrence time in order to apply corrective measures.
actuation is performed based on the results of the sens-      We build two mechanisms. The first mechanism is on
ing. Therefore, anomalies during data sensing and during      the component level, that is the level of devices and links.
data communication may have a negative effect on the          It captures the behavior of each individual component.
system performance or operation. Collecting anomalous         The reliability of each component is computed by this
data typically happens on the device level by the sensors.    mechanism. The second mechanism is on the level of
It can be caused by different reasons like noise or defect    the MAPE feedback loop. The MAPE stands for Monitor,
due to environmental factors. When this happens, the          Analyzer, Planner and Executor. The feedback loop is
sensors can be calibrated again to perform with a great ac-   used in autonomic computing to achieve self-adaptation
curacy. Anomalies occurring on the links of the DeltaIoT      in software systems[13]. The system-level mechanism
system are related to the decrease in the QoS. The packet     is connected to the monitor component of the feedback
loss and the latency are some of the manifestations of        loop.
these anomalies occurring in those links.                        The backbone of the component-level mechanism is an
   We have presented a mechanism for an efficient con-        anomaly agent that is instantiated by each component of
figuration space reduction [11]. The mechanism focused        the IoT system. The quantified reliability is determined
on the analysis after an anomaly has happened at a com-       using mainly two metrics: mean time between anomalies,
ponent level. In this paper, the main focus of the study      anomaly rate. The function of the anomaly agent is to
is to forecast an anomaly before it happens. It is impor-     predict an anomaly time, depending on the quantified
tant to reduce the time between anomaly occurrence and        reliability of the component. The anomaly agent has to
detection. It is equally important to minimize the time       predict an accurate anomaly time. It behaves according
from anomaly detection to correction. Moreover, precise       to the principles of reinforcement learning. It is rewarded
anomaly understanding aids in constructing more precise       for the accurate prediction of the anomaly time. Figure
probabilistic model of the system, which helps to find        2 illustrates the component-level mechanism of the ap-
more reliable configuration of the system using proba-        proach. According to [14], “reinforcement learning is an
bilistic model checking [12]. Many anomaly detection          area of machine learning concerned with how intelligent
techniques have been proposed for computing devices in        agents ought to take actions in an environment in order
general, each with its advantages and drawbacks. How-         to maximize the notion of cumulative reward”. The main
ever, techniques for anomaly forecasting are few. In the      motivation of using reinforcement learning is to record
Internet of Things domain, to the best of our knowledge,      the different states of the system and their transitions
our study is the only one that makes use of reliability       [15][16]. The system has an optimal state in which the
quantification and machine learning approach to predict       probability of each component’s reliability is high. The
anomaly occurrence. As explained in the approach, if          next state is an in-between state where the component’s
the time of anomaly occurrence could be predicted, then       reliability is just average. Lastly, the system has a critical
corrective measures can be applied in order to prevent        state in which an anomaly has already occurred or is
the anomaly from happening.
very likely to occur. Capturing these different states and   actions related to an anomaly time. We formalize our
reasoning about them can be helpful in discovering an        problem as a Markov Decision Process or MDP. The com-
optimal time for predictive maintenance. To implement        ponent, which is the environment interacting with the
the anomaly agent, we use an approach that relies on         agent, is modeled as a Markov Process. The Q-learning
Time Difference Learning [17]. The agent is implemented      algorithm used to create the agent, is chosen because it
according to a Q-Learning algorithm [18]. The approach       is model-free, off-policy, and value-based algorithm.
is well suited for situations with great degree of random       The MDP describing the environment for the learning
variables and uncertainty. In the next subsections, we       process, contains a tuple of four elements. The first ele-
explain the two mechanisms in detail.                        ment is a set of finite states S. the second element is a set
                                                             of finite actions A. the number of states is function of the
5.1. Component-level mechanism                               number of actions. The actions to be performed by the
                                                             agent are, for each run, adding an integer value to the
The network of most IoT systems is composed of several       current time and to check whether the time corresponds
heterogeneous devices. These heterogeneous devices           to the anomaly time. The third element of the tuple is the
possess sometimes different characteristics that can hin-    reward R to be received after transitioning from state S
der their interoperability. Therefore, when designing a      to state S’ as a result of performing an action. The fourth
mechanism for anomaly prediction, each individual com-       element of the tuple is the probability P related to the
ponent of the network must have a self-centered module       performed action.
that captures its unique characteristics. The component-        The Q in Q-learning is a measure of the quality of a
level mechanism is illustrated in figure 2. The mechanism    state-action combination. When an action is taken by a
has two main parts. The first part is a reliability quan-    learning agent, the reward of that action along with the
tification algorithm, where the reliability of the module    learning rate, the discount factor and the initial condition
is quantified based on the previously mentioned metrics.     or previous value of Q, are used to determine the new
The IoT system operates in an environment where the          value of Q for that state.
quality of its components deprecates over time. Some
components can be calibrated back to normal like the sen-
sors and the network links. However, the physical aspect                   𝑄𝑡 (𝑠, 𝑎) = 𝑄𝑡−1 (𝑠, 𝑎) + 𝛼[𝑟+
of the system in most of the cases cannot be calibrated.                𝛾 * 𝑚𝑎𝑥𝛼 𝑄(𝑠′ , 𝑎′ ) − 𝑄𝑡−1 (𝑠, 𝑎)]           (4)
Therefore, that aspect is out of the scope of this study.
We track the component based on the three metrics: the
mean time between anomalies (MTBA), the anomaly rate         5.2. System-level mechanism
(AR) and the probability of anomaly (PA). First we deter-    The mechanism is implemented on the monitor level of
mine the anomaly rate AR by determining the number of        the MAPE feedback loop. The monitor part of the MAPE
anomalies per cycle of time. It is calculated by dividing    feedback loop observes the system and the operating en-
the number of anomalies over the cycle of time.              vironment with which the system is interacting, to check
                                                             whether there are changes. We leverage this function
                          𝐴𝑛𝑜𝑚𝑎𝑙𝑖𝑒𝑠
                  𝐴𝑅 =                                 (1)   of the monitor, and append the system-level mechanism
                          𝐶𝑦𝑐𝑙𝑒𝑇 𝑖𝑚𝑒
                                                             on it. The mechanism performs two main tasks. The
  The MTBA is the time the system or the component is        first task of the mechanism is to check the results from
operating normally before an anomaly occurrence. The         the component during each cycle performed by the IoT
MTBA is determined by the following formula                  system. The second task is to aggregate the results of the
                                                             from the components over all the cycles performed by
                                    1
                    𝑀 𝑇 𝐵𝐴 =                           (2)   the system.
                                  𝐴𝑅
   The probabilbity of anomaly occurrence PA, is deter-
mined using the MTBA is the following formula
                                                           5.3. Learning Process
                               −1
                                                           In the component-level mechanism, our method first de-
                  𝑃 𝐴 = 𝑒(( 𝑀 𝑇 𝐵𝐴 )*𝑡𝑖𝑚𝑒)             (3) termines an accurate quantification of the metrics, that
   The second part of the component-level mechanism can give a snapshot of the quality of the component at
is a Q-Learning agent, where the agent learns the char- each period of the system operating cycle. This is most
acteristics of the component, based on the quantified required during the time of data sensing and data for-
reliability and the overall environment of the component. warding. The components of the IoT system, i.e., sensors
The agent must learn to predict an anomaly time. There- and links, operate differently in the environment. We
fore, the actions to be taken by the agent are prediction have described earlier the kind of anomalies that the com-
                                                           ponents are subjected to. The sensors can have random
anomalies like noise but also cyclic anomalies. The links     [6] F. A. Alaba, M. Othman, I. A. T. Hashem, F. Alotaibi,
of the network have external interferences or message             Internet of things security: A survey, Journal of
clogging leading to anomalies. However, most of these             Network and Computer Applications 88 (2017) 10–
anomalies are related to the decrease of accuracy of the          28.
device and decrease of power settings of the link. The        [7] A. Karkouch, H. Mousannif, H. Al Moatassime,
approach determines the number of anomalies that are              T. Noel, A model-driven architecture-based data
occurring during each cycle. Therefore, for each cycle we         quality management framework for the internet of
can observe a different anomaly rate. That observation            things, in: 2016 2nd International Conference on
helps to determine and update the mean time between               Cloud Computing Technologies and Applications
anomalies during all the cycles. We determine the actions         (CloudTech), IEEE, 2016, pp. 252–259.
performed by the agent as adding time in seconds to the       [8] S. J. Moore, C. D. Nugent, S. Zhang, I. Cleland, Iot
current time. The reason is that the current time is the          reliability: a review leading to 5 key research direc-
time when the agent decides to make a prediction about            tions, CCF Transactions on Pervasive Computing
an anomaly time. The agent decides to make a prediction           and Interaction (2020) 1–17.
after getting the anomaly probability for that period. The    [9] M. U. Iftikhar, G. S. Ramachandran, P. Bollansée,
amount of time in seconds to add to the current time is           D. Weyns, D. Hughes, Deltaiot: A real world exem-
function of the anomaly probability of that period. If the        plar for self-adaptive internet of things (artifact),
anomaly probability is high, the amount is small, and             in: DARTS-Dagstuhl Artifacts Series, volume 3,
on the other hand, if it is low, either no time is added,         Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik,
or a big amount. After each prediction, the Q value of            2017.
that state-action combination is updated according to the    [10] S. Y. Shin, S. Nejati, M. Sabetzadeh, L. C. Briand,
reward obtained.                                                  C. Arora, F. Zimmer, Dynamic adaptation of
                                                                  software-defined networks for iot systems: a search-
                                                                  based approach, in: Proceedings of the IEEE/ACM
6. Conclusion                                                     15th International Symposium on Software Engi-
                                                                  neering for Adaptive and Self-Managing Systems,
In this research, we are investigating the possibility of
                                                                  2020, pp. 137–148.
preemptive forecasting of anomalies that occur at the
                                                             [11] A. B. Diallo, H. Nakagawa, T. Tsuchiya, Adapta-
device and network layers of an IoT architecture, by im-
                                                                  tion space reduction using an explainable frame-
plementing an anomaly agent based on the Time Dif-
                                                                  work, in: Proc. of the IEEE 45th Annual Computers,
ference Learning method. In the next step, we plan to
                                                                  Software, and Applications Conference (COMPSAC
implement another anomaly agent based on the Monte
                                                                  2021), IEEE, 2021, pp. 1654–1661.
Carlo method and evaluate the performance of these two
                                                             [12] H. Nakagawa, H. Toyama, T. Tsuchiya, Expression
agents.
                                                                  caching for runtime verification based on parame-
                                                                  terized probabilistic models, The Journal of Systems
References                                                        and Software, Elsevier 156 (2019) 300–311.
                                                             [13] J. O. Kephart, D. M. Chess, The vision of autonomic
 [1] D. E. Kouicem, A. Bouabdallah, H. Lakhlef, Internet          computing, Computer 36 (2003) 41–50.
     of things security: A top-down survey, Computer         [14] J. Hu, H. Niu, J. Carrasco, B. Lennox, F. Arvin,
     Networks 141 (2018) 199–221.                                 Voronoi-based multi-robot autonomous exploration
 [2] A. Al-Fuqaha, M. Guizani, M. Mohammadi, M. Aled-             in unknown environments via deep reinforcement
     hari, M. Ayyash, Internet of things: A survey on             learning, IEEE Transactions on Vehicular Technol-
     enabling technologies, protocols, and applications,          ogy 69 (2020) 14413–14423.
     IEEE communications surveys & tutorials 17 (2015)       [15] M. Wiering, M. Van Otterlo, Reinforcement learn-
     2347–2376.                                                   ing, Adaptation, learning, and optimization 12
 [3] M. Xie, Y.-S. Dai, K.-L. Poh, Computing system re-           (2012).
     liability: models and analysis, Springer Science &      [16] R. Riveret, Y. Gao, G. Governatori, A. Rotolo, J. Pitt,
     Business Media, 2004.                                        G. Sartor, A probabilistic argumentation framework
 [4] A. Mavrogiorgou, A. Kiourtis, C. Symvoulidis,                for reinforcement learning agents, Autonomous
     D. Kyriazis, Capturing the reliability of unknown            Agents and Multi-Agent Systems 33 (2019) 216–274.
     devices in the iot world, in: 2018 Fifth Interna-       [17] R. S. Sutton, A. G. Barto, Temporal-difference learn-
     tional Conference on Internet of Things: Systems,            ing, Reinforcement learning: an introduction (1998)
     Management and Security, IEEE, 2018, pp. 62–69.              167–200.
 [5] A. Rayes, S. Salam, Internet of things from hype to     [18] C. J. Watkins, P. Dayan, Q-learning, Machine learn-
     reality, Springer (2017).                                    ing 8 (1992) 279–292.

</pre>