<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Hybrid method for protecting RTOS from failures and cyberattacks using compact Markov models⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Piotr Gaj</string-name>
          <email>piotr.gaj@polsl.pl</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomas Sochor</string-name>
          <email>tomas.sochor@osu.cz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Korobchynskyi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bohdan Savenko</string-name>
          <email>bsavenko@khmnu.edu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksandr</string-name>
          <email>oleksandr.kozelskiy@khmnu.edu.ua</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yevhenii Bereznyak Military Academy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ukraine</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Khmelnytskyi National University</institution>
          ,
          <addr-line>Khmelnytskyi, Instytutska street 11, 29016</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Prigo University College European Research University Vítězslava Nezvala 801/1</institution>
          <addr-line>736 01 Havířov</addr-line>
          <country>Czech Republic European Union</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Silesian University of Technology</institution>
          ,
          <addr-line>ul. Akademicka 2A, 44-100 Gliwice</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The paper presents a method of preventive restart of components in real-time operating systems, which combines a simplified Markov chain with a hybrid dual watchdog timer and is considered as a tool for enhancing cyber-resilience. Unlike traditional reactive strategies, which are activated only after a failure, the proposed approach provides a transition to a proactive model capable of counteracting both internal faults and external impacts, including denial-of-service (DoS) attacks, fault injections, or logic bombs. The use of compact Markov models makes it possible to quickly assess the risk of approaching a critical state without a significant increase in computational costs, which creates conditions for timely localization of the problem and initiation of the restart of a specific task, driver, or module at an early stage. This reduces the probability of losing the working context, minimizes the “window of vulnerability,” and decreases the risk of a complete reboot. The combination of software and hardware levels forms a multi-level protection mechanism: the software watchdog acts as the first line of defense, while the hardware one guarantees final recovery in case of deep lock-up or targeted attack. Such integration makes it possible to reduce the frequency of global restarts, limit downtime, and increase resilience to cyberattacks in resourceconstrained conditions. The method does not require complex machine learning algorithms or cumbersome formal models, it is based on minimalist stochastic schemes and can be adapted to different platforms and threat scenarios. Experimental results demonstrated a 3.1 reduction in recovery time from 2.7 s to 0.84 s, approximately a 70% decrease in total system downtime from 4.5% to 1.37% over 5 minutes, and an average CPU load increase of only 1.4%, confirming the method's high efficiency and low computational overhead. This ensures stability, predictability, and security of embedded and cyberphysical systems, where meeting timing constraints and continuity of task execution, as well as the ability to withstand modern cyber threats, are critically important. The obtained results demonstrate the practical effectiveness of the approach in industrial automation, robotics, automotive and medical systems, and confirm the prospects of the proposed solution for the development of next-generation cyber-defense systems.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;operating systems</kwd>
        <kwd>RTOS</kwd>
        <kwd>Markov chain</kwd>
        <kwd>preventive restart</kwd>
        <kwd>fault tolerance</kwd>
        <kwd>cyber-physical systems anomaly detection</kwd>
        <kwd>cyber resilience 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        In real-time and embedded operating systems, there is a gradual increase in the number of
competing tasks and the degree of component integration. This significantly complicates ensuring
continuity of operation and compliance with strict timing constraints, known as deadlines. The
problem becomes particularly critical in cyber-physical systems (CPS), where any error or delay in
controlling physical processes can cause critical consequences — equipment failure, emergency
situations, or unacceptable delays in performing safety functions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In the case of targeted
cyberattacks, it creates an additional window of vulnerability that can be exploited by adversaries
to destabilize or block the operation of the system. Modern reviews emphasize that with the
growing complexity of RT environments, there is an increasing need for integrated software
faulttolerance mechanisms that take into account interactions with schedulers and mixed-criticality
[34].
      </p>
      <p>A key tool for improving reliability and security has traditionally been the watchdog timer. A
standard hardware watchdog performs a full reset of the microcontroller after failures, thus
implementing a purely reactive approach that often requires significant time and resource
overhead [3] and does not take into account the factor of intentional impacts, for example, fault
injections or logic attacks. The introduction of a combined hardware-software watchdog with the
ability to locally (software-based) restart individual components before a fatal failure occurs [5,45]
based on low-dimensional Markov chains potentially makes it possible to significantly increase
fault tolerance, reduce overall system downtime [6], and decrease the effectiveness of
denial-ofservice (DoS) attacks or logic-based blocking of critical tasks.</p>
      <p>The development of such solutions is complicated by limited computational resources, such as
RAM capacity and clock frequency, and the requirement to meet real-time constraints [2]. In-depth
system state analysis is usually based on computationally complex approaches, machine learning
methods [7] or extended formal models, which are poorly suited for low-power microcontrollers. In
contrast, simplified Markov chains with a small number of states make it possible to quickly, with
minimal resource consumption [8], estimate the probability of the system transitioning into a
critical state [4], including states that may be triggered by targeted attacks or abnormal behavior
simulating failure. Within this approach, a predictive software watchdog can detect not only
technical faults in advance but also suspicious activity and carry out a preventive restart of only
the problematic task, driver, or module.</p>
      <p>The aim of this work is the development and theoretical justification of a method for preventive
restart of embedded RTOS components, based on a simplified Markov model of failure risk and
integrated with a classical hardware watchdog timer. The proposed approach is intended to reduce
the number of full system reboots, shorten downtime, and improve the overall reliability and
cyber-resilience of resource-constrained CPS.</p>
      <p>A comparative review of existing fault-tolerance methods in embedded real-time operating
systems shows that the classical hardware watchdog guarantees recovery only post factum: the
restart is initiated after an actual hang and is accompanied by a complete loss of operational
context [9]. A software watchdog is potentially able to detect the failure of an individual process
earlier, but its effectiveness entirely depends on the correct functioning of the RTOS kernel, and it
can become a target for adversary bypass. Structural redundancy methods remain excessively
resource-intensive for most low-power platforms, while the application of formal models or
machine learning algorithms often exceeds the available computational capabilities of the
microcontroller or requires significant verification efforts [10].</p>
      <p>Under strict limitations on memory and clock frequency, it is advisable to combine a minimal
hardware emergency restart mechanism with a software intelligence layer capable of proactively
responding to symptoms of impending failure or signs of attack, and locally restoring the operation
of the faulty component, minimizing the number of full restarts. The key task in such a hybrid
scheme is the development of a lightweight criterion for evaluating the current system state
sufficiently sensitive for timely detection of increasing execution delays, queue overflows, or other
anomalies, including those caused by intentional actions, but not so coarse as to cause false
positives. Simplified, low-dimensional Markov chains meet these requirements, since they allow
real-time estimation of the probability of the system transitioning into a critical state with minimal
computational overhead [11].</p>
      <p>Accordingly, the scientific task is to develop a hybrid fault-tolerance and cyber-defense
mechanism that integrates a hardware watchdog with a predictive software module based on a
Markov model. This approach should ensure real-time assessment of the risk of a critical failure,
initiation of a soft restart of only the defective task or driver, maintenance of continuous and
secure system operation in most non-fatal scenarios, and transfer of control to the hardware timer
in cases of deep hang or successful attack. The expected result of implementing the proposed
scheme is a significant reduction of downtime, a decrease in the number of full system reboots, and
strengthened cyber-resilience in resource-constrained cyber-physical applications.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Similar works</title>
      <p>In recent years, there has been a rapid development of approaches to failure prediction in real-time
systems and embedded devices. Significant attention has been devoted to machine learning
methods and sensor data analysis. In particular, Ukrainian researchers have explored the
possibilities of using neural networks, decision trees, and other artificial intelligence algorithms for
failure prediction in order to improve the reliability and efficiency of operation, for example, in
smart homes [19]. Comprehensive reviews for real-time systems emphasize the appropriateness of
combining model-based and data-driven approaches to failure prediction [34].</p>
      <p>In such approaches, data are collected on system parameters, such as temperature, load, and
response time, and hidden patterns that precede failure are sought. It has been proven that timely
prediction of a fault before its actual occurrence makes it possible to initiate early self-diagnosis
and protective measures, thereby preventing catastrophic consequences. For example, neural
network-based methods are able to detect nonlinear dependencies in system behavior and predict
overheating or other force majeure situations in advance [12,13]. At the same time, it is noted that
purely reactive strategies, such as classical hardware resets in case of failure, or overly conservative
approaches, when the system excessively reduces performance for the sake of reliability, no longer
meet the needs of modern applications [14]. Additionally, emphasis is placed on the need for
lightweight anomaly monitoring tools for resource-constrained CPS/IoT platforms [44]. A
promising line for such constrained settings is deep echo state neural networks, a reservoir
computing approach that offers fast training and efficient inference for cyber-threat detection in
IoT-driven industrial control environments [51]. Complementary to runtime ML-based monitoring,
evolutionary optimization, particularly genetic algorithms has proved effective for design-time
architectural synthesis of security subsystems, including optimal sensor and coverage layout in
perimeter protection, as demonstrated in PSCAD [47]. In SDN-IoT environments, systematic
reviews document a spectrum of ML methods for detecting botnet anomalies and their practical
limitations [38]. Beyond surveys, recent SDN work reports real-time ML-based DDoS detection
integrated at the controller with automatic mitigation via flow-rule updates, achieving high
detection rates while preserving throughput and acceptable latency [50]. In addition to detection,
agent-based distributed defense mechanisms coordinate mitigation across nodes and demonstrate
favorable performance against DDoS while maintaining acceptable throughput and response time
[48].</p>
      <p>Failure prediction is particularly relevant in the context of multitasking RTOS, where faults may
manifest as task infinite loops, deadlocks, or missed critical deadlines. Research shows that the
most common causes of embedded system hangs include memory corruption and being stuck in an
infinite loop [15], hangs of external hardware components, mutual blocking of tasks due to
incorrect use of mutexes [16], or situations where a high-priority task monopolizes the CPU and
prevents other tasks from executing. At the network level, early detection of botnet activity is
achieved by analyzing DNS traffic [31]. To counter evasion of detection, specialized DNS
antievasion techniques are used [32]. In corporate networks, self-adaptive mechanisms increase
resilience during botnet attacks by localizing and isolating nodes [33]. Reviews of botnet-driven
DDoS attacks in IoT networks summarize the available datasets and methods, demonstrating the
evolution of attack vectors [39].</p>
      <p>Under such conditions, simply recording the fact of a failure is not enough, the system must be
able to predict the approach of these undesirable states. Among modern approaches to failure
prediction, model-based models [17] and data-driven models [18] can be distinguished. The first are
built on the use of analytical models of system components and the calculation of residuals, that is,
the deviations between the measured parameters and the nominal model. Their advantage lies in
the consideration of the physical properties of processes, but creating an accurate model of a
complex RTOS is a non-trivial task. The second approach is based on the analysis of large volumes
of sensor and telemetry data; statistical and ML algorithms study system behavior patterns and
detect signs of impending failure without an explicit physical model. Data-driven methods work
well for complex systems with many variables but require sufficiently informative data to train
models and detect correlations that precede failures [20]. In security scenarios, this is accompanied
by the transition to autonomous, privacy-oriented IDS with distributed decision-making [40]. In
the context of SDN-IoT, a spectrum of ML methods and key challenges of deployment at the
network edge have been outlined [41].</p>
      <p>In the practice of failure prediction, both approaches are often combined: relying on expert
models and machine learning algorithms to achieve maximum accuracy. Such integration is also
useful for protection tasks, the combination of knowledge about the system architecture with
telemetry data makes it possible to increase resilience to both accidental technical faults and
deliberate cyberattacks.</p>
      <p>The transition from reactive to proactive monitoring means that the system continuously tracks
key parameters and the state of components in order to detect deviations in advance. This
approach is implemented in the concept of Prognostics and Health Management (PHM), or
prognostic health systems [21]. Within PHM, each node or process of a real-time system can have
built-in sensors and control agents that evaluate its condition online. For example, response times
of tasks, execution cycle frequency [22], resource consumption, temperature modes [23], supply
voltage, and so on [24] can be monitored. Modern PHM surveys highlight the growing role of deep
learning and the need to adapt methods to resource-constrained CPS/RTOS platforms [42].
Systematizations of ML-PHM emphasize the combination of condition monitoring with
degradation prediction while taking into account real-time constraints [43].</p>
      <p>Based on these data, algorithms perform diagnostics, that is, the detection of already existing
deviations and prognostics, meaning the prediction of the time to failure or the probability of a
fault. The literature provides examples where the implementation of comprehensive monitoring
made it possible not only to promptly detect failures but also to predict their occurrence several
cycles before the actual failure, which enabled the planning of recovery measures without stopping
the system. This is particularly important for critical real-time systems, including avionics,
automotive controllers, and medical devices, where unexpected failure is unacceptable [25]. Taking
into account cyber threats, anomaly monitoring further limits the space for intentional impacts in
critical domains [44].</p>
      <p>Proactive monitoring minimizes downtime and ensures safe degraded functioning: the system
can switch in advance to a backup module or perform recovery procedures if the prognostic
module signals an increased risk of failure. One of the tools of proactive diagnostics is intelligent
Watchdog controllers, special mechanisms that monitor system operability and initiate recovery in
case of abnormal behavior [26]. Software-defined watchdog mechanisms make it possible to
formalize deadlines and response policies at the level of event-control models [37]. “Smart”
hardware detectors at the processor level increase coverage of transient and permanent faults
without significant overhead [36]. Heterogeneous integration with FPGA reduces monitoring
overhead without violating task deadlines [35].</p>
      <p>The classical Watchdog timer is a hardware or software timer that is regularly reset by the
executing program; if the program hangs and does not reload the timer in time, the timer generates
a signal for reset or another emergency action. Thus, the Watchdog traditionally serves as the last
line of defense of the system against hangs and unacceptable delays. However, modern trends go
beyond simple hardware reset, researchers propose multi-level Watchdog architectures that
include both software and hardware layers of control. In distributed IoT scenarios, decentralized
IDS with alert correlation are advisable, which strengthens resilience to software supply chain
attacks [40].</p>
      <p>Additionally, in related work on the security of multi-computer systems, a similar principle of
multi-level self-healing is proposed. In particular, systems for detecting metamorphic malware [27]
and hidden steganographic modifications [28] demonstrate how intelligent agents can
autonomously respond to anomalies without fully rebooting the server. Additional studies on the
criteria of centralization of traps and honeypots in multi-computer networks [29], analysis of
lowlevel DDoS attacks using the feature of traffic self-similarity [30], adaptive entropy metrics [46] and
self-adaptive mechanisms of corporate network resilience against botnet attacks emphasize the
effectiveness of the proactive, multi-level approach: first locally isolate or restart the affected
node/process, and only then, if necessary, employ more drastic hardware measures.
Complementary to runtime monitoring and recovery, evolutionary design-time optimization of
security architectures has proved effective: genetic algorithms deliver near-Pareto-optimal sensor
layouts and parameterizations for perimeter protection, as demonstrated in PSCAD [47], and have
been generalized to distributed security systems with multi-criteria objectives and deployment
constraints [49]. These results logically resonate with the development of intelligent Watchdog
controllers in embedded systems, which combine selective recovery for fault or attack localization
with the ultimate hardware line of defense.
3. Methodology and methods
3.1. Mathematical model and method description
The proposed method consists of combining the classical hardware watchdog timer as a hard layer
of protection with a predictive software layer, referred to as the Software Watchdog, which uses a
simplified Markov model to detect a probable failure before it occurs and to perform local
recovery , that is, a soft reset, of problematic components. If the software restart does not help or
the hang involves the kernel as a whole, the hardware watchdog timer, acting as the second stage,
executes a full system reset. Such a two-stage mechanism reduces the number of full resets and
shortens downtime, which is particularly important in CPS with strict timing requirements and
limited resources. The method provides a discrete assessment of the system state at time moments
t = k ⋅ Δt = k, where Δt is the execution period of the Predictive Watchdog Task in the RTOS. At
each step, the system is in one of a finite set of states, for example, {N , W , C , F }, corresponding
respectively to “Normal”, “Warning”, “Critical”, “Fail”), and transitions between these states are
described by a low-dimensional Markov chain.</p>
      <p>Let S = {N , W , C , F } be a finite set of states. At each tick t k = kΔt сthe system (or the
“Predictive Watchdog Task,” which reflects its health) is in one of the states x ( k )∈ S. For
simplicity, assume that, N (Normal) — the system is in the normal range (queues are not overfilled,
heartbeat signals from all tasks arrive on time), W (Warning) — degradation of indicators is
detected (increased load, delayed heartbeat signals), but not yet critical, C (Critical) — the risk of
imminent failure is high (significant delays, frequent missed deadlines, a task stops sending
heartbeat signals), F (Fail) — actual hang/failure (in the model, this is an absorbing state or an
indicator that a global reset has occurred).</p>
      <p>In implementation, more or fewer states may be defined, for example, “Recovery,” “Standby”,
but 4–5 active states are already sufficient for most simplified scenarios.</p>
      <p>We assume that x ( k + 1 ) depends only on x ( k ) and not on the states at earlier moments (the
Markov property). This can be written as:</p>
      <p>P =</p>
      <p>PNN
( PCN</p>
      <p>PWN
PFN</p>
      <p>PNW
PWW
PCW
PWF),∑ Pij = 1 ,∀ i∈ S,
PCF j∈ S
PFF
(1)
n(i→ j)≔ n(i→ j) + 1
pij ≈
n(i→ j)
= n(i→ j)</p>
      <p>ni
Additionally, we track ni = ∑ n(i→m), that is, how many times the system was in state i. Thus,
m∈ S
we have:
where the indices i and j denote the system states N (Normal), W (Warning), C (Critical), and F
(Fail).</p>
      <p>If F (Fail) is considered as an absorbing state, then the following holds PFF = 1 і PF j = 0 for j ≠ F
. In this case, reaching F is interpreted as an irreversible hang. However, if in the physical system a
hardware reset is executed after F (Fail), then it is logical to assume that the system state returns to
N (Normal) — but only after reboot.</p>
      <p>For estimating the elements pij, a simple frequency counting method is applied. Suppose that at
the k-th step the system is in state i = x ( k ), and at the x ( k + 1 )-th step it is in state j = x ( k + 1 ).
Then the counter n(i→ j) is incremented by 1:
(2)
(3)
(4)
(5)
∑ n(i→m)
m∈ S</p>
      <p>This ratio provides an approximation of the transition probability i → j . It allows updating P,
without significant additional CPU overhead. If the system runs for a long time, a finite sliding
window [21] or exponential smoothing can be applied to avoid the accumulation of outdated
statistics. For example, with a sliding window of size L, for instance, 50–100 steps, only the most
recent L transition observations are retained.</p>
      <p>At each time step Δt the Predictive Watchdog Task in the RTOS collects metrics as queue load,
delays, and missed deadlines, and, according to threshold logic, assigns the system to one of the
states:
1. N (Normal): for example, when the maximum queue fill is ≤ 50% and all heartbeat signals
from critical tasks are timely;
2. W (Warning): partial or periodic overload above permissible limits, but no sustained
deadline misses; e.g., max fill in the 50–80% range or at least one delayed heartbeat signal;
3. C (Critical): queue overflow above 80% or systematic deadline misses by one or more
critical tasks;
4. F (Fail): heartbeat signals are missing for one or more critical tasks over two–three
consecutive cycles, or when a fatal error occurs.</p>
      <p>Thus, the definitions of N , W , C , F depend on the application scenario and the established
threshold values. Such thresholds are determined empirically or based on the specifics of the
application.</p>
      <p>The transition probabilities between the states in the Predictive Watchdog are dynamic and can
be adjusted in real time depending on changes in key system metrics. The main influencing factors
are queue occupancy, delays in system activity signals, memory usage, and other system
performance parameters. For example, if the queue occupancy level exceeds a certain threshold,
this may indicate an approach to a critical state, increasing the probability of transition:
p ( W → C )= f 1( Q , H , M )</p>
      <p>p ( C → D )= f 2( H , T )</p>
      <p>Similarly, if the delay of system activity signals lasts longer than expected, the probability of
transition to the failure state increases:</p>
      <p>If after a Soft Reset the system quickly stabilizes, the probability increases that:
p ( C → N )= f 3( R ) ,
(6)
where Q is the queue occupancy. H is the delay of activity signals. M is the memory usage. T is
the response waiting time. R is the number of successful software resets, f 1 , f 2 , f 3 is the empirical
or statistically identified functions that map current system metrics into the probability of the
corresponding transition, p(…) is the the probability of the Predictive Watchdog making the given
transition.</p>
      <p>The time to failure is modeled as a random variable that follows an exponential distribution,
since system failures in real time are usually independent events occurring at a constant rate. The
exponential distribution is used to estimate the probability that the system will remain operational
during a certain period, i.e., that a failure will not occur before a given moment. Its key property is
memorylessness, which means that the probability of failure in the future does not depend on how
long the system has already been operating without faults. The failure rate is described by
parameter λ, which determines the average failure frequency, and the probability of system
survival during time t is calculated by the formula:</p>
      <p>P ( T &gt; t )= e- λt ,
p ( N → W )= 1 - P ( T &gt; t )
here T is the time to failure. λ is the failure rate. The probability of transition to state W or C is
defined as:</p>
      <p>If the calculated probability value exceeds a certain threshold, the watchdog timer switches to
proactive response, triggering a software reset or preparing a hardware reset.</p>
      <p>Let us describe failure prediction and the two-stage protection. If it is known that the current
state is x ( k )= i, then P ( x ( k + 1 )= Fail∣ x ( k )= i )= piF. If this probability piF exceeds the
predefined thresholdθ, we choose θ as the failure probability threshold. When:</p>
      <p>piF &gt; 0[ P2 ]iF &gt; 0θ∈ [ 0.5 ; 0.8 ]  (typically)
the system is considered close to a critical state, and the predictive layer decides to perform a
preventive software reset of the corresponding component, for example, by deleting the
problematic task and recreating it, without waiting for an actual hang.</p>
      <p>If it is necessary to consider the probability of reaching F (Fail) after several steps, the product of
the matrix Pr is used:</p>
      <p>Pr = ⏟×⋯ {P × P × P },</p>
      <p>r
where element [ Pr ]iF gives the probability of being in state F after r steps, given that currently
x ( k )= i. In practical microcontroller systems, usually 1–5 steps of prediction are sufficient due to
resource constraints and short periods Δt.</p>
      <p>In the proposed two-stage mechanism, if the Markov chain analysis indicates a high probability
P ≥ θ of rapid transition to F, the predictive layer locally recovers the problematic task, performing
a soft reset: TaskDel ( taskXHandle )→ ReInit ( )→ TCreate ( ) .</p>
      <p>If the local restart does not help, that is, when state C persists, the software layer does not call
feed(...) of the hardware watchdog timer. After time T HW , which represents the hardware WDT
timeout, a full system reset occurs. In the Markov model, this corresponds to the transition C → F,
which in practice means microcontroller reboot and return to the initial state N .
(7)
(8)
(9)
(10)</p>
      <p>Thus, if the deviation is not too deep, the soft reset allows avoiding a global reboot and reduces
downtime. The hardware WDT remains as the last safeguard if the system escapes software
control. Formally: if at step k the soft reset procedure is launched, then at step k + 1 we artificially
set x ( k + 1 )= N , provided that the reset was successful. For accuracy, we introduce a “Recovery”
state (R) and modify the transition matrix.</p>
      <p>Since we have Δt ≪ T HW , the software layer has several attempts, for example, 10–20 cycles
with a step interval of Δt to perform a software reset before the hardware watchdog timeout
expires. If the Predictive Watchdog Task concludes that the system has returned to Normal or to
Warning without progress to Critical, it invokes HW_WDT_Feed(), continuing normal operation. If
the local failure is not eliminated, the Predictive Watchdog Task deliberately does not send the
control signal to the hardware watchdog, which causes it to trigger and initiates a full system
reboot.</p>
      <p>If x ( k )= C,  the soft reset is ineffective  ⇒  no control signal is sent to the HW WDT  ⇒  global
reset.</p>
      <p>In summary, the mathematical scheme is defined by the following relations. The system state
x ( k ) is a discrete variable in a finite space S. The transition laws are determined by the matrix 
P, which is updated incrementally or within a sliding window:
pij = n(i→ j) . ,
ni
(11)
where pij is the probability of the system transitioning from state i to state j in one step, n(i→ j) is
the number of observed direct transitions from state i to state j within the current sampling
window, ni is the total number of times state i was recorded as the origin during that same
window. The probability of failure in one step is P(failure), which represents transitions from the
Critical state to the Fail state or from the Warning state to the Fail state. If the threshold θ is
exceeded, preventive actions such as a soft reset are triggered. If necessary, a prediction for k steps
ahead is made using the transition matrix P. A soft reset is equivalent to forcing the system back
from a dangerous state to the Normal or Recovery state. A full reset via the hardware watchdog
occurs if the system signals a critical problem but the software reset does not succeed.</p>
      <sec id="sec-2-1">
        <title>3.2. Markov Chain Predictive Watchdog Task Algorithm</title>
        <p>The algorithm of the Markov Chain Predictive Watchdog Task is presented in Figure 1 as a
block diagram showing the sequence of actions from initialization to the choice of recovery
strategy. It illustrates the monitoring task logic in FreeRTOS:
1. Start system monitoring.
2. Analyze the process.
3. Check, using the Markov model, whether the task shows prerequisites for a hang.
4. If yes – perform a task restart or recovery.
5. If after the restart the problem is resolved – terminate the watchdog task.
6. If after the restart the problem is not resolved – restart the system.
7. After the system restart, the monitoring cycle begins again.</p>
        <p>Thus, the proposed method combines hardware and software levels of control, using a
simplified Markov model for predicting failures and timely triggering of a local restart. This
approach makes it possible to react in advance to symptoms of an impending critical state, reduce
the number of global restarts, and shorten downtime. The two-stage strategy a preventive soft reset
and a guaranteed hardware reset ensures a balance between proactivity and reliability, making the
method an effective solution for embedded and cyber-physical systems with strict timing
constraints and limited resources.</p>
      </sec>
      <sec id="sec-2-2">
        <title>3.3. Integration with IDS and PHM for deeper cyber-resilience</title>
        <p>The proposed PredictiveWDT can be augmented with external signals from intrusion detection
systems (IDS) and from prognostics &amp; health management (PHM) platforms to obtain a deeper
assessment of cyber-resilience. IDS contributes anomaly scores derived from network or host
telemetry, and PHM contributes health-risk scores derived from sensor and operational data. These
signals are fused with the Markov estimator to form a unified risk used by the watchdog decision
logic:</p>
        <p>
          Rk = α Pk (C → F ) β A kIDS+ γ H kPHM,
(12)
where k is the sampling step with period Δt, α Pk (C → F ) is the estimated one-step probability of
transition from Critical to Fail at step k , A kIDS∈ [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] is the normalized IDS anomaly score at step k
, H kPHM∈ [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] is the normalized PHM health-risk score at step k , α , β , γ ≥ 0 are the weights with
unit sum that set the contribution of each term. A local recovery is triggered when Rk ≥ θ, while
the hardware watchdog remains the final recovery layer. Integration is lightweight: subscribe to
IDS and PHM events, normalize incoming scores to the unit range, update Rk once per polling
period, and reuse the existing queue, semaphore, timing, and task-control primitives of the
PredictiveWDT. This fusion improves early detection of both technical faults and cyber anomalies
without modifying the Markov core or the state machine.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Experiments</title>
      <p>The research was conducted on a real hardware platform STM32F4 under the control of the
FreeRTOS real-time operating system. The selected hardware platform belongs to the class of
microcontrollers with the ARM Cortex-M4 architecture, widely used in embedded systems, which
combines sufficient performance with strict limitations on power consumption and computational
resources. FreeRTOS, in turn, provides typical multitasking mechanisms, such as the scheduler,
message queues, and semaphores, which allow the reproduction of scenarios of critical application
systems. Such a combination of hardware and software components makes it possible to model
realistic operating conditions of resource-constrained cyber-physical systems and at the same time
experimentally verify the effectiveness of the proposed fault-tolerance mechanisms.</p>
      <p>The test CPS on which the experiments were conducted consisted of the following components:</p>
      <p>The central processing unit, performing task scheduling;
The SPI peripheral, on which overload scenarios were modeled;
The UART interface for logging and diagnostics;
GPIO events, which simulated external sensor signals and generated asynchronous
interrupts;
The PredictiveWDT software module, integrated into FreeRTOS for predictive diagnostics
and recovery.</p>
    </sec>
    <sec id="sec-4">
      <title>5. Results</title>
      <p>In the first series of experiments, the operation of the system was studied using only the
classical hardware watchdog. For this purpose, a hang of TaskA, which serviced the SPI peripheral,
was simulated, as well as a mutual blocking or deadlock in TaskA. In the case when TaskA stopped
responding, the hardware WDT was triggered only after the expiration of the preset timeout of 2.0
seconds. After that, a full microcontroller reset occurred with re-initialization of all system
components, which took another 0.6–0.8 seconds. Thus, the total recovery time averaged about 2.7
seconds. The operation of the system in the first series of experiments, which involved only the
hardware watchdog timer, is illustrated in Figure 2, left part, where the event logging is shown in
simple_watchdog.log.</p>
      <p>5 × 2.7 = 13.5 s
Percentage of downtime over 5 minutes (300 s):
(13)
13.5 × 100 % = 4.5 % (14)
300</p>
      <p>In the second series of tests, the operation of the system with the software watchdog timer was
studied. The same actions were carried out. TaskA could hang due to simulated SPI peripheral
overload or simulated mutual blocking. The system with PredictiveWDT detected the critical state
on average within 0.11–0.13 s, since the polling period was 100 ms. Usually, 1–2 cycles were
sufficient to change the state estimate from Normal/Warning to Critical. After transitioning to
Critical with a high probability of transition to F (Fail), in particular when the thresholds of critical
load were exceeded, PredictiveWDT performed a software reset of TaskA, reloading the associated
drivers. This operation took about 0.3–0.4 s, so the total return time to Normal did not exceed 0.5 s.
There was also a simulation of a mutual blocking in TaskA, where the deadlock was so deep that
even a software reset with task recreation did not solve the problem. In this case, the deep lock
occurred due to an error in SPI peripheral interaction, when one of the tasks held a semaphore for
SPI access while the DMA transmission remained stuck in a waiting state. As a result, other tasks
waiting for this resource could not proceed, and the FreeRTOS kernel effectively lost the ability to
perform context switching. Under such conditions, a software reset of an individual task is
ineffective because the blocking occurs at the scheduler level. This defines the applicability limit of
the PredictiveWDT method: when the scheduler or peripheral subsystem becomes unresponsive,
full recovery can be achieved only through the activation of the hardware watchdog timer. The
system operation can be seen in Figure 2, right part, where the event logging is shown in
predictive_watchdog.log.</p>
      <p>In a total of 5 attempts, only in one case did the system remain in a locally non-functional state.
In this case, PredictiveWDT deliberately did not send a control signal to the hardware watchdog
timer, which led to its triggering and a full system reset. The other 4 faults were corrected precisely
through the local restart. Thus, the proportion of global restarts was 100%/5*1 = 20%, as shown in
Figure 2 on the right. With a purely reactive approach, that is, in the absence of a software
watchdog timer, the real-time scheduling method in similar scenarios always results in 100% global
restarts, since the system waits only for the hardware watchdog timer.</p>
      <p>Total downtime:</p>
      <p>( 4 × 0. 35 )+( 1 × 3 )= 1 . 4 + 2.7 = 4. 1s
Percentage of downtime over 5 minutes (300 s):
4.5 % - 1.36666 % = 4.5 % - 1.36666 % = 3.13334 % (17)</p>
      <p>Let us calculate in relative terms the reduction of downtime between the first and the second
test:
4.5 - 1.36666 × 100 % = 69.62978 % (18)</p>
      <p>4.5</p>
      <p>Regarding CPU load, the worst measurement in FreeRTOS+Trace recorded an additional 1.6%
load when TaskA generated a large number of events and PredictiveWDT simultaneously
processed transition counters in the matrix. On average, this load remained below 1.4%, i.e., it did
not noticeably affect the execution of the remaining tasks considering that the processor core runs
at 180 MHz.</p>
      <p>Also the concise comparison Table 1 below summarizes response time and steady-state
CPU load across representative fault-tolerance methods, providing context for the
proposed hybrid approach [45].</p>
      <sec id="sec-4-1">
        <title>Formal methods and verification</title>
      </sec>
      <sec id="sec-4-2">
        <title>Steady-state CPU overhead (%) 0–0.2</title>
        <p>1-3
50–200
20-80
~0</p>
      </sec>
      <sec id="sec-4-3">
        <title>Description</title>
      </sec>
      <sec id="sec-4-4">
        <title>Last-resort global reset, reactive, no early localization</title>
      </sec>
      <sec id="sec-4-5">
        <title>Predictive local restart, HW WDT as final safeguard</title>
      </sec>
      <sec id="sec-4-6">
        <title>Immediate masking or fast takeover, high hardware and synchronization cost</title>
      </sec>
      <sec id="sec-4-7">
        <title>Feature extraction and inference on the target MCU, higher compute and energy demand</title>
      </sec>
      <sec id="sec-4-8">
        <title>Proves absence of classes of</title>
        <p>defects before deployment,
limited use for online detection</p>
        <p>It should be noted that the proposed approach is OS-independent. Only the monitoring and
recovery bindings are adapted, while the Markov-based decision logic, thresholds, sampling period,
and transition matrix remain unchanged. On Zephyr, a watchdog task can selectively use kernel
queues, semaphores, and thread synchronization primitives, as well as restart tasks at the task level
using standard thread management services; on VxWorks, the same functionality is implemented
through message queues, semaphore facilities, task management services, and a native watchdog
library for eventual recovery. The necessary changes are limited to replacing the queue/semaphore
checking, delay/synchronization, task management, and hardware watchdog interfaces with their
platform equivalents. With a polling period of 100 ms, the computational overhead remains low,
approximately one to two percent on Cortex-M-class target systems, and comparable behavior is
expected on Zephyr and VxWorks under similar configurations.</p>
        <p>The overall conclusions from the experiment show that the proposed strategy thanks to the
regular soft restart of problematic components provides a significant reduction in average
downtime five to eight times less compared to a full reboot. At the same time, the guaranteed hard
protection method against fatal failures is preserved. The execution time itself did not increase
significantly in percentage terms. Therefore, the combination of lightweight stochastic risk
assessment based on the Markov model and the two-stage software–hardware mechanism
contributes to a noticeable improvement in reliability and recovery speed in CPS, while not
requiring significant computational or memory resources.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6. Conclusions</title>
      <p>The paper presents a two-stage method of cyber-resilience and fault tolerance for embedded and
cyber-physical systems, which combines a hardware watchdog timer as the last line of defense
against fatal failures with a dedicated software watchdog module that operates on a simple Markov
model. This approach makes it possible to proactively detect a probable fault or anomaly caused
either by a technical malfunction or by malicious interference and to perform a local software reset
of the affected components. The main idea is not to wait for an actual system hang, but to evaluate
the risk of transition to F (Fail) by analyzing current operational conditions using a
lowdimensional Markov chain, and to initiate recovery measures in advance. In cases where preventive
actions are ineffective, the hardware watchdog is deliberately allowed to execute a full system
reset, ensuring the last line of defense against deep lockups or kernel-level cyberattacks. The
conducted experimental studies confirmed the following key advantages:
1. The system downtime was significantly reduced thanks to local soft restarts, and the
recovery time decreased by a factor of 3.1 compared to a full system reboot from 2.7 to 0.84
seconds.
2. Reduction in the frequency of global restarts: in a typical scenario, about 80% of potential
failures or anomalies were eliminated locally, without hardware resets.
3. The computational overhead remained low, as the CPU load did not exceed 1–2%, which
makes the proposed approach suitable for resource-constrained microcontrollers, even in
high-risk attack conditions.</p>
      <p>With the same number of faults over 5 minutes, the use of the Predictive Watchdog reduced the
total downtime percentage from 4.5% to 1.36666%. In relative terms, this represents approximately
a 69.62978% reduction in downtime.</p>
      <p>The proposed method demonstrates high practical feasibility in the domain of embedded and
real-time cyber-physical systems, where strict timing constraints must be combined with
guaranteed fault elimination and protection against malicious impacts. The use of a lightweight
Markov model ensures fast analysis without complex formal or ML-based solutions, while the
twostage design minimizes the number of hardware reboots, still retaining the hardware watchdog
timer as the ultimate protective mechanism. Thus, the combination of preventive software recovery
based on low-dimensional Markov chains with hard hardware resets creates a balance between the
reactive nature of classical watchdog timers and dynamic approaches to cyber defense.</p>
      <p>The research results confirm the effectiveness of applying this scheme across diverse fields —
from industrial automation and robotics to automotive and medical systems — where it is critically
important not only to maintain continuity and reliability of operation with minimal downtime, but
also to ensure the ability of systems to withstand modern cyber threats. Future work will focus on
integrating the PredictiveWDT with intrusion detection systems (IDS) and prognostics and health
management (PHM) frameworks to enable unified monitoring of both system faults and cyber
anomalies, thereby providing a more comprehensive assessment of cyber-resilience.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <p>AI tools were used exclusively for translation and proofreading purposes. All content was
originally written by the author.
[2] J. Navarro, Introduction to system reliability theory, Springer, New York, NY, 2021, 181 pp.</p>
      <p>doi:10.1007/978-3-030-86953-3.
[3] A. Gangolli, Q. H. Mahmoud, A. Azim, A systematic review of fault injection attacks on IoT
systems, Electronics 11 (2022) 2023. doi:10.3390/electronics11132023 .
[4] T. Zaytseva, L. Kravtsova, N. Kaminskaya, Markov processes in researching the probability of
cyber attacks on marine vessels, Journal of Information Technologies in Education 52 (2023)
20–32. doi:10.14308/ite000763.
[5] R. Mehalaine, et al., Watchdog timer for fault tolerance in embedded systems, J. Intelligent</p>
      <p>Embedded Systems Applications (2024). doi:10.18280/jesa.570619.
[6] Y. Yigit, L. Maglaras, M. A. Ferrag, N. Moradpoor, G. Lambropoulos, Reliability analysis of
fault-tolerant memory systems, in: Proceedings of the 8th South-East Europe Design
Automation, Computer Engineering, Computer Networks and Social Media Conference
(SEEDA-CECNSM), IEEE, Piraeus, Greece, 2023, pp. 1–6.
doi:10.1109/SEEDACECNSM61561.2023.10470763.
[7] A. Elhanashi, P. Dini, S. Saponara, Q. Zheng, Advancements in TinyML: applications,
limitations, and impact on IoT devices, Electronics 13 (2024) 3562.
doi:10.3390/electronics13173562.
[8] F. Michel, M. Siegle, Formal error bounds for the state space reduction of Markov chains,</p>
      <p>Performance Evaluation 167 (2025) 102464. doi:10.1016/j.peva.2024.102464.
[9] M. A. Solouki, S. Angizi, M. Violante, Dependability in embedded systems: a survey of
faulttolerance methods and software-based mitigation techniques, IEEE Access 12 (2024) 1–35.
doi:10.1109/ACCESS.2024.3509633.
[10] L. Bariah, Q. Zhao, H. Zou, et al., Tiny machine learning and on-device inference: a survey of
applications, challenges, and future directions, Sensors 25 (2025) 3191. doi:10.3390/s25103191
[11] M. Kobielnik, W. Kempa, On the time to buffer overflow in a queueing model with a general
independent input stream and working vacations, Sensors 21 (2021) 5507.
doi:10.3390/s21165507.
[12] X. Fan, W. Zhang, H. Qi, X. Zhou, Accurate battery temperature prediction using self-training
neural networks within embedded system, Energy 313 (2024) 134031.
doi:10.1016/j.energy.2024.134031.
[13] S. Maity, A. Majumder, R. Roy, A. Hota, S. Dey, Harnessing machine learning in dynamic
thermal management in embedded CPU–GPU platforms, ACM Transactions on Design
Automation of Electronic Systems 30 (2025) 19:1–19:32. doi:10.1145/3708890.
[14] S. Rana, AI-driven fault detection and predictive maintenance in electrical power systems: a
systematic review of data-driven approaches, digital twins, and self-healing grids, American
Journal of Advanced Technology and Engineering Solutions 1 (2025) 258–289.
doi:10.63125/4p25x993.
[15] E. Magliano, A. Savino, S. Di Carlo, Real-time embedded system fault injector framework for
micro-architectural state-based reliability assessment, Journal of Electronic Testing 41 (2025)
193–208. doi:10.1007/s10836-025-06170-w.
[16] R. Aalund, V. P. Paglioni, Enhancing reliability in embedded systems hardware: a literature
survey, IEEE Access (2025) 1–1. doi:10.1109/ACCESS.2025.3534138.
[17] P. Mercorelli, Recent advances in intelligent algorithms for fault detection and diagnosis,</p>
      <p>Sensors 24 (2024) 2656. doi:10.3390/s24082656.
[18] R. Mennilli, L. Mazza, A. Mura, Integrating machine learning for predictive maintenance on
resource-constrained PLCs: a feasibility study, Sensors 25 (2025) 537. doi:10.3390/s25020537.
[19] D. A. Santos, et al., Hybrid hardening approach for a fault-tolerant RISC-V system-on-chip,</p>
      <p>IEEE Transactions on Nuclear Science 71 (2024) 1722–1730. doi:10.1109/TNS.2024.3406021.
[20] M. Almeida, E. Pereira, G. Gonçalves, HyPredictor: hybrid failure prognosis approach
combining data-driven and knowledge-based methods, in: Proceedings of the 21st
International Conference on Informatics in Control, Automation and Robotics (ICINCO 2024),
2024, pp. 245–252. doi:10.5220/0012924300003822.
[21] E. Zio, Challenges to IoT-Enabled Predictive Maintenance for Industry 4.0, IEEE Internet of</p>
      <p>Things Journal 7 (2020) 4585–4597. doi:10.1109/JIOT.2019.2957029.
[22] S. A. Khajeh, M. Saberikamarposhti, A. M. Rahmani, Real-time scheduling in IoT applications:
a systematic review, Sensors 23 (2023) 232. doi:10.3390/s23010232.
[23] O. Ettahri, A. Oukaira, M. Ali, A. Hassan, M. Nabavi, Y. Savaria, A. Lakhssassi, Real-time
thermal monitoring using embedded sensors interfaces, Sensors 20 (2020) 5657.
doi:10.3390/s20195657.
[24] W. A. Cruz Castañeda, P. Bertemes Filho, Improvement of an edge-IoT architecture driven by
artificial intelligence for smart-health chronic disease management, Sensors 24 (2024) 7965.
doi:10.3390/s24247965.
[25] N. Jiang, C. Zhang, Y. Cao, R. Zhan, Prognostic and health management of critical aircraft
systems and components: an overview, Sensors 23 (2023) 8124. doi:10.3390/s23198124.
[26] R. Mehalaine, M. Djezzar, D. Nessah, Z. Saiad, A. Saidi, Watchdog timer for fault tolerance in
embedded systems, Journal Européen des Systèmes Automatisés 57 (2024) 1713–1720.
doi:10.18280/jesa.570619.
[27] A. Kashtalian, S. Lysenko, O. Savenko, A. Nicheporuk, T. Sochor, V. Avsiyevych,
Multicomputer malware detection systems with metamorphic functionality, Radioelectronic and
Computer Systems 2024 (1) (2024) 152–175. doi:10.32620/reks.2024.1.13.
[28] D. Denysiuk, O. Savenko, S. Lysenko, B. Savenko, A. Kashtalian, Method for detecting
steganographic changes in images using machine learning, in: 2023 13th International
Conference on Dependable Systems, Services and Technologies (DESSERT), IEEE, Athens,
Greece, 2023, pp. 1–6. doi:10.1109/DESSERT61349.2023.10416453.
[29] A. Kashtalian, S. Lysenko, A. Sachenko, B. Savenko, O. Savenko, A. Nicheporuk, Evaluation
criteria of centralization options in the architecture of multicomputer systems with traps and
baits, Radioelectronic and Computer Systems 2025 (1) (2025) 264–297.
doi:10.32620/reks.2025.1.18.
[30] S. Lysenko, K. Bobrovnikova, S. Matiukh, I. Hurman, O. Savenko, Detection of the botnets’
low-rate DDoS attacks based on self-similarity, International Journal of Electrical and
Computer Engineering 10 (2020) 3651–3659. ISSN: 2088-8708.
doi:10.11591/ijece.v10i4.pp36513659.
[31] O. Pomorova, O. Savenko, S. Lysenko, A. Kryshchuk, K. Bobrovnikova, A technique for the
botnet detection based on DNS-traffic analysis, in: Communications in Computer and
Information Science, vol. 522, Springer, 2015, pp. 127–138. ISSN: 1865-0929.
doi:10.1007/978-3319-19419-6_12.
[32] S. Lysenko, O. Pomorova, O. Savenko, A. Kryshchuk, K. Bobrovnikova, DNS-based
antievasion technique for botnets detection, in: Proceedings of the 8th IEEE International
Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology
and Applications, IEEE, Warsaw, Poland, 2015, pp. 453–458. doi:10.1109/IDAACS.2015.7340777.
[33] S. Lysenko, O. Savenko, K. Bobrovnikova, A. Kryshchuk, Self-adaptive system for the
corporate area network resilience in the presence of botnet cyberattacks, in: Communications
in Computer and Information Science, vol. 860, 2018, pp. 385–401.
doi:10.1007/978-3-31992459-5_31.
[34] F. Reghenzani, Z. Guo, W. Fornaciari, Software fault tolerance in real-time systems: identifying
the future research questions, ACM Computing Surveys 55 (2023) 306. doi:10.1145/3589950.
[35] F. Ratti, HeterogeneousRTOS: a CPU–FPGA real-time OS for fault tolerance on COTS at
nearzero timing cost, ACM Transactions on Embedded Computing Systems 24 (2025) 144.
doi:10.1145/3712062.
[36] D. Simpson, J. Harkin, M. McElholm, L. McDaid, Smart watchdog for RISC-V: a novel spiking
neural network approach to fault detection, IEEE Transactions on Circuits and Systems II:
Express Briefs 72 (2025) 1–5. doi:10.1109/TCSII.2025.3583042.
[37] B. Asch, E. Jellum, M. Lohstroh, E. A. Lee, Software-defined watchdog timers for
cyberphysical systems, IEEE Embedded Systems Letters 18 (2024) 115–118.
doi:10.1109/LES.2024.3467332 .
[38] A. Mehalaine, A. Ferroudji, M. Ziane, K. Benahmed, Watchdog timer for fault tolerance in
embedded systems, Journal Européen des Systèmes Automatisés 57 (2024) 1713–1720.
doi:10.18280/jesa.570619.
[39] S. Lee, S.-H. Kwon, IndWatch: industrial watchdogs based on blockchain for securing software
supply chains in IIoT, Sensors 21 (2021) 1170. doi:10.3390/s21041170.
[40] H. de Medeiros Neto, A. M. d. R. Araújo, J. J. P. C. Rodrigues, A. L. L. Aquino, P. H. J. Nardelli,
A survey on AI-based anomaly detection in IoT and sensor networks, Sensors 23 (2023) 6765.
doi:10.3390/s23156765.
[41] W. G. Negera, A. Abebe, Review of botnet attack detection in SDN-enabled IoT using machine
learning, Sensors 22 (2022) 9837. doi:10.3390/s22249837.
[42] I. Mutambik, An efficient flow-based anomaly detection system for enhanced security in IoT
networks, Sensors 24 (2024) 7408. doi:10.3390/s24227408.
[43] I. Apostol, M. Preda, C. Nila, I. Bica, IoT botnet anomaly detection using unsupervised deep
learning, Electronics 10 (2021) 1876. doi:10.3390/electronics10161876.
[44] M. Gelgi, Y. Guan, S. Arunachala, M. S. S. Rao, N. Dragoni, Systematic literature review of IoT
botnet DDoS attacks and evaluation of detection techniques, Sensors 24 (2024) 3571.
doi:10.3390/s24113571.
[45] O. Kozelskyi, A. Drozd, B. Savenko, P. Gaj, A model for probabilistic monitoring and proactive
restart of real-time operating systems under intensive state changes in cyber-physical systems,
in: CEUR Workshop Proceedings, vol. 4013, 2025. URL:
https://ceur-ws.org/Vol-4013/paper16.pdf.
[46] J. Dalou’, B. Al-Duwairi, M. Al-Jarrah, Adaptive entropy-based detection and mitigation of
DDoS attacks in software defined networks, International Journal of Computing 19 (2020) 399–
410. doi:10.47839/ijc.19.3.1889.
[47] P. Bykovyy, V. Kochan, A. Sachenko, G. Markowsky, Genetic algorithm implementation for
perimeter security systems CAD, in: 4th IEEE Workshop on Intelligent Data Acquisition and
Advanced Computing Systems: Technology and Applications (IDAACS), Dortmund, 2007, pp.
634–638. doi:10.1109/IDAACS.2007.4488498.
[48] K. Singh, K. Singh Dhindsa, B. Bhushan, Performance analysis of agent-based distributed
defense mechanisms against DDoS attacks, International Journal of Computing 17 (2018) 15–
24. doi:10.47839/ijc.17.1.945.
[49] P. Bykovyy, Y. Pigovsky, V. Kochan, A. Sachenko, G. Markowsky, S. Aksoy, Genetic algorithm
implementation for distributed security systems optimization, in: IEEE International
Conference on Computational Intelligence for Measurement Systems and Applications
(CIMSA), Istanbul, 2008, pp. 120–124. doi:10.1109/CIMSA.2008.4595845.
[50] S. R., A. Kanavalli, A. Gupta, A. Pattanaik, S. Agarwal, Real-time DDoS detection and
mitigation in software defined networks using machine learning techniques, International
Journal of Computing 21 (2022) 353–359. doi:10.47839/ijc.21.3.2691.
[51] S. Singaravelan, P. Velayutha Perumal, R. Arun, V. Selvakumar, D. Murugan, Deep
learningbased echo state neural network for cyber threat detection in IoT-driven IICS networks,
International Journal of Computing 23 (2024) 205–210. doi:10.47839/ijc.23.2.3538.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Saraeian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shirazi</surname>
          </string-name>
          ,
          <article-title>Digital twin-based fault tolerance approach for cyber-physical production system</article-title>
          ,
          <source>ISA Transactions 130</source>
          (
          <year>2022</year>
          )
          <fpage>35</fpage>
          -
          <lpage>50</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.isatra.
          <year>2022</year>
          .
          <volume>03</volume>
          .007.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>