<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>P. Saraiva, On Shannon entropy and its applications, Kuwait J. Sci.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/NICS48868.2019.9023812</article-id>
      <title-group>
        <article-title>Byzantine Fault Tolerance in Distributed Systems: Advancing the Replica State Discovery Protocol v2.0⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Maksym Kotov</string-name>
          <email>maksym_kotov@ukr.net</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serhii Toliupa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymyr Nakonechnyi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Serhii Buchyk</string-name>
          <email>buchyk@knu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleksandr Buchyk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>60 Volodymyrska str., 01033 Kyiv</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>50</volume>
      <issue>3</issue>
      <fpage>0000</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Contemporary demands for computational complexity and fault tolerance have led to the development of different approaches toward building replicated and clustered environments. The foundational aspect of these systems is a set of underlying consensus, management mechanisms, and protocols. It is quite common to have a centralized management plane responsible for deployment, task assignment, and result aggregation to avoid complexities intrinsic within a distributed consensus paradigm. However, the fault tolerance and trust decentralization capabilities of such an approach remain restricted. One of the most prominent examples of a distributed system is blockchain technology and its off-chain networks introducing capabilities for managing and coordinating the assignment of computational tasks. Blockchain has its limitations, mainly related to response times and throughput, as it necessitates consensus for every action and interaction. A lightweight cluster coordination and consensus management framework, the Replica State Discovery Protocol (RSDP), aims to provide rapid coordination of nodes. RSDP defines an interface for arbitrary logical extension of distributed computation modules and establishes a set of rules for nodes to follow to achieve consensus within the network. Nonetheless, RSDP was initially designed as a protocol for private coordination, and in its conception, it was not constructed with Byzantine fault tolerance (BFT) in mind. The purpose of this paper is to advance the said coordination method to incorporate practices that allow for secure decentralized computation coordination even in the presence of malicious actors. Firstly, this article defines methods to achieve strict state transitions that avoid trust exploitations and flooding techniques. Secondly, this paper presents multiple approaches toward building strict quorum state reducers within RSDP, which allow to initiation of BFT-compliant operations. Thirdly, an additional set of new mechanisms and methods are described in the context of RSDP to improve both efficiency and reliability. Finally, we propose the generalized BFT model for RSDP, which leverages blockchain as a trusted source and enables the establishment of a completely decentralized public coordinated computing environment.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;computer network</kwd>
        <kwd>network protocol</kwd>
        <kwd>coordination of distributed system</kwd>
        <kwd>distributed consensus</kwd>
        <kwd>Byzantine fault tolerance</kwd>
        <kwd>BFT</kwd>
        <kwd>Replica State Discovery Protocol</kwd>
        <kwd>RSDP</kwd>
        <kwd>BFT-compliant consensus within RSDP</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The rising popularity of Internet technology has been rapidly increasing over the past few decades.
Nowadays, online interactions between remotely located parties have become quite common in
both business and governance sectors. Distributed systems serve as a technical foundation for such
operations [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. The demand led to numerous research and engineering efforts to improve the
security, reliability, and availability characteristics of these systems [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        One of the primary aspects that define architectural complexity is the coordination mechanism.
Depending on the nature of managed nodes, the task of cluster management tends to be nontrivial
and requires custom solutions to achieve state synchronization, failover, and availability. In that
context, there are two approaches based on the type of managed services: replication and
clusterization, which respectively correspond to stateless and stateful endpoints united under a
single abstract amalgamation [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7 ref8">4–11</xref>
        ].
      </p>
      <p>
        The replication approach applies to networks responsible for managing a set of homogeneous
multiagent systems [
        <xref ref-type="bibr" rid="ref4 ref5 ref6">4–6</xref>
        ]. Such installments are characterized by the interchangeability of the
constituents; that is, every participating node can be substituted by another without disrupting the
overall performance or operational stability of a distributed system. In that case, internal
architecture is commonly composed of an external-facing request forwarder and an internal pool of
replaceable nodes that perform the desired computation. A monitoring solution continuously
probes the active nodes, and upon detection of any inconsistency or response delays, signals the
control plane to simply redeploy the faulty instance.
      </p>
      <p>On the other side, clusterization refers to a method of organizing, managing, and maintaining
coordination between a set of heterogeneous multiagent systems. Clustered environments are often
comprised of either a set of stateful nodes or nodes responsible for handling unexchangeable
procedures [9–11]. The said approach requires significant engineering effort to effectively manage
request distribution and consistent operation within the system. In that case, it is quite common to
represent the interactions in two abstract layers: coordination and execution.</p>
      <p>The coordination layer taxonomy includes decentralized and centralized systems, providing
publicly and privately governed mechanisms respectively. Centralized systems are also
characterized by a single coordination responsibility service, where a server or a group of servers
oversee and reconcile operations within a network under regulated control [12, 13]. In turn,
decentralized systems represent a network of participating peers coordinated through a consensus
mechanism. We could further divide these into classes based on their adherence to the Byzantine
fault tolerance principles, which outline the inherent complexities associated with open networks.</p>
      <p>Having said that, the Replica State Discovery Protocol (RSDP) is the first consensus framework
developed for coordination of the clustered and replicated decentralized systems [14, 15]. It defines
the abstraction layers and foundations, upon which a plethora of arbitrary computational logic
could be implemented to achieve lightweight, consistent, and coordinated execution management
without requirements for a central responsible entity.</p>
      <p>One of the prominent examples of contemporary decentralized technologies compliant with
BFT is the blockchain [16–19]. Blockchain has recently become widespread, growing in interest
rates and demand from its substantial user base worldwide. This technology defines a logical basis
for verifiable public asset transfer and computation crucial to establishing a secure transactional
environment. However, it is quite limited in its scalability and overall throughput, which is an area
of active research [20–23].</p>
      <p>Initially, RSDP was designed as a decentralized coordination solution to be used within private
permissioned networks. The protocol assumes that a secure permissioned environment is
provisioned for the cluster. Its primary role is to rapidly reconcile cluster-wide state and, as an
implication, dynamically provide all the necessary context information for cluster members to
perform their procedures. RSDP was not designed to withstand any potential state transition
violations or malicious interactions and thus is not compliant with the BFT requirements.</p>
      <p>The purpose of this article is to advance RSDP capabilities to the 2.0 version, which introduces a
secure coordination layer required for both unstable and malicious environments. To achieve that,
firstly, this article introduces a set of new phase transition controls to restrict possible abuse of
participating nodes. In addition, this paper outlines a new consensus quorum approach defined on
the state reducers level. The mechanisms allow to improve protocols’ resistance capabilities against
both network losses and potential outlying coordination impact. Lastly, this article introduces a
novel approach toward open decentralized computation coordination based on RSDP and
blockchain technologies [24].</p>
    </sec>
    <sec id="sec-2">
      <title>2. Overview of the replica state discovery protocol</title>
      <p>In its basis, RSDP relies on the communication layer responsible for handling message-sending and
receiving procedures. Originally, the protocol defined an abstraction layer built on top of AMQP,
called a Simulated Local Area Network (SLAN). This abstraction allows to simplify the
development of cluster-wide network protocols and is described in its dedicated article [14].</p>
      <p>The SLAN logical topology is shown in Fig. 1:
The SLAN network topology is comprised of participating nodes, their respective queues, and a
central coordination entity responsible for initiating and managing communication channels. That
is, a central coordination is an AMQP server, where one of the most popular and widely used
implementations is RabbitMQ [25–27].</p>
      <p>In its paradigm, the communication process is designed to be reliable, scalable, and
asynchronous. The message producer does not address the recipient but rather the queue or a set of
queues when initiating a message-passing procedure. In the case of SLAN, this capability is
leveraged to both establish a broadcast basis and a direct message sending by assigning anonymous
dynamic queues to each peer individually.</p>
      <p>To begin with, let us define the core entities of SLAN:
</p>
      <p>Nodes: N ={n1 , … , n|N|}, where each n∈ N is a network participant that can send and
receive messages.</p>
      <p>Messages: M ={m1 , … , m|M|}, each m∈ M is a data payload transmitted among nodes.
Queues: Q={q1 , … , q|Q|} with a bijection Q : N → Q. Each node n has a unique queue
Q ( n) holding messages before consumption.</p>
      <p>In the context of SLAN, the individual queues are designed to be transient and accessible only
within the confines of the established connection between the peer and the AMQP server. That
approach allows for simplified dynamic handling of joining nodes but also lacks durability
guarantees for messages.</p>
      <p>The local area networks provide two basic communication operations: broadcast and direct
message passing. Having these operations allows us to build advanced custom protocols. However,
it is quite a challenge to provide these primitives within distributed systems that span multiple
networks or even regions. By leveraging AMQP capabilities, SLAN allows for establishing such a
basis efficiently and securely by leveraging direct and fanout exchange types defined in the
underlying protocol.
We can define the routing structures as follows:









</p>
      <p>Exchanges: E={ Efanout , Edirect }, where specific exchanges route messages: Efanout
broadcasts to all queues, Edirect routes by a key.</p>
      <p>Fanout Bindings: ∀ q∈ Q , ( Efanout , q )∈ Bfanout, where every queue is bound to Efanout.
Routing Keys: K is the set of keys, and R : N → K is injective, where each node n is
assigned a unique key R ( n) for direct routing.</p>
      <p>Direct Bindings: ( Edirect , Q ( n) , R ( n))∈ Bdirect for each n∈ N , where the direct exchange
maps each key R ( n) to the corresponding queue Q ( n).</p>
      <p>Within AMQP, routing keys play a pivotal role in its ability to establish complex interaction
schemas between exchanges and queues. For instance, the protocol defines ways to establish
pattern-like routing, where a queue will receive each message that was successfully matched
against a regular expression. AMQP also defines even more advanced routing patterns, such as
headers, providing even greater flexibility in network management. However, since the goal of
SLAN is to provide basic primitives, direct matching against the key is sufficient.</p>
      <p>The SLAN supports the following operations:</p>
      <p>Broadcast Operation: f broadcast (ns , m) enqueues m into all Q ( n) , n∈ N , where a broadcast
from ns via Efanout delivers m to every node.</p>
      <p>Direct Send Operation: f sendDirect (ns , nt , m) enqueues m into Q (nt ), where a direct send
uses Edirect and R (nt ) so that only nt’s queue receives m.</p>
      <p>Consumption Operation: f consume ( n , m) such that if ( n , m)∈ C , then m is removed from
Q ( n), where C ⊆ N × M is a consumption relation such that if ( n , m)∈ C, it indicates
node n has consumed message m (i.e., removed it from its queue). Once a node n processes
m, it’s marked consumed and dequeued.</p>
      <p>The simulated network could be perceived as an integral system that has its state and a set of
internal processes that actively modify it. This approach allows us to gain a holistic understanding
of the observed Decentralized Coordination Network (DCN) built on top of the underlying
communication media.</p>
      <p>State and its transitions could be defined in the following way:</p>
      <p>Broadcast Transition: δ broadcast ( S , ns , m) adds m to all Q ( n). This transition signifies state
change during the call of f broadcast (ns , m).</p>
      <p>Direct Send Transition: δdirect ( S , ns , nt , m) adds m to Q (nt ). This models the state change
when a message is sent directly with f sendDirect (ns , nt , m).</p>
      <p>Consume Transition: δconsume ( S , n , m) removes m from Q ( n) and adds ( n , m) to C,
representing a state change when a node finishes processing a message.</p>
      <p>The reliability management, congestion control, message persistence, and network tunneling
are devolved to the SLAN layer. This abstraction allows us to simplify both the interface's
complexity and functional concerns within RSDP, making it more digestible and robust. Further
research could include finding different approaches towards establishing communication media
between RSDP nodes appropriate for different environments.
V ={v1 , v2 , … , vn } and a set of directed edges E⊆ V × V representing communication links
between nodes form a topology of a distributed system. In such a system, each node has si which is
the initial state of the node vi and a dedicated mutex to ensure consistency between state
transitions [15].</p>
      <p>Let us first define utility functions such as node identifier: f id ( vi) → id of node vi, derives the
sender’s address for a given node; metadata extraction f meta ( vi) → metadata of node vi, derives the
initial or meta information for a given node.</p>
      <p>Every state mutation operation starts with f acquire ( vi) and ends with f release ( vi) function calls, to
lock and release the local state transition mutex respectively. These procedures are necessary to
guarantee that no two-state mutation functions such as f agg or f update could be executed at the same
time and interfere with the results of each other.</p>
      <p>The “DEBATE” phase includes the following steps:</p>
      <p>Send “HELLO” Messages: each vi sends a “HELLO” message to its out-neighbors N +(vi),
where: M hello ( vi)=( f id ( vi) , f meta ( vi)) with a directed message propagation that could be
denoted as: M hello ( vi)</p>
      <p>propagated to → N +(vi).</p>
      <p>Receive “STATUS” Messages: upon receiving M hello, each v j∈ N +(vi) sends its initial state:
M status ( v j)=( f id ( v j) , f meta ( vi) , s j).</p>
      <p>Aggregate States: each vi aggregates the received states {s j∣ v j∈ N−(vi) } into a local
aggregated state: si*=f agg status ( si , { M status ( v j)∣ v j∈ N−(vi) })</p>
      <p>This process introduces replicas to each other within the system. Its purpose is to share the
initial configuration or metadata to derive the initial view of the system’s state.</p>
      <p>The “SHARE” phase includes the following steps:</p>
      <p>Broadcast “SHARE” Messages: each vi broadcasts its aggregated state si* to its
outneighbors: M share ( vi)=( f id ( vi) , f meta ( vi) , si*) with the directed message propagation as
follows: M share ( vi)</p>
      <p>propagated to → N +(vi).</p>
      <p>Validate and Aggregate: each receiving node v j validates and merges the received state
using s^*j=f agg share ( s*j , { M share ( vi)∣ vi∈ N−(v j) })where f agg share extracts state components si*
from each M share ( vi) and performs operations declared within the reducer to derive a final
operational state s^*j.




</p>
      <p>After validation and aggregation processes are finished, the local state gets updated and
represents a holistic view of the target cluster state. At this point, all the necessary steps were
taken, and the participating nodes could execute their operations based on the derived states.</p>
      <p>The “CLOSE” phase includes the following steps:
</p>
      <p>Send “CLOSE” Messages: vk sends a “CLOSE” message containing its state sk to its
outneighbors: M close ( vk )=( f id ( vk ) , f meta ( vk ) , sk ) with directed message propagation denoted
as: M close ( vk )
propagated to → N +(vk).</p>
      <p>Update Internal State: each receiving node vi removes references to vk and updates its
s'i=f agg cl ose ( s j , M close ( vk )) where f agg close is a function defined within the reducer to
update the internal state si based on incoming records within sk.</p>
      <p>This phase is not necessary and is used to introduce dynamic participation handling capabilities for
RSDP. The process allows to dynamically adjust the cluster state when a subset of nodes departs
from the network.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Reinforcement of the phase transition control</title>
      <p>Phase transition control is one of the primary aspects of the protocol’s consistency guarantees. In
its initial version, RSDP defined mutexes as a main instrument to isolate state manipulation logic
between phases [15, 28]. Mutexes create so-called critical sections that restrict access to a single
abstract execution entity. The approach allows for coordinating multiple such entities, viz.,
processes, their threads, or asynchronous threads in the case of event-driven architectures.</p>
      <p>Nonetheless, the defined transition restrictions are not sufficient for handling outlying
operations. Critical sections forbid parallel execution of the restricted code segments with
programmatically defined boundaries. To enter the critical section and leave it, the parallel
execution entity must acquire and release the mutex, respectively. If there are multiple entry points
to the critical section and some of them omit the ingress boundary, parallelism and inconsistencies
are still quite possible.</p>
      <p>Having said that, RSDP in its original definition does not verify whether “STATUS” messages
were received during the “DEBATE” phase or not. The controlling mutex by definition was taken
before sending the initial “HELLO” message and released after the “SHARE” message was sent.
That approach, while protecting against parallel processing of consistent sequential messages,
allowed for some delayed messages to be reprocessed after the “DEBATE” phase was already
finished.</p>
      <p>To resolve this issue, a new state control mechanism is proposed leveraging the principles of the
finite state machine (FSM) [29]. Let us first define a set of possible protocol execution statuses
G={gINITIAL , gDEBATE , gIDLE , gSHARE , gCLOSE }, representing associated zones with the respective
phases and inter-phase execution states.</p>
      <p>In such a machine, transitions are defined as a set T , where each transition t n∈ T is
represented by: t n=( gi , σ n , g j) where gi , g j∈ G and σ n is the event that triggers the transition.
The transition rules could be defined as follows:



</p>
      <p>INITIAL → DEBATE: t1=( gINITIAL , σ start , gDEBATE), triggered by a start signal from the
joining node. The gINITIAL status does not allow the execution of any operations besides the
initiation of the protocol.</p>
      <p>DEBATE → IDLE: t 2=( gDEBATE , σ status agg , gIDLE) triggered when the aggregation from the
“STATUS” messages is finalized. The gDEBATE status does not allow execution of any
operations besides those within “DEBATE” phase.</p>
      <p>IDLE → DEBATE: t 3=( gIDLE , σ hello , gDEBATE), triggered before sending the initial “HELLO”
message during the “DEBATE” phase. The gIDLE status does not allow execution of any
operations besides transition to the next status.</p>
      <p>IDLE → SHARE: t 4=( gIDLE , σ share , gSHARE), triggered before sending or upon receiving the
“SHARE” message. The gSHARE status does not allow the execution of any operations
besides those within “SHARE” phase.</p>
      <p>IDLE → CLOSE: t5=( gIDLE , σ close , gCLOSE), triggered upon receiving the “CLOSE” message.
The gCLOSE status does not allow the execution of any operations besides those within
“CLOSE” phase.</p>
      <p>SHARE → IDLE: t 6=( gSHARE , σ share agg , gIDLE) triggered when the aggregation from the
“SHARE” messages is finalized. The gDEBATE status does not allow the execution of any
operations besides those within “SHARE” phase.</p>
      <p>CLOSE → IDLE: t7=( gCLOSE , σ close agg , gIDLE) triggered when the aggregation from the
“CLOSE” messages is finalized. The gDEBATE status does not allow the execution of any
operations besides those within “CLOSE” phase.</p>
      <p>Let gi∈ G represent the current. The transition function δ is defined as: δ ( gi , σ )=g j and is used
to formally change the current execution status. For example, δ ( gINITIAL , σ start )=gDEBATE.</p>
      <p>Based on the defined rules, a state transition matrix could be constructed that visualizes
available operations.</p>
      <p>In the following matrix, states are indexed as follows:
1 → gINITIAL</p>
      <sec id="sec-3-1">
        <title>2 → gDEBATE</title>
      </sec>
      <sec id="sec-3-2">
        <title>3 → gIDLE</title>
      </sec>
      <sec id="sec-3-3">
        <title>4 → gSHARE</title>
      </sec>
      <sec id="sec-3-4">
        <title>5 → gCLOSE</title>
        <p>The transition rules thereby forbid the execution of outlying operations and provide additional
consistency guarantees for the protocol in addition to the already established mutex system. Before
adding messages to the buffers or executing state mutating operations, the engine now also has to
verify the validity of its status.</p>
        <p>Having resolved the state transition inconsistencies, another key aspect in stochastic
environments is to handle infrequent but still possible scenarios. One such scenario is an indefinite
holding of a mutex due to unforeseen physical or logical interference. Since the protocol engine
leverages such locks to modify the internal representation of the cluster state, it could potentially
get stuck if for some reason the mutex was not released.</p>
        <p>First, let us define mutex μ a binary variable where:
μ={i∧if locked by t h read H i</p>
        <p>0∧if unlocked
In that case, μ represents a mutex state and i is an identifier of the thread that locked the mutex.
The mutex can have the following state transitions:

</p>
        <p>Lock: μ=0 → μ=i (mutex acquired by the thread i).</p>
        <p>Release: μ ← 0 (mutex available to be acquired again).</p>
        <p>Mutex allows isolating resource access between multiple remote cooperating parties that are
trying to perform a critical operation simultaneously. In that context, a critical section is part of a
program that ensures mutual exclusion to the shared resource:</p>
        <p>Such a mutex does not resolve the underlying unexpected issues that led to the deadlock in the
first place but allows us to detect the issues and react to them. The protocol engine could either go
into recovery and try to reconcile the execution state or abort all operations and reinitialize itself,
going through each synchronization phase again, starting with gINITIAL.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Status consensus quorum introduction within RSDP</title>
      <p>The initial descriptions of RSDP have left uncovered mechanisms of reducers’ internal operation
consistency. To begin a discussion about the importance of a quorum-based approach for a subset
of distributed synchronization tasks, let us first define the failure and validity conditions for the
protocol, its managed nodes, and the system as a whole.</p>
      <p>The validity conditions could be evaluated from multiple perspectives and for different abstract
objects. Firstly, there are perspectives of every node vi∈ V within the system Gcluster=(V , E )
defined in the previous sections. The failure condition could also be evaluated from the perspective
of the Gcluster itself based on different protocol stages.</p>
      <p>Let us begin the definition of validity conditions from the perspective of participating nodes.
The “Local Validity” implies the repeatable results of the aggregation methods. Formally said: vi is
locally valid if:
∀ t ∈ N , f agg share ( si* , { M share ( v j)∣ v j∈ N−(vi) })=s^*j
(2)
(3)
(4)
where t is some point in time, n is some threads, χi (t ): characteristic function; χi (t )=1 if H i is in
the critical section at time t , χi (t )=0 otherwise.</p>
      <p>To handle possible inconsistencies and deadlocks, a newer version of the protocol leverages the
time-based mutexes. Essentially, this type of mutex enters a critical section with a commitment to
release it within a certain time interval. A timeout-based mutex μt introduces a timer xi and a
timeout τ i. The following state transitions and rules apply to the μt:


</p>
      <p>Acquire Mutex: μt=0 ⟹</p>
      <p>μt ← i , xi≔ 0.</p>
      <p>Release Mutex: μt=i∧ xi&lt; τ i ⟹
Timeout Exception: μt=i∧ xi ≥ τ i ⟹
μt ← 0.</p>
      <p>exception raised , μt ← 0</p>
      <p>This means that repeated aggregator calls on the same data yield the same result. The validity
implies determinism and that the aggregation function does not perform any unexpected side
effects that would lead to desynchronization.</p>
      <p>The other part of the validity conditions from the replica’s perspective describes its relative
operational quality in comparison with other participants. The “Global Validity” could be
described formally as:
f agg status( si , { M status ( v j)∣ v j∈ N−(vi) })=f agg share( si* , { M share ( v j)∣ v j∈ N−(vi) })</p>
      <p>That could also be expressed as si*=s^*j, the locally aggregated view of the cluster state is
representative and conforms with the holistic representation gained by merging the final states
from participating nodes.</p>
      <p>The perspective of a single node could be limited and thus biased. A simple binary validity
condition is not indicative within the ambit of a distributed consensus-based system. Hence, the
validation should be considered as a gauged estimation representing the degree of validity at some
particular time point. The first validation condition from the system’s perspective called






“Transitional System Validity” defines the divergence degree based on the last sharing session of
aggregated states. The estimation process could be described in the following steps:
Collect Broadcasted States: S*=si*⊆ M share ( vi)∣ vi∈ V ;
Group States: partition S¿ by identical values into classes {P1 , … , Pl };</p>
      <p>Measure Divergence: apply one of the validity estimation functions defined below.</p>
      <p>The measure could be performed by any participating node or external interceptor that has
access to the communication media. The following base functions could be used for such
estimation:</p>
      <p>Magnitude: Dmg=</p>
      <p>n
higher value means lover fragmentation.</p>
      <p>Entropy: Den=−∑m |Pl| log(|Pl|) where evaluation is based on entropy and a higher
l=1 n n
value means higher fragmentation [30].</p>
      <p>Concentration: Dco=∑m (|Pl|)2, measure the squared distribution of elements, where a
l=1 n
higher value means lover fragmentation.</p>
      <p>max|Pl|
l</p>
      <p>, represents the normalized maximum proportion, where a</p>
      <p>A similar approach could be leveraged to build the “Operational System Validity” which
represents the divergence between the finalized states. The same steps for evaluation apply to the
S^i*={s^i*∣ vi∈ V , s^i* belongs to vi } being the initial set. The method requires access to the
participating nodes to gather the finalized states.</p>
      <p>Having discussed the validity conditions and their estimation, it is quite important to emphasize
that the main metric representing the cluster’s state is “Operational System Validity”. That is
because it describes the system’s consistency rather than the individual node. Additionally, it is less
susceptible to the biases of a single node. Thus, minimization of divergence is the primary goal of
establishing a robust distributed consensus framework.</p>
      <p>The protocol could leverage its redundancy to improve consistency. To achieve that, an
additional layer is proposed to automatically reconcile inconsistent state aggregations. It introduces
the popular and weighted voting mechanisms with a set of deterministic rules to be applied before
the finalized state s^i* is settled for the replica [31, 32].</p>
      <p>Let’s start with the basis of the proposed voting consensus system. Suppose each node v j
proposes a hash h ( s*j ), where s*j is the state aggregation of v j. Then node vi receives the states and
hashes from N−(vi)={v j∈ V ∣ ( v j , vi)∈ E }, verifies them and selects one to represent its own
finalized state s^i*.</p>
      <p>The “Unweighted Popular Vote” starts with the collection of H i={h ( s*j )∣ v j∈ N−(vi) }, where
each h ( s*j ) represents the state aggregation from the perspective of a node v j. To evaluate the vote
result, for each candidate hash h'i ,k∈ H i, where k ∈ N and 0&lt; k ≤|H i| compute Ci ,k (h'i ,k ), which
is the number of times h'i ,k appears in H :</p>
      <p>i
Ci ,k (h'i ,k )=|{h ( s*j )∈ H i : h ( s*j )=h'i ,k }|
(5)
After each corresponding Ci ,k (h'i ,k ) has been constructed, the vote decision could be expressed as
follows:</p>
      <p>Ci ,k= max Ci ,k (h'i ,k ) , T i={h'i ,k∈ H i∣ Ci ,k (h'i ,k )=Ci ,k }</p>
      <p>h'i,k∈ Hi
where T i is a set of winning unique hash values. In such a case there could be a tie between votes
that has to be resolved. If |T i|=1, choose that single hash hi* in T i. Otherwise, pick hi*=m≺in T i,
What is different is the approach to implementing Ci ,k (h'i ,k ) by introducing the weight term w j&gt;0.
For each node v j∈ V , let us define a weight w j∈ R0. These weights can be derived in multiple
different ways. For example, from trust relationships or resource capacities:

</p>
      <p>Trust-Based Weights: w j= ∑ T ( vl , v j) , T ( vl , v j) ≥ 0, where T ( vl , v j) measures how
vl∈ V
much vl trusts v j. Summing over all vl∈ V yields a strictly positive real number if each
node is trusted by at least one other node. The trust could belong to R&gt;0 or a simpler binary
version {0, 1}.</p>
      <p>Resource-Based Weights: w j=f res ( B W j , CP U j , RA M j), where
B W j , CP U j , RA M j∈ R≥0 denote the bandwidth, CPU, and memory resources of v j, and
f res : R3≥0 → R &gt;0 is a function that returns a positive real number representing the node’s
capacity.</p>
      <p>The weight function could be injected into the reducer, allowing for custom setups that adhere
to the unique requirements of the system. It is important to emphasize that the weight assignment
function should also be deterministic, otherwise it would negatively influence system state
fragmentation. The weighted count computation function could be represented as the following:
(6)
(7)
Ci ,k (h'i ,k )=</p>
      <p>∑ [ w j⋅ I {h ( s*j )=h'i ,k }]
v j∈ N−(vi)
where I { X } returns 1 if X is true and 0 otherwise. Each vote for h'i ,k is multiplied by v j’s weight
value w j. The model could be generalized to include the popular voting mechanism where every
vote has the same weight. Thus, the reducer’s interface can simply expect a weight function where
the default one would yield the same value for each node.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Enhancing protocol’s efficiency and reliability</title>
      <p>Moving on to the efficiency and reliability improvements, let us first consider the updates related
to the critical section management within the protocol. Critical sections are the base atomic units
of execution from the perspective of a parallelized system. These points are responsible for a
unified state update shared between multiple execution threads. As we’ve discussed in the previous
section, the critical sections are defined and managed by respective mutex instances responsible for
ensuring phase transition consistency.</p>
      <p>It is quite important to emphasize that while controlled critical sections allow to manage
asynchronous access to the shared entity, they are one of the primary causes leading to bottlenecks
and deadlocks in the systems. That is because critical sections practically merge multiple parallel
execution threads into a single queue of operations. Therefore, they should be reduced to a minimal
required space to guarantee successful code execution and, at the same time, avoid excessive and
greedy locking strategies that would unnecessarily stagger the protocol’s engine.</p>
      <p>The proposed modification provides a new approach to delimit phase transitions and their
internal operations. One of the most important is related to the “DEBATE” phase and its critical
sections management. RSDP initially defined the said phase starting with the mutex acquisition
before sending the “HELLO” message. Subsequent operations involved “STATUS” message
gathering, state aggregation, and broadcasting the “SHARE” message. The interphase mutex was
supposed to be released only after the broadcast operation, holding the entire system at a halt
during the message-passing process. Since in its initial conception, the notion of engine statuses
was not introduced, such an approach allowed to delay the processing of the incoming “SHARE”
messages and avoid inconsistencies.</p>
      <p>In contrast with the initial version of RSDP, the proposed modification suggests acquiring a
mutex exclusively for the aggregation and update processes. The initial operations during the
“DEBATE” phase involving cluster status gathering could be executed outside the critical section
while still guaranteeing state transition consistency due to the introduction of the FSM principles
that postulate strict ordering of engine status transitions and hence executable operations. Such an
improvement allows us to avoid potential congestion related to superfluous critical sections.</p>
      <p>Another aspect that must be addressed is the reasoning behind the “CLOSE” phase’s existence
and its purpose. This phase is responsible for gracefully and dynamically handling departing nodes
from the cluster’s set. It has to be said that the approach could be done differently without an
introduction of the unique phase and its related logic. The same effect could be potentially
achieved by using the “SHARE” message, which contains a modified state by the departing replica
itself rather than the “CLOSE” message. Though the latter was eventually chosen due to the
following reasons:


</p>
      <p>The addition of a “CLOSE” message allows for differentiation between state updates and
reduction requests. Separation of concerns greatly simplifies aggregation logic for the
incoming shared states.</p>
      <p>Broadcasting “CLOSE” messages are more efficient than repetitively going through the
entire “DEBATE” and “SHARE” phases, which would be necessary in case the reducer’s
logic is built on top of a newly proposed voting mechanism.</p>
      <p>The separated approach allows for the development of additional logic related to events
management. An example could involve side effects that are not related to the state
management.</p>
      <p>Overall, the decision to include the “CLOSE” phase introduces greater modularity, simplifies the
codebase, and extends potential logical extensions within the cluster. Any operation defined for
“CLOSE” phase handling should still comply with deterministic principles to avoid
desynchronization. If the reducer’s logic relies on side effects during that phase, a new synch
process starting with “DEBATE” will have to be executed.</p>
      <p>The proposed RSDP 2.0 does not guarantee finalized state consistency. It is still quite possible
for some replicas to diverge due to hardware, network, software conditions, or their combinations.
Mechanisms like voting, including its weighted variants, aim to reduce the state groups'
fragmentation degree, but they cannot guarantee eventual consistency.</p>
      <p>For that reason, the proposed version introduces the “Mandatory Resynchronization
Mechanism”. The MRM could be implemented in multiple different ways, each with its
implications:
</p>
      <p>Asynchronous Scheduling: in that case, each replica would have an independent resynch
period. The approach is straightforward but does not leverage the cluster’s capabilities to
coordinate its operations, leading to inefficient monitoring.</p>
      <p>Cron-Based Scheduling: this would lead to strict synchronization and result in
nearsimultaneous RSDP status transitions on participating nodes, but discrepancies detection
would take the entire schedule period [33].</p>
      <p>Evenly Distributed Intervals: through RSDP itself, the participating replicas could
distribute the time slot and coordinate with each other in intervals to monitor the cluster’s
state.</p>
      <p>The latest approach would start with the initial synchronization round, going through the
“DEBATE” and “SHARE” phases. Using the dedicated reducer, every replica could deduce the
network size, participant, and deterministically assign the time slots for resynch periods. The
approach minimizes the detection time of an inconsistency by leveraging the cluster’s capabilities.</p>
      <p>Recall
that
upon
receiving</p>
      <p>M he llo,
each
v j∈ N−(vi)
sends
its
initial
state:
M status ( v j)=( f id ( v j) , f meta ( vi) , s j) through the M status ( v j) message that between other things like
metadata or state, contains a routable sender’s address. That said, when the buffer of incoming
messages is being processed into an aggregation si*=f agg status( si , { M status ( v j)∣ v j∈ N−(vi) }), it
could be constructed as follows:
si*={(f id ( v j) , |ΔAT| ⋅ rank ( f id ( v j) , A ))∣ v j∈ N−(vi)∪ vi}
(8)
where Δ T is a synchronization period provided as a parameter, |A| is the total number of
addresses in the sorted set A, where A ={f id ( v j)∣ v j∈ N−(vi)∪ vi }, and rank ( f id ( v j) , A ) returns
the position of f id ( v j) within a sorted set A for a given node v j. As a result, it would represent the
assigned timeslots for every participating node.</p>
      <p>Another important issue that must be addressed pertains to the potential attacks aimed at
service availability. Firstly, the designed version of RSDP is still supposed to be used within the
private network, where each node goes through authentication and authorization processes to
transmit messages inside the network. Though the attacks are not limited only to public services, in
such cases, it is important to design countermeasures against the potential abuse of
synchronization mechanisms.</p>
      <p>Since every interaction process goes through the SLAN layer, which is comprised of a set of
intermediary nodes, additional monitoring and security measures could be installed to prevent
potential message spam abuse. This is possible since the media server is responsible for
authentication and the routing process which enables dynamic evaluation of requests. If it detects
frequent messages of the same type coming from the same node, it could easily isolate it to avoid
potential network congestion and malicious activities.</p>
      <p>Additionally, since the RSDP 2.0 introduces quorum-based consensus, the acceptance of an
incoming shared state is resolved due to the BFT-compliant properties of the voting mechanism.
That is, for a new shared state to be accepted, it should be supported by the majority of the
network. That means that an incoming divergent state would lead to a new phase of the
synchronization period that would determine whether the transition should be accepted. Future
research could be aimed at detecting anomalous behavior.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Decentralized byzantine fault tolerance</title>
      <p>As was previously stated, RSDP is designed for controlled environments and relies on the SLAN
layer for authentication and security measures. By leveraging a voting-based census, it is possible
to limit the influence on the state distribution and coordination process since any malicious intent
would be dismissed unless it comes from the majority of the network.
Nonetheless, RSDP is designed as a cluster solution that would coordinate a group of nodes
towards a common goal through the common state management capabilities. The protocol does not
provide any side-effect verification mechanism and thus cannot guarantee that the designated tasks
through state mapping were accomplished correctly. Additionally, it relies on participating nodes
to provide status and aggregated data without malicious intent since every other node would trust
it. For these reasons, to achieve decentralization and leverage the protocol’s capabilities outside the
controlled environment, blockchain technology could be leveraged [16–20].</p>
      <p>The RSDP-Blockchain layer interactions are shown in Fig. 3:
As shown in Fig. 3, there are two distinct layers defined, the first being the blockchain network and
its validator nodes and the other RSDP cluster nodes. Every node within the RSDP cluster has to
have connections established with each other as well as with the validator node serving as an entry
point for interactions with the ledger. To explain the reasoning behind such a combination, let us
first discuss blockchain technology, its implications, limitations, and the problems it allows us to
solve.</p>
      <p>Blockchain is a decentralized network of publicly linked nodes responsible for managing the
ledger’s history. In its foundation, it relies on the linked list of hashes, representing a succinct state
of the assigned block. One of the core problems that is solved with blockchain is the historical
consensus regarding occurring interactions. Its operational consistency is guaranteed by a common
consensus mechanism and strict validation rules [16–20].</p>
      <p>There are myriads of approaches to establishing a voting mechanism, with the most common
being proof of stake, proof of authority, and proof of work. Their purpose is to achieve
networkwide consensus about the next block value. In contrast with RSDP, we can define a clear distinction
between the blockchain consensus layer and the designed protocol:
</p>
      <p>RSDP is designed as a solution that provides distributed state management and
coordination, where every node contributes to the resulting aggregation while blockchain
consensus protocols are aimed towards the validation process to verify the next block in the
ledger.</p>
      <p>Blockchain consensus protocols are designed to scale; they do not require a fully connected
network for their operation, which is a current limitation of RSDP and will be addressed in
future research.</p>
      <p>RSDP emphasizes distribution rather than decentralization. The proposed mechanisms
allow for mitigating the influence of uncertain network conditions, but the purpose of the
protocol is to coordinate a cluster rather than provide a trustless platform.</p>
      <p>That being said, blockchain technology could serve as a trust provision layer, responsible for
asset management and slashing penalties in case state inconsistency is detected. Every node within
the RSDP cluster should be registered on-chain, and a prearranged amount of assets should be
staked for both potential penalties and rewards. On top of that, every node must commit its state
transition events to the chain for historical audit purposes.</p>
      <p>Every interaction with the cluster would follow the following stages:




</p>
      <p>A set of RSDP nodes register on the blockchain network, providing stakes to the designated
smart contract.</p>
      <p>The task being published on the ledger through the contract would signify the beginning of
cluster processing. The task could include execution parameters, target goals to be
achieved, and the chosen state reducer.</p>
      <p>Participating nodes register a cluster through the smart contract and sign up for task
execution.</p>
      <p>Moving through every state transition phase such as “HELLO”, “SHARE”, or “CLOSE”, logs
would be published on the ledger to verify operational correctness.</p>
      <p>Once the task is finalized due to either an end condition or a revoking command by the
client, the results and latest states are to be published on the ledger for verification.</p>
      <p>The approach allows following the log trace at any point since the ledger is a publicly available
structure. Recall that every state transition function within RSDP should be deterministic, and thus,
every operation could be verified by outside entities. In case inconsistency was detected on the
node during its phases, slashing could be applied based on the impact degree, which could be
determined by methods provided in previous sections such as “Operational System Validity” based
on concentration, entropy, or the largest group.</p>
      <p>Another concern that has to be addressed is the state mutation due to external information. It
could happen that the cluster must agree upon a value that has a significant impact on other
systems. For example, RSDP could be used for cluster-wide rate limits compliance, where every
node tries to access a remote server and must avoid security policy violations. In such cases, it is
quite common to have dynamic policies that adjust to the current demand. Hence, the state should
periodically be updated to reflect the current demand.</p>
      <p>Inside the controlled environment, it's possible to simply establish an additional communication
channel or leverage existing SLAN’s capabilities to broadcast a new event that would update the
system’s status and trigger a new “DEBATE” round. The said approach would not be applicable
within the public environment due to trust limitations and accountability. Hence, the ledger must
be utilized as a trustless event propagation medium that would record the originating address and
provide economic guarantees for participants.</p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>Modern requirements towards available computational capabilities, their coordination, and security
inevitably lead to the rising academic and engineering interest of the networking community. As a
result, a plethora of methods of achieving dynamic coordinated network extension and
maintenance have been developed, where RSDP stands as a unique approach for abstracting
complexities related to distributed consensus management.
Firstly, this article expounds on the definition of the updated RSDP model, including its protocol
phases and consensus process. In addition to that, it introduces new methods aimed at safeguarding
the phase transition process described within the protocol. This version of the protocol also defines
additional mechanisms for handling lost updates and flooding conditions that may happen due to
hardware issues or intentionally. The proposed approach allows for a significant improvement in
the protocol’s resilience and reliability.</p>
      <p>Secondly, this article introduces a new BFT-compliant approach with RSDP by defining a set of
quorum state reducers. The quorum coordination allows for effectively discovering malicious
operations during consensus, both intentional and accidental, which is a primary property required
from shared distributed systems. The designed foundation for quorum support is flexible and easy
to modify due to the open definition of the weight control mechanism.</p>
      <p>Finally, this article defines a new computational coordination approach by incorporating both
RSDP and blockchain technologies. Such design allows for an efficient, rapid, and incentivized
collaboration of interlinked nodes within a decentralized system. Blockchain technology in that
case works as both an incentive and governance platform, providing additional resources for the
correct cooperation with the nodes and slashing procedures otherwise. Future applications of this
technology could lead to an expansion of accessible, low-cost cloud computation engines suitable
for various tasks.</p>
      <p>Overall, the tenet of this article is to further expansion of the possibility horizon within a
decentralized network coordination paradigm. This paper is intended to ameliorate
decisionmaking processes done by network engineers and architects. It is also a goal of this article to spark
a further surge in research and engineering efforts within decentralized coordination technology.
Declaration on Generative AI
While preparing this work, the authors used the AI programs Grammarly Pro to correct text
grammar and Strike Plagiarism to search for possible plagiarism. After using this tool, the authors
reviewed and edited the content as needed and took full responsibility for the publication’s content.
[9] A. Luntovskyy, et al., Highly-distributed systems: What is inside? in: 2020 IEEE International
Conference on Problems of Infocommunications. Science and Technology (PIC S&amp;T), 2020.
doi:10.1109/PICST51311.2020.9467890
[10] V. S. Pai et al., Locality-aware request distribution in cluster-based servers, ACM SIGOPS OSR
32(5) (1998) 205–216. doi:10.1145/384265.291048
[11] A. Verma, et al., Large-scale cluster management at Google with Borg, in: 10th European</p>
      <p>Conference on Computer Systems, EuroSys’15, 2015, 1–17. doi:10.1145/2741948.2741964
[12] Y. Kostiuk, et al., Integrated protection strategies and adaptive resource distribution for secure
video streaming over a Bluetooth network, in: Cybersecurity Providing in Information and
Telecommunication Systems II, vol. 3826, 2024, 129–138.
[13] P. Anakhov, et al., Evaluation method of the physical compatibility of equipment in a hybrid
information transmission network, J. Theor. Appl. Inf. Technol. 100(22) (2022) 6635–6644.
[14] M. Kotov, S. Toliupa, V. Nakonechnyi, Method of building local area network simulation based
on AMQP and its support protocols suite, Telecommun. Inf. Technol. 3 (2024) 102–119.
doi:10.31673/2412-4338.2024.039989
[15] M. Kotov, S. Toliupa, V. Nakonechnyi, Replica state discovery protocol based on advanced
message queuing protocol, Electron. Prof. Sci. J. Cybersecur. Educ. Sci. Tech. 3(23) (2024) 156–
171. doi:10.28925/2663-4023.2024.23.156171
[16] Z. Zheng, et al., An overview of blockchain technology: Architecture, consensus, and future
trends, in: IEEE International Congress on Big Data (BigData Congress), 2017.
doi:10.1109/BigDataCongress.2017.85
[17] D. Yaga, et al., Blockchain technology overview, National Institute of Standards and</p>
      <p>Technology Internal Report, 2019. doi:10.48550/arXiv.1906.11078
[18] J. Golosova, A. Romanovs, The advantages and disadvantages of the blockchain technology, in:
IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering
(AIEEE), 2018. doi:10.1109/AIEEE.2018.8592253
[19] G. Habib, et al., Blockchain technology: Benefits, challenges, applications, and integration of
blockchain technology with cloud computing, Future Internet 14(11) (2022).
doi:10.3390/fi14110341
[20] M. S. Kotov, Tree-based state sharding for scalability and load balancing in multichain
systems, Electron. Prof. Sci. J. Cybersecur. Educ. Sci. Tech. 2(26) (2024) 392–408.
doi:10.28925/2663-4023.2024.26.702
[21] W. Liu, et al., Distributed and parallel blockchain: Towards a multi-chain system with
enhanced security, IEEE Transact. Dependable Secur. Comput. 22(1) (2024) 1–16.
doi:10.1109/tdsc.2024.3417531
[22] S. I. Sion, et al., A comprehensive review of multi-chain architecture for blockchain integration
in organizations, in: Business Process Management: Blockchain, Robotic Process Automation,
Central and Eastern European, Educators and Industry Forum, BPM 2024, Lecture Notes in
Business Information Processing, vol. 527, 2024. doi:10.1007/978-3-031-70445-1_1
[23] F. Hashim, K. Shuaib, N. Zaki, Sharding for scalable blockchain networks, SN Comput. Sci. 4(2)
(2023). doi:10.1007/s42979-022-01435-z
[24] V. Zhebka, et al., Methodology for choosing a consensus algorithm for blockchain technology,
in: Workshop on Digital Economy Concepts and Technologies Workshop, DECaT, vol. 3665
(2024) 106–113.
[25] N. Naik, Choice of effective messaging protocols for IoT systems: MQTT, CoAP, AMQP and
HTTP, in: 2017 IEEE International Systems Engineering Symposium (ISSE), 2017, 426–435.
doi:10.1109/SysEng.2017.8088251
[26] J. L. Fernandes, et al., Performance evaluation of RESTful web services and AMQP protocol, in:
2013 5th International Conference on Ubiquitous and Future Networks (ICUFN), 2013.
doi:10.1109/ICUFN.2013.6614932</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>V.</given-names>
             
            <surname>Grechaninov</surname>
          </string-name>
          , et al.,
          <article-title>Models and Methods for Determining Application Performance Estimates in Distributed Structures</article-title>
          ,
          <source>in: Cybersecurity Providing in Information and Telecommunication Systems</source>
          , vol.
          <volume>3288</volume>
          ,
          <year>2022</year>
          ,
          <fpage>134</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
             
            <surname>Astapenya</surname>
          </string-name>
          , et al.,
          <article-title>Analysis of ways and methods of increasing the availability of information in distributed information systems</article-title>
          ,
          <source>in: 2021 IEEE 8th International Conference on Problems of Infocommunications, Science and Technology</source>
          (
          <year>2021</year>
          ). doi:
          <volume>10</volume>
          .1109/picst54195.
          <year>2021</year>
          .9772161
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>H.</given-names>
            <surname> Hulak</surname>
          </string-name>
          , et al.,
          <article-title>Dynamic model of guarantee capacity and cyber security management in the critical automated system</article-title>
          ,
          <source>in: 2nd International Conference on Conflict Management in Global Information Networks</source>
          , vol.
          <volume>3530</volume>
          (
          <year>2023</year>
          )
          <fpage>102</fpage>
          -
          <lpage>111</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R. </given-names>
            <surname>Guerraoui</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
           
          <article-title>Schiper, Fault-tolerance by replication in distributed systems</article-title>
          ,
          <source>in: Reliable Software Technologies-Ada-Europe'96, Lecture Notes in Computer Science</source>
          , vol.
          <volume>1088</volume>
          ,
          <year>1996</year>
          . doi:
          <volume>10</volume>
          .1007/BFb0013477
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>K. P.</surname>
          </string-name>
           Birman,
          <string-name>
            <surname>T.</surname>
          </string-name>
           A. 
          <article-title>Joseph, Exploiting replication in distributed systems</article-title>
          , Distrib. Syst. (
          <year>1989</year>
          )
          <fpage>319</fpage>
          -
          <lpage>367</lpage>
          . doi:
          <volume>10</volume>
          .1145/90417.90751
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
             
            <surname>Ciciani</surname>
          </string-name>
          , et al.,
          <article-title>Analysis of replication in distributed database systems</article-title>
          ,
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>2</volume>
          (
          <year>1990</year>
          )
          <fpage>247</fpage>
          -
          <lpage>261</lpage>
          . doi:
          <volume>10</volume>
          .1109/69.54723
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
             
            <surname>Zaharia</surname>
          </string-name>
          , et al.,
          <article-title>Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing</article-title>
          ,
          <source>in: 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI '12)</source>
          ,
          <year>2012</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
             
            <surname>Cristian</surname>
          </string-name>
          ,
          <article-title>Understanding fault-tolerant distributed systems</article-title>
          ,
          <source>Commun. of the ACM</source>
          <volume>34</volume>
          (
          <issue>2</issue>
          ) (
          <year>1991</year>
          )
          <fpage>56</fpage>
          -
          <lpage>78</lpage>
          . doi:
          <volume>10</volume>
          .1145/102792.102801
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>