1. Introduction

P. Saraiva, On Shannon entropy and its applications, Kuwait J. Sci.

10.1109/NICS48868.2019.9023812

Byzantine Fault Tolerance in Distributed Systems: Advancing the Replica State Discovery Protocol v2.0⋆

Maksym Kotov

maksym_kotov@ukr.net 0

Serhii Toliupa

Volodymyr Nakonechnyi

Serhii Buchyk

buchyk@knu.ua 0

Oleksandr Buchyk

0 0 Taras Shevchenko National University of Kyiv , 60 Volodymyrska str., 01033 Kyiv , Ukraine

2019

50 3 0000 0003

Contemporary demands for computational complexity and fault tolerance have led to the development of different approaches toward building replicated and clustered environments. The foundational aspect of these systems is a set of underlying consensus, management mechanisms, and protocols. It is quite common to have a centralized management plane responsible for deployment, task assignment, and result aggregation to avoid complexities intrinsic within a distributed consensus paradigm. However, the fault tolerance and trust decentralization capabilities of such an approach remain restricted. One of the most prominent examples of a distributed system is blockchain technology and its off-chain networks introducing capabilities for managing and coordinating the assignment of computational tasks. Blockchain has its limitations, mainly related to response times and throughput, as it necessitates consensus for every action and interaction. A lightweight cluster coordination and consensus management framework, the Replica State Discovery Protocol (RSDP), aims to provide rapid coordination of nodes. RSDP defines an interface for arbitrary logical extension of distributed computation modules and establishes a set of rules for nodes to follow to achieve consensus within the network. Nonetheless, RSDP was initially designed as a protocol for private coordination, and in its conception, it was not constructed with Byzantine fault tolerance (BFT) in mind. The purpose of this paper is to advance the said coordination method to incorporate practices that allow for secure decentralized computation coordination even in the presence of malicious actors. Firstly, this article defines methods to achieve strict state transitions that avoid trust exploitations and flooding techniques. Secondly, this paper presents multiple approaches toward building strict quorum state reducers within RSDP, which allow to initiation of BFT-compliant operations. Thirdly, an additional set of new mechanisms and methods are described in the context of RSDP to improve both efficiency and reliability. Finally, we propose the generalized BFT model for RSDP, which leverages blockchain as a trusted source and enables the establishment of a completely decentralized public coordinated computing environment.

eol>computer network network protocol coordination of distributed system distributed consensus Byzantine fault tolerance BFT Replica State Discovery Protocol RSDP BFT-compliant consensus within RSDP

1. Introduction

The rising popularity of Internet technology has been rapidly increasing over the past few decades. Nowadays, online interactions between remotely located parties have become quite common in both business and governance sectors. Distributed systems serve as a technical foundation for such operations [ 1, 2 ]. The demand led to numerous research and engineering efforts to improve the security, reliability, and availability characteristics of these systems [ 3 ].

One of the primary aspects that define architectural complexity is the coordination mechanism. Depending on the nature of managed nodes, the task of cluster management tends to be nontrivial and requires custom solutions to achieve state synchronization, failover, and availability. In that context, there are two approaches based on the type of managed services: replication and clusterization, which respectively correspond to stateless and stateful endpoints united under a single abstract amalgamation [ 4–11 ].

The replication approach applies to networks responsible for managing a set of homogeneous multiagent systems [ 4–6 ]. Such installments are characterized by the interchangeability of the constituents; that is, every participating node can be substituted by another without disrupting the overall performance or operational stability of a distributed system. In that case, internal architecture is commonly composed of an external-facing request forwarder and an internal pool of replaceable nodes that perform the desired computation. A monitoring solution continuously probes the active nodes, and upon detection of any inconsistency or response delays, signals the control plane to simply redeploy the faulty instance.

On the other side, clusterization refers to a method of organizing, managing, and maintaining coordination between a set of heterogeneous multiagent systems. Clustered environments are often comprised of either a set of stateful nodes or nodes responsible for handling unexchangeable procedures [9–11]. The said approach requires significant engineering effort to effectively manage request distribution and consistent operation within the system. In that case, it is quite common to represent the interactions in two abstract layers: coordination and execution.

The coordination layer taxonomy includes decentralized and centralized systems, providing publicly and privately governed mechanisms respectively. Centralized systems are also characterized by a single coordination responsibility service, where a server or a group of servers oversee and reconcile operations within a network under regulated control [12, 13]. In turn, decentralized systems represent a network of participating peers coordinated through a consensus mechanism. We could further divide these into classes based on their adherence to the Byzantine fault tolerance principles, which outline the inherent complexities associated with open networks.

Having said that, the Replica State Discovery Protocol (RSDP) is the first consensus framework developed for coordination of the clustered and replicated decentralized systems [14, 15]. It defines the abstraction layers and foundations, upon which a plethora of arbitrary computational logic could be implemented to achieve lightweight, consistent, and coordinated execution management without requirements for a central responsible entity.

One of the prominent examples of contemporary decentralized technologies compliant with BFT is the blockchain [16–19]. Blockchain has recently become widespread, growing in interest rates and demand from its substantial user base worldwide. This technology defines a logical basis for verifiable public asset transfer and computation crucial to establishing a secure transactional environment. However, it is quite limited in its scalability and overall throughput, which is an area of active research [20–23].

Initially, RSDP was designed as a decentralized coordination solution to be used within private permissioned networks. The protocol assumes that a secure permissioned environment is provisioned for the cluster. Its primary role is to rapidly reconcile cluster-wide state and, as an implication, dynamically provide all the necessary context information for cluster members to perform their procedures. RSDP was not designed to withstand any potential state transition violations or malicious interactions and thus is not compliant with the BFT requirements.

The purpose of this article is to advance RSDP capabilities to the 2.0 version, which introduces a secure coordination layer required for both unstable and malicious environments. To achieve that, firstly, this article introduces a set of new phase transition controls to restrict possible abuse of participating nodes. In addition, this paper outlines a new consensus quorum approach defined on the state reducers level. The mechanisms allow to improve protocols’ resistance capabilities against both network losses and potential outlying coordination impact. Lastly, this article introduces a novel approach toward open decentralized computation coordination based on RSDP and blockchain technologies [24].

2. Overview of the replica state discovery protocol

In its basis, RSDP relies on the communication layer responsible for handling message-sending and receiving procedures. Originally, the protocol defined an abstraction layer built on top of AMQP, called a Simulated Local Area Network (SLAN). This abstraction allows to simplify the development of cluster-wide network protocols and is described in its dedicated article [14].

The SLAN logical topology is shown in Fig. 1: The SLAN network topology is comprised of participating nodes, their respective queues, and a central coordination entity responsible for initiating and managing communication channels. That is, a central coordination is an AMQP server, where one of the most popular and widely used implementations is RabbitMQ [25–27].

In its paradigm, the communication process is designed to be reliable, scalable, and asynchronous. The message producer does not address the recipient but rather the queue or a set of queues when initiating a message-passing procedure. In the case of SLAN, this capability is leveraged to both establish a broadcast basis and a direct message sending by assigning anonymous dynamic queues to each peer individually.

To begin with, let us define the core entities of SLAN: 

Nodes: N ={n1 , … , n|N|}, where each n∈ N is a network participant that can send and receive messages.

Messages: M ={m1 , … , m|M|}, each m∈ M is a data payload transmitted among nodes. Queues: Q={q1 , … , q|Q|} with a bijection Q : N → Q. Each node n has a unique queue Q ( n) holding messages before consumption.

In the context of SLAN, the individual queues are designed to be transient and accessible only within the confines of the established connection between the peer and the AMQP server. That approach allows for simplified dynamic handling of joining nodes but also lacks durability guarantees for messages.

The local area networks provide two basic communication operations: broadcast and direct message passing. Having these operations allows us to build advanced custom protocols. However, it is quite a challenge to provide these primitives within distributed systems that span multiple networks or even regions. By leveraging AMQP capabilities, SLAN allows for establishing such a basis efficiently and securely by leveraging direct and fanout exchange types defined in the underlying protocol. We can define the routing structures as follows:          

Exchanges: E={ Efanout , Edirect }, where specific exchanges route messages: Efanout broadcasts to all queues, Edirect routes by a key.

Fanout Bindings: ∀ q∈ Q , ( Efanout , q )∈ Bfanout, where every queue is bound to Efanout. Routing Keys: K is the set of keys, and R : N → K is injective, where each node n is assigned a unique key R ( n) for direct routing.

Direct Bindings: ( Edirect , Q ( n) , R ( n))∈ Bdirect for each n∈ N , where the direct exchange maps each key R ( n) to the corresponding queue Q ( n).

Within AMQP, routing keys play a pivotal role in its ability to establish complex interaction schemas between exchanges and queues. For instance, the protocol defines ways to establish pattern-like routing, where a queue will receive each message that was successfully matched against a regular expression. AMQP also defines even more advanced routing patterns, such as headers, providing even greater flexibility in network management. However, since the goal of SLAN is to provide basic primitives, direct matching against the key is sufficient.

The SLAN supports the following operations:

Broadcast Operation: f broadcast (ns , m) enqueues m into all Q ( n) , n∈ N , where a broadcast from ns via Efanout delivers m to every node.

Direct Send Operation: f sendDirect (ns , nt , m) enqueues m into Q (nt ), where a direct send uses Edirect and R (nt ) so that only nt’s queue receives m.

Consumption Operation: f consume ( n , m) such that if ( n , m)∈ C , then m is removed from Q ( n), where C ⊆ N × M is a consumption relation such that if ( n , m)∈ C, it indicates node n has consumed message m (i.e., removed it from its queue). Once a node n processes m, it’s marked consumed and dequeued.

The simulated network could be perceived as an integral system that has its state and a set of internal processes that actively modify it. This approach allows us to gain a holistic understanding of the observed Decentralized Coordination Network (DCN) built on top of the underlying communication media.

State and its transitions could be defined in the following way:

Broadcast Transition: δ broadcast ( S , ns , m) adds m to all Q ( n). This transition signifies state change during the call of f broadcast (ns , m).

Direct Send Transition: δdirect ( S , ns , nt , m) adds m to Q (nt ). This models the state change when a message is sent directly with f sendDirect (ns , nt , m).

Consume Transition: δconsume ( S , n , m) removes m from Q ( n) and adds ( n , m) to C, representing a state change when a node finishes processing a message.

The reliability management, congestion control, message persistence, and network tunneling are devolved to the SLAN layer. This abstraction allows us to simplify both the interface's complexity and functional concerns within RSDP, making it more digestible and robust. Further research could include finding different approaches towards establishing communication media between RSDP nodes appropriate for different environments. V ={v1 , v2 , … , vn } and a set of directed edges E⊆ V × V representing communication links between nodes form a topology of a distributed system. In such a system, each node has si which is the initial state of the node vi and a dedicated mutex to ensure consistency between state transitions [15].

Let us first define utility functions such as node identifier: f id ( vi) → id of node vi, derives the sender’s address for a given node; metadata extraction f meta ( vi) → metadata of node vi, derives the initial or meta information for a given node.

Every state mutation operation starts with f acquire ( vi) and ends with f release ( vi) function calls, to lock and release the local state transition mutex respectively. These procedures are necessary to guarantee that no two-state mutation functions such as f agg or f update could be executed at the same time and interfere with the results of each other.

The “DEBATE” phase includes the following steps:

Send “HELLO” Messages: each vi sends a “HELLO” message to its out-neighbors N +(vi), where: M hello ( vi)=( f id ( vi) , f meta ( vi)) with a directed message propagation that could be denoted as: M hello ( vi)

propagated to → N +(vi).

Receive “STATUS” Messages: upon receiving M hello, each v j∈ N +(vi) sends its initial state: M status ( v j)=( f id ( v j) , f meta ( vi) , s j).

Aggregate States: each vi aggregates the received states {s j∣ v j∈ N−(vi) } into a local aggregated state: si*=f agg status ( si , { M status ( v j)∣ v j∈ N−(vi) })

This process introduces replicas to each other within the system. Its purpose is to share the initial configuration or metadata to derive the initial view of the system’s state.

The “SHARE” phase includes the following steps:

Broadcast “SHARE” Messages: each vi broadcasts its aggregated state si* to its outneighbors: M share ( vi)=( f id ( vi) , f meta ( vi) , si*) with the directed message propagation as follows: M share ( vi)

propagated to → N +(vi).

Validate and Aggregate: each receiving node v j validates and merges the received state using s^*j=f agg share ( s*j , { M share ( vi)∣ vi∈ N−(v j) })where f agg share extracts state components si* from each M share ( vi) and performs operations declared within the reducer to derive a final operational state s^*j.     

After validation and aggregation processes are finished, the local state gets updated and represents a holistic view of the target cluster state. At this point, all the necessary steps were taken, and the participating nodes could execute their operations based on the derived states.

The “CLOSE” phase includes the following steps: 

Send “CLOSE” Messages: vk sends a “CLOSE” message containing its state sk to its outneighbors: M close ( vk )=( f id ( vk ) , f meta ( vk ) , sk ) with directed message propagation denoted as: M close ( vk ) propagated to → N +(vk).

Update Internal State: each receiving node vi removes references to vk and updates its s'i=f agg cl ose ( s j , M close ( vk )) where f agg close is a function defined within the reducer to update the internal state si based on incoming records within sk.

This phase is not necessary and is used to introduce dynamic participation handling capabilities for RSDP. The process allows to dynamically adjust the cluster state when a subset of nodes departs from the network.

3. Reinforcement of the phase transition control

Phase transition control is one of the primary aspects of the protocol’s consistency guarantees. In its initial version, RSDP defined mutexes as a main instrument to isolate state manipulation logic between phases [15, 28]. Mutexes create so-called critical sections that restrict access to a single abstract execution entity. The approach allows for coordinating multiple such entities, viz., processes, their threads, or asynchronous threads in the case of event-driven architectures.

Nonetheless, the defined transition restrictions are not sufficient for handling outlying operations. Critical sections forbid parallel execution of the restricted code segments with programmatically defined boundaries. To enter the critical section and leave it, the parallel execution entity must acquire and release the mutex, respectively. If there are multiple entry points to the critical section and some of them omit the ingress boundary, parallelism and inconsistencies are still quite possible.

Having said that, RSDP in its original definition does not verify whether “STATUS” messages were received during the “DEBATE” phase or not. The controlling mutex by definition was taken before sending the initial “HELLO” message and released after the “SHARE” message was sent. That approach, while protecting against parallel processing of consistent sequential messages, allowed for some delayed messages to be reprocessed after the “DEBATE” phase was already finished.

To resolve this issue, a new state control mechanism is proposed leveraging the principles of the finite state machine (FSM) [29]. Let us first define a set of possible protocol execution statuses G={gINITIAL , gDEBATE , gIDLE , gSHARE , gCLOSE }, representing associated zones with the respective phases and inter-phase execution states.

In such a machine, transitions are defined as a set T , where each transition t n∈ T is represented by: t n=( gi , σ n , g j) where gi , g j∈ G and σ n is the event that triggers the transition. The transition rules could be defined as follows:    

INITIAL → DEBATE: t1=( gINITIAL , σ start , gDEBATE), triggered by a start signal from the joining node. The gINITIAL status does not allow the execution of any operations besides the initiation of the protocol.

DEBATE → IDLE: t 2=( gDEBATE , σ status agg , gIDLE) triggered when the aggregation from the “STATUS” messages is finalized. The gDEBATE status does not allow execution of any operations besides those within “DEBATE” phase.

IDLE → DEBATE: t 3=( gIDLE , σ hello , gDEBATE), triggered before sending the initial “HELLO” message during the “DEBATE” phase. The gIDLE status does not allow execution of any operations besides transition to the next status.

IDLE → SHARE: t 4=( gIDLE , σ share , gSHARE), triggered before sending or upon receiving the “SHARE” message. The gSHARE status does not allow the execution of any operations besides those within “SHARE” phase.

IDLE → CLOSE: t5=( gIDLE , σ close , gCLOSE), triggered upon receiving the “CLOSE” message. The gCLOSE status does not allow the execution of any operations besides those within “CLOSE” phase.

SHARE → IDLE: t 6=( gSHARE , σ share agg , gIDLE) triggered when the aggregation from the “SHARE” messages is finalized. The gDEBATE status does not allow the execution of any operations besides those within “SHARE” phase.

CLOSE → IDLE: t7=( gCLOSE , σ close agg , gIDLE) triggered when the aggregation from the “CLOSE” messages is finalized. The gDEBATE status does not allow the execution of any operations besides those within “CLOSE” phase.

Let gi∈ G represent the current. The transition function δ is defined as: δ ( gi , σ )=g j and is used to formally change the current execution status. For example, δ ( gINITIAL , σ start )=gDEBATE.

Based on the defined rules, a state transition matrix could be constructed that visualizes available operations.

In the following matrix, states are indexed as follows: 1 → gINITIAL

2 → gDEBATE 3 → gIDLE 4 → gSHARE 5 → gCLOSE

The transition rules thereby forbid the execution of outlying operations and provide additional consistency guarantees for the protocol in addition to the already established mutex system. Before adding messages to the buffers or executing state mutating operations, the engine now also has to verify the validity of its status.

Having resolved the state transition inconsistencies, another key aspect in stochastic environments is to handle infrequent but still possible scenarios. One such scenario is an indefinite holding of a mutex due to unforeseen physical or logical interference. Since the protocol engine leverages such locks to modify the internal representation of the cluster state, it could potentially get stuck if for some reason the mutex was not released.

First, let us define mutex μ a binary variable where: μ={i∧if locked by t h read H i

0∧if unlocked In that case, μ represents a mutex state and i is an identifier of the thread that locked the mutex. The mutex can have the following state transitions:  

Lock: μ=0 → μ=i (mutex acquired by the thread i).

Release: μ ← 0 (mutex available to be acquired again).

Mutex allows isolating resource access between multiple remote cooperating parties that are trying to perform a critical operation simultaneously. In that context, a critical section is part of a program that ensures mutual exclusion to the shared resource:

Such a mutex does not resolve the underlying unexpected issues that led to the deadlock in the first place but allows us to detect the issues and react to them. The protocol engine could either go into recovery and try to reconcile the execution state or abort all operations and reinitialize itself, going through each synchronization phase again, starting with gINITIAL.

4. Status consensus quorum introduction within RSDP

The initial descriptions of RSDP have left uncovered mechanisms of reducers’ internal operation consistency. To begin a discussion about the importance of a quorum-based approach for a subset of distributed synchronization tasks, let us first define the failure and validity conditions for the protocol, its managed nodes, and the system as a whole.

The validity conditions could be evaluated from multiple perspectives and for different abstract objects. Firstly, there are perspectives of every node vi∈ V within the system Gcluster=(V , E ) defined in the previous sections. The failure condition could also be evaluated from the perspective of the Gcluster itself based on different protocol stages.

Let us begin the definition of validity conditions from the perspective of participating nodes. The “Local Validity” implies the repeatable results of the aggregation methods. Formally said: vi is locally valid if: ∀ t ∈ N , f agg share ( si* , { M share ( v j)∣ v j∈ N−(vi) })=s^*j (2) (3) (4) where t is some point in time, n is some threads, χi (t ): characteristic function; χi (t )=1 if H i is in the critical section at time t , χi (t )=0 otherwise.

To handle possible inconsistencies and deadlocks, a newer version of the protocol leverages the time-based mutexes. Essentially, this type of mutex enters a critical section with a commitment to release it within a certain time interval. A timeout-based mutex μt introduces a timer xi and a timeout τ i. The following state transitions and rules apply to the μt:   

Acquire Mutex: μt=0 ⟹

μt ← i , xi≔ 0.

Release Mutex: μt=i∧ xi< τ i ⟹ Timeout Exception: μt=i∧ xi ≥ τ i ⟹ μt ← 0.

exception raised , μt ← 0

This means that repeated aggregator calls on the same data yield the same result. The validity implies determinism and that the aggregation function does not perform any unexpected side effects that would lead to desynchronization.

The other part of the validity conditions from the replica’s perspective describes its relative operational quality in comparison with other participants. The “Global Validity” could be described formally as: f agg status( si , { M status ( v j)∣ v j∈ N−(vi) })=f agg share( si* , { M share ( v j)∣ v j∈ N−(vi) })

That could also be expressed as si*=s^*j, the locally aggregated view of the cluster state is representative and conforms with the holistic representation gained by merging the final states from participating nodes.

The perspective of a single node could be limited and thus biased. A simple binary validity condition is not indicative within the ambit of a distributed consensus-based system. Hence, the validation should be considered as a gauged estimation representing the degree of validity at some particular time point. The first validation condition from the system’s perspective called       “Transitional System Validity” defines the divergence degree based on the last sharing session of aggregated states. The estimation process could be described in the following steps: Collect Broadcasted States: S*=si*⊆ M share ( vi)∣ vi∈ V ; Group States: partition S¿ by identical values into classes {P1 , … , Pl };

Measure Divergence: apply one of the validity estimation functions defined below.

The measure could be performed by any participating node or external interceptor that has access to the communication media. The following base functions could be used for such estimation:

Magnitude: Dmg=

n higher value means lover fragmentation.

Entropy: Den=−∑m |Pl| log(|Pl|) where evaluation is based on entropy and a higher l=1 n n value means higher fragmentation [30].

Concentration: Dco=∑m (|Pl|)2, measure the squared distribution of elements, where a l=1 n higher value means lover fragmentation.

max|Pl| l

, represents the normalized maximum proportion, where a

A similar approach could be leveraged to build the “Operational System Validity” which represents the divergence between the finalized states. The same steps for evaluation apply to the S^i*={s^i*∣ vi∈ V , s^i* belongs to vi } being the initial set. The method requires access to the participating nodes to gather the finalized states.

Having discussed the validity conditions and their estimation, it is quite important to emphasize that the main metric representing the cluster’s state is “Operational System Validity”. That is because it describes the system’s consistency rather than the individual node. Additionally, it is less susceptible to the biases of a single node. Thus, minimization of divergence is the primary goal of establishing a robust distributed consensus framework.

The protocol could leverage its redundancy to improve consistency. To achieve that, an additional layer is proposed to automatically reconcile inconsistent state aggregations. It introduces the popular and weighted voting mechanisms with a set of deterministic rules to be applied before the finalized state s^i* is settled for the replica [31, 32].

Let’s start with the basis of the proposed voting consensus system. Suppose each node v j proposes a hash h ( s*j ), where s*j is the state aggregation of v j. Then node vi receives the states and hashes from N−(vi)={v j∈ V ∣ ( v j , vi)∈ E }, verifies them and selects one to represent its own finalized state s^i*.

The “Unweighted Popular Vote” starts with the collection of H i={h ( s*j )∣ v j∈ N−(vi) }, where each h ( s*j ) represents the state aggregation from the perspective of a node v j. To evaluate the vote result, for each candidate hash h'i ,k∈ H i, where k ∈ N and 0< k ≤|H i| compute Ci ,k (h'i ,k ), which is the number of times h'i ,k appears in H :

i Ci ,k (h'i ,k )=|{h ( s*j )∈ H i : h ( s*j )=h'i ,k }| (5) After each corresponding Ci ,k (h'i ,k ) has been constructed, the vote decision could be expressed as follows:

Ci ,k= max Ci ,k (h'i ,k ) , T i={h'i ,k∈ H i∣ Ci ,k (h'i ,k )=Ci ,k }

h'i,k∈ Hi where T i is a set of winning unique hash values. In such a case there could be a tie between votes that has to be resolved. If |T i|=1, choose that single hash hi* in T i. Otherwise, pick hi*=m≺in T i, What is different is the approach to implementing Ci ,k (h'i ,k ) by introducing the weight term w j>0. For each node v j∈ V , let us define a weight w j∈ R0. These weights can be derived in multiple different ways. For example, from trust relationships or resource capacities:  

Trust-Based Weights: w j= ∑ T ( vl , v j) , T ( vl , v j) ≥ 0, where T ( vl , v j) measures how vl∈ V much vl trusts v j. Summing over all vl∈ V yields a strictly positive real number if each node is trusted by at least one other node. The trust could belong to R>0 or a simpler binary version {0, 1}.

Resource-Based Weights: w j=f res ( B W j , CP U j , RA M j), where B W j , CP U j , RA M j∈ R≥0 denote the bandwidth, CPU, and memory resources of v j, and f res : R3≥0 → R >0 is a function that returns a positive real number representing the node’s capacity.

The weight function could be injected into the reducer, allowing for custom setups that adhere to the unique requirements of the system. It is important to emphasize that the weight assignment function should also be deterministic, otherwise it would negatively influence system state fragmentation. The weighted count computation function could be represented as the following: (6) (7) Ci ,k (h'i ,k )=

∑ [ w j⋅ I {h ( s*j )=h'i ,k }] v j∈ N−(vi) where I { X } returns 1 if X is true and 0 otherwise. Each vote for h'i ,k is multiplied by v j’s weight value w j. The model could be generalized to include the popular voting mechanism where every vote has the same weight. Thus, the reducer’s interface can simply expect a weight function where the default one would yield the same value for each node.

5. Enhancing protocol’s efficiency and reliability

Moving on to the efficiency and reliability improvements, let us first consider the updates related to the critical section management within the protocol. Critical sections are the base atomic units of execution from the perspective of a parallelized system. These points are responsible for a unified state update shared between multiple execution threads. As we’ve discussed in the previous section, the critical sections are defined and managed by respective mutex instances responsible for ensuring phase transition consistency.

It is quite important to emphasize that while controlled critical sections allow to manage asynchronous access to the shared entity, they are one of the primary causes leading to bottlenecks and deadlocks in the systems. That is because critical sections practically merge multiple parallel execution threads into a single queue of operations. Therefore, they should be reduced to a minimal required space to guarantee successful code execution and, at the same time, avoid excessive and greedy locking strategies that would unnecessarily stagger the protocol’s engine.

The proposed modification provides a new approach to delimit phase transitions and their internal operations. One of the most important is related to the “DEBATE” phase and its critical sections management. RSDP initially defined the said phase starting with the mutex acquisition before sending the “HELLO” message. Subsequent operations involved “STATUS” message gathering, state aggregation, and broadcasting the “SHARE” message. The interphase mutex was supposed to be released only after the broadcast operation, holding the entire system at a halt during the message-passing process. Since in its initial conception, the notion of engine statuses was not introduced, such an approach allowed to delay the processing of the incoming “SHARE” messages and avoid inconsistencies.

In contrast with the initial version of RSDP, the proposed modification suggests acquiring a mutex exclusively for the aggregation and update processes. The initial operations during the “DEBATE” phase involving cluster status gathering could be executed outside the critical section while still guaranteeing state transition consistency due to the introduction of the FSM principles that postulate strict ordering of engine status transitions and hence executable operations. Such an improvement allows us to avoid potential congestion related to superfluous critical sections.

Another aspect that must be addressed is the reasoning behind the “CLOSE” phase’s existence and its purpose. This phase is responsible for gracefully and dynamically handling departing nodes from the cluster’s set. It has to be said that the approach could be done differently without an introduction of the unique phase and its related logic. The same effect could be potentially achieved by using the “SHARE” message, which contains a modified state by the departing replica itself rather than the “CLOSE” message. Though the latter was eventually chosen due to the following reasons:   

The addition of a “CLOSE” message allows for differentiation between state updates and reduction requests. Separation of concerns greatly simplifies aggregation logic for the incoming shared states.

Broadcasting “CLOSE” messages are more efficient than repetitively going through the entire “DEBATE” and “SHARE” phases, which would be necessary in case the reducer’s logic is built on top of a newly proposed voting mechanism.

The separated approach allows for the development of additional logic related to events management. An example could involve side effects that are not related to the state management.

Overall, the decision to include the “CLOSE” phase introduces greater modularity, simplifies the codebase, and extends potential logical extensions within the cluster. Any operation defined for “CLOSE” phase handling should still comply with deterministic principles to avoid desynchronization. If the reducer’s logic relies on side effects during that phase, a new synch process starting with “DEBATE” will have to be executed.

The proposed RSDP 2.0 does not guarantee finalized state consistency. It is still quite possible for some replicas to diverge due to hardware, network, software conditions, or their combinations. Mechanisms like voting, including its weighted variants, aim to reduce the state groups' fragmentation degree, but they cannot guarantee eventual consistency.

For that reason, the proposed version introduces the “Mandatory Resynchronization Mechanism”. The MRM could be implemented in multiple different ways, each with its implications: 

Asynchronous Scheduling: in that case, each replica would have an independent resynch period. The approach is straightforward but does not leverage the cluster’s capabilities to coordinate its operations, leading to inefficient monitoring.

Cron-Based Scheduling: this would lead to strict synchronization and result in nearsimultaneous RSDP status transitions on participating nodes, but discrepancies detection would take the entire schedule period [33].

Evenly Distributed Intervals: through RSDP itself, the participating replicas could distribute the time slot and coordinate with each other in intervals to monitor the cluster’s state.

The latest approach would start with the initial synchronization round, going through the “DEBATE” and “SHARE” phases. Using the dedicated reducer, every replica could deduce the network size, participant, and deterministically assign the time slots for resynch periods. The approach minimizes the detection time of an inconsistency by leveraging the cluster’s capabilities.

Recall that upon receiving

M he llo, each v j∈ N−(vi) sends its initial state: M status ( v j)=( f id ( v j) , f meta ( vi) , s j) through the M status ( v j) message that between other things like metadata or state, contains a routable sender’s address. That said, when the buffer of incoming messages is being processed into an aggregation si*=f agg status( si , { M status ( v j)∣ v j∈ N−(vi) }), it could be constructed as follows: si*={(f id ( v j) , |ΔAT| ⋅ rank ( f id ( v j) , A ))∣ v j∈ N−(vi)∪ vi} (8) where Δ T is a synchronization period provided as a parameter, |A| is the total number of addresses in the sorted set A, where A ={f id ( v j)∣ v j∈ N−(vi)∪ vi }, and rank ( f id ( v j) , A ) returns the position of f id ( v j) within a sorted set A for a given node v j. As a result, it would represent the assigned timeslots for every participating node.

Another important issue that must be addressed pertains to the potential attacks aimed at service availability. Firstly, the designed version of RSDP is still supposed to be used within the private network, where each node goes through authentication and authorization processes to transmit messages inside the network. Though the attacks are not limited only to public services, in such cases, it is important to design countermeasures against the potential abuse of synchronization mechanisms.

Since every interaction process goes through the SLAN layer, which is comprised of a set of intermediary nodes, additional monitoring and security measures could be installed to prevent potential message spam abuse. This is possible since the media server is responsible for authentication and the routing process which enables dynamic evaluation of requests. If it detects frequent messages of the same type coming from the same node, it could easily isolate it to avoid potential network congestion and malicious activities.

Additionally, since the RSDP 2.0 introduces quorum-based consensus, the acceptance of an incoming shared state is resolved due to the BFT-compliant properties of the voting mechanism. That is, for a new shared state to be accepted, it should be supported by the majority of the network. That means that an incoming divergent state would lead to a new phase of the synchronization period that would determine whether the transition should be accepted. Future research could be aimed at detecting anomalous behavior.

6. Decentralized byzantine fault tolerance

As was previously stated, RSDP is designed for controlled environments and relies on the SLAN layer for authentication and security measures. By leveraging a voting-based census, it is possible to limit the influence on the state distribution and coordination process since any malicious intent would be dismissed unless it comes from the majority of the network. Nonetheless, RSDP is designed as a cluster solution that would coordinate a group of nodes towards a common goal through the common state management capabilities. The protocol does not provide any side-effect verification mechanism and thus cannot guarantee that the designated tasks through state mapping were accomplished correctly. Additionally, it relies on participating nodes to provide status and aggregated data without malicious intent since every other node would trust it. For these reasons, to achieve decentralization and leverage the protocol’s capabilities outside the controlled environment, blockchain technology could be leveraged [16–20].

The RSDP-Blockchain layer interactions are shown in Fig. 3: As shown in Fig. 3, there are two distinct layers defined, the first being the blockchain network and its validator nodes and the other RSDP cluster nodes. Every node within the RSDP cluster has to have connections established with each other as well as with the validator node serving as an entry point for interactions with the ledger. To explain the reasoning behind such a combination, let us first discuss blockchain technology, its implications, limitations, and the problems it allows us to solve.

Blockchain is a decentralized network of publicly linked nodes responsible for managing the ledger’s history. In its foundation, it relies on the linked list of hashes, representing a succinct state of the assigned block. One of the core problems that is solved with blockchain is the historical consensus regarding occurring interactions. Its operational consistency is guaranteed by a common consensus mechanism and strict validation rules [16–20].

There are myriads of approaches to establishing a voting mechanism, with the most common being proof of stake, proof of authority, and proof of work. Their purpose is to achieve networkwide consensus about the next block value. In contrast with RSDP, we can define a clear distinction between the blockchain consensus layer and the designed protocol: 

RSDP is designed as a solution that provides distributed state management and coordination, where every node contributes to the resulting aggregation while blockchain consensus protocols are aimed towards the validation process to verify the next block in the ledger.

Blockchain consensus protocols are designed to scale; they do not require a fully connected network for their operation, which is a current limitation of RSDP and will be addressed in future research.

RSDP emphasizes distribution rather than decentralization. The proposed mechanisms allow for mitigating the influence of uncertain network conditions, but the purpose of the protocol is to coordinate a cluster rather than provide a trustless platform.

That being said, blockchain technology could serve as a trust provision layer, responsible for asset management and slashing penalties in case state inconsistency is detected. Every node within the RSDP cluster should be registered on-chain, and a prearranged amount of assets should be staked for both potential penalties and rewards. On top of that, every node must commit its state transition events to the chain for historical audit purposes.

Every interaction with the cluster would follow the following stages:     

A set of RSDP nodes register on the blockchain network, providing stakes to the designated smart contract.

The task being published on the ledger through the contract would signify the beginning of cluster processing. The task could include execution parameters, target goals to be achieved, and the chosen state reducer.

Participating nodes register a cluster through the smart contract and sign up for task execution.

Moving through every state transition phase such as “HELLO”, “SHARE”, or “CLOSE”, logs would be published on the ledger to verify operational correctness.

Once the task is finalized due to either an end condition or a revoking command by the client, the results and latest states are to be published on the ledger for verification.

The approach allows following the log trace at any point since the ledger is a publicly available structure. Recall that every state transition function within RSDP should be deterministic, and thus, every operation could be verified by outside entities. In case inconsistency was detected on the node during its phases, slashing could be applied based on the impact degree, which could be determined by methods provided in previous sections such as “Operational System Validity” based on concentration, entropy, or the largest group.

Another concern that has to be addressed is the state mutation due to external information. It could happen that the cluster must agree upon a value that has a significant impact on other systems. For example, RSDP could be used for cluster-wide rate limits compliance, where every node tries to access a remote server and must avoid security policy violations. In such cases, it is quite common to have dynamic policies that adjust to the current demand. Hence, the state should periodically be updated to reflect the current demand.

Inside the controlled environment, it's possible to simply establish an additional communication channel or leverage existing SLAN’s capabilities to broadcast a new event that would update the system’s status and trigger a new “DEBATE” round. The said approach would not be applicable within the public environment due to trust limitations and accountability. Hence, the ledger must be utilized as a trustless event propagation medium that would record the originating address and provide economic guarantees for participants.

Conclusions

Modern requirements towards available computational capabilities, their coordination, and security inevitably lead to the rising academic and engineering interest of the networking community. As a result, a plethora of methods of achieving dynamic coordinated network extension and maintenance have been developed, where RSDP stands as a unique approach for abstracting complexities related to distributed consensus management. Firstly, this article expounds on the definition of the updated RSDP model, including its protocol phases and consensus process. In addition to that, it introduces new methods aimed at safeguarding the phase transition process described within the protocol. This version of the protocol also defines additional mechanisms for handling lost updates and flooding conditions that may happen due to hardware issues or intentionally. The proposed approach allows for a significant improvement in the protocol’s resilience and reliability.

Secondly, this article introduces a new BFT-compliant approach with RSDP by defining a set of quorum state reducers. The quorum coordination allows for effectively discovering malicious operations during consensus, both intentional and accidental, which is a primary property required from shared distributed systems. The designed foundation for quorum support is flexible and easy to modify due to the open definition of the weight control mechanism.

Finally, this article defines a new computational coordination approach by incorporating both RSDP and blockchain technologies. Such design allows for an efficient, rapid, and incentivized collaboration of interlinked nodes within a decentralized system. Blockchain technology in that case works as both an incentive and governance platform, providing additional resources for the correct cooperation with the nodes and slashing procedures otherwise. Future applications of this technology could lead to an expansion of accessible, low-cost cloud computation engines suitable for various tasks.

Overall, the tenet of this article is to further expansion of the possibility horizon within a decentralized network coordination paradigm. This paper is intended to ameliorate decisionmaking processes done by network engineers and architects. It is also a goal of this article to spark a further surge in research and engineering efforts within decentralized coordination technology. Declaration on Generative AI While preparing this work, the authors used the AI programs Grammarly Pro to correct text grammar and Strike Plagiarism to search for possible plagiarism. After using this tool, the authors reviewed and edited the content as needed and took full responsibility for the publication’s content. [9] A. Luntovskyy, et al., Highly-distributed systems: What is inside? in: 2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T), 2020. doi:10.1109/PICST51311.2020.9467890 [10] V. S. Pai et al., Locality-aware request distribution in cluster-based servers, ACM SIGOPS OSR 32(5) (1998) 205–216. doi:10.1145/384265.291048 [11] A. Verma, et al., Large-scale cluster management at Google with Borg, in: 10th European

Conference on Computer Systems, EuroSys’15, 2015, 1–17. doi:10.1145/2741948.2741964 [12] Y. Kostiuk, et al., Integrated protection strategies and adaptive resource distribution for secure video streaming over a Bluetooth network, in: Cybersecurity Providing in Information and Telecommunication Systems II, vol. 3826, 2024, 129–138. [13] P. Anakhov, et al., Evaluation method of the physical compatibility of equipment in a hybrid information transmission network, J. Theor. Appl. Inf. Technol. 100(22) (2022) 6635–6644. [14] M. Kotov, S. Toliupa, V. Nakonechnyi, Method of building local area network simulation based on AMQP and its support protocols suite, Telecommun. Inf. Technol. 3 (2024) 102–119. doi:10.31673/2412-4338.2024.039989 [15] M. Kotov, S. Toliupa, V. Nakonechnyi, Replica state discovery protocol based on advanced message queuing protocol, Electron. Prof. Sci. J. Cybersecur. Educ. Sci. Tech. 3(23) (2024) 156– 171. doi:10.28925/2663-4023.2024.23.156171 [16] Z. Zheng, et al., An overview of blockchain technology: Architecture, consensus, and future trends, in: IEEE International Congress on Big Data (BigData Congress), 2017. doi:10.1109/BigDataCongress.2017.85 [17] D. Yaga, et al., Blockchain technology overview, National Institute of Standards and

Technology Internal Report, 2019. doi:10.48550/arXiv.1906.11078 [18] J. Golosova, A. Romanovs, The advantages and disadvantages of the blockchain technology, in: IEEE 6th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE), 2018. doi:10.1109/AIEEE.2018.8592253 [19] G. Habib, et al., Blockchain technology: Benefits, challenges, applications, and integration of blockchain technology with cloud computing, Future Internet 14(11) (2022). doi:10.3390/fi14110341 [20] M. S. Kotov, Tree-based state sharding for scalability and load balancing in multichain systems, Electron. Prof. Sci. J. Cybersecur. Educ. Sci. Tech. 2(26) (2024) 392–408. doi:10.28925/2663-4023.2024.26.702 [21] W. Liu, et al., Distributed and parallel blockchain: Towards a multi-chain system with enhanced security, IEEE Transact. Dependable Secur. Comput. 22(1) (2024) 1–16. doi:10.1109/tdsc.2024.3417531 [22] S. I. Sion, et al., A comprehensive review of multi-chain architecture for blockchain integration in organizations, in: Business Process Management: Blockchain, Robotic Process Automation, Central and Eastern European, Educators and Industry Forum, BPM 2024, Lecture Notes in Business Information Processing, vol. 527, 2024. doi:10.1007/978-3-031-70445-1_1 [23] F. Hashim, K. Shuaib, N. Zaki, Sharding for scalable blockchain networks, SN Comput. Sci. 4(2) (2023). doi:10.1007/s42979-022-01435-z [24] V. Zhebka, et al., Methodology for choosing a consensus algorithm for blockchain technology, in: Workshop on Digital Economy Concepts and Technologies Workshop, DECaT, vol. 3665 (2024) 106–113. [25] N. Naik, Choice of effective messaging protocols for IoT systems: MQTT, CoAP, AMQP and HTTP, in: 2017 IEEE International Systems Engineering Symposium (ISSE), 2017, 426–435. doi:10.1109/SysEng.2017.8088251 [26] J. L. Fernandes, et al., Performance evaluation of RESTful web services and AMQP protocol, in: 2013 5th International Conference on Ubiquitous and Future Networks (ICUFN), 2013. doi:10.1109/ICUFN.2013.6614932

[1]

Grechaninov , et al., Models and Methods for Determining Application Performance Estimates in Distributed Structures , in: Cybersecurity Providing in Information and Telecommunication Systems , vol. 3288 , 2022 , 134 - 141 .

[2]

Astapenya , et al., Analysis of ways and methods of increasing the availability of information in distributed information systems , in: 2021 IEEE 8th International Conference on Problems of Infocommunications, Science and Technology ( 2021 ). doi: 10 .1109/picst54195. 2021 .9772161

[3]

Hulak , et al., Dynamic model of guarantee capacity and cyber security management in the critical automated system , in: 2nd International Conference on Conflict Management in Global Information Networks , vol. 3530 ( 2023 ) 102 - 111 .

[4]

Guerraoui , A. Schiper, Fault-tolerance by replication in distributed systems , in: Reliable Software Technologies-Ada-Europe'96, Lecture Notes in Computer Science , vol. 1088 , 1996 . doi: 10 .1007/BFb0013477

[5] K. P. Birman, T. A. Joseph, Exploiting replication in distributed systems , Distrib. Syst. ( 1989 ) 319 - 367 . doi: 10 .1145/90417.90751

[6]

Ciciani , et al., Analysis of replication in distributed database systems , IEEE Trans. Knowl. Data Eng . 2 ( 1990 ) 247 - 261 . doi: 10 .1109/69.54723

[7]

Zaharia , et al., Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing , in: 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI '12) , 2012 , 1 - 14 .

[8]

Cristian , Understanding fault-tolerant distributed systems , Commun. of the ACM 34 ( 2 ) ( 1991 ) 56 - 78 . doi: 10 .1145/102792.102801