<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Optimizing Lock-Free Containers for Multithreaded Socially Oriented Information Systems⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sergiy Yakovlev</string-name>
          <email>sergiy.yakovlev@p.lodz.pl</email>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrii Strelchenko</string-name>
          <email>andrii.strelchenko@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Artem Khovrat</string-name>
          <email>artem.khovrat@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Volodymyr Kobziev</string-name>
          <email>volodymyr.kobziev@nure.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Kharkiv National University of Radio Electronics</institution>
          ,
          <addr-line>14, Nauky, Ave., Kharkiv, 61166</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Lodz University of Technology</institution>
          ,
          <addr-line>90-924 Lodz</addr-line>
          ,
          <country country="PL">Poland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>V.N. Karazin Kharkiv National University</institution>
          ,
          <addr-line>4, Svobody, Sq., Kharkiv, 61022</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This research examines lock-free containers designed for computationally intensive multithreaded intelligent solutions in socially oriented information systems, particularly financial platforms and social networks during periods of peak demand. Effective handling of large, rapidly changing data streams necessitates efficient buffering mechanisms. An expert evaluation identified queues as the optimal data structure for such sequential data processing tasks due to their inherent structural suitability. The study critically evaluated traditional lock-based synchronization methods commonly used in multi-threaded environments, uncovering significant drawbacks, including susceptibility to deadlocks, priority inversion, increased latency, and poor scalability. Given these limitations, the investigation pivoted towards lockfree synchronization methods, leveraging hardware-supported atomic operations and Compare-AndSwap (CAS) loops to facilitate concurrency without explicit locking mechanisms. To further optimize performance, memory locality principles were applied to lock-free queue implementations. Techniques such as strategic memory alignment, padding, and sequence numbering were introduced, significantly reducing cache misses and improving its efficiency. These enhancements aimed to minimize synchronization overhead, thus substantially increasing throughput and scalability under high contention scenarios. A rigorous benchmarking methodology was developed to evaluate the effectiveness of these optimizations, explicitly addressing multi-threaded measurement accuracy and correctness testing. Three distinct queue implementations were tested: a standard baseline lock-free queue, a volatile-based lock-free queue incorporating memory locality optimizations, and an atomic-based variant similarly optimized. Experimental results clearly indicated the volatile-based optimized queue significantly outperformed other implementations. It demonstrated notably lower latency, decreased performance variability, and superior scalability, underscoring the effectiveness of memory locality optimizations. These findings provide critical insights for developing efficient, scalable, and reliable synchronization solutions essential for contemporary high-load computing environments.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;cache optimization</kwd>
        <kwd>computational intelligence</kwd>
        <kwd>concurrent processing</kwd>
        <kwd>high-loaded systems 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The rapid pace of globalization and the increasing digitalization of society have significantly
amplified the importance of intelligent high-performance socially oriented systems. These systems,
including social networks, financial platforms, and critical infrastructure applications, must
effectively handle substantial and variable data streams. Furthermore, the resilience and
responsiveness of these systems become particularly crucial during social crises, such as pandemics
or military conflicts, when informational demands peak sharply.</p>
      <p>A critical challenge faced by these systems is the management of high-load scenarios
characterized by extensive concurrent data processing demands. The inability to manage such
loads effectively results in back pressure, a phenomenon where data processing throughput cannot
match incoming data rates, leading to potential system delays and reliability issues. Therefore,
addressing high-load data processing requirements becomes vital for maintaining operational
efficiency and data integrity within socially critical applications.</p>
      <p>Traditionally, intelligent high-load systems have employed lock-based synchronization
approaches, extensively researched and widely implemented due to their straightforward semantics
and ease of use. However, lock-based mechanisms are inherently limited by issues such as
deadlocks, priority inversion, increased latency, and poor scalability under high concurrency
conditions. Conversely, lock-free synchronization strategies, which avoid explicit locking
mechanisms by leveraging atomic hardware operations, remain relatively underexplored in the
context of socially oriented high-load systems, despite their potential to offer significant
performance improvements.</p>
      <p>
        Given the aforementioned context, this study aims to comprehensively investigate the
applicability and optimization potential of lock-free queue implementations specifically tailored for
high-load socially oriented systems [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The primary objective is to evaluate and enhance lock-free
queue structures through strategic memory locality optimizations, ultimately aiming to achieve
maximal performance efficiency and data integrity. To achieve this goal, the following research
tasks were formulated:




      </p>
      <p>Identification and detailed characterization of the back pressure phenomenon in high-load
systems.</p>
      <p>Analysis and comparison of existing synchronization mechanisms (lock-based vs. lock-free)
regarding their performance and limitations.</p>
      <p>Exploration and implementation of memory locality optimization techniques within
lockfree data structures.</p>
      <p>Comprehensive benchmarking and correctness testing of optimized lock-free
implementations under realistic high-load scenarios.</p>
      <p>Importantly, the intent of this research is not to target specific throughput metrics; instead, the
goal is to develop a lock-free queue optimized to deliver maximal efficiency within clearly defined
operational constraints. Special emphasis is placed on guaranteeing robust data transfer,
maintaining sequence integrity, and reducing synchronization overhead. This analysis will
critically assess optimization strategies tailored to high-throughput data processing and evaluate
the contribution of lock-free data structures toward enhancing performance in relevant high-load
environments.</p>
    </sec>
    <sec id="sec-2">
      <title>2. System description</title>
      <p>Before proceeding with further analysis, it is critical to establish a clear understanding of high-load
system characteristics, as their operational specifics significantly influence the selection and
efficiency of buffering mechanisms and subsequent data handling methodologies. High-load
environments often encounter substantial and unpredictable streams of data, necessitating
specialized strategies to ensure stable, responsive, and reliable performance.</p>
      <sec id="sec-2-1">
        <title>2.1. Back pressure problem</title>
        <p>A primary challenge encountered by high-load systems is "back pressure." This phenomenon arises
when consumer threads or processes fail to match the pace at which data producers generate
information. Consequently, unprocessed data accumulates, potentially leading to significant delays
or even system failures. Effectively managing back pressure is crucial, particularly in socially
oriented and critical infrastructure systems where data integrity, timeliness, and consistent
throughput are non-negotiable requirements.</p>
        <p>Typical software architectures address back pressure through several approaches:


</p>
        <p>Producer Regulation: Slowing down data generation intentionally at the source. This is
practical primarily in scenarios involving user interaction but is ineffective for systems
reliant on automated, continuous data streams.</p>
        <p>Data Dropping: Discarding excess incoming data. Suitable for applications where minor
data loss does not substantially degrade overall functionality, yet inappropriate for critical
systems requiring strict data continuity.</p>
        <p>
          Data Buffering: Implementing intermediate buffers that temporarily store excess incoming
data, smoothing out transient data surges and preventing overload scenarios. This solution
is optimal for environments prioritizing data consistency and integrity [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Buffering received data</title>
        <p>Given the limitations of producer regulation and data dropping in critical system scenarios,
buffering received data (BRD) emerges as the most viable solution for managing back pressure
effectively. As depicted in Figure 1, BRD employs a dedicated producer interface thread responsible
for continuously retrieving and aggregating incoming data into a Shared Container.
Simultaneously, multiple consumer threads independently access and process data from this shared
container, distributing processing load effectively.</p>
        <p>This strategy allows for substantial scalability. Theoretically, the number of consumer threads
can increase indefinitely, provided sufficient hardware resources are available. Practically,
however, hardware constraints impose limits on the number of simultaneous threads, and scaling
beyond certain thresholds may result in diminishing returns or increased complexity related to
synchronization and thread management.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Buffering received data with memory pool</title>
        <p>
          A further optimization of the basic BRD method includes integrating an external memory pool into
the buffering architecture, as illustrated in Figure 2. By employing a memory pool, operations are
performed on memory pointers rather than directly on buffered data. This approach significantly
reduces container access bottlenecks and improves the overall efficiency and responsiveness of the
data handling process [
          <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
          ].
        </p>
        <p>Adopting this methodology necessitates clearly defining data sizes provided by producer
threads, ensuring consumer threads effectively retrieve and manage data via pointer-based access.
Common implementations utilize circular buffers with fixed capacities, overwriting old data with
new entries without explicit clearing operations. Despite potential security considerations inherent
in this approach, proper data management practices effectively mitigate these risks.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Buffer implementation</title>
      <p>For effectively managing buffered data in high-load systems, an appropriate data structure must be
selected based on its efficiency in handling concurrent data access and processing.</p>
      <sec id="sec-3-1">
        <title>3.1. Container selection</title>
        <p>To determine the most suitable container type, an expert assessment was conducted involving 100
specialists from various cities, including Kharkiv, Kyiv, Lviv, Vienna, Lisbon, Krakow, Dnipro,
Odesa, New York, Toronto, and Tbilisi. The survey aimed to identify the most frequently utilized
data structures in data flow scenarios. The majority of experts identified queues and stacks as the
most common, each receiving maximum support (100 votes).</p>
        <p>Considering the characteristics of typical data handling scenarios encountered in high-load
systems – particularly where orderly data processing is critical – the queue was selected as the
container type for further investigation. The fundamental principle guiding queue functionality is
"First In, First Out" (FIFO), where data items are inserted at one end (tail) and retrieved from the
opposite end (head). Common queue implementations include:


</p>
        <p>Linked Lists: Consistently linked nodes that allow constant-time operations for data
insertion and removal.</p>
        <p>Dynamic Arrays: Arrays expandable at both ends, requiring memory reallocation during
extension.</p>
        <p>Hybrid Models: Combining linked lists and dynamic arrays, typically in the form of
fixedsize array buckets that dynamically expand as needed.</p>
        <p>Due to their structured approach to data management and inherent efficiency in handling
sequential processing tasks, queues are widely utilized in information processing systems. Two
fundamental approaches exist for queue implementation in multi-threaded systems: lock-based and
lock-free paradigms.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Lock-based approach</title>
        <p>The lock-based approach involves securing exclusive access to critical code sections, thereby
preventing concurrent thread interference. This approach typically employs synchronization
mechanisms such as mutexes or locks. A representative scenario is illustrated in Figure 3,
demonstrating protection of a shared resource – such as a global counter – through mutex locking.
The process involves explicitly acquiring a lock before performing the operation and releasing it
afterward, ensuring data consistency at the cost of potential thread waiting times.</p>
        <p>Another generalized implementation of this approach is shown in Figure 4, highlighting a
lockbased queue implementation. Here, node creation and insertion operations are explicitly
encapsulated as critical sections. Each thread must obtain exclusive access before modifying shared
structures, thereby ensuring data integrity and preventing race conditions.</p>
        <p>Despite its straightforward implementation and clear synchronization semantics, the lock-based
method possesses several inherent drawbacks, including:


</p>
        <p>Deadlocks and livelocks: Incorrect handling of locks can lead to situations where threads
become indefinitely blocked, effectively halting system functionality.</p>
        <p>Latency issues: Threads may experience significant delays as they wait for locked resources
to become available, resulting in reduced overall system responsiveness.</p>
        <p>Poor scalability: As the number of threads increases, contention for locks escalates, thereby
degrading performance significantly in high-load environments.</p>
        <p>Additionally, the lock-based approach can introduce priority inversion issues, where
lowerpriority threads hold locks required by higher-priority threads, causing increased waiting times
and reduced predictability in real-time systems. The complexity of managing multiple locks can
also lead to higher chances of programmer errors, making system maintenance and debugging
more challenging.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Lock-Free approach</title>
        <p>
          To mitigate the limitations associated with lock-based methods, lock-free techniques employ
atomic operations provided by hardware, ensuring that data modifications occur indivisibly
without being visible in intermediate states. A fundamental concept in lock-free programming is
the Compare-And-Swap (CAS) operation [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], illustrated in Figure 5. CAS loops attempt atomic
updates to shared variables, repeatedly verifying that the expected state remains consistent before
applying modifications.
        </p>
        <p>Algorithm presented in Figure 6 is a basic example of a lock-free queue implementation using
CAS operations. Unlike lock-based approaches, this method significantly reduces thread wait times
and eliminates deadlock conditions by continuously retrying operations without explicit locking
mechanisms.</p>
        <p>
          Moreover, lock-free approaches inherently provide better adaptability to varying workloads.
Since threads do not wait on locks, they can immediately attempt retries after a failure, allowing
them to dynamically respond to system load fluctuations. This characteristic substantially
contributes to achieving improved throughput, especially in high-contention environments typical
of high-load applications. Another significant advantage of lock-free methodologies is the inherent
robustness against thread failures. Unlike lock-based mechanisms, where a thread failure while
holding a lock could stall or degrade the entire system's operation, lock-free techniques ensure that
individual thread failures do not adversely impact overall system performance [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. This feature is
particularly beneficial in critical real-time applications where uninterrupted service is essential.
        </p>
        <p>Nevertheless, lock-free programming introduces unique challenges, notably the ABA problem,
wherein memory states can revert to previously observed conditions undetected, causing
erroneous system behavior. Although these challenges require sophisticated handling techniques,
lock-free implementations generally provide superior scalability and responsiveness compared to
lock-based methods.</p>
        <p>The recognition of these nuanced challenges has encouraged further advancements, particularly
incorporating memory locality optimization into lock-free techniques. Subsequent sections will
thoroughly examine such advanced lock-free variations, emphasizing the critical role of memory
locality in enhancing performance in high-load, multi-threaded environments.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Optimization of lock-free approach</title>
      <sec id="sec-4-1">
        <title>4.1. Memory locality challenges in lock-free implementations</title>
        <p>
          Additional Optimizing memory access patterns is critical to improving the performance of
lockfree algorithms in high-load, multi-threaded systems. Two core principles drive cache efficiency
and memory optimization [
          <xref ref-type="bibr" rid="ref7">7, 8</xref>
          ]:


        </p>
        <p>Temporal Locality: This principle implies that recently accessed data will likely be accessed
again shortly. Effective utilization of temporal locality requires ensuring frequently accessed
data remains available within fast cache memory, thus minimizing repetitive and costly
accesses to slower main memory.</p>
        <p>Spatial Locality: Spatial locality refers to the tendency for a program to access data locations
near previously accessed locations. Optimally leveraging spatial locality involves organizing
data so that related data elements are stored within the same cache lines, thereby
significantly reducing cache misses and enhancing overall processing speed [9].</p>
        <p>Lock-free data structures frequently utilize linked list architectures due to their inherent
flexibility. However, linked lists pose specific challenges for memory locality optimization. While
they inherently offer some spatial locality advantages when nodes are allocated contiguously,
practical constraints typically require separation of frequently updated components, such as queue
head and tail pointers, to different cache lines. This separation mitigates cache contention and the
false sharing phenomenon—an issue where multiple threads inadvertently cause cache
invalidations due to modifications of adjacent memory locations.</p>
        <p>To address false sharing, implementations commonly employ padding techniques, explicitly
isolating frequently modified data by placing them on distinct cache lines.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Enhancing lock-free algorithms with sequence numbers</title>
        <p>An advanced optimization strategy involves incorporating sequence numbers into the node
structure of lock-free queues. Sequence numbers function as synchronization tools, significantly
reducing unnecessary cache coherence traffic among threads. Proper alignment of node data and
sequence numbers ensures operations by different threads occur on separate cache lines, improving
parallel execution efficiency [10].</p>
        <p>The key advantages of integrating padding and sequence numbers into lock-free structures
include [11]:


</p>
        <p>Prevention of false sharing by isolating updates to different memory regions.</p>
        <p>Enhanced cache efficiency through optimal data alignment.</p>
        <p>Improved scalability and performance in high-contention, multi-threaded scenarios due to
reduced synchronization overhead.</p>
        <sec id="sec-4-2-1">
          <title>Example code illustrating this optimization shown in Figure 7.</title>
          <p>By effectively employing padding and sequence numbers, lock-free algorithms can achieve
significant performance improvements through optimized memory locality strategies. With these
foundational concepts established, we can now proceed to a comprehensive evaluation of these
strategies. Before conducting performance evaluations, clearly defining benchmarking principles to
accurately measure and analyze algorithm efficiency under realistic high-load conditions is
essential [11].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Benchmarking</title>
      <sec id="sec-5-1">
        <title>5.1. Methodology and challenges</title>
        <p>Accurately measuring performance metrics represents one of the critical challenges when
evaluating multi-threaded applications, particularly those utilizing lock-free structures. A
conventional approach for benchmarking performance involves capturing the execution time of a
particular operation or set of operations. Typically, this approach can be outlined as shown in
Figure 8:</p>
        <p>While effective in single-threaded scenarios, this straightforward measurement approach
becomes unreliable in multi-threaded contexts due to potential interference from concurrent
operations and system scheduling.</p>
        <p>Reliable performance measurement in multi-threaded scenarios necessitates addressing
concurrent execution challenges and operating system scheduling interference [12]. Techniques
such as thread affinity or "thread pinning" are employed to minimize inaccuracies by binding
individual threads to specific CPU cores, thereby preventing context switches and ensuring
consistent execution contexts. Furthermore, modern computing systems exhibit nondeterministic
behavior caused by scheduling and resource contention, complicating reliable performance
measurements. Thus, isolating benchmark tests from external system processes is essential to
achieving accurate, reproducible results. "Shielding," a strategy involving reserving specific CPU
cores exclusively for benchmark threads, significantly reduces variability in measurements
resulting from external interference.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Producer-consumer model</title>
        <p>Benchmarking producer-consumer scenarios, particularly those with multiple producers and
consumers, introduces additional complexities. To ensure accurate and meaningful benchmarking
results, the following considerations are essential:


</p>
        <sec id="sec-5-2-1">
          <title>Minimizing interference from unrelated system processes. Consistent core allocation for benchmark threads. Capturing synchronization overhead, as contention among threads significantly affects performance.</title>
          <p>For a robust evaluation of performance in real-world settings, the benchmarking framework
should record comprehensive statistical data, including minimum and maximum latencies, standard
deviation, and percentile thresholds (such as the 95th and 99th percentiles). This detailed statistical
approach provides a deeper understanding of system behavior, particularly under worst-case
conditions and peak load scenarios. Additionally, integer-based data types are commonly utilized in
benchmarks due to their minimal memory overhead and simplicity, mimicking pointer operations
effectively. External memory allocation strategies, such as memory arenas, offer significant
performance benefits and simplify data management, further enhancing benchmarking reliability.</p>
          <p>Correctness testing is equally critical to maintaining data integrity in concurrent data
structures. Verifying the correctness typically involves validating the order and accuracy of
processed elements. For instance (see Figure 9), one-to-one (1:1) producer-consumer tests involve a
producer sequentially inserting values into a queue and a consumer subsequently retrieving and
verifying the integrity of these values.
identify race conditions, synchronization issues, and unexpected behaviors emerging in concurrent
execution environments.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Evaluation results</title>
      <p>To thoroughly assess the effectiveness of the lock-free queue implementations optimized for
memory locality, extensive experiments were conducted. The evaluation focused on insertion
(enqueue) and retrieval (dequeue) operations within queues, covering internal object sizes ranging
from 1 byte to 128 bytes. The chosen size increments followed powers of two, aligning with
common cache line sizes in processor architectures, specifically targeting ARM and comparable
systems [13].</p>
      <p>Experiment results are illustrated for Dequeue and Enqueue on Figure 11 and Figure 12,
respectively.</p>
      <p>Figures 11 and Figure 12 present the performance measurements for the dequeue and enqueue
operations, respectively, under the 2:2 producer-consumer configuration. Due to the structural
similarities between the two operations, both figures exhibit comparable performance trends.</p>
      <sec id="sec-6-1">
        <title>Key findings from the experimental data include:</title>
        <p>
</p>
        <p>The distribution of minimum execution times across varying data sizes remained
consistently stable for each tested queue implementation, exhibiting only slight
performance degradation at the largest size (128 bytes).</p>
        <p>Notable peaks in maximum latency were observed for object sizes of 2 bytes (across all
queue implementations) and at 128 bytes specifically within atomic-based implementations.</p>
        <p>The performance degradation at the 2-byte size is attributed to suboptimal processor-level
handling, as this size does not align naturally with typical processor architectures [15]. The latency
increase observed at 128 bytes corresponds to the tested processor's actual cache line size (64
bytes). While volatile-based and traditional lock-free implementations appeared unaffected by this
discrepancy, atomic-based implementations were significantly impacted due to additional atomic
operation overhead. The experimental outcomes underline the considerable advantage of memory
locality optimizations in lock-free approaches. Specifically, improved data placement within
memory significantly enhances access efficiency, thereby directly influencing the fundamental
performance characteristics of lock-free algorithms [14]. The subsequent section will offer a
comprehensive discussion and interpretation of these performance results, considering their
implications for practical high-load system scenarios.</p>
        <p>Aside from synthetic experiments, this optimization approach was also applied to a system
performing statistical analysis of market data from an automated electronic exchange. The results
demonstrated a significant improvement in system stability, particularly in addressing the
backpressure problem.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>This research conducted a comprehensive analysis and evaluation of lock-free queue
implementations, emphasizing memory locality optimization strategies. Extensive experimental
studies were carried out, assessing performance across various internal queue object sizes, ranging
from 1 to 128 bytes. Three distinct implementations were systematically compared: a conventional
lock-free queue serving as a baseline, a volatile-based approach incorporating memory locality
optimizations, and an atomic-based implementation similarly enhanced by locality techniques.</p>
      <p>The principal findings indicate that while conventional lock-free queues maintain theoretical
non-blocking properties, they demonstrate significant performance variability under high
contention due to inherent cache coherence overhead and frequent synchronization needs.
Conversely, implementations optimized for memory locality – particularly those leveraging volatile
variables – exhibited marked performance improvements under intensive concurrent conditions,
attributed to reduced memory barriers and enhanced cache efficiency. However, atomic-based
queues, despite benefiting from structured memory alignment strategies, introduced additional
overhead linked to atomic synchronization operations, thus occasionally diminishing performance
compared to their volatile counterparts. These experimental outcomes underscore the critical role
of memory locality in lock-free queue optimization and highlight the delicate balance required
between cache optimization and synchronization overhead.</p>
      <p>Additionally, the study explored broader aspects relevant to practical deployment, including
expert evaluations identifying queue data structures as optimal for sequential data processing in
high-load systems. Benchmarking and correctness testing methodologies were rigorously defined
and applied, ensuring comprehensive assessment of performance and data integrity under realistic
multi-threaded scenarios.</p>
      <p>In conclusion, memory locality optimizations substantially enhance the scalability and
performance of lock-free queues. Further investigations should focus on developing hybrid
synchronization mechanisms dynamically adaptable to varying levels of contention, thereby
further enhancing scalability and efficiency in real-world, high-load, multi-threaded environments.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The authors would like to thank the Armed Forces of Ukraine for the opportunity to write a valid
work during the full-scale invasion of the Russian Federation on the territory of Ukraine. Also, the
authors wish to extend their gratitude to Kharkiv National University of Radio Electronics for
providing licences for additional software to prepare algorithms and the paper.</p>
    </sec>
    <sec id="sec-9">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used Grammarly Edu and submodule of Microsoft
365 in order to check grammar and spelling. After using these services, the authors reviewed and
edited the content as needed and take full responsibility for the publication’s content.
[8] P. Bilokon, B. Gunduz, C++ Design Patterns for Low-latency Applications Including
Highfrequency Trading, Arxiv, available at: https://arxiv.org/abs/2309.04259 (last accessed
31.03.2025).
[9] Y. Lin, W. Lin, J. Xu, Y. Chen, Z. Jin, J. Qin, J. He, S. Cai, Y. Zhang, Z. Wang, W. Chen, PARS: A
Pattern-Aware Spatial Data Prefetcher Supporting Multiple Region Sizes. International IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems 43/11 (2024) pp.
3638–3649. doi: 10.1109/TCAD.2024.3442981.
[10] Q. C. Liu, J. Shun, I. Zablotchi, Lock-free Fill-in Queue. in: Proceedings of the 29th ACM
SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, Edinburgh,
UK, 2–5 March 2024, pp. 286–300. doi: 10.5281/zenodo.10253798.
[11] A. Blot, J. Petke, A Comprehensive Survey of Benchmarks for Improvement of Software's
Non</p>
      <p>Functional Properties. ACM Computing Surveys 57/7 (2025) pp. 1–36. doi: 10.1145/3711119.
[12] M. Herlihy, N. Shavit, The Art of Multiprocessor Programming, Morgan Kaufmann</p>
      <p>Publishers (2021), San Francisco, USA.
[13] Intel, Intel® Xeon® CPU Max Series Configuration and Tuning Guide, IDZ Technical Library,
available at:
https://www.intel.com/content/www/us/en/content-details/769060/intel-xeoncpu-max-series-configuration-and-tuning-guide.html (last accessed 31.03.2025).
[14] K. Klenk, M. M. Moayeri, J. Guo, M. P. Clark, R. J. Spiteri, Mitigating synchronization
bottlenecks in high-performance actor-model-based software. in: Workshops of the
International Conference for High Performance Computing, Atlanta, USA, 17–22 November
2024, pp. 1274–1287. doi: 10.1109/SCW63240.2024.00168.
[15] P. Moreno, M. Areias, R. Rocha, On the implementation of memory reclamation methods in a
lock-free hash trie design. Journal of Parallel and Distributed Computing 155 (2021) pp. 1–13.
doi: 10.1016/j.jpdc.2021.04.007.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yakovlev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Khovrat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kobziev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Uzlov</surname>
          </string-name>
          ,
          <article-title>Decision support algorithm in the development of information sensitive socially oriented systems</article-title>
          .
          <source>in: Proceedings of the 4th International Workshop of IT-professionals on Artificial Intelligence</source>
          , Cambridge, USA,
          <fpage>25</fpage>
          -27
          <source>September</source>
          <year>2024</year>
          , pp.
          <fpage>315</fpage>
          -
          <lpage>326</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Honorat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dardaillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Miomandre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-F.</given-names>
            <surname>Nezan</surname>
          </string-name>
          ,
          <source>Automated Buffer Sizing of Dataflow Applications in a High-Level Synthesis Workflow. ACM Transactions on Reconfigurable Technology and Systems 17/1</source>
          (
          <issue>2024</issue>
          ) pp.
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          . doi:
          <volume>10</volume>
          .1145/3626103.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pohnl</surname>
          </string-name>
          ,
          <article-title>Shared-Memory-Based Lock-Free Queues: The Key to Fast and Robust Communication on Safety-Critical Edge Devices</article-title>
          .
          <source>in: Proceedings of Cyber-Physical Systems and Internet of Things Week</source>
          , San Antonio, USA,
          <fpage>9</fpage>
          -12 May
          <year>2023</year>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ando</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kadobayashi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Takakura</surname>
          </string-name>
          ,
          <article-title>Clustering Massive Packets using a Lock-Free Algorithm of Tree-Based Reduction on GPGPU</article-title>
          .
          <source>International Journal of Computer Science and Information Security</source>
          <volume>19</volume>
          /3 (
          <issue>2021</issue>
          ) pp.
          <fpage>19</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Ben-David</surname>
          </string-name>
          ,
          <article-title>Lock-free locks revisited</article-title>
          .
          <source>in: Proceedings of Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, 2-6 April</source>
          <year>2022</year>
          , pp.
          <fpage>278</fpage>
          -
          <lpage>293</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Assiri</surname>
          </string-name>
          ,
          <article-title>Lock-free Fill-in Queue</article-title>
          .
          <source>in: Proceedings of International Conference on Computer and Information Sciences, Sakaka</source>
          , Saudi Arabia,
          <fpage>13</fpage>
          -15
          <source>October</source>
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bakhvalov</surname>
          </string-name>
          ,
          <article-title>Performance Analysis and Tuning on Modern CPUs</article-title>
          , Easyperf, available at: https://faculty.cs.niu.edu/~winans/notes/patmc.
          <source>pdf (last accessed 31.03</source>
          .
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>