Benchmarking AyraDB Next-Generation Database on Davinci-1 Super-Computer Carlo Cavazzoni2, Chiara Francalanci3, Paolo Giacomazzi3, Nicolò Magini2, Roberto Morelli2 and Paolo Ravanelli1 1 Cherrydata srl, Via Abano 9, Milano, 20131, Italy 2 Leonardo S.p.A, Torre Finmeccanica, Via Raffaele Pieragostini, Genova, 16149, Italy 3 Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, Milano, 20133, Italy Abstract This paper presents the results of a 1-year project aimed at testing the performance and scalability of AyraDB, a next-generation database, on Davinci-1 super-computer. AyraDB is unique as it is fully peer-to-peer, with no central node to coordinate other storage nodes and no need for caching data in-memory, with consequent linear scalability. All competing solutions, including the most popular databases such as MongoDB and Redis, are not fully peer-to-peer, have central coordination nodes and/or cache data in memory with a bottleneck to scalability. These unique features make AyraDB particularly suitable to store and retrieve satellite data, which are notoriously challenging due to their size and layered structure, and position it as an enabler of new use cases in the space economy. In this paper, we present the results of a large- scale test aimed at verifying and measuring AyraDB’s performance on Davinci-1’s HPC infrastructure. An HPC infrastructure is in fact designed for data- and processing-intensive applications and can support scalability without infrastructural bottlenecks limiting software performance. Tests have been performed with a total of 500 runs on a number of 8-core servers growing from 1 to 20 and a data size growing from 10 Gbyte to 500 Gbyte. Results show how AyraDB reaches 1 million requests/s with 13 servers, with linear scalability and consistent read/write performance, beating the state of the art by 5X factor. Keywords 1 Big data, big-data infrastructure, database, horizontal scalability, davinci-1, ayradb. 1. Introduction Estimates indicate that the growth trend for big data projects is still exponential [3]. There is a clear market need for new and more efficient data management technologies to enable a variety of big data applications. For example, satellite data are notoriously big data and raise several challenges not only from the perspective of analytic applications (such as statistics or machine learning), but also to be stored and retrieved efficiently (see, for example, [6]). These challenges have led to the design a new database, called AyraDB, that can be an enabler of big-data applications, from both a technical and economic point of view [7]. The key requirements in the design of AyraDB have been: 1) horizontal scalability, to accommodate growing capacity requirements by adding nodes to the database infrastructure, 2) on-disk data storage with no caching, to work efficiently with all types of data access loads, 3) thrifty hardware consumption through greater performance, to reduce costs and, thus, increase ITADATA2022: The 1st Italian Conference on Big Data and Data Science, September 20–21, 2022, Milan, Italy EMAIL: carlo.cavazzoni@leonardo.com (A. 1), chiara.francalanci@polimi.it (A. 2), paolo.giacomazzi@polimi.it (A. 3), nicolo.magini@leonardo.com (A. 4), roberto.morelli.ext@leonardo.com (A. 5), paolo.ravanelli@cherry-data.com (A. 6) ORCID: 0000-0002-0373-8065 (A. 2); 0000-0001-9584-3200 (A. 3) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Wor Pr ks hop oceedi ngs ht I tp: // ceur - SSN1613- ws .or 0073 g CEUR Workshop Proceedings (CEUR-WS.org) economic scalability. We believe that by satisfying these fundamental requirements, AyraDB can be effectively applied to the management of large datasets, providing performance and economic benefits with respect to competing databases. The goal of the research presented in this paper is to perform a large-scale test of AyraDB to verify and measure AyraDB’s performance edge. A large-scale test requires an HPC infrastructure, which is in fact designed for data- and processing-intensive applications and can support scalability without infrastructural bottlenecks limiting software performance. We have executed a large-scale test of AyraDB’s performance and scalability on Davinci-1 super-computer, located in Genova and belonging to Leonardo Company. Davinci-1 is the perfect infrastructure to run AyraDB’s tests. Davinci-1 is managed in house with total control of the infrastructure, enabling the design of a test with no bottlenecks and providing dependable performance benchmarks. These tests are key to demonstrate the technical and business readiness of AyraDB, since an efficient management of large datasets represents a critical success factor. Finally, these tests have helped us estimate the energy savings enabled by AyraDB. We have estimated that AyraDB can significantly reduce hardware and, thus, energy consumption. With the tests presented in this paper, we have gained a deeper understanding of the green benefits of AyraDB and verified our estimates. In terms of long-term sustainability, we believe that the test can act as a proof of concept of the technical features of AyraDB and of its suitability for challenging applications. A number of application opportunities could open up, particularly in the space industry, enabled by the mix of performance, scalability and favourable economics of AyraDB. The presentation is organized as follows. The next section reviews the state of the art. Section 3 presents the main technical features of AyraDB. Section 4 describes the characteristics of the Davinci-1 super- computer. Section 5 presents the results of our tests. Conclusions are finally drawn in Section 6. 2. State of the art An extensive survey of benchmarking efforts of big data systems can be found in [8]. Most scientific benchmarking studies focus on a) comparative analyses of selected market solutions with a limited number of nodes (typically 3) and rarely test the maximum performance of databases or their scalability limits, understandably due to the cost of such large-scale tests, b) the design of fair and objective benchmarking tools executing the right set of tests to obtain dependable results. This indicates a general focus of scientific studies on methodological issues and a lack of scientific (particularly open access) studies pushing performance to a limit. Business owners of database solutions and market service providers tend to provide more extensive benchmarks, sometimes executed by (paid) third parties. The clear conflict of interest makes these benchmarks potentially optimistic towards the marketed solution, against competitors. However, for the same reason, we would not expect the marketed solution to perform better than the corresponding benchmarks provided by the owner/service provider. In other words, we could consider these benchmarks as best case scenarios. The focus of this paper’s benchmarking effort is a key-value read/write load. With this focus, looking at the top performing solutions we have found that the highest performance obtained is approximately 300k operations/s. For example, Couchbase has published benchmarking results showing how Couchbase can reach 300k operations/s with a cluster of 10 servers, with a negligible increase when the number of servers is doubled [9]. According to this benchmarking effort, MongoDB can reach only 200k operations/s with 20 servers. Scylladb’s Website provides a benchmark showing a maximum per- core throughput of Scylladb of 6.2k operations/core and an overall maximum throughput of (again) 300k operations/s. This throughput halves when write operations are performed and is reduced by a factor of 3 with a load that limits the benefits of in-memory caching. In [12], authors provide benchmarks for Riak database, showing how Riak DB has a performance that settles around 300k operations/s with an enterprise-significant number of keys in the database. A thorough market analysis has been conducted as part of the Hippj project (SME Instrument phase 1, [2]). The resulting market position of AyraDB is shown in Figure 1. AyraDB represents the only database that can offer both technical and economic scalability. While in-memory databases have a technical scalability that is comparable with that of AyraDB, their economics are far less favorable. On- disk databases, are less expensive, but they lack in latency, throughput or both. This indicated that AyraDB involves no trade-off between cost and performance and no limitations on the application scenarios. Figure 1: Market position of AyraDB A distinctive feature of AyraDB is that data are stored on disk and do not need to be cached in memory to obtain sub-millisecond response time. A sub-millisecond response time can be reached with commodity SSD hard disks. These features position the economics of AyraDB in the high-performance range. At the same time, costs are reduced by a factor of 20, as memory is far more expensive than disks. A competing solution with a technical and economic scalability comparable to that of AyraDB could not be found in [2]. A best-in-class cost-to-performance ratio is only one dimension of AyraDB’s competitiveness. AyraDB has linear scalability. This means that if hardware capacity doubles, AyraDB’s performance doubles accordingly. Competing solutions do not scale linearly, as to double their performance, they require more than double capacity (see, for example, [9]). Above a certain amount of capacity, their performance stops increasing. This means that even if the cloud provider can provide additional capacity, the database cannot exploit it to scale up. In contrast, AyraDB can use additional capacity efficiently, with no upper bounds to scalability. 3. Technical features of AyraDB The core of AyraDB is a key-value database. Data are stored in tables, on disk, where a table is a collection of records, indexed by a key. An AyraDB cluster can have one or multiple servers, and in a multi-server cluster all servers have the same role, and there is not a centralized server that manages the cluster, that is, the AyraDB cluster is fully peer-to-peer. The records of a table are distributed among all the servers of a cluster, based on the hash of the key that indexes the record. Therefore, all servers statistically store the same number of records. AyraDB implements three types of tables: • fixed_length: this table has a column scheme, that is, the user can configure the number of fields (columns) of the record, and the labels of the columns. All records in the same fixed_length table have the same column structure. Moreover, in the fixed_length table the maximum length in bytes of each field is explicitly defined. Fixed_length tables are particularly efficient in use cases such as time series, satellite imagery. • padded: the difference between padded and fixed_length tables is that in padded tables there is no maximum limit on the length of a field. • nosql: nosql tables have no column scheme. Each record can have an arbitrary number of fields, with arbitrary labels. Each table has a configurable replication factor R. With R=1, a record is stored on one server only. With R=2, a record is stored on two servers, and so on. In a cluster with M servers, the maximum replication factor is R=M, meaning that each record is stored on all servers. Each table may have a different replication factor. Replication is used to make the database more reliable, because if R=2, the failure of a single server does not prevent us from reaching all records. In general, with replication factor equal to R, the system is fully functional even with the failure of R-1 servers. AyraDB exposes a set of HTTP-REST APIs, that are normally used wrapped in higher-level APIs in various programming languages such as C, Java, Python, and NodeJS. The basic record-based APIs are: • READ: to read a record (or a set of fields), given the target table and key. • INSERT/UPDATE: to write/update a record (or a set of fields), given the target table and key. • DELETE: to delete a record given the target table and keys. In nosql tables, the DELETE API can be used to remove a field from a record. The table-based APIs are: • CREATE_TABLE: to create a new table • DELETE_TABLE: to completely remove a table • TRUNCATE_TABLE: to delete all the records of a table • RESTRUCTURE_TABLE: to modify some features of a table, such as the replication factor, pr the column scheme (for fixed_length and padded tables). The server-based APIs are: • ADD_SERVER: to add a server to a cluster • REMOVE_SERVER: to remove a server from a cluster Replication is managed at record level. With a replication factor equal to R, each record has R copies on R different servers. If at least one of the servers storing the same record is active, a read query for that record is successful. With a replication factor of R, the AyraDB cluster can operate transparently and consistently with up to R-1 failed servers. Consistency is guaranteed through the synchronization operation that is performed automatically when the servers become available again. AyraDB provides automatic synchronization of data when one or multiple servers fail or are shut down for maintenance reasons. Write/delete operations that would be performed on unavailable servers are logged persistently on disk and, when the server becomes available again, logged operations are performed in the order in which they where logged. 4. Davinci-1 super-computer The benchmarks performed in this work were run on a private cloud infrastructure provisioned through the OpenStack® cloud technology [4]. This framework allows easy deployment of the servers needed for the benchmarks. The computing nodes used to instantiate the servers are part of the high- performance computing cluster Davinci-1 located in Genoa and belonging to the Leonardo company. The HPC is equipped with 56 CPU nodes and 80 GPU nodes. Both InfiniBand and 10 Gbps Ethernet connections are available for node communication, but for the benchmark only Ethernet was used. A detailed schema of the infrastructure is reported in Figure 3. The nodes used for the cloud tenant belong to the CPU family and are equipped with 2 24-core CPUS (Intel® Xeon® Platinum 8268 CPU @ 2.90GHz) and 768 GiB of RAM. Through the OpenStack® layer, virtualization of the servers that are used for scaling of the benchmarks described in this work takes place. Figure 3: Davinci-1 HPC infrastructure schema. For benchmark execution, three different kinds of virtual machines are deployed: metaclient, clients and servers. While the metaclient has no specific hardware requirements, clients and servers need some computational resources. Each of these virtual machines was deployed with 8 VCPUs, 64 GiB of RAM and 200 GiB of disk storage. The block devices used for disk storage on these VMs are provisioned on a Ceph [5] storage cluster equipped with a mixture of solid-state drives and rotational hard disks, configured to replicate the data 3 times across its 4 servers, and interconnected to the compute nodes through the 10 Gbps Ethernet network. 5. Empirical testing 5.1. The AyraDB Benchmarking Backend The AyraDB Benchmarking Backend (ABB) is a system designed to enable a simple and straightforward benchmarking procedure for AyraDB clusters of arbitrary configuration and size. ABB is designed for simplicity and easy usage, as in any case a single command is needed to benchmark any configuration of AyraDB. Figure 4 shows the architecture of the benchmarking system. It is a hierarchical three-level system, where the top level (Benchmarking backend) is the orchestrator of the benchmarking activity. The machine where the benchmarking backend runs is conventionally named metaclient. The benchmarking backend, receives as input the specifications of the requested benchmark, and implements it by sending commands to the clients, in the client subsystem. The clients, in turn, translate the commands from the metaclient into a workload for the servers of the AyraDB cluster. The system is design to work with different hardware configurations, including a multi-site distributed infrastructure. In this benchmarking effort, we have focused on a single-site infrastructure, where the network latency is minimum and negligible with respect to the database latency. This facilitates the interpretation of results, by isolating the performance of the database from other infrastructural constraints. Figure 4: The architecture of the AyraDB Benchmarking Backend 5.1.1. Configuration of a benchmark run A benchmark run is configured with the following parameters: • M: the number of servers (in the AyraDB cluster). • B (GByte): the size of the table, per server. With one server, the size of the table is B GByte, with 2 servers, the size of the table is 2B Gbyte and in general with M servers the size of the table is MB Gbyte. • L (byte): the size of the table’s records. The total number of records of the table is MBx109/L. • C: the number of clients (in the client subsystem). The number of clients is determined with the simple rule of 1 client every three servers. Therefore, with 1, 2, and 3 servers, we use 1 client. With 4, 5, and 6 servers, we use 2 clients, and so on. • The type of operation: a. read: with the read operation, a record (or a selection of fields of the record) is read from the table and sent to the requesting client. Records are selected randomly and uniformly. b. update: with the update operation, a record (or a selection of fields of the record) is modified in the table, according to the values provided by the requesting client. Records are selected randomly and uniformly. • The addressed fields of the record (both for read and update) • N: the number of operations (read/update) per connection per round (see section 5.1.2). • P: the pipeline size of read/update operations. • R: replication factor. • T: table type (fixed_length, padded, nosql) 5.1.2. Execution of a benchmark run A benchmark run has two phases: • Phase 1: initial table loading: in this phase the clients load the table on the servers, by creating records and loading each record on the AyraDB cluster. • Phase 2: measurement of the maximum throughput of read/update. This phase is executed as a series of rounds, where in the ith round a total of iM connections from the clients to the AyraDB servers is set up. On each connection, N read or update operations are executed and the throughput of each connection is registered. When all the connections have executed N operations, the total throughput is calculated. As the number of connections per server increases, throughput grows, until the maximum throughput of the AyraDB cluster is reached. When the measured throughput becomes stationary, the benchmark run is stopped and the maximum throughput is stored. At round i, each server will have i connections from the clients. The time required to each connection to complete the requested N operations can vary slightly, therefore, if the throughput of each connection is just added up to calculate the total throughput, the result may have an optimistic bias, due to the fact that there may be periods where not all the iM connections are active (some connections will have finished their job earlier than other connections) and, as a consequence, the throughput of some connections would be larger. In fact, with a smaller number of active connections, the throughput of the individual connection is larger. In order to avoid this possible bias, each connection produces a time series of throughput measurements, formatted as a list of (ti,r i) pairs, where t is the absolute time and r is throughput. In particular, r i is the throughput of the connection in the time interval [ti, ti+1]. At the end of one round, the metaclient will receive from the clients iM time series of throughput, (ti,j,r i,j), where j is the connection index, ranging from 0 to iM-1. Since the clients are synchronized with the Network Time Protocol (NTP) the time values in the time series are consistent. The metaclient determines the time interval [ts, te]. where all the iM connections are active. The value of ts is easily calculated as ts = maxjÎ[0,iM](t0,j) For the calculation of te, the last time value of each series, referred to as tl,j is considered: te = minjÎ[0,iM](tl,j) The metaclient does not consider the throughput statistics outside the time interval [ts, te], therefore, it is guaranteed that the throughput statistics associated to a group of iM connections are calculated on data collected when exactly iM connection are actually active. 5.2. Benchmarking results For the benchmarking activity carried out so far we have used, for both clients and servers, the following machines: • 8 cpus • 32 Gbyte RAM • 256 Gbyte SSD The parameters of the benchmarks are: • M: the number of servers (in the AyraDB cluster), ranging from 1 to 13. • B (GByte): the size of the table, per server. We have used 10, 20, 30, and 40 GByte/server. • L (byte): the size of the table’s records. We have used 1500 byte. • C: the number of clients (in the client subsystem), a function of the number of servers, as specified in section 5.1.1 • The type of operation: read/update. • The addressed fields of the record: one field of 100 byte • N: the number of operations (read/update) per connection per round: 1 million. • P: pipeline size, equal to 16. • R: replication factor, equal to 1. • T: table type, fixed_length. Figure 5 shows the throughput of the read operation as a function of the number of connections, for an AyraDB cluster of 1 server, with table size equal to 20 GByte. The throughput grows steadily as the number of connections increases, until it reaches the maximum system’s throughput, where it saturates. Benchmark, read, 1 server 100000 90000 80000 Throughput (operation/s) 70000 60000 50000 40000 read_20GB/server 30000 20000 10000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 number of connections Figure 5: Measured throughput of read operation, 1 server, table size 20 GByte/server The same behavior, on a different scale, is shown in Figure 6, reporting the measured throughput of the read operation as a function of the number of connections, for an AyraDB cluster of 13 servers. Qualitatively, the behaviors shown in the two Figures are quote similar, however, with 13 servers throughput exceeds 1 million read/s. Benchmark, read, 13 servers 1200000 1000000 Throughput (operation/s) 800000 600000 read_20GB/server 400000 200000 0 13 39 65 91 117 143 169 195 221 number of connections Figure 6: Measured throughput of read operation, 13 server, table size 20 GByte/server. Figure 7 shows the throughput of the update operation as a function of the number of connections, for an AyraDB cluster of 1 server, with table size equal to 20 GByte. Similar to read operations, the throughput with update operations grows steadily as the number of connections increases, until it reaches the maximum system’s throughput, where it saturates. Similar results are obtained with 13 servers, as shown in Figure 8. Benchmark, update, 1 server 100000 90000 80000 Throughput (operation/s) 70000 60000 50000 40000 update_20GB/server 30000 20000 10000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 number of connections Figure 7: Measured throughput of read operation, 13 server, table size 20 GByte/server. The scalability of throughput as a function of the cluster size is shown in Figure 9, reporting the maximum throughput as a function of the number of servers. The AyraDB cluster size ranges from 1 to 13 servers, and the table size ranges from 10 GByte/server to 40 GByte/server. Therefore, with 13 servers and 40 GByte/server the absolute table size is 520 GByte. The figure includes both read and update benchmarks, for all the consideres cluster sizes and table sizes. Benchmark, update, 13 servers 1200000 1000000 Throughput (operation/s) 800000 600000 update_20GB/server 400000 200000 0 13 39 65 91 117 143 169 195 221 number of connections Figure 8: Measured throughput of read operation, 13 server, table size 20 GByte/server. Benchmark, read/update 1200000 1000000 read_10GB/server Throughput (operation/s) 800000 update_10GB/server read_20GB/server 600000 update_20GB/server 400000 read_30GB/server update_30GB/server 200000 read_40GB/server update_40GB/server 0 1 2 3 4 5 6 7 8 9 10 11 12 13 number of servers Figure 9: Maximum throughput of read/update operation, 1 to 13 servers, server, table size equal to 10 GByte/server, 20 GByte/server, 30 GByte/server, 40 GByte/server. Figure 9 shows that AyraDB has a good linear scalability of throughput, as the cluster size grows. Moreover, the throughput of the read and update operations are quite similar. Finally, table size does not seem to impact significantly of the throughput performance. 6. Discussion and conclusions Results show how AyraDB reaches 1 million operations/s with 13 nodes. As noted before, both InfiniBand and 10 Gbps Ethernet connections are available for node communication in Davinci-1, but for this paper’s benchmark only Ethernet was used. Future work will benchmark AyraDB with InfiniBand, hopefully breaking the 1 million operations/s limit. However, comparing the current result with the state of art suggests that AyraDB has linear scalability and per-core throughput that, to the best of our knowledge, is 5 times greater that the best available benchmark. Obtaining scalability at a fraction of the cost unlocks business benefits in a number of use cases. It reduces the barriers to entry for SMEs in data-intensive businesses, it enables new business models based on the exploitation of large datasets, it helps ensure the success of independent players in the data market and it is environmentally sustainable. In our professional activity, we have found that most companies experience technical scalability issues with their database and struggle to accommodate big data requirements. We have explored these issues and thoroughly understood business needs. We have responded to these needs by designing AyraDB, a next-generation database that has demonstrated to be far less expensive and more scalable than market average. There is a strong market need for efficient big data technologies. We have repeatedly observed hardware scalability issues with most of our clients (including over 20 large corporations). Scalability issues seem to be so pervasive that we have started looking for a cause of scalability issues that could be general and common to all our clients. We believe that the slow down of Moore’s law is a root cause of current scalability issues. In the past, growing storage and processing capacity demands were satisfied by continuous hardware innovation that was guaranteed by Moore’s law. In fact, Moore’s law states that the ratio of processing capacity to cost doubles every 18 months. Moore’s law has been published in 1965 and has proved accurate for several decades. Lately, the semiconductor business has slowed down its innovation pace. Admittedly, Intel stated in 2015 that the pace of advancement has slowed. They have also stated that hyperscaling will continue to guarantee scalability. Hyperscaling is the ability to seamlessly provision and add computing, memory, networking, and storage resources to a given computer or set of computers to meet growing capacity requirements. Hyperscaling is at the basis of cloud computing, that is in fact presented as scalable since computing resources can be added as needed with virtually no limits. However, there is a fundamental difference between Moore’s law and hyperscaling. While the first guarantees that unit capacity costs decrease over time, the second can provide capacity by adding new resources with the same unit costs. In other words, hyperscaling can guarantee technical scalability, but does not guarantee economic scalability and environmental sustainability. With hyperscaling and, thus, cloud computing, if data grow in size, costs grow accordingly. Clearly, this is a threat for many industries, particularly for the space economy, as it creates a growing issue of economic scalability. With this study, we have taken a first step towards building a scalable infrastructure that can take full advantage of hyperscaling, with linear scalability and no bottlenecks due to in-memory caching. Empirical results show how AyraDB’s thrifty hardware consumption through greater performance translates into lower costs and, thus, increases economic scalability. Tests with a replication factor R>1 are ongoing work. Further work is needed to complete the tests by: - Using Infiniband and an increasing number of servers, - Testing AyraDB with a growing dataset size, - Perform a comparison with the different types of tables supported by AyraDB (fixed length or SQL, padded and noSQL), - Perform an extensive AyraDB benchmarking on bare metal, that is, with non-virtualized architectures, in comparison with the virtualized architectures used in the benchmarks performed so far, - Build a "data processing pipeline" for key applicability areas with stringent performance requirements, among the applications relating to the analysis of geographic and satellite data. 7. Acknowledgements This work was funded by the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 951732. This work expresses the opinions of the authors and not necessarily those of the European Commission. The European Commission is not liable for any use that may be made of the information contained in this work. 8. References [1] https://scylladb.medium.com/cockroachdb-vs-scylla-benchmark-b0215e81144. [2] Hippj project, https://cordis.europa.eu/project/id/867276 [3] IDC “Worldwide semiannual big data and analytics spending guide”, 2021, https://www.idc.com/getdoc.jsp?containerId=US47485920 [4] https://www.openstack.org/ [5] https://ceph.io/ [6] Boudriki Semlali, BE., El Amrani, C. (2021). Satellite Big Data Ingestion for Environmentally Sustainable Development. In: Ben Ahmed, M., Mellouli, S., Braganca, L., Anouar Abdelhakim, B., Bernadetta, K.A. (eds) Emerging Trends in ICT for Sustainable Development. Advances in Science, Technology & Innovation. Springer, Cham. https://doi.org/10.1007/978-3-030-53440- 0_29 [7] www.ayradb.com [8] Fuad Bajaber, Sherif Sakr, Omar Batarfi, Abdulrahman Altalhi, Ahmed Barnawi, Benchmarking big data systems: A survey, Computer Communications, Volume 149, 2020, pp. 241-251, ISSN 0140-3664 [9] Couchbase performance benchmarks, https://www.couchbase.com/benchmarks, 2022 [10] Scylladb performance benchmarks, https://www.scylladb.com/product/benchmarks/aws-i2- 8xlarge-benchmark/#:~:text=ScyllaDB%20on%20AWS%20i2.,- 8xlarge%20Benchmark&text=Benchmarking%20database%20systems%20helps%20users,of%2 0wide%20column%20NoSQL%20databases. [11] Ahmet Ercan Topcu, Aimen Mukhtar Rmis, Analysis and evaluation of the riak cluster environment in distributed databases, Computer Standards & Interfaces, Volume 72, 2020, ISSN 0920-5489, https://doi.org/10.1016/j.csi.2020.103452.