=Paper=
{{Paper
|id=Vol-3041/494-497-paper-91
|storemode=property
|title=Concept of Peer-To-Peer Caching Database for Transaction History Storage as an Alternative to Blockchain in Digital Economy
|pdfUrl=https://ceur-ws.org/Vol-3041/494-497-paper-91.pdf
|volume=Vol-3041
|authors=Mikhail Belov,Stanislav Grishko,Eugenia Cheremisina,Nadezhda Tokareva
}}
==Concept of Peer-To-Peer Caching Database for Transaction History Storage as an Alternative to Blockchain in Digital Economy==
Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 CONCEPT OF PEER-TO-PEER CACHING DATABASE FOR TRANSACTION HISTORY STORAGE AS AN ALTERNATIVE TO BLOCKCHAIN IN DIGITAL ECONOMY M.A. Belova, S.I. Grishko, E.N. Cheremisina, N.A. Tokareva 1 System Analysis and Control Department, Dubna State University, Universitetskaya 19, 141980, Dubna, Russia E-mail: a belov@uni-dubna.ru This paper discusses the concept of a distributed horizontally scalable and cascadable peer-to-peer caching database, optimized for the digital economy needs and suitable for storing the history of a large number of transactions of every citizen involved in business processes based on digital technologies, starting from receiving public and social services in electronic form and ending with consumption of electronic goods and services produced by e-business and e-commerce. We also offer an approach to organizing student teamwork for the development of the solution at the Dubna State University based on the use of our innovative data center project «Virtual Computer Lab». Keywords: database, NoSQL, peer-to-peer, blockchain, blockchain issues, smart contracts, digital economy, distributed computing systems, data management, virtual computer lab. Mikhail Belov, Stanislav Grishko, Eugenia Cheremisina, Nadezhda Tokareva Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 494 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 1. Introduction The development of the digital economy implies storing the history of a large number of transactions of every citizen involved in business processes based on digital technologies, starting from receiving public and social services in electronic form and ending with consumption of electronic goods and services produced by e-business and e-commerce. 2. Blockchain issues Today, there is the idea of blockchain - a continuous, sequential chain of blocks containing information (transaction history) built according to certain rules, where blocks are stored and processed on many different computers (computing devices). At certain stages of development, however, any blockchain is subject to the so-called "51% Attack," because the idea of total decentralization embedded in blockchain allows blockchains to be trusted by an exceptionally large number of participants who form a "controlling block" of generating capacity, allowing blockchain complexity to rise to a level where it is not realistic for attackers who can undermine blockchain trust by introducing a fake block chain to gain computing power greater than the rest. If attackers with superior computing power manage to create a persistent chain of blocks (usually at least 6) and those blocks are replicated across all of the participants' personal computers, where block removal is not envisioned by the idea of blockchain itself, then the blockchain will be discredited [1]. The second problem with blockchain is that, in general terms, the client needs to store on his or her device all the blocks of blockchain data that will accumulate like an avalanche over time, with no way to delete them (this is the initial technological "highlight" of blockchain), and most of this data is of no interest to any blockchain participant, while only the manufacturers of computer, telecommunications equipment and mobile devices benefit. This poses an additional social problem because, if the digital economy is to involve all segments of the population, it will require providing all Russian citizens with expensive computer hardware and smartphones that can be damaged or lost in use, which will require their immediate replacement because a person without an electronic gadget cannot be a full- fledged participant in the digital economy. The problems and difficulties in providing citizens with electronic devices can lead to a sharp increase in social tensions and/or social inequalities. Given the above arguments, the idea of blockchain adoption in healthcare, postal, transportation and other socially important services seems utopian today. The third problem with blockchain is the lack of a built-in technology for fast search of transactions within a block of data, which goes against the principles of the digital economy, a paradigm of which relies on quick access to relevant data and transaction history as part of the business processes in which the user is involved. In addition, it should be noted that smart contracts are only a paradigm that requires a decentralized data repository, and all actions are represented as mathematical rules. 2. Background If we look carefully at the data structure within a digital economy system, we can see that transactions are grouped with respect to a natural unique identifier of a citizen, which allows for efficient block distribution based on their hash across the node-segments of a scalable peer-to-peer NoSQL DBMS, eliminating the appearance of "hotspots", and the data itself can be easily represented in the form of "key-value" tuples and stored in a columnar structure, providing a quick search due to the gossip protocol that allows redirecting requests to the node whose responsibility range includes the hash of a specific unique identifier. Well, since we are talking about storing transaction history within the business processes of the digital economy, the key-value relationship is essentially a one-to-many 495 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 relationship, in the context of a design focused on personalized information output for each specific user. Ensuring communication between users within groups and communities (many-to-many relationship) is also possible and can be implemented by means of secondary indexes, materialized representations, or partial data redundancy through denormalization, depending on the power of the set that the data form, to ensure acceptable performance of sampling queries. If we talk about the task of quickly obtaining summary or aggregated statistical information, it is not difficult to solve it by loading the necessary data in the YARN cluster of the open technology platform Apache Hadoop, for example, in the processing environment of Random Access Memory SPARK, applying the principle of Resilient Distributed Datasets and basic concepts of building a pipeline of operations mapping, moving, sorting and convolution in the framework of functional programming. However, the relative simplicity of horizontal scaling of disk space, processing power and RAM, does not provide transactional scaling, as simultaneous access of many users to the central database nodes, would make the bandwidth of the data network a bottleneck. Therefore, we need a peer-to-peer caching database that will store all relevant data for a particular user on their device and the closest peer-to-peer servers, based on selected proximity criteria according to a given set of features and attributes. If we rise to an empirical level, from the perspective of participants in the digital economy, it is a question of storing a set of facts. Facts in a database are immutable; once stored, they do not change. However, old facts may be replaced by new facts over time or due to circumstances. The state of the database is the value determined by the set of facts in effect at a given point in time. So, this analysis allows us to move on to a more detailed consideration of the architecture of the proposed peer-to-peer caching database design solution [2-4]. 3. Solution Architecture A peer-to-peer client library (a peer-to-peer access library) is embedded into the client application and allows to get data from the peer-to-peer servers, cache data on the client device (to reduce the load on the peer-to-peer servers), while keeping such an important property as "final immutability", and also to exchange the peer-to-peer server lists between the clients. The peer-to-peer server provides data access by caching the necessary segments of the central database demanded by the connecting clients. Connection to a specific group (farm) of peer-to-peer servers is determined by specified criteria, which can be geolocation data, type of users, type of processes, type of transactions, etc. Peer-to-peer servers can exchange data segments with each other (peer-to-peer communications), and store as many data segments as the storage system quotas and limitations allow. In certain cases, a client application may also act as a peer-to-peer server, but there are threats of loss of data integrity and validity through the emergence of fake peer-to-peer servers on the network, created by hackers to discredit it. Records in the central database (if developers wish, in parallel to peer-to-peer servers) can be made by means of transactors, which accept write transactions and process them serially, ensuring guaranteed integrity until successful synchronization with the central database, due to the replication factor of the distributed network file system (odd number of servers greater than 3 (three) is recommended, to ensure a recording quorum), where open technology solutions based on Apache Hadoop HDFS or Apache Cassandra can be selected as the basis. However, HDFS fault tolerance will require the use of additional components such as Zookeeper, Zookeeper Failover Controller and Quorum Journal Manager. Access to the transactor is recommended as part of a service-oriented architecture, through REST- services that can be scaled by applying standard load-balancing technologies used in web server deployments. This approach allows providing access to the transactor through the usual HTTP protocol, and transactors themselves and the centralized database will be in an isolated network, access 496 Proceedings of the 9th International Conference "Distributed Computing and Grid Technologies in Science and Education" (GRID'2021), Dubna, Russia, July 5-9, 2021 to which should be done via routing with the use of modern encryption technologies, and hacker attacks via HTTP protocol can be prevented by modern IPS systems, combining signature and heuristic approaches of malicious activity detection. According to the principles of organizing access to the transactor, access to the central data repository can be easily organized as well. The proposed approach makes it possible to implement staggered isolation of the central database and cascading of network traffic using peer-to-peer server farms and service-oriented architecture. 4. Environment for teamwork and collaboration To provide our students with the opportunity to develop the peer-to-peer cashing database solution, we replaced the physical computers with virtual machines in the Virtual Computer Lab. The Virtual Computer Lab provides a set of software and hardware-based virtualization and containerization tools which enable flexible and on-demand provision and use of computing resources in the form of cloud Internet services with an integrated knowledge management system using the principles of self- organization, functioning as a homogeneous environment with elements of cognitive representation of internal operational resources based on visual models and partial automation of fundamental technological operations with the expert system for carrying out research projects, resource-intensive computational calculations and tasks related to the development of sophisticated corporate and other distributed information systems. The Virtual Computer Lab self-organization makes the transition from a complex system of granular group security policies with many restrictions to the formation of personal responsibility and respect for colleagues, which should be a solid foundation for strengthening and developing classical cultural values in the educational environment. [5–8]. 5. Conclusion In conclusion, it would be useful to note that the proposed concept of a distributed horizontally scalable and cascadable peer-to-peer caching database could become the basis for a modern, efficient, as well as easy-to-implement and maintain technological platform for the implementation of digital economy services in the Russian Federation. References [1] Imran Bashir. Mastering Blockchain: A deep dive into distributed ledgers, consensus protocols, smart contracts, DApps, cryptocurrencies, Ethereum, and more, 3rd Edition, Packt Publishing (August 31, 2020). [2] Lori Jo Underhill, Defining the Digital Economy: The Structure of the Digital Economy in Focus, Independently published (February 15, 2019). [3] Tim Jordan, The Digital Economy, Polity; 1st edition (January 28, 2020). [4] Maria Wasastjerna, Competition, Data and Privacy in the Digital Economy: Towards a Privacy Dimension in Competition Policy? (International Competition Law), Wolters Kluwer (July 16, 2020). [5] M.A. Belov, Y.A. Kryukov, M.A. Miheev, P.E. Lupanov, N.A. Tokareva, and E.N. Cheremisina, Sovremennye informatsionnye tekhnologii i IT-obrazovanie 14, 4, 823–832 (2018). [6] M.A. Belov, Y.A. Krukov, M.A. Mikheev, N.A. Tokareva, and E.N. Cheremisina, CEUR Workshop Proceedings 2267, 207–212 (2018). [7] E.N. Cheremisina, M.A. Belov, N.A. Tokareva, S.I. Grishko, and A.V. Sorokin, CEUR Workshop Proceedings 2023, 299–302 (2017). [8] M.A. Belov, E.N. Cheremisina, and S.V. Potemkina, Journal of Emerging research and solutions in ICT 1, 2, 39–46 (2016). 497