Scalable zero-knowledge proofs for verifying cryptographic hashing in blockchain applications⋆ Oleksandr Kuznetsov1,2,3,∗,†, Anton Yezhov4,†, Vladyslav Yusiuk4,† and Kateryna Kuznetsova2,4,† 1 eCampus University, 10 Via Isimbardi, 22060 Novedrate, Italy 2 University of Macerata, 30/32 Via Crescimbeni, 62100 Macerata, Italy 3 V. N. Karazin Kharkiv National University, 4 Svobody sq., 61022 Kharkiv, Ukraine 4 Zpoken OÜ, Kesklinna linnaosa, Sakala tn 7-2, Harju maakond, 10141 Tallinn, Estonia Abstract Zero-knowledge proofs (ZKPs) have emerged as a promising solution to address the scalability challenges in modern blockchain systems. This study proposes a methodology for generating and verifying ZKPs to ensure the computational integrity of cryptographic hashing, specifically focusing on the SHA-256 algorithm. By leveraging the Plonky2 framework, which implements the PLONK protocol with the FRI commitment scheme, we demonstrate the efficiency and scalability of our approach for both random data and real data blocks from the NEAR blockchain. The experimental results show consistent performance across different data sizes and types, with the time required for proof generation and verification remaining within acceptable limits. The generated circuits and proofs maintain manageable sizes, even for real-world data blocks with a large number of transactions. The proposed methodology contributes to the development of secure and trustworthy blockchain systems, where the integrity of computations can be verified without revealing the underlying data. Further research is needed to assess the applicability of the approach to other cryptographic primitives and to evaluate its performance in more complex real-world scenarios. Keywords zero-knowledge proofs, blockchain, scalability, cryptographic hashing 1 1. Introduction tool for addressing the limitations of current blockchain implementations [6, 11]. The advent of blockchain technology has revolutionized The primary objective of this research is to investigate the various industries, offering a decentralized and secure application of zero-knowledge proofs for ensuring the approach to data management and transactions [1]. computational integrity of cryptographic hashing in However, as blockchain networks grow in size and blockchain systems. We aim to develop and evaluate a complexity, scalability has emerged as a critical challenge. methodology for generating and verifying ZKPs using the The increasing number of transactions and users on Plonky2 framework, a state-of-the-art ZKP toolkit. The main blockchain networks has led to slower transaction tasks of this study include: processing times and higher fees, hindering the widespread adoption of this technology [2]. Addressing the scalability 1. Generating ZKPs for cryptographic hashing of issue is crucial for the success and practical applicability of random data using Plonky2. modern blockchain projects [3–5]. 2. Testing the generated ZKPs to assess their Zero-knowledge Proofs (ZKPs) have gained significant correctness and efficiency. attention as a potential solution to the scalability problem 3. Applying the developed methodology to real data in blockchain networks [6, 7]. ZKPs allow one party (the blocks from the NEAR blockchain [12]. prover) to prove to another party (the verifier) that a given 4. Analyzing the performance and scalability of the statement is true without revealing any additional proposed approach for both random and real- information beyond the validity of the statement itself world data. [8, 9]. By enabling the verification of transactions without disclosing sensitive data, ZKPs can significantly reduce the By addressing these tasks, we seek to contribute to the computational burden on blockchain nodes and improve the development of efficient and scalable solutions for ensuring overall efficiency of the network [7, 10]. The integration of the integrity of computations in blockchain systems, ZKPs into blockchain systems has the potential to enhance ultimately supporting the broader adoption of this privacy, security, and scalability, making them a promising transformative technology. CQPC-2024: Classic, Quantum, and Post-Quantum Cryptography, August 0000-0003-2331-6326 (O. Kuznetsov); 0009-0004-6380-5233 6, 2024, Kyiv, Ukraine (A. Yezhov); 0009-0009-9662-9615 (V. Yusiuk); 0000-0002-5605-9293 ∗ Corresponding author. (K. Kuznetsova) † These authors contributed equally. © 2024 Copyright for this paper by its authors. Use permitted under oleksandr.kuznetsov@uniecampus.it (O. Kuznetsov); Creative Commons License Attribution 4.0 International (CC BY 4.0). anton.yezhov@zpoken.io (A. Yezhov); vladyslav.y@zpoken.io (V. Yusiuk); kateryna.k@zpoken.io (K. Kuznetsova) CEUR Workshop ceur-ws.org ISSN 1613-0073 25 Proceedings 2. Background systems. The Plonky2 framework, developed by Polygon Zero, is utilized to generate and verify ZKPs for SHA-256 ZKPs are cryptographic protocols that allow a prover to computations. Plonky2 is a modern ZKP toolkit that convince a verifier of the validity of a statement without implements the PLONK protocol [20] in conjunction with revealing any information beyond the truth of the statement the FRI (Fast Reed-Solomon Interactive Oracle Proofs) [21] itself [13]. The concept of ZKPs was first introduced by commitment scheme. Goldwasser, Micali, and Rackoff in their seminal paper “The The PLONK protocol is a universally updatable Knowledge Complexity of Interactive Proof Systems” [8]. A structured reference string scheme that enables efficient ZKP must satisfy three key properties: proof generation and verification for arbitrary arithmetic circuits [20]. The FRI commitment scheme provides a fast 1. Completeness: If the statement is true, an honest and scalable method for committing to polynomials and prover can convince an honest verifier of its verifying their evaluations [22]. validity. By leveraging the capabilities of the Plonky2 2. Soundness: If the statement is false, no cheating framework, we aim to develop an efficient and scalable prover can convince an honest verifier that it is methodology for generating and verifying ZKPs for SHA- true, except with a small probability. 256 computations, ultimately contributing to the 3. Zero-knowledge: The verifier learns nothing advancement of privacy-preserving and computationally beyond the truth of the statement. efficient blockchain systems. Mathematically, a ZKP for a statement x ∈ L can be represented as an interactive protocol between a prover P 3. Research methodology and a verifier V, where L is an NP language. The prover P In this section, we present a detailed description of the aims to convince the verifier V that x ∈ L without revealing methodology employed for generating and testing ZKPs any additional information. The protocol is described as to ensure the computational integrity of cryptographic follows [14]: hashing. Our approach consists of three main stages:  1, if x  L; generating ZKPs for random data, testing the obtained ( P,V )( x )   proofs, and applying the developed methodology to real 0, otherwise, data blocks from the NEAR blockchain [12]. where (P,V)(x) denotes the output of the interaction between P and V on input x. 3.1. Generating ZKPs for hashing random The construction of ZKPs relies on several data cryptographic primitives and techniques, such as To generate ZKPs, we utilized the Plonky2 framework commitment schemes, challenge-response protocols, and developed by Polygon Zero. Plonky2 implements the hash functions [15]. A common approach to designing ZKPs PLONK protocol in conjunction with FRI as a commitment is the Sigma protocol, which consists of three moves: commitment, challenge, and response [16, 17]. scheme, providing a robust and efficient verification mechanism. The generation process involved the 1. Commitment: The prover sends a commitment to following steps: the verifier, which binds the prover to a specific value without revealing it. 1. Generating random data of various lengths (10, 2. Challenge: The verifier sends a random challenge 100, 1000, 10000 bytes). to the prover. 2. Computing the SHA-256 hash function for the 3. Response: The prover computes a response based generated data. on the commitment, challenge, and private 3. Creating a ZKP to validate the correctness of the information related to the statement being proved. hash computation using Plonky2. 4. Storing the generated proof and the corresponding The verifier then checks the validity of the response and circuit for subsequent verification. accepts or rejects the proof accordingly. The Fiat-Shamir heuristic [18] can be used to convert a Sigma protocol into The experiments for generating ZKPs were conducted a non-interactive ZKP by replacing the verifier’s challenge on a server with an AMD Ryzen 9 7950X 16-Core Processor with a hash of the prover’s commitment and the statement running at 4.7 MHz. For each length of random data (10, 100, being proved. 1000, 10000 bytes), we measured the following parameters: Cryptographic hash functions play a vital role in  The complexity of native verification (computing blockchain systems, ensuring the integrity and the hash function and comparing the result with immutability of data. However, the computation of hash the hash code) in cycles per byte and seconds. functions can be time-consuming, especially for large datasets. ZKPs can be employed to prove the correctness of  The complexity of circuit generation in cycles per hash computations without revealing the input data, byte and seconds. thereby reducing the computational burden on blockchain  The complexity of proof generation in cycles per nodes [19]. byte and seconds. In this study, we focus on the application of ZKPs to the  The complexity of proof verification in cycles per SHA-256 hash function, which is widely used in blockchain byte and seconds. 26  The size of the generated circuit in gates. random data and real data blocks from the NEAR  The size of the generated proof in bytes. blockchain. The experiments were conducted using the methodology described in the previous section, and the 3.2. Testing the generated ZKPs results provide valuable insights into the efficiency and scalability of the proposed approach. To verify the correctness of the generated ZKPs, we employed the following methodology: 4.1. Results for random data 1. Loading the generated proof and the Table 1 summarizes the results of generating and testing corresponding circuit for each set of random data. ZKPs for random data of various lengths using the Plonky2 2. Verifying the proof using Plonky2 while framework. The table includes the complexity of native measuring the verification complexity in cycles verification, circuit generation, proof generation, and proof per byte and seconds. verification, as well as the sizes of the generated circuits and 3. Comparing the hash code obtained from the proofs. verification with the original hash code computed The results in Table 1 demonstrate the following key for the random data. observations: 3.3. Applying ZKPs to real data blocks from 1. The complexity of native verification, circuit the NEAR blockchain generation, proof generation, and proof verification increases with the length of the To assess the applicability of the developed methodology to random data. However, the increase in complexity real-world data, we utilized blocks from the NEAR is not linear, indicating the scalability of the blockchain of various heights and with different numbers of proposed approach. transactions: 2. The time required for native verification remains negligible (in the order of microseconds) even for  Block #121,114,606 at height 121,114,606, larger data lengths, highlighting the efficiency of containing 52 transactions (5677 bytes) [23]. the native verification process.  Block #121,136,789 at height 121,136,789, 3. The time required for circuit generation and proof containing 78 transactions (5092 bytes) [24]. generation increases with the data length but  Block #121,117,653 at height 121,117,653, remains within acceptable limits (less than 13 containing 102 transactions (4897 bytes) [25]. seconds for 10000 bytes of data).  Block #121,089,333 at height 121,089,333, 4. The time required for proof verification is containing 169 transactions (6262 bytes) [26]. significantly lower than that of proof generation, emphasizing the efficiency of the verification The selected blocks reflect the diversity of real data in process, which is crucial for the practical the NEAR blockchain and allow us to evaluate the application of ZKPs. performance of ZKPs generation and verification in various 5. The sizes of the generated circuits and proofs scenarios. increase with the data length but remain The process of generating and testing ZKPs for the manageable (less than 250 KB for 10,000 bytes of selected NEAR blocks involved the following steps: data), ensuring the feasibility of storing and 1. Obtaining the binary block data from the NEAR transmitting the generated proofs. blockchain using the provided block hashes. These results confirm the efficiency and scalability of 2. Generating ZKPs for each block using Plonky2 the proposed approach for generating and verifying ZKPs while measuring the complexity of the circuit and using the Plonky2 framework for random data of various proof generation. lengths. 3. Verifying the generated proofs while measuring To illustrate the relationship between the complexity of the verification complexity. proof verification and the length of random data, we present 4. Comparing the obtained results with the results Fig. 1, which shows the proof verification time as a function for random data to assess the applicability and of the input data size. scalability of the proposed approach. By following this structured methodology, we aim to thoroughly evaluate the efficiency and practicality of generating and verifying ZKPs using Plonky2 for both random data and real data blocks from the NEAR blockchain [27]. The results of these experiments will be presented and discussed in the following section. 4. Results and analysis In this section, we present and analyze the results Figure 1: Proof verification time as a function of random obtained from generating and testing ZKPs for both data length 27 As evident from Fig. 1, the proof verification time remains Plonky2 framework and the underlying cryptographic consistently low, even for larger data sizes, with a primitives, such as the PLONK protocol and the FRI verification time of approximately 0.0044 seconds for commitment scheme. The efficient arithmetic circuit random data of length 10,000 bytes. This observation representation and the optimized proof construction in highlights the efficiency of the verification process and its Plonky2 contribute to the fast verification times, even for potential for scalability in real-world applications. larger datasets. The linear relationship between the verification time and the data length can be attributed to the design of the Table 1 Results of generating and testing ZKPs for random data of various lengths Random Data Length (bytes) 10 100 1,000 10,000 Native Verification Complexity 196 250 1,752 17,022 (cycles/byte) Native Verification Complexity 0.00000004 0.00000005 0.0000003 0.000003 (seconds) Circuit Generation Complexity 197,896,186 432,109,936 5,653,509,584 58,641,652,143 (cycles/byte) Circuit Generation Complexity 0.04 0.09 1.23 12.70 (seconds) Proof Generation Complexity 255,670,545 465,505,001 3,730,111,317 38,720,856,965 (cycles/byte) Proof Generation Complexity 0.05 0.10 0.82 8.58 (seconds) Proof Verification Complexity 11,826,688 12,459,491 15,544,539 19,610,437 (cycles/byte) Proof Verification Complexity 0.0028 0.0029 0.0037 0.0044 (seconds) Circuit Size (gates) 1,419 2,842 22,739 223,148 Proof Size (bytes) 121,752 127,256 152,756 180,112 Table 2 Results of generating and testing ZKPs for real data blocks from the NEAR blockchain Block Height 121,114,606 121,136,789 121,117,653 121,089,333 Number of transactions 52 78 102 169 Block bytes 5,677 5,092 4,897 6,262 Native Verification Complexity 9,368 8,424 8,366 10,318 (cycles/byte) Native Verification Complexity 0.000001 0.000001 0.000001 0.000002 (seconds) Circuit Generation Complexity 27,010,322,753 26,380,158,107 26,791,174,445 56,830,642,601 (cycles/byte) Circuit Generation Complexity 5.87 5.71 5.80 12.18 (seconds) Proof Generation Complexity 18,633,537,207 19,172,519,712 18,015,459,404 38,191,422,267 (cycles/byte) Proof Generation Complexity 4.18 4.10 4.03 8.32 (seconds) Proof Verification Complexity 17,173,339 17,197,238 17,034,388 19,024,157 (cycles/byte) Proof Verification Complexity 0.004 0.004 0.004 0.004 (seconds) Circuit Size (gates) 126,498 113,704 109,442 139,289 Proof Size (bytes) 165,684 165,684 165,684 180,112 4.2. Results for real data blocks from the required for native verification, circuit generation, proof NEAR blockchain generation, and proof verification. The results in Table 2 lead to the following To assess the applicability of the developed observations: methodology to real-world scenarios, we generated and tested ZKPs for data blocks from the NEAR blockchain. 1. The complexity of native verification for real Table 2 presents the results obtained for the selected data blocks is comparable to that of random blocks, including the block height, the number of data of similar sizes, confirming the consistency transactions, the block size, and the complexity and time of the native verification process. 28 2. The time required for circuit generation and including blockchain systems. The ability to generate and proof generation for real data blocks is also verify ZKPs efficiently and scalably can contribute to the comparable to that of random data, development of more secure and trustworthy systems, demonstrating the applicability of the where the integrity of computations can be verified proposed approach to real-world scenarios. without revealing the underlying data. 3. The time required for proof verification However, it is important to note that the current remains consistently low (around 0.004 study focuses on the specific case of cryptographic seconds) for all the tested real data blocks, hashing using the SHA-256 algorithm. Further research regardless of the number of transactions or is needed to assess the applicability of the proposed block size, highlighting the efficiency of the approach to other cryptographic primitives and to verification process. evaluate its performance in more complex real-world 4. The sizes of the generated circuits and proofs scenarios. for real data blocks are similar to those of In conclusion, the experimental results presented in random data, indicating the feasibility of this section provide strong evidence for the efficiency storing and transmitting the proofs in real- and scalability of the proposed methodology for world applications. generating and verifying ZKPs using the Plonky2 framework. The approach demonstrates consistent These results validate the applicability and performance for both random and real data, highlighting scalability of the proposed methodology for generating its potential for practical applications in ensuring the and verifying ZKPs using Plonky2 for real data blocks computational integrity of cryptographic hashing, from the NEAR blockchain. particularly in the context of blockchain systems. The consistency in performance between random and real data suggests that the approach can be 6. Conclusions effectively utilized in practical scenarios, such as ensuring the computational integrity of cryptographic In this study, we proposed and evaluated a methodology hashing in blockchain applications. for generating and verifying ZKPs to ensure the computational integrity of cryptographic hashing in 5. Discussion blockchain systems. By leveraging the Plonky2 framework, we demonstrated the efficiency and The experimental results presented in this section scalability of our approach for both random data and demonstrate the efficiency and scalability of the real data blocks from the NEAR blockchain. proposed approach for generating and verifying ZKPs The experimental results showed that the proposed using the Plonky2 framework. The methodology methodology achieves consistent performance across exhibits consistent performance for both random data different data sizes and types, with the time required for and real data blocks from the NEAR blockchain, proof generation and verification remaining within highlighting its potential for practical applications. acceptable limits. The complexity of native verification, The complexity of native verification, circuit circuit generation, proof generation, and proof generation, proof generation, and proof verification verification scales well with increasing data lengths, scales well with increasing data lengths, ensuring the indicating the feasibility of applying the approach to feasibility of applying the approach to larger datasets. larger datasets. The time required for proof verification remains Moreover, the sizes of the generated circuits and consistently low, even for real data blocks with a large proofs remain manageable, even for real-world data number of transactions, emphasizing the efficiency of blocks with a large number of transactions. This is the verification process, which is crucial for the practical particularly important for blockchain applications, adoption of ZKPs. where the storage and transmission of proofs should not Moreover, the sizes of the generated circuits and introduce significant overhead. proofs remain manageable, even for larger data lengths The consistency in performance between random and real data blocks, indicating the feasibility of storing and real data suggests that the proposed methodology and transmitting the proofs in real-world scenarios. This can be effectively applied to ensure the computational is particularly important for blockchain applications, integrity of cryptographic hashing in various blockchain where the storage and transmission of proofs should not systems. The ability to generate and verify ZKPs introduce significant overhead. efficiently and scalably contributes to the development The consistency in performance between random and of more secure and trustworthy systems, where the real data suggests that the proposed methodology can be integrity of computations can be verified without effectively applied to ensure the computational integrity revealing the underlying data. of cryptographic hashing in various applications, 29 Further research is needed to assess the applicability of Netw. 8 (2022) 604–613. doi: 10.1016/j.dcan. the proposed approach to other cryptographic 2022.04.017. primitives and to evaluate its performance in more [12] NEAR | Blockchains, Abstracted. URL: complex real-world scenarios. Nonetheless, the results https://near.org/. presented in this study provide a solid foundation for the [13] O. Goldreich, S. Micali, A. Wigderson, Proofs That development of efficient and scalable solutions for Yield Nothing but Their Validity or All Languages ensuring the integrity of computations in blockchain in NP Have Zero-Knowledge Proof Systems, J. systems, ultimately supporting the broader adoption of ACM 38 (1991) 690–728. doi: 10.1145/11682 this transformative technology. 5.116852. [14] O. Goldreich, Y. Oren, Definitions and Properties References of Zero-Knowledge Proof Systems, J. Cryptology 7 (1994) 1–32. doi: 10.1007/BF00195207. [1] S. Nakamoto, Bitcoin: A Peer-to-Peer Electronic [15] U. Feige, A. Fiat, A. Shamir, Zero-Knowledge Cash System, (2008). Proofs of Identity, J. Cryptology 1 (1988) 77–94. [2] G. Kaur, C. Gandhi, Chapter 15. Scalability in doi: 10.1007/BF02351717. Blockchain: Challenges and Solutions, Handbook [16] I. Damgård, J. B. Nielsen, CPT Course Home Page of Research on Blockchain Technology, Academic (2010). URL: https://cs.au.dk/~ivan/CPT.html Press (2020) 373–406. doi: 10.1016/B978-0-12- [17] C. Bartoli, I. Cascudo, On Sigma-Protocols and 819816-2.00015-0. (packed) Black-Box Secret Sharing Schemes (2023). [3] V. Zhebka, et al., Methodology for Choosing a URL: https://eprint.iacr.org/2023/1652 Consensus Algorithm for Blockchain Technology, [18] A. Fiat, A. Shamir, How to Prove Yourself: in: Digital Economy Concepts and Technologies, Practical Solutions to Identification and Signature vol. 3665 (2024) 106–113. Problems, Advances in Cryptology-CRYPTO ‘86, [4] B. Bebeshko, et al., Application of Game Theory, Springer-Verlag (1987) 186–194. Fuzzy Logic and Neural Networks for Assessing [19] J. Eberhardt, S. Tai, ZoKrates - Scalable Privacy- Risks and Forecasting Rates of Digital Currency, J. Preserving Off-Chain Computations, IEEE Theor. Appl. Inf. Technol. 100(24) (2022) 7390– International Conference on Internet of Things 7404. (iThings) and IEEE Green Computing and [5] V. Zhebka, et al., Methodology for Predicting Communications (GreenCom) and IEEE Cyber, Failures in a Smart Home based on Machine Physical and Social Computing (CPSCom) and Learning Methods, in: Cybersecurity Providing in IEEE Smart Data (SmartData) (2018) 1084–1091. Information and Telecommunication Systems, vol. doi: 10.1109/Cybermatics_2018.2018. 00199. 3654 (2024) 322-332. [20] A. Gabizon, Z. J. Williamson, O.-M. Ciobotaru, [6] X. Yang, W. Li, A Zero-Knowledge-Proof-Based PLONK: Permutations Over Lagrange-bases for Digital Identity Management Scheme in Oecumenical Nonin-teractive Arguments of Blockchain, Comput. Secur. 99 (2020). doi: Knowledge, IACR Cryptol, ePrint Arch. (2019). 10.1016/j.cose.2020.1020 50. [21] E. Ben-Sasson, et al., Fast Reed-Solomon [7] A. Emami, et al., A Scalable Decentralized Privacy- Interactive Oracle Proofs of Proximity (2018). doi: Preserving E-Voting System Based on Zero- 10.4230/Lipics.Icalp.201 8.14. Knowledge Off-Chain Computations, J. Inf. Secur. [22] E. Ben-Sasson, et al., Scalable Zero Knowledge Appl. 79 (2023). doi: 10.1016/j.jisa.2023.103645. with No Trusted Setup, Advances in Cryptology – [8] S. Goldwasser, S. Micali, C. Rackoff, The CRYPTO 2019, Springer International Publishing, Knowledge Complexity of Interactive Proof- Cham (2019) 701–732. doi: 10.1007/978-3-030- Systems, Seventeenth Annual ACM Symposium 26954-8_23. on Theory of Computing, Association for [23] Near Block DnGLLWt6Q4MKv65uLLc2u Computing Machinery (1985) 291–304. doi: AB81eRbvS944f5Jkh2FF5US | Near Blocks. URL: 10.1145/22145. 22178. https://nearblocks.io/blocks/DnGLLWt6Q4MKv65 [9] E. Ben-Sasson, et al., Succinct {Non-Interactive} uLLc2uAB81eRbvS944f5Jkh2FF5US Zero Knowledge for a von Neumann Architecture, [24] Near Block CHNB17HdYWDbapLq5tv3y2Wwv 23rd USENIX Security Symposium (2014) 781–796. 755LUT4LttrHn6KtwHD | Near Blocks. URL: [10] M. Loporchio, et al., A Survey of Set Accumulators https://nearblocks.io/blocks/CHNB17HdYWDbap for Blockchain Systems, Comput. Sci. Rev. 49 Lq5tv3y2Wwv755LUT4LttrHn6KtwHD (2023) 100570. doi: 10.1016/j.cosrev.2023.100570. [25] Near Block 5qD3eZtUrkheHKEGhQw3oa [11] Y. Huang, et al., Blockchain-Based Continuous rPHsdjiAmWNASeZV9W1r5s | Near Blocks. URL: Data Integrity Checking Protocol with Zero- Knowledge Privacy Protection, Digital Commun. 30 https://nearblocks.io/blocks/5qD3eZtUrkheHKEG hQw3oarPHsdjiAmWNASeZV9W1r5s [26] Near Block 4oMRqMRD1P6wPtnkPURNpa6s nxUvMFMyDZCv7uSq53FX | Near Blocks. URL: https://nearblocks.io/blocks/4oMRqMRD1P6wPtn kPURNpa6snxUvMFMyDZCv7uSq53FX [27] K. Kuznetsova, et al., Solving Blockchain Scalability Problem Using ZK-SNARK, Advances in Artificial Systems for Logistics Engineering III, Springer Nature Switzerland, Cham (2023) 360– 371. doi: 10.1007/978-3-031-36115-9_33. 31