<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Blockchains Meet Distributed Hash Tables: Decoupling Validation from State Storage</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Matteo</forename><surname>Bernardini</surname></persName>
							<email>mat.bernardini@stud.uniroma3.it</email>
						</author>
						<author>
							<persName><forename type="first">Diego</forename><surname>Pennino</surname></persName>
							<email>pennino@ing.uniroma3.it</email>
						</author>
						<author>
							<persName><forename type="first">Maurizio</forename><surname>Pizzonia</surname></persName>
							<email>pizzonia@ing.uniroma3.it</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Ingegneria</orgName>
								<orgName type="institution">Università degli Studi Roma Tre</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="institution">Sezione Informatica e Automazione</orgName>
								<address>
									<addrLine>Via della Vasca Navale 79</addrLine>
									<postCode>00146</postCode>
									<settlement>Roma</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Blockchains Meet Distributed Hash Tables: Decoupling Validation from State Storage</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">EA1818FD2EC93A187CCBBA546D96D21F</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T19:54+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Blockchain</term>
					<term>Distributed Hash Table</term>
					<term>Synchronisation Efficiency</term>
					<term>Integrity</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The first obstacle that regular users encounter when setting up a node for a public blockchain is the time taken for downloading all the data needed for the node to start operating correctly. In fact, this may last from hours to weeks for the major networks.</p><p>Our contribution is twofold. Firstly, we show a design that enables mining and validation of new blocks keeping only a very small state. Secondly, we show that it is possible to store the state of the blockchain in a distributed hash table obtaining a wide spectrum of trade-offs between storage committed by the nodes and replication factor. Our proposal is independent from the consensus algorithm adopted, and copes well with transactions that involve smart contracts.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Cryptocurrencies based on public blockchains started with the promise for the common people to gain some freedom from institutional and government-regulated payment instruments. Currently, this romantic objective is hardly met. The reasons are many. The adoption of a proof-of-work consensus algorithm is usually regarded as a prominent one due to its big requirements in terms of computational power. The needs are so high that only focused organisations can afford mining (e.g., mining pools). This is one of the reasons that prompt researchers and other communities to focus on the study of lighter consensus algorithms, e.g., based on some form of proof-of-stake. The effort in that direction is remarkable and it is likely that this problem will be practically overcome quite soon.</p><p>Even supposing to have a light consensus algorithm, when installing a node a common user stumbles upon another major difficulty: the time taken to download the data needed for the software to properly operate and, up to a certain extent, the amount of space required for that data. Note that this is true also with today's technology when the user decides to install a validate-only node, which may be useful, for example, to have an independent way of injecting transactions into the network. The time needed for the first synchronisation varies depending on the bandwidth, on the I/O speed of the mass storage and on the CPU speed of the node. This is currently in the order of 6-24 hours for Ethereum but may last weeks for Bitcoin. Further, if the node is shut down for a while, it needs a certain amount of time before becoming fully operational again.</p><p>In this paper, we present a blockchain design with certain notable properties with respect to the ability to quickly start a validating node, which might possibly be a miner. In particular, we show that it is possible to run a blockchain in which validating nodes are only mandated to keep the last n received blocks.</p><p>In our approach, nodes do not store the whole state of the blockchain but only the state changed by transactions in these blocks, while still being able to validate blocks with the same level of security of regular blockchains.</p><p>The data needed to perform validation of transactions are retrieved from untrusted storage by the creator of the transaction and conveyed to the nodes along with it. Security is obtained by equipping candidate transactions with proofs derived from authenticated data structures, similar to those already commonly used in blockchains. Since validation is decoupled from the storage of the bulk of the blockchain data, we are free to store these data where it is more convenient. Our proposal is to store them in a Distributed Hash Table (DHT) realised by the same nodes that perform validations. In principle, a node may choose how much storage to commit for storing blockchain data, from nothing to the whole blockchain dataset. The burden to query the DHT is left to the creator of the transaction. This allows us to obtain a wide spectrum of trade-offs between storage used by nodes and replication factor.</p><p>The rest of this paper is structured as follows. In Section 2 we review the state of the art. In Section 3 we briefly introduce basic concepts. In Section 4 we describe our decoupled approach. In Section 5 we discuss performances and security. In Section 6 we draw the conclusions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">State of the Art</head><p>In 2008 the blockchain technology was introduced as basis for the Bitcoin cryptocurrency <ref type="bibr" target="#b6">[7]</ref>. In that context, blockchain addressed the problem to verify the correct behaviour of untrusted nodes in a peer-to-peer network, with respect to the execution of payment transactions. Following the same approach, many other blockchain-based systems extended the application spectrum and the kind of supported transactions. One notable example is Ethereum <ref type="bibr" target="#b10">[11]</ref>, which supports transactions that can execute general purpose scripts called smart contracts.</p><p>Scalability is one of the prominent topics in research about blockchain. Most of the work is focused on obtaining scalable consensus algorithms in terms of number of nodes and throughput, keeping latency low. A good survey of this area is provided by <ref type="bibr" target="#b1">[2]</ref>. It compares various scalability issues with common blockchains and recent proposals to overcome scalability limits. It concludes suggesting that the sharding technique seems to be the most viable method for scaling the blockchain. Essentially, this technique divides the peer-to-peer network in several smaller networks dividing up the load (see for example <ref type="bibr" target="#b3">[4]</ref>).</p><p>We focus on a very specific practical problem. In current technologies, a validating node is required to store a large amount of data and this is one of the difficulties that hinder the wide adoption of the current blockchain technology. A Bitcoin "full node" needs to store the whole transaction history. Currently, the Bitcoin database of a full node occupies more that 150 GB. In Ethereum, blocks assert not only consensus on a valid set of transactions but also on an explicitly represented state of the system in terms of amount of money associated with addresses and content of persistent variables of smart contracts. For this reason, contrary to Bitcoin nodes, Ethereum nodes can validate and mine new blocks without storing the whole transaction history. Currently, for Ethereum the full history is larger that 1 TB but Ethereum does not mandate to store it for mining or validating blocks. A node storing only the current state needs about 150 GB of free space (using geth fast sync). The time taken to download and validate these data greatly varies depending on bandwidth, cpu speed and mass storage speed. However it may go from several hours to several weeks. This problem is particularly relevant if blockchain is adopted in IoT devices <ref type="bibr" target="#b2">[3]</ref>. The adoption of the sharding technique should mitigate this problem. However, for several reasons, sharding requires a complete re-engineering with respect to the most common blockchains. Further, the impact depends on the size of the shards, which may be affected by other considerations, and on how large the state associated with each shard is.</p><p>Blockchain is also used as a notary service. When notarised data is more than a handful of bytes, the document is usually stored by a different service and just its hash is recorded in the blockchain. One possible approach to keep the decentralised characteristic of the system intact is to store these documents in a peer-to-peer network, like IPFS <ref type="bibr" target="#b0">[1]</ref>. These solutions rely on Distributed Hash Tables (DHTs). A DHT is a key-value pair storing system that is decentralised and distributed and guarantees that any participating node can efficiently retrieve the value associated with a given key using a lookup service (see for example <ref type="bibr" target="#b5">[6]</ref>). In <ref type="bibr" target="#b9">[10]</ref> an authenticated DHT is proposed. In this paper, we adopt a DHT to store the state of the blockchain to relieve a validating node from storing the state associated with the addresses of the blockchain and permit higher flexibility in storage commitment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Background</head><p>In this section, we recall basic concepts, terminology and properties about authenticated data structures (ADS ), Blockchain and Distributed Hash Tables (DHT ), limiting the matter to what is strictly needed to understand the rest of this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Authenticated Data Structures (ADS)</head><p>For this paper, an ADS is a container of key-value pairs, which are also called elements. The ADS deterministically provides a constant-size digest of its content with the same properties of a cryptographic hash. We call it root-hash, denoted by r. If the value of any element of the set changes, r changes. An ADS provides two operations, the authenticated query of a key k and the authenticated update of a key k with a new value v . A query returns the value v and a proof of the result with respect to the current value of r. The update operation on k changes v associated with k into a provided v and changes r in r , as well. An interesting aspect is that a trusted entity that intends to update k can autonomously compute r starting from the proof of k, v obtained by a query.</p><p>A typical ADS is a Merkle Hash Tree in which each leaf stores the hash of k, v and each internal node stores the hash of the composition of the hashes of its children. In this case, a proof for k is constructed by considering the path from the leaf associated with k to the root. The proof contains the sequence of the hashes stored in the siblings of each node in that path labelled with the indication that the path is entering a node from the left or right child. The proof check is performed by computing the hashes on the path starting from the leaf and comparing the result with the root-hash. To update the root-hash, the same computation is performed using the new value when computing the hash of the leaf.</p><p>When we have a large set of elements stored in an ADS, but we only need authentication for a small number of them, known in advance, we can resort to the pruning technique. Pruning reduces the storage size of the tree, without changing the root-hash, by removing sections of the tree that are not needed for the expected queries. The basic idea is very simple. Whenever a subtree has only unneeded leaves, we can remove all the subtree maintaining only its root with its original hash. Pruning an ADS reduces the required space, preserves the root-hash, preserves the capability to produce proofs for the needed keys, and keeps security intact.</p><p>Further details can be found in <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b4">5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Blockchain</head><p>From our point of view, a blockchain is a data structure that stores a state and its evolution over time. The state is a sort of key-value store where keys are called (state) elements. The concept of state element is an abstraction that may be regarded as an address with a corresponding balance (following the Ethereum terminology) or as a variable of a contract account with its value.</p><p>A transaction is an atomic change of a number of involved state elements. A block is essentially a sequence of transactions. Each block is hash-chained with the previous one in a blockchain. In our model, each block is associated with a state before and after the execution of its transactions. Blocks are generated at regular intervals of length T , called block time. Each block is identified by an increasing number: its index. The block with index i is denoted b i . The depth of a block is the number of blocks that were mined after it plus one. The depth of the last mined block is 1. To be valid, a block should conform to a number of consensus rules, which may deal with specific semantic constraints (like accounting constraints or smart contract execution). Even if consensus rules are fundamental in practice, the rest of the paper is largely independent from the specific rules enforced by a blockchain. The consensus algorithm is the way nodes reach an agreement about the next block to be added to the blockchain. The rest of the paper is independent from the specific algorithm adopted by a blockchain. Certain algorithms may temporarily produce forks, that is more chains are grown at the same time for a while, then one of them is chosen (usually the longest one) by all nodes discarding the blocks of the other chains.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Distributed Hash Tables (DHT)</head><p>Distributed Hash Tables are distributed data structures supporting put() and get() primitives on key-value pairs. Most DHT implementations can locate an object in O(log n) network operations, where n is the number of nodes of the DHT, and provide a fault-tolerant way to access large amounts of data. The keyspace is the set of all possible keys. Each node stores a subset of key-value pairs among the keyspace. We say a node N is authority for the key k if it stores its data. Each node also gets assigned an identifier from the keyspace. The DHT defines a distance function between keys. Typically a node N is authority for keys close to its identifier according to that distance.</p><p>In our study, we do not need the put() operation. The get(k) operation returns the value associated with a key k, performing a lookup in the network, to locate a node that is authority for k. For that purpose a suitable routing algorithm is used, with each node storing a routing table based on the distance among node identifiers.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Block Validation and State Storage as Separate Roles</head><p>In our approach, each node has two distinct roles: the storage role and the validation role. For the storage role, each node stores values for a subset of the state elements, essentially acting as a DHT node. For the validation role, each node mines new blocks and validate blocks that are broadcasted in the network.</p><formula xml:id="formula_0">0 0 0 0 0 0 1 1 1 1 1 1 1 0 r ∅ ∅ ∅ ∅ e0 = 000 e1 = 001 e4 = 100 e6 = 110 e7 = 111 e2 = 010 e3 = 011 e5 = 101 v = 0 v = 0 v = 0 v = 0 v = 0 v = 0 v = 0 v = 0 this part is not stored in N1</formula><p>Node N1 is authority for these elements this part is stored in N1</p><p>(a) The pADS stored by N1.</p><formula xml:id="formula_1">0 0 0 0 0 0 1 1 1 1 1 1 1 0 r ∅ ∅ ∅ ∅ e0 = 000 e1 = 001 e4 = 100 e6 = 110 e7 = 111 e2 = 010 e3 = 011 e5 = 101 v = 0 v = 0 v = 0 v = 0 v = 0 v = 0 v = 0 v = 0 this part is not stored in N2 this part is stored in N2</formula><p>Node N2 is authority for these elements this part is not stored N2</p><p>(b) The pADS stored by N2. Contrary to the traditional approach, validation does not rely on local storage of the state. A node may not play the storage role at all. Further, any node can create a new transaction and broadcast it so that it can be included (if valid) in one of the next blocks during mining.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Storage Role of a Node</head><p>For this role, each node acts as a node of a DHT. The whole DHT stores the state of the blockchain (see Section 3.2) and is able to reply to each query for a state element e with its value v.</p><p>Since the storage is considered untrusted by the validation role, we equip the state with an ADS. For simplicity, we assume the ADS to be constructed as a binary prefix tree (see Figure <ref type="figure" target="#fig_0">1</ref>) where we suppose that elements are identified by strings of bits of the same length. Logically the state is represented as a complete binary tree where the leaves are all possible state elements. As for any DHT, in general, not all the state is kept by each node. A node that stores the value of an element e is called authority for e. If node N is not authority for e, we prune e from the ADS stored by N using the technique described in Section 3.1. We call this pruned version a pADS. Figure <ref type="figure" target="#fig_0">1</ref> depicts an example of an ADS, over state elements e 0 , . . . , e 7 , and how two nodes N 1 and N 2 store different pruned versions of it. All state elements are associated to a value. A special value, denoted ∅ in figure, refers to the hash associated with subtrees whose leaves store only null values, like all unused elements. In the example, N 1 is authority for e 0 and e 1 , while N 2 is authority for e 4 and e 5 . The part of the ADS delimited by the solid line is the structure stored by a specific node, which we call pADS, while the part delimited by the dashed line is pruned.</p><p>Each node N stores a pADS for the blockchain state in a certain instant of time. In a synchronous scenario, all pADSes are the pruned version of the same ADS. In a real scenario, this is no longer true because each node receives the updates with a different delay. For this reason, each node N stores a pADS for the blockchain state that is associated with a block b i (i.e., the state before the execution of the transactions in b i ). We call that block the pivot block for N .</p><p>When a new block is received and validated by N , it is added to the local view of the blockchain. At the same time, the pADS is updated by applying changes according to the execution of the transactions of b i , and b i+1 becomes the new pivot, where b i+1 may be the last block in the local view of N of the blockchain or an older block still stored by N (see Section 4.3). Since block propagation in the network takes some time, nodes can have a slightly skewed view of the state. This aspect is taken into account within the validation role.</p><p>A query for element e can be answered by a node N that is authority for e. It returns the value v of e, the proof p obtained by its pADS (see Section 3) and the index of the pivot block for N .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Transaction Creation</head><p>A transaction involves a number of state elements. These are the state elements read and updated by the operations executed in the transaction. Operations may be complex, like in an invocation of a smart contract. However, we constrain the operations to act only on a set of state elements that should be known before the execution of the transaction.</p><p>In our approach, a transaction is similar to one in traditional blockchains. It specifies sender(s), receiver(s), operations, and all involved state elements. As in traditional blockchains, for state elements that contain cryptocurrency balances to be charged, a signature is also provided to prove that the sender of the transaction is also the owner of charged state elements.</p><p>Suppose a node N c intends to create a new transaction t. The set of the state elements involved in t is denoted by E. To know the current value associated with each e ∈ E, N c queries the DHT obtaining, from a node N a authority for e, the tuple δ i (e) = v, p, i , where v is the value for e known to N a , p is the proof obtained from the pADS of N a , and i is the index of the pivot block of N a . When N c broadcasts t, it attaches δ i (e) to t for each e. These additional data are attached without any additional signature, since they will be verified against root-hashes taken from blocks, and will be used during mining by nodes executing the validation role. For the validation role, each node N stores only the last d + f blocks it received (see Figure <ref type="figure" target="#fig_1">2</ref>). We denote Λ N this truncated blockchain of N . At a certain instant of time, the truncated blockchain of distinct nodes may be different, since the propagation of new blocks in the network may take some time. We suppose that each branch of a fork can be at most f blocks long. Hence, the block at depth greater than f cannot be undone by a fork resolution. The pivot block for the storage role for a certain node N is chosen to be b l−f , where l is the index of last block in Λ N .</p><p>We suppose that d is big enough so that the proofs attached with each transaction are computed on states related to blocks that are still in Λ N when N validates the transaction for the mining of a new block. The dimensioning of d should take into account the time taken to create a transaction, comprising the query to the DHT, the propagation delay of the transaction, and the maximum time a transaction has to wait to be included in a block in case of a peak of requests (see also <ref type="bibr">Section 5)</ref>.</p><p>Each block b i contains a number of transactions. In traditional blockchains these are stored in a Merkle Tree whose root-hash is stored in the block. We depict it in this way in Figure <ref type="figure">3</ref>, even though this is not relevant for our approach. The union of all involved state elements in transactions associated with b i is denoted E i . The block b i also contains a pADS τ i representing the state of the blockchain before the application of the transactions in b i , but only elements in E i are explicitly represented in τ i , the rest is pruned. Note that, the root-hash of τ i is uniquely associated with the whole state at a certain time. We also denote by π i a distinct pADS (with the same topology of τ i but different content) obtained applying all transaction in b i to the state represented by τ i . Note that, a node can choose to store π i 's or re-compute them when needed. Since the pivot block is now changed, the pADS for the storage role of N is updated accordingly. Namely, τ l−f −1 is updated with the changes of the execution of transitions of b l−f −1 into a τ . Then, τ is used to update values of leaves or hashes of pruned subtrees in the pADS used for the storage role.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Creation of a block</head><p>To mine a new block, a node N chooses a set of selected transactions to be included in the candidate next block b l+1 , based on some criteria (e.g., prioritising higher fees). We call E l+1 the state elements that are involved in at least one of the selected transactions. For each e ∈ E l+1 , all δ i (e) = v, p, i attached to the selected transactions are checked for their integrity by comparing the root-hash computed from p and v against the roothash of τ i , which should be in Λ N if d is big enough.</p><p>We now compute the pADS τ = τ l+1 to be included into b l+1 representing the state before the execution of the selected transactions. Note that, π l can not be used as τ l+1 since they have a different pruning. We start by creating a τ with the final structure but with no hashes and values. This is done by merging all the proofs attached to the selected transactions considering only topological information and ignoring hashes and values. Then, we iteratively perform the following operations starting from x = l and decrementing x. We consider π x and, only for nodes with unset hash/value in τ , set their hashes/values with the content of the corresponding nodes in π x . We then do the same thing with nodes of proofs in δ x (e) that are attached to selected transactions, again filling only empty hashes/values. We iterate until τ is fully populated. At the end, τ turns out to be equal to τ l+1 . This procedure ends since, in the worst case, τ is populated only by all δ i (e) attached to selected transactions. Its correctness derives from the fact it gives precedence to the most up-to-date hashes/values and up-to-date hashes should take into account non updated hashes/values by the way transactions update the state.</p><p>Each selected transaction t is then validated considering its execution starting from the values of elements as in τ l+1 , according to the consensus rules. If all consensus rules are met, t is actually added to the block. The new block b l+1 contains all the valid transactions t and τ l+1 (pruned from elements only used by invalid transactions). Then N executes the consensus algorithm and, if/when successful, broadcasts the block to the blockchain network.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Discussion</head><p>In this section, we discuss several aspects of a possible realisation of the described approach. We assess our approach on the basis of some parameters taken from the Ethereum network. Currently, in Ethereum, a broadcasted block has an average size of 18 kB and a transaction has an average size of 200 bytes. Supposing to adopt our approach, additional information should be attached to transactions and blocks. In both cases, its size depends on the number of involved state elements.</p><p>In our approach, each broadcasted transaction attaches values and proofs of the involved state elements, relative to a certain pivot block. Since values usually have negligible size, we focus on proofs. For each state element, in our very simple construction of a pADS, a proof has as many elements as the number of bits of the identifier of the state element. For example, if an identifier has 160 bits and a hash is 20 bytes, we obtain for each proof a size of 3200 bytes. However, depending on the design, many state-elements that are involved in a transaction may be close to each other and thus share most of the path to the root. For example, in Ethereum the state elements which represent the "storage" of a contract account are all in the same subtree. A set of proofs can be represented itself as a pADS. Further, the pADS may be a patricia trie compressing long chains. For this reason, we assume an average of 500 bytes for each involved state element. In the Ethereum case, most transactions are calls to smart contracts, generally involving more that two state elements. We assume 5 as an average number of state elements per transaction. With these assumptions, each transaction turns out to be about 2700 bytes, which is one order of magnitude larger than the standard one. We leave a more precise estimation as a future work.</p><p>A broadcasted block b i contains τ i , which is a pADS covering all state elements involved in transactions contained in b i . Hashes of inner nodes of τ i are omitted since they can be computed from their leaves (see Section 3.1). We estimate the size of τ i as the size of the union of all the proofs of involved state elements in b i . In Ethereum, a block has 90 transactions on average. Given the assumptions above, we estimate an average of 450 involved state elements per block. Thus, each block gets an average overhead of 225 kB, again one order of magnitude bigger than the standard block.</p><p>We now estimate the time needed by a node to synchronise. This is the time needed to receive d + f blocks. Usually, in Ethereum, a transaction is considered confirmed after 3 minutes which is about 12 blocks, hence, we set f = 12. Regarding the dimensioning of d, for simplicity, we consider a model in which propagation of broadcast communications in the network takes some time but is synchronised, in the sense that all nodes receive the same data at the same time. In this model, all nodes have the same pivot block. A node N c , to create a transaction θ, queries the DHT to obtain proofs of state elements involved in θ. Suppose that the first query is served at time t 0 with index i. Then, all queries are completed within time t 0 +∆t DHT . Supposing transactions propagate in the network in time ∆t pr , θ arrives to a miner N m at time t 1 = t 0 + ∆t DHT + ∆t pr . For N m to be able to check the proofs attached with θ, b i should still be in Λ Nm at t 1 . This is verified if ∆t DHT + ∆t pr &lt; (d − 1)T , since Λ Nm is updated every block time T . The work described in <ref type="bibr" target="#b7">[8]</ref> reports a look-up time for Kademlia of no more than 30 seconds. To take into account the time to broadcast a new transaction and the time it has to wait in queue, we conservatively set d = 8 (which is equivalent to 2 minutes). Thus, the data to be downloaded to set up a node from scratch is about 4.8 MB, which at the speed of 2 Mb/s takes about 19 seconds. A very high speed up with respect to the time currently taken by an Ethereum node, which is in the order of hours at best. This speed up is not totally for free. Our approach also increases the time taken by the sender node to prepare the transaction, due to queries to be performed on the DHT. We note that those queries, one for each state element involved in the transaction, can be performed in parallel. Further, for use cases in which nodes should perform a large amount of similar transactions, specific caching mechanisms can be designed.</p><p>We note that, a transaction involving a large number of state elements is a considerable burden for the network, due to the fact that corresponding proofs should be stored in the block (i.e., in τ i ). Hence, a blockchain adopting our approach can discourage this kind of transactions by suitable means, like higher fees or gas consumption. On the other hand, a reward may be given to nodes that commit more storage for the DHT or that serve more DHT requests.</p><p>Concerning security, if we assume that blocks in Λ N are trusted, the proofs that are received along with the transactions to be processed are enough to guarantee the authenticity of the involved state elements. Transactions containing proofs with indexes outside Λ N are discarded. Validation of broadcasted blocks only requires the previous block to be trusted, which is a common assumption for blockchains.</p><p>Since resynchronisation of a node is easy, we expect a high rate of disconnection and reconnection of nodes. During reconnection, a node needs a way to be sure that the first block it downloads belongs to the right blockchain, for example to the same blockchain that it was attached before reconnection. This problem is mitigated in regular blockchains by the fact that the nodes download the whole chain and have the hash of the first block hard-coded. In our approach, to avoid eclipse attacks during synchronisation, we may introduce checkpoint blocks, whose hash is supposed to be kept by nodes for some time and included in all (or some) of the following blocks. This method allows for safe resynchronisation after a disconnection for a limited amount of time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Conclusions</head><p>We have presented a method to significantly decrease the time taken by a node of a blockchain network to download the data needed to properly work. In particular, validation can be performed storing a small amount of data. To achieve this, we take advantage of authenticated data structures and flexibly distribute the storage of the blockchain state on a DHT made by the same nodes of the network. The main feature of this methodology is its speed, which allows the first synchronisation of a node to be performed in less than a minute. Additionally, our approach is independent from the adopted consensus algorithm and from the consensus rules.</p><p>As future works, we plan to prototypically implement this approach in a software derived from one of the major blockchain networks (e.g., Ethereum). We plan to perform an extensive test and possibly proposing the idea to the corresponding community. We also consider interesting to investigate the relation of our approach with other scaling techniques, like sharding.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: An example of pruned ADS (pADS ) for two nodes, each storing just a part of the same ADS.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: A node, for its validation role, stores only the last d + f blocks (see text). In the figure, l denotes the index of the last block.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>6 Figure 3 :</head><label>63</label><figDesc>Figure 3: Content of two consecutive blocks. Each block b i contains a pruned representation τ i of the state before the execution of the transactions in b i . In τ i , the only unpruned leaves are the state elements involved in the transactions of the block.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="1,0.00,159.54,612.00,472.91" type="bitmap" /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">Juan</forename><surname>Benet</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1407.3561</idno>
		<title level="m">Ipfs-content addressed, versioned, p2p file system</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Blockchain and scalability</title>
		<author>
			<persName><forename type="first">Anamika</forename><surname>Chauhan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Om</forename><surname>Prakash Malviya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Madhav</forename><surname>Verma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Tejinder</forename><surname>Singh Mor</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="122" to="128" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Analysis of the communication traffic for blockchain synchronization of iot devices</title>
		<author>
			<persName><forename type="first">Pietro</forename><surname>Danzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Anders</forename><forename type="middle">Ellersgaard</forename><surname>Kalor</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Cedomir</forename><surname>Stefanovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Petar</forename><surname>Popovski</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Communications (ICC)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="1" to="7" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Omniledger: A secure, scale-out, decentralized ledger via sharding</title>
		<author>
			<persName><forename type="first">Eleftherios</forename><surname>Kokoris-Kogias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Philipp</forename><surname>Jovanovic</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Linus</forename><surname>Gasser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nicolas</forename><surname>Gailly</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ewa</forename><surname>Syta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Bryan</forename><surname>Ford</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Symposium on Security and Privacy (SP)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2018">2018. 2018</date>
			<biblScope unit="page" from="583" to="598" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">A general model for authenticated data structures</title>
		<author>
			<persName><forename type="first">Charles</forename><surname>Martel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Glen</forename><surname>Nuckolls</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Premkumar</forename><surname>Devanbu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Michael</forename><surname>Gertz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">April</forename><surname>Kwong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Stuart</forename><forename type="middle">G</forename><surname>Stubblebine</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Algorithmica</title>
		<imprint>
			<biblScope unit="volume">39</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="21" to="41" />
			<date type="published" when="2004">2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Kademlia: A peer-to-peer information system based on the xor metric</title>
		<author>
			<persName><forename type="first">Petar</forename><surname>Maymounkov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">David</forename><surname>Mazieres</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Workshop on Peer-to-Peer Systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2002">2002</date>
			<biblScope unit="page" from="53" to="65" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Bitcoin: A peer-to-peer electronic cash system</title>
		<author>
			<persName><forename type="first">Satoshi</forename><surname>Nakamoto</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Evaluating and improving the content access in kad</title>
		<author>
			<persName><forename type="first">Moritz</forename><surname>Steiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Damiano</forename><surname>Carra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Ernst</forename><forename type="middle">W</forename><surname>Biersack</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Peer-to-peer networking and applications</title>
		<imprint>
			<biblScope unit="volume">3</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="115" to="128" />
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Authenticated data structures</title>
		<author>
			<persName><forename type="first">Roberto</forename><surname>Tamassia</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">European Symposium on Algorithms</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2003">2003</date>
			<biblScope unit="page" from="2" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Efficient content authentication over distributed hash tables</title>
		<author>
			<persName><forename type="first">Roberto</forename><surname>Tamassia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Nikos</forename><surname>Triandopoulos</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. Int&apos;l Conf. Applied Cryptography and Network Security (ACNS&apos;07)</title>
				<meeting>Int&apos;l Conf. Applied Cryptography and Network Security (ACNS&apos;07)</meeting>
		<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Ethereum: A secure decentralised generalised transaction ledger</title>
		<author>
			<persName><forename type="first">Gavin</forename><surname>Wood</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Ethereum project yellow paper</title>
		<imprint>
			<biblScope unit="volume">151</biblScope>
			<biblScope unit="page" from="1" to="32" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
