<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>SEBD</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Consistently Faster: A Survey and Fair Comparison of Consistent Hashing Algorithms</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Massimo Coluzzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amos Brocco</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiziano Leidi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Information Systems and Networking Institute, Department of Innovative Technologies, University of Applied Sciences and Arts of Southern Switzerland</institution>
          ,
          <addr-line>Lugano</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <volume>31</volume>
      <fpage>02</fpage>
      <lpage>05</lpage>
      <abstract>
        <p>Consistent hashing is a well-known hashing technique that allows for a minimal number of resources to be remapped when a cluster is scaled. It plays a fundamental role as a data router and a load balancer in various fields, such as distributed databases, cloud infrastructures, and peer-to-peer networks. Although studies of consistent hashing algorithms relating to diferent usage scenarios have been carried out, the literature does not provide a thorough evaluation and comparative assessment. Therefore, this paper surveys and empirically compares the most prominent consistent hashing algorithms for distributed databases and cloud infrastructures, published from 1997 to 2021, in a fair agnostic context. Comparison has been performed by implementing all algorithms in Java and benchmarking them on commodity hardware. The metrics involved in the comparison are memory usage, initialization, resize and lookup time, balance, and monotonicity. We found Jump, Anchor, and Dx to outmatch the other algorithms on all the considered metrics. The measured values match the asymptotic curves. Although, some asymptotically faster algorithms have been shown to be slower in practice due to memory accesses.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Consistent hashing</kwd>
        <kwd>load balancing</kwd>
        <kwd>distributed systems</kwd>
        <kwd>survey</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Hashing algorithms are deterministic functions which take an arbitrary amount of data as
input and produce a fixed length output called hash value or digest. In distributed database
clusters, tables sharding can be used to achieve horizontal scalability and improved performance.
Hashing algorithms can be used to distribute records among all shards, while the system can
eficiently determine which shard is responsible for a specific record.</p>
      <p>
        Hashing algorithms can be used to construct associative arrays that allow for referencing
data through an arbitrary key instead of a numerical index: the key is consumed by the hashing
algorithm to determine the index of the bucket where data resides. With Distributed Hash
Tables (DHTs) each bucket might reside on a diferent node in a computer network. Dynamically
resizing an hash table to change the number of buckets typically involves a considerable cost,
as the elements of the original table need to be inserted into the new table. On a DHT such
an operation would be required each time a node joins or leaves the network, and it incurs a
considerable transmission cost, as data would need to be transferred across the network should
the corresponding bucket change. To overcome this issue, consistent hashing solutions have
been developed. Consistent hashing is a class of distributed hashing algorithms that strives
to achieve balanced data placement and minimal reallocation costs. The goal of this paper is
to analyze state of the art to determine the most efective consistent hashing algorithm to use
to distribute data among the nodes of a cluster. Accordingly, the most prominent consistent
hashing algorithms published since the first papers on this matter published by Thaler and
Ravishankar in 1996 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and by David Karger et al. in 1997 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] were considered, namely:
- Consistent Hashing Ring: published by D. Karger et al. in 1997 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ][
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and
recognized as the first one utilizing the term consistent hashing to describe a solution to the
aforementioned problem.
- Rendezvous: published by Thaler and Ravishankar in 1996 [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
- Jump: published by Lamping and Veach in 2014 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
- Multi-probe: published by Appleton and O’Reilly in 2015 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
- Maglev: published by D. E. Eisenbud in 2016 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
- Anchor: published by Gal Mendelson et al. in 2020 [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>
        - Dx: published by Chaos Dong and Fang Wang in 2021 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        Some of them (like Ring and Maglev) are widely used in popular tools and massively distributed
infrastructures like Amazon and Google. In contrast, Anchor and Dx have not yet seen a
significant adoption as their publication is relatively recent. Other research eforts led to the
development of consistent hashing algorithms with bounded loads [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ]. While relevant
in the field of distributed load balancing, such a type of algorithms is out of the scope of this
research work. Early surveys of consistent hashing were published in 2018 [12] and 2022 [13],
but they focus mainly on lookup time. More specifically, in this paper, we aim at finding the
best fitting algorithm to be used as the foundation of a smart partition scheme capable of:
uniform data distribution among nodes (to avoid ”hotspots”), elastic re-balancing when the
compute cluster scales, minimizing data reassignment during cluster scaling, identifying the
partitions where data is stored, optimal performance with respect to initialization time, resize
time, and memory usage. We implemented all the algorithms in Java to provide uniform and
comparable implementations publicly available on GitHub [14] together with the benchmarking
tools. The measured values match the asymptotic curves. Although, some asymptotically faster
algorithms have been shown to be slower in practice due to the number of memory accesses.
The considered metrics will be thoroughly discussed in Section 3.1.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Description of the algorithms</title>
      <p>In this section, we will summarize the problem addressed by consistent hashing algorithms
and then describe the solutions. It should be noted that the paper is not covering consistent
hashing algorithms designed to work in peer-to-peer environments, and all the algorithms
analyzed in this paper are designed to distribute resources among a set of known nodes. This
problem is common to many applications, including distributed storage systems, database and
data warehouse clusters. Every distributed system used to store and retrieve data where the
underlying hardware can manage to fail can take advantage of the proposed solutions. It is left
to the reader to choose which solution best fits a specific use case. This work aims to compare
the solutions in a fair agnostic context.</p>
      <p>Notation</p>
      <p>The following notations have been used throughout this paper:
- K: the number of looked-up keys;
- W: the number of working nodes in a cluster;
- A: the number of overall nodes in a cluster (both working and not working);
- V: the number of virtual nodes for each physical node in Ring;
- M: the number of positions in the lookup table for each node in Maglev;
- P: the number of probes in Multi-probe;
- N: a generic number of buckets (i.e., values between 0 and  − 1);
Problem statement Given a set of keys of size  and a set of buckets of size  , we aim
to distribute all the keys evenly among the available buckets. We can map each bucket to an
integer between 0 and  − 1. Then we can use a traditional hashing algorithm to map each key
to an integer value. Finally, we can map such a value to an integer between 0 and  − 1 with
modular arithmetic. A good hashing function will distribute the keys evenly so that each node
will get about / keys. Unfortunately, simplistic approaches are not suitable in a distributed
scenario, because adding or removing a bucket would require a remapping of almost all the keys.
In particular, suppose we identify the buckets with the nodes of a distributed system and the
keys with the resources stored in the system: it is not desirable to redistribute all the resources
when scaling up or down the cluster. Ideally, only the keys stored in the buckets involved in
the resizing operation should be moved. Consistent hashing algorithms aim to address this
situation and distribute the keys among the buckets so that adding or removing a bucket will
cause only / keys to move.</p>
      <p>
        Consistent Hashing Ring This algorithm was first introduced in 1997 by David Karger, who
addressed the problem of distributing keys such that hot spots are decreased or eliminated,
there is no need for constant communication between caches, and the initial key-to-bucket
assignment is minimally afected as buckets change [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The proposed solution was aimed at
improving caching in distributed networks without the need for complete information about
their state. The original paper introduced the term consistent hashing, and described three key
properties of consistent hashing, namely smoothness, spread and load, which are related to the
balance and monotonicity metrics used in our analysis. In particular, smoothness measures the
expected fraction of keys that must be moved to a new bucket in order to maintain balance,
spread concerns the total number of buckets to which a object is assigned, whereas load is the
number of distinct objects assigned to a particular bucket. As the name implies, this algorithm is
based on a (unit) ring structure. A traditional base hash function maps each key to an numerical
value on the unit interval [
        <xref ref-type="bibr" rid="ref1">0 − 1</xref>
        ] and, therefore, to a position in the ring. Buckets as well are
hashed and mapped to a position in the ring using a traditional hash function. The position of
a key, as determined by the Ring algorithm, is chosen as being the first bucket found moving
clockwise from the key point in the ring (this bucket is referred to as the successor to the key).
When a bucket is removed, all of its keys are remapped to the next bucket, traversing the
ring clockwise. As this situation will lead to an unbalance in the key distribution (a bucket
adjacent to the one being removed might get twice the amount keys), the basic algorithm can
be improved by mapping each bucket multiple times in diferent positions using virtual nodes,
as proposed by Dynamo [15]. A major drawback of this algorithm is related to the memory
required for storing the topology of the ring and the association between nodes and buckets.
Rendezvous This algorithm, also known as highest random weight (HRW), was initially
published in 1996 by Thaler and Ravishankar [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Even though it predates the term consistent
hashing, the proposed approach aims at the same goals of achieving balanced data distribution
and minimal disruption. In contrast to Ring, there is no need for virtual nodes in order to
address unbalance, dropping the requirement for a map between nodes and virtual nodes
(which would increase memory usage and would need to be kept continuously updated): in this
regard, Rendezvous can be considered as a stateless algorithm, which trades minimal memory
requirements for a computational time proportional to the number of buckets. The key and the
identifier of each node are hashed together and the node generating the smallest hash for the
given key is chosen. This algorithm uses a smaller amount of memory than Ring, but finding
the correct node for a given key takes ( ). It can be a reasonable solution for systems with a
few nodes, but might not scale well as the number of nodes grows.
      </p>
      <p>
        Jump Jump [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] works by computing when its output changes as the number of buckets
increases. In contrast to the previously mentioned algorithms, it is not possible to assign
arbitrary identifiers to a node. Instead, Jump assumes that buckets are numbered sequentially,
therefore new buckets can only be added after the existing ones. Furthermore, removal of
arbitrary buckets is not supported. The algorithm takes an integer key and the number of
buckets  as an input, and computes a bucket number in the range [0,  − 1]. Jump uses 64-bit
linear congruential generator [16] (a pseudo-random generator) with a seed corresponding
to the key to be hashed. It computes pseudo-random jumps among the buckets until it finds
a position at or past the number of buckets (the output of the algorithms corresponds to the
previous computed bucket). As previously mentioned, since there is no internal data structure
to keep track of the working nodes, this algorithm assumes all buckets in the range [0,  − 1]
referring to working nodes. During the scale down of the cluster, only the bucket  − 1 can
be removed. Furthermore, it cannot handle the failure of a random node. These might be
considered substantial limitations preventing the potential use of this algorithm in real-life
environments. Jump’s authors claim that such limitations make the algorithm more suitable for
data storage applications than for distributed web caching. Although, storage hardware can fail
as any other hardware. Google describes a test where they sorted 1PB of data on 48000 hard
drives [17], and at each run, at least one disk managed to break. Therefore, Jump seems not to
be a good fit even for data storage applications.
      </p>
      <p>
        Multi-probe Multi-probe [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] aims at addressing the main limitation of Jump, namely the
impossibility of removing arbitrary nodes. However, in order to achieve a good balance it
requires hashing the key multiple times, negatively afecting its lookup performance.
Multiprobe puts nodes on a hash ring as done by the Ring algorithm, but instead of creating many
virtual nodes to improve the balance, the key is rehashed multiple times and the node with the
minimal distance is subsequently chosen. This algorithm uses less memory than Ring (with
virtual nodes), however, as the authors suggest, rehashing each key 21 times is required to
optimize the trade-of between performance and balance.
      </p>
      <p>
        Maglev Maglev is the name of a network load balancer and its underlying consistent hashing
algorithm [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. A network load balancer typically comprises multiple devices logically located
between routers and service endpoints (generally TCP or UDP servers), and is responsible for
matching each packet to its corresponding service and forwarding it to one of that service’s
endpoints. Maglev creates a lookup table where each node is listed  times ( = 128 in
our benchmarks). Each node gets a preference list of all the lookup table positions. This list
is constructed by having all the nodes take turns filling their most-preferred table positions
that are still empty until the lookup table is completely filled in. This procedure gives to each
node an almost equal share of the lookup table. Mapping a key to its destination node is very
fast, since finding the entry in the table takes (1). On the other hand, the table needs to be
recreated when nodes are added or removed. Furthermore, the authors suggest replicating each
node at least 100 times, causing the table size to become significant.
      </p>
      <p>Anchor The Anchor algorithm keeps track of the currently working nodes, and maps each
key to a bucket between 0 and − 1, where  is the number of available nodes (working and
not working). If the corresponding node is not working, the algorithm will rehash the key to
hit another bucket. The lookup ends when the algorithm selects a bucket related to a working
node. The algorithm uses an intelligent approach to avoid selecting the same bucket more than
once. Given  the number of working nodes, the algorithm keeps a set of working nodes
called working set (). Initially, the working set contains all the buckets between 0 and  − 1.
If a bucket  is removed, the working set becomes ∖{}, and Anchor manages to remap
the keys such that all the keys initially mapped to a bucket in ∖{} will still be mapped
to the same bucket, while the keys mapped to the bucket  will be evenly spread among the
remaining buckets. The authors describe two implementations. The first creates a new copy
of the working set for every removed bucket using Θ( +  2) memory and performing the
lookup in ((  )). The second is an in-place version that uses four arrays of integers to keep
track of the state of the cluster. The in-place version uses Θ() memory and takes ((  )2)
for the lookup. For our comparison, we implemented the in-place version.</p>
      <p>
        Dx Dx [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] combines ideas from several of the aforementioned algorithms. For example, it
keeps track of all the available nodes (like Anchor) using a bit-array to mark whether nodes
are working or not, and leverages a pseudo-random generator like Jump to determine the
target bucket. In order to compute a position in the range [0, − 1], Dx uses a pseudo-random
function () initialized with the key as the seed. Accordingly, a sequence of buckets in the
form (), (()), ((()))... can be generated, and the first working bucket is chosen.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Benchmarks</title>
      <p>All benchmarks have been performed on the same hardware, using an Intel® Core™ i7-1065G7
CPU with 4 cores and 8 threads, as well as 32GB of main memory. Each analyzed algorithm
leverages a collision-resistant non-cryptographic hash function for mapping a key to a bucket
[18]. We repeated each test for the following hash functions: XX [19], MD5 [20], and MURMUR3
[21]. Since there are no significant diferences between the tested functions [ 22], only the
results concerning XX will be presented. Repeating tests for every possible combination of
parameters would generate a combinatorial explosion of results that would make the article
verbose and dificult to interpret. For this reason, we decided to configure each algorithm with
the parameters experimentally found to perform best. We used 1000 virtual nodes for each
physical node in Ring as suggested by Dynamo [15] and a lookup table with a size greater than
100 times the number of working nodes in Maglev as suggested by the authors. More precisely,
we used 128 (i.e., 27) entries per node. The authors of Anchor and Dx do not suggest a proper
amount for the cluster capacity (). We found it reasonable to set the overall capacity of the
cluster to be at least 10 times the number of initial nodes.</p>
      <p>Datasets The distribution of the keys can influence the balance of the algorithms. If the
keys follow a uniform distribution, it is reasonable to assume the distribution’s result to be
balanced. We expect the algorithms also to be balanced for clustered distributions. We tested
each algorithm with three diferent distributions, namely a uniform distribution, a normal or
Gaussian distribution, and a clustered distribution extracted from a real-life dataset [23]. Since
all the algorithms result in a good balance for the first two distributions, we will show only the
worst-case scenario results represented by the clustered distribution.</p>
      <sec id="sec-3-1">
        <title>3.1. Performance Metrics</title>
        <p>We tested how the performance of the considered algorithms scales as the number of nodes
grows, by repeating each test for clusters with 10, 100, 1000 and 10000 nodes. In the following,
we will briefly describe every analyzed metric.</p>
        <p>Memory usage Most of the algorithms keep an internal data structure. For clusters with a
considerable amount of nodes, memory consumption is undoubtedly a critical factor, therefore
we deem important to evaluate this metric in relation to the size of the network.
Time We analyzed the time taken by each algorithm to initialize its internal structures, as
well as while adding or removing buckets, and when performing a lookup.</p>
        <p>
          Balance With the term balance, we mean the ability to spread the keys evenly among the
buckets. Given  the number of keys and  the number of buckets, ideally, we expect each
bucket to get  keys. Let   ≤  be the minimum number of keys in a bucket,
and   ≥  be the maximum number of keys in a bucket; we define  % =
  *  as the minimum amount of keys in a bucket related to the expected  , and
 % =   *  as the maximum amount of keys in a bucket related to the
expected  . The balance is defined as the interval [ %,  %]. We expect a
well-balanced algorithm to have such an interval close to [
          <xref ref-type="bibr" rid="ref1 ref1">1, 1</xref>
          ].
        </p>
        <p>Resize balance We expect the keys to be evenly distributed also after adding or removing
buckets. Given a system with  keys evenly distributed among  buckets, after adding a

new bucket, we expect each bucket to have (+1) keys, whereas after removing a bucket,
we expect each remaining bucket to have (−  1) keys. We measured the distribution of the
keys after adding and removing buckets a certain amount of times using the aforementioned
[ %,  %] interval.</p>
        <p>Monotonicity We expect that only the keys related to the nodes involved in the resize will

move during the resize of a cluster. More precisely, only (+1) keys should move to the new
node when a new node is added, and only the keys belonging to such node should move when
a node is removed. We can measure this property by tracking the position of each key before
and after the resize, accounting for the number of keys that are not behaving as expected.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Asymptotic Results</title>
      <p>Table 1 shows the asymptotic complexity in space and time of each algorithm. The value
represented by each variable has been summarized in Section 2. Concerning memory usage,
Jump is expected to be the best performer, because no data structure is required to run the
algorithm. Conversely, Maglev and Ring shall require memory space proportional to the number
of repetitions  of each node in the lookup table, and the number  of virtual nodes for each
physical node respectively. Regarding the lookup time, Maglev should be able to perform this
operation in constant time, whereas the other algorithms should scale roughly logarithmically
with respect to the size of the cluster. With respect to the initialization time, Jump is clearly
favored by not depending on any data structure. Similarly, concerning resize time, Anchor, Dx,
Jump, and Rendezvous should scale without issues.</p>
      <p>(a) Memory usage
(b) Initialization time</p>
    </sec>
    <sec id="sec-5">
      <title>Empirical Results</title>
      <p>This section will describe the results of the benchmarks for each metric. As stated before, the
results are pretty similar for diferent hash functions and distributions, therefore we will show
only the results for the clustered distribution and the XX hash function. In the following sections,
we will discuss each metric showing the related benchmarks in detail. The data reported in the
graphs refers to an average over 10 runs: error bars will be used to represent the variability of
the reported measurement across all runs.</p>
      <p>Memory usage As expected, Ring and Maglev are the algorithms with the most significant
memory usage (Figure 1a). The reason for Ring is the creation of 1000 virtual nodes for each
physical node, while the reason for Maglev is the creation of a lookup table with size more
than 100 times the number of nodes. Even if Maglev is the second worst after Ring in memory
usage, it still uses 100 times less memory than Ring because we implemented the lookup table
of Maglev using an array while the internal structure of Ring leverages a tree-map which causes
many wrapping objects to be created. The implementation of Ring can be changed to optimize
memory usage, but the asymptotic complexity will not change. Dx and Jump use the least
amount of memory, while Multi-probe, Rendezvous, and Anchor are in the middle of the pack.
For all the algorithms except Jump, the memory usage is proportional to the number of buckets.
As shown in the last chart, Jump uses a constant amount of memory, while Dx uses an amount
(a) Best case
(b) Worst case
of memory proportional to the number of buckets, though, Dx’s internal data structure is a
bit-array, and the actual consumption of memory is limited.</p>
      <sec id="sec-5-1">
        <title>Initialization time</title>
        <p>The initialization time (Figure 1b) is the time needed by the algorithm to initialize its internal
data structure. As expected, Ring is the slowest algorithm because it needs to create a sorted
list of  *  virtual nodes, with a suggested value of V equal to 1000, which takes time
(  2(  )). The second slowest is Maglev, the reason is the population of a lookup
table with more than 100 times the number of buckets. Every node will choose a position in
the table. If the chosen position is already taken, another one will be evaluated. The process
continues until every entry in the table is filled. Anchor, Dx, and Multi-probe use an average
amount of time for creating the list of nodes. Multi-probe uses a sorted list whose population
takes ( 2( )). The initialization time for Anchor and Dx is proportional to the cluster’s
capacity (overall number of nodes), in our benchmark we assumed the capacity to be ten
times the number of the initial working nodes ( ). The best performing algorithms during
initialization are Rendezvous and Jump. Rendezvous uses a set of nodes that can be populated
in Θ( ). On the other hand, Jump does not have any data structure to initialize; therefore, it
initializes, as expected by the asymptotic analysis, in constant time.</p>
        <p>Lookup time The lookup time is the time needed to find the node a given key belongs to,
assuming the cluster is stable (no nodes added or removed). In the best case, as shown in
Figure 2a, Rendezvous is the slowest because it checks the key against every node to find the
best match. It causes the lookup to be linear in the number of working nodes Θ( );
Multiprobe and Ring confirm a complexity of (2( )). As we can see in the chart, Multi-probe
is slower than Ring by a constant factor. In the case of Multi-probe, we used 21 probes as
suggested in the paper, therefore the complexity is (212( )). The complexity of Ring is
(2(  )). We used  = 1000 as suggested by the authors, therefore, the complexity is
(2(1000 )) ≤ (10 + 2( )). The remaining algorithms are very fast, in the order of
nanoseconds. From a theoretical point of view, Jump should be slower than Anchor and Dx, but
(a) Resize time
(b) Balance of key distribution
practically it shows to be faster because it does not access the memory, and therefore it works
at CPU speed. On the other hand, Maglev should perform the lookup in constant time (Θ(1)),
but using a big lookup table causes many memory accesses that slow down the algorithm. The
asymptotic complexity for Anchor is ((  )) and the complexity for Dx is (  ); therefore,
the worst-case scenario for these two algorithms is when  is much smaller than  (i.e.,  is
big). We tested the worst-case scenario using a cluster of 10000 nodes and collecting the lookup
time after removing the nodes from 10% to 90%. As shown in Figure 2b for Dx, the lookup
time grows quite rapidly while the lookup time of Anchor is less afected by the removal of the
nodes. As expected, Jump is getting faster because its complexity is proportional to the number
of working nodes. Even in the worst-case scenario, these algorithms are faster in lookup than
all the others (except for Maglev).</p>
        <p>Resize time Resize time is the time needed by the algorithm to update the data structure
when adding or removing buckets. We tested diferent situations by adding and removing
buckets from 10% to 50% of the initial cluster size. We will show the average time for adding or
removing one bucket, all algorithms but Maglev do not allow multiple removals at the same time.
As expected, Ring and Maglev are the slowest due to the size of their internal data structures. In
particular, Maglev is very slow because every change in the cluster size causes the rebuild of the
lookup table. Therefore, even adding or removing one single bucket will have a complexity of
Θ(100 *  ). This cost could be amortized should multiple removals happen concurrently (as
the lookup table could be rebuilt only once). Nonetheless, as shown in Figure 3a, the diference
compared to other algorithms is so significant that such an optimization is unlikely to make
Maglev fair well in this metric. The second slowest resize time is obtained by Ring due to adding
or removing a certain number of virtual nodes (e.g., 1000) for each physical node. In particular,
the virtual nodes need to be added to a sorted list, therefore the complexity to add or remove
a node is ( 2(  )). Jump is the fastest because it does not have any data structure to
update. Anchor, Dx, and Rendezvous are slower but still resized in constant time. Multi-probe
keeps a sorted list of nodes, and adding or removing a node will take (2( )).
Balance The balance is conditioned by the way keys are distributed. As mentioned before,
we tested this metric using three diferent distributions: uniform (best-case scenario where keys
are already uniformly distributed), normal (average-case scenario where keys are distributed
following a Gaussian curve), and clustered (worst-case scenario where keys are concentrated in
some intervals leaving other areas empty). In the best-case and average-case scenarios, almost
all algorithms are well balanced except for Multi-probe, which seems to ignore some nodes
completely. In the worst-case scenario, represented by a clustered distribution of the keys,
there are algorithms able to keep a satisfying balance while others lose balance completely.
To represent the balance, we measured each node’s percentage of keys collected. Then, we
took the node with the lower percentage and the one with the highest. The chart in Figure
3b shows a list of intervals. A well-balanced algorithm shows an interval close to 100%. The
larger is the interval, and the worst is the balance. We can see that Anchor, Dx, and Jump are
pretty well balanced throughout our benchmark. On the contrary, Rendezvous, Ring, and Maglev
lose their balance as the cluster grows. In particular, Rendezvous is not balanced even on small
clusters. Finally, Multi-probe is the worst performing in balance. When the cluster grows in
size, Multi-probe ends by completely ignoring some nodes, which is an undesired behavior.
Balance after resizing We performed the same test after resizing the cluster. We added and
removed random nodes several times before executing the balance test. The results of this test
prove that the resize of the cluster does not afect the balance of the selected algorithms.
Monotonicity Finally, we tested the property of monotonicity. This property states that only
the keys related to the nodes involved in the resize should move during the resize of a cluster.
We tested this property by tracking the position of each key before and after the resize and by
counting the number of keys that are not behaving as expected. We sized the number of keys in
order to have 1000 keys for every node. Subsequently, starting with a stable cluster, we stored
the destination of each key. Next, we removed up to 50% of the nodes and stored the destination
of each key after the removal. Next, we restored the removed nodes and stored the destination
of each key after the restoration. Finally, we compared the stored values. Monotonicity proved
to hold for every algorithm, and when we add or remove up to the 50% of the nodes, only the
keys involved in the resize will move from one node to another. The nodes not involved in the
resize will not be afected.
After analyzing all the benchmarks, we graded each algorithm for each metric and grouped them
into three categories (best performing, average performing, and worst performing), as shown in
Table 3. The algorithms showing the best performance for all the analyzed metrics are Anchor,
Dx, and Jump. We compared in more detail those three algorithms and summarized the results
in Table 2. The comparison is based on a cluster of 10000 working nodes, assuming an overall
capacity () of ten times the number of working nodes. It is also based on the XX hash function
and the clustered distribution. These three algorithms are equal in balance, balance after resizing,
and monotonicity. Jump does not use any internal data structure; therefore, it uses minimal
memory and is the fastest in any metric. On the other hand, Jump has the remarkable limitation
that only the last inserted node can be removed. It means that Jump is not able to handle the
failure of a random node in the cluster, making it unsuitable for production environments and
jeopardizing its excellent performance. Anchor and Dx are addressing this limitation by using
an internal data structure to keep track of the cluster’s nodes (both working and removed)
which causes the algorithms to be slower during initialization and resize, but they reach similar
performance in lookup operations. Nevertheless, Anchor and Dx are the fastest after Jump,
performing lookups in the range of nanoseconds and avoiding Jump’s limitations which makes
them the best choice to handle consistent hashing in non-peer-to-peer environments.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>4. Conclusions</title>
      <p>This paper surveyed and compared the most prominent consistent hashing algorithms for
distributed databases and cloud infrastructures, published from 1997 to 2021. We analyzed Ring,
the first algorithm of this kind, published in 1997 but still adopted by many distributed systems.
We analyzed Rendezvous, published in 1996, which can be considered its principal competitor.
We compared them with Jump, Multi-probe, and Maglev, published by Google between 2014 and
2016. Maglev is the algorithm used by Google’s network load balancers. Finally, we analyzed
Anchor and Dx published in 2020 and 2021. These last two algorithms have not yet seen a
significant adoption, but they are very promising and have interesting asymptotic curves. The
metrics involved in the comparison were memory usage, initialization, resize and lookup time,
balance, and monotonicity. We performed several benchmarks using diefrent key distributions
and hashing algorithms. Our results match the asymptotic curves and show that Ring sufers
from higher than average memory usage due to virtual nodes and, similar to Maglev, is penalized
due to its algorithmic complexity during initialization and resizing. Moreover, Multi-probe is
the worst performer when it comes to balance and resize balance. Overall, we found Jump,
Anchor, and Dx to perform better than the other algorithms on all the considered metrics. In
particular, Jump is the best-performing algorithm because it does not use any internal data
structure, which allows it to work at CPU speed. On the other hand, using no internal data
structure makes Jump a stateless algorithm unable to handle random failures. This limit makes
Jump unsuitable for real-life environments despite its excellent performance, therefore in the
majority of use cases, only Anchor and Dx are worth considering.
[12] D. Gryski, Consistent hashing: Algorithmic tradeofs, 2018. URL: https://dgryski.medium.</p>
      <p>com/consistent-hashing-algorithmic-tradeofs-ef6b8e2fcae8.
[13] A. Slesarev, M. Mikhailov, G. Chernishev, Benchmarking hashing algorithms for load
balancing in a distributed database environment, in: P. Fournier-Viger, A. Hassan, L.
Bellatreche, A. Awad, A. Ait Wakrime, Y. Ouhammou, I. Ait Sadoune (Eds.), Advances in Model
and Data Engineering in the Digitalization Era, Springer Nature Switzerland, Cham, 2022,
pp. 105–118.
[14] Institute of Information Systems and Networking at SUPSI,
java-consistenthashing-algorithms, 2021. URL: https://github.com/SUPSI-DTI-ISIN/
java-consistent-hashing-algorithms.
[15] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S.
Sivasubramanian, P. Vosshall, W. Vogels, Dynamo: Amazon’s highly available key-value
store, SIGOPS Oper. Syst. Rev. 41 (2007) 205–220. doi:http://doi.acm.org/10.1145/
1323293.1294281.
[16] P. L’Ecuyer, Tables of linear congruential generators of diferent sizes and good lattice
structure, Math. Comput. 68 (1999) 249–260. doi:10.1090/S0025-5718-99-00996-5.
[17] G. Czajkowski, Sorting 1pb with mapreduce, 2008. URL: https://googleblog.blogspot.com/
2008/11/sorting-1pb-with-mapreduce.html.
[18] C. Estébanez, Y. Saez, G. Recio, P. Isasi, Performance of the most common
noncryptographic hash functions, Software: Practice and Experience 44 (2014) 681–698.
[19] xxhash, https://github.com/Cyan4973/xxHash, 2014. Accessed: 2022-05-02.
[20] R. L. Rivest, The MD5 message digest algorithm, RFC 1321, 1992. ftp://ftp.rfc-editor.org/
in-notes/rfc1321.txt.
[21] A. Appleby, Murmurhash3, https://github.com/aappleby/smhasher, 2015. Accessed:
202205-02.
[22] S. Priyamvada, Analysis of various hash function, International Journal of Innovative</p>
      <p>Science and Research Technology 3 (2018).
[23] M. Coluzzi, A. Brocco, P. Contu, T. Leidi, Clustered distribution for consistent hashing
algorithms, 2022. URL: https://dx.doi.org/10.21227/r6ps-dz05. doi:10.21227/r6ps-dz05.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Thaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. V.</given-names>
            <surname>Ravishankar</surname>
          </string-name>
          ,
          <article-title>A name-based mapping scheme for rendezvous</article-title>
          ,
          <source>in: Technical Report CSE-TR-316-96</source>
          , University of Michigan, University of Michigan,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Karger</surname>
          </string-name>
          , E. Lehman,
          <string-name>
            <given-names>T.</given-names>
            <surname>Leighton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Panigrahy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Levine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lewin</surname>
          </string-name>
          ,
          <article-title>Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web</article-title>
          ,
          <source>in: Proceedings of the twenty-ninth annual ACM symposium on Theory of computing</source>
          ,
          <year>1997</year>
          , pp.
          <fpage>654</fpage>
          -
          <lpage>663</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sherman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Berkheimer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bogstad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Dhanidina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Iwamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Matkins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yerushalmi</surname>
          </string-name>
          ,
          <article-title>Web caching with consistent hashing</article-title>
          ,
          <source>Comput. Netw</source>
          .
          <volume>31</volume>
          (
          <year>1999</year>
          )
          <fpage>1203</fpage>
          -
          <lpage>1213</lpage>
          . URL: https://doi.org/10.1016/S1389-
          <volume>1286</volume>
          (
          <issue>99</issue>
          )
          <fpage>00055</fpage>
          -
          <lpage>9</lpage>
          . doi:
          <volume>10</volume>
          .1016/ S1389-
          <volume>1286</volume>
          (
          <issue>99</issue>
          )
          <fpage>00055</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D. G.</given-names>
            <surname>Thaler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. V.</given-names>
            <surname>Ravishankar</surname>
          </string-name>
          ,
          <article-title>Using name-based mappings to increase hit rates</article-title>
          ,
          <source>IEEE/ACM Transactions on networking 6</source>
          (
          <year>1998</year>
          )
          <fpage>1</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Lamping</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Veach</surname>
          </string-name>
          ,
          <article-title>A fast, minimal memory, consistent hash algorithm</article-title>
          ,
          <source>arXiv preprint arXiv:1406.2294</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Appleton</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>O'Reilly, Multi-probe consistent hashing</article-title>
          ,
          <source>arXiv preprint arXiv:1505.00062</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Eisenbud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Contavalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kononov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mann-Hielscher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cilingiroglu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cheyney</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Hosein</surname>
          </string-name>
          ,
          <article-title>Maglev: A fast and reliable software network load balancer</article-title>
          ,
          <source>in: 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>523</fpage>
          -
          <lpage>535</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mendelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Vargaftik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Barabash</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. H.</given-names>
            <surname>Lorenz</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Keslassy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Orda</surname>
          </string-name>
          ,
          <article-title>Anchorhash: A scalable consistent hash</article-title>
          ,
          <source>IEEE/ACM Transactions on Networking</source>
          <volume>29</volume>
          (
          <year>2020</year>
          )
          <fpage>517</fpage>
          -
          <lpage>528</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Dxhash: A scalable consistent hash based on the pseudo-random sequence</article-title>
          ,
          <source>arXiv preprint arXiv:2107.07930</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mirrokni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Thorup</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zadimoghaddam</surname>
          </string-name>
          ,
          <article-title>Consistent hashing with bounded loads</article-title>
          ,
          <year>2016</year>
          . URL: https://arxiv.org/abs/1608.01350. doi:
          <volume>10</volume>
          .48550/ARXIV.1608.01350.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Coleman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Shrivastava</surname>
          </string-name>
          ,
          <article-title>Revisiting consistent hashing with bounded loads</article-title>
          ,
          <source>in: AAAI</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>