<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Workload Cost Optimization Using Dynamic Replication in Decentralized Systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Ryoga</forename><surname>Yoshida</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Osaka University</orgName>
								<address>
									<addrLine>Yamadaoka</addrLine>
									<postCode>565-0871</postCode>
									<settlement>Suita</settlement>
									<region>Osaka</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chuan</forename><surname>Xiao</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Osaka University</orgName>
								<address>
									<addrLine>Yamadaoka</addrLine>
									<postCode>565-0871</postCode>
									<settlement>Suita</settlement>
									<region>Osaka</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Makoto</forename><surname>Onizuka</surname></persName>
							<affiliation key="aff0">
								<orgName type="institution">Osaka University</orgName>
								<address>
									<addrLine>Yamadaoka</addrLine>
									<postCode>565-0871</postCode>
									<settlement>Suita</settlement>
									<region>Osaka</region>
									<country key="JP">Japan</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Workload Cost Optimization Using Dynamic Replication in Decentralized Systems</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">551EE289B63BEC029D49DA440ABA439D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:09+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>dynamic replication</term>
					<term>decentralized systems</term>
					<term>transaction management</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Data replication plays a crucial role in decentralized systems by enhancing durability and availability. The ADR algorithm is a dynamic replication method that optimizes communication costs by adaptively adjusting the number of replicas. However, it overlooks workload costs, which are critical in real-world applications, leading to suboptimal performance, especially in parallel processing environments. To address this limitation, we propose an enhanced ADR algorithm that incorporates both communication and computational costs. Our method refines the cost model by considering the maximum-cost execution path in update transactions, ensuring a more accurate workload estimation. Additionally, we introduce an improved expansion-contraction test that efficiently optimizes replication placement. Experimental evaluations across various network topologies demonstrate that the proposed method achieves up to 12% higher throughput than the existing ADR algorithm, particularly in read-heavy environments. These results indicate that our approach provides a more balanced and efficient replication strategy, adapting to diverse workload patterns in decentralized systems.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Data replication is a fundamental technique in decentralized systems, where data is replicated and stored across multiple processors. When writing data, transactions synchronize update transactions on the replicas across multiple processors to ensure durability. When reading data, the data can be retrieved from any processor holding the latest version, thereby maintaining consistency.</p><p>The number of processors where the latest data is replicated (we call replication processors) is a critical factor in data replication and can significantly impact system performance; e.g., in systems with read-heavy workloads, if the number of replication processors is small, it leads to frequent data retrieval from remote replication processors, degrading performance. In contrast, in systems with update-heavy workloads, if the number of replication processors is large, it increases the update load and degrades performance. Thus, for read-heavy workloads, the number of replication processors should be large to reduce the number of data retrievals from remote replication processors, while for update-heavy workloads, the number of replication processors should be small to decrease the update loads.</p><p>The optimal number of replication processors typically depends on the frequency of read and update transactions on each processor. In many decentralized systems, system designers must define the number of replication processors statically during the design phase, and then manually adjust it during the production phase <ref type="bibr" target="#b0">[1]</ref>. However, this approach is suboptimal in environments with frequently fluctuating read/update transactions and is also inefficient due to the manual effort required by system designers. To overcome this, dynamic replication techniques are promising in the sense that they adaptively adjust the number of replication Envelope yoshida.ryoga@ist.osaka-u.ac.jp (R. Yoshida); chuanx@ist.osaka-u.ac.jp (C. Xiao); onizuka@ist.osaka-u.ac.jp (M. Onizuka) GLOBE https://sites.google.com/site/chuanxiao1983 (C. Xiao); http: //www-bigdata.ist.osaka-u.ac.jp/professor/onizuka/onizuka_en.html (M. Onizuka) processors.</p><p>Specifically, the ADR algorithm <ref type="bibr" target="#b0">[1]</ref> is one of these dynamic replication techniques. It adaptively changes the number of replication processors according to the read/update transactions by periodically making the expansion and contraction tests. However, it has two issues: (1) it focuses solely on optimizing communication cost and does not consider workload cost, which is more critical in real-world applications, and (2) prioritizes minimizing communication cost, which can lead to longer overall transaction execution times.</p><p>To overcome the above issues, we propose an enhanced ADR algorithm. Specifically, in addition to communication costs, we consider processor computational costs, allowing for a more accurate estimation of overall workload cost. Furthermore, since the execution time of update transactions in parallel environments is determined by the heaviest computational path, we redefine update transaction cost by focusing only on the maximum cost path, rather than summing up the weights of all paths.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Preliminaries</head><p>In decentralized systems, minimizing the workload cost across the entire system is a critical factor. We focus on applications that perform system-wide operations with the objective of reducing overall workload cost. Various dynamic replication algorithms have been proposed <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2,</ref><ref type="bibr" target="#b2">3,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b4">5]</ref>, among which the ADR algorithm <ref type="bibr" target="#b0">[1]</ref> is designed to optimize overall communication cost by adaptively modifying the replication scheme 𝑅. Given its objective, it is considered the most relevant algorithm for achieving the goal of this study.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Replication Scheme</head><p>A replication scheme 𝑅 represents the set of processors that hold the latest replicas and forms a variable-sized "amoeba" that shifts toward the center of the network of read/write (read/update) requests. 𝑅 is created for each data object. When the number of read requests increases, the ADR algorithm expands 𝑅 to reduce the communication cost by responding to read requests from a local processor or nearby processors. In contrast, when the number of write requests increases, the ADR algorithm shrinks 𝑅 to reduce the overhead of updating replicas in 𝑅. Hereafter, the processors included in 𝑅 are referred to as 𝑅 processors.</p><p>For data reading, if the processor where a read request occurs belongs to 𝑅, the processor reads the local replica. If the processor does not belong to 𝑅, the processor sequentially sends read requests to its neighboring processors. Once the request reaches an 𝑅 processor, it returns the replica to the requesting processor. For data writing, the replicas in all 𝑅 processors are updated synchronously by repeatedly sending the data to neighboring processors.</p><p>As an example, consider the communication network shown in Figure <ref type="figure" target="#fig_1">1</ref>. The replication scheme 𝑅 consists of processors 𝑝4 and 𝑝5, depicted in green. When reading data at processor 𝑝1, the nearest 𝑅 processor is processor 𝑝4, from which the data is fetched. When writing data at processor 𝑝1, the update is sequentially sent to 𝑝4 and 𝑝5 via 𝑝2, and the replicas are updated on those processors.</p><p>Under the ADR algorithm, the replication scheme 𝑅 always forms a single connected set of processors. In addition, 𝑅 is created in various object units, such as a tuple, block, or text file. It is guaranteed that when the read-write pattern at each processor -the number of reads and writes issued by each processor -is regular, the replication scheme converges to the optimal configuration, regardless of the initial scheme <ref type="bibr" target="#b0">[1]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Expansion and Contraction</head><p>𝑅 is periodically adjusted through expansion and contraction<ref type="foot" target="#foot_0">1</ref> every fixed period. Expansion occurs in systems with read-intensive workloads, increasing the size of 𝑅 (i.e., adding more processors to 𝑅) to reduce the communication cost between the requesting processor and the 𝑅 processors. In contrast, contraction occurs in systems with writeintensive workloads, decreasing the size of 𝑅 (i.e., removing processors from 𝑅) to reduce the communication cost between the 𝑅 processors. Whether to expand or contract 𝑅 is determined by executing the expansion and contraction tests, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Issues of the ADR Algorithm</head><p>The ADR algorithm focuses solely on optimizing communication cost and does not consider workload cost, which is more critical in real-world applications. As a result, it fails to consider disparities in processing costs among processors or disparities in the execution times of transactions between read and write operations.</p><p>Additionally, in parallel processing environments, there are cases where communication time increases, but trans-action execution time decreases. In such situations, the ADR algorithm prioritizes minimizing communication cost, which can inadvertently prolong overall transaction execution time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Proposed Method</head><p>This section introduces an improved version of the ADR algorithm. As noted earlier, the ADR algorithm optimizes only communication cost, neglecting workload cost, which is more crucial in real-world applications. To overcome this issue, the proposed method modifies the cost function of the ADR algorithm and redefines the optimization equation for a more realistic workload representation.</p><p>Specifically, in addition to communication costs, the proposed method considers processor computational costs, allowing for a more accurate estimation of overall workload cost. Furthermore, since update transaction processing time in parallel execution environments is determined by the heaviest computational path, the proposed method redefines update transaction cost by focusing only on the maximum cost path, rather than summing up the weights of all paths.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Optimization Formula</head><p>We revise the ADR algorithm to optimize the workload cost across the entire system. To optimize the workload cost rather than the communication cost, we extend the objective formula using not only the number of communications but also its associated read/update cost.</p><p>The workload cost and optimization formulation are defined as follows:</p><formula xml:id="formula_0">C workload (𝑅) ∶= ∑ 𝑣∈𝑉 (#U(𝑣, 𝑅) × C u (𝑣, 𝑅) + #F(𝑣, 𝑅) × C r (𝑣, 𝑅)) argmin 𝑅 C workload (𝑅)</formula><p>where 𝑣 denotes a processor in the network, 𝑉 denotes the set of all processors, 𝑅 denotes the replication scheme, #U(𝑣, 𝑅) denotes the number of update transactions, C u (𝑣, 𝑅) denotes the cost of an update transaction, #F(𝑣, 𝑅) denotes the number of fetch transactions, and C r (𝑣, 𝑅) denotes the cost of a read transaction. The goal of the optimization formula is to find the replication scheme 𝑅 that minimizes the workload cost. In practice, 𝑅 is gradually adjusted to progressively reduce the workload cost as much as possible.</p><p>In the proposed method, #F(𝑣, 𝑅), C r (𝑣, 𝑅), and C u (𝑣, 𝑅) are defined as follows:</p><formula xml:id="formula_1">C u (𝑣, 𝑅) ∶= L u (𝑣, 𝑅) #F(𝑣, 𝑅) ∶= #R(𝑣, 𝑅)𝛽(𝑣, 𝑅) C r (𝑣, 𝑅) ∶= L r (𝑣, 𝑅)</formula><p>where L u (𝑣, 𝑅) is the distance to the farthest 𝑅 processor from processor 𝑣, 𝛽(𝑣, 𝑅) is the cache miss rate, and L r (𝑣, 𝑅) is the distance from the nearest 𝑅 processor to processor 𝑣.</p><p>When a cache hit occurs, no fetch operation is triggered, allowing local reads with zero cost. Therefore, the workload cost at a processor considers only the read costs incurred by fetch operations, which is determined by multiplying the number of read transactions #R(𝑣, 𝑅) by the cache miss rate 𝛽(𝑣, 𝑅), resulting in #F(𝑣, 𝑅).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Expansion-Contraction Test</head><p>The workload cost is optimized through an expansioncontraction test, which is executed after every 𝑘 successfully completed transactions. In the expansion-contraction test, for each expansion-contraction pattern, the system determines whether to expand expandable processors or shrink shrinkable processors by calculating the differential workload cost.</p><p>Similar to the ADR algorithm, each expansioncontraction test restricts operations such as expanding or contracting beyond one hop and forming a discontinuous 𝑅. These restrictions are imposed due to computational complexity concerns and the potential excessive fluctuation in #F(𝑣) and #U(𝑣) before and after expansion-contraction.</p><p>Next, we explain the method for computing 𝛿(𝑅), the optimal expansion-contraction pattern set that minimizes the differential workload cost. 𝛿(𝑅) is calculated as follows:</p><formula xml:id="formula_2">𝛿(𝑅) = argmin 𝐸 𝑖 ⊆𝐸,𝐶 𝑗 ⊆𝐶 C workload (𝑅 𝐸 𝑖 ,𝐶 𝑗 ) − C workload (𝑅) = argmin 𝐸 𝑖 ⊆𝐸,𝐶 𝑗 ⊆𝐶 ∑ 𝑣∈𝑉 #U(𝑣, 𝑅 𝐸 𝑖 ,𝐶 𝑗 ) × L u (𝑣, 𝑅 𝐸 𝑖 ,𝐶 𝑗 ) + ∑ 𝑣∈𝑉 #F(𝑣, 𝑅) × Δ 𝐸 𝑖 ,𝐶 𝑗 L r (𝑣, 𝑅)) − ∑ 𝑣∈𝑉 #U(𝑣, 𝑅) × L u (𝑣, 𝑅)<label>(1)</label></formula><p>where 𝐸 denotes the set of expandable processors, and 𝐶 represents the set of contractible processors. 𝑅 𝐸 𝑖 ,𝐶 𝑗 denotes the replication scheme after expanding processors 𝐸 𝑖 and contracting processors 𝐶 𝑗 . Δ 𝐸 𝑖 ,𝐶 𝑗 𝑓 (𝑅) denotes 𝑓 (𝑅 𝐸 𝑖 ,𝐶 𝑗 ) − 𝑓 (𝑅). Here, assuming that Δ 𝐸 𝑖 ,𝐶 𝑗 𝑅 = 𝑅 𝐸 𝑖 ,𝐶 𝑗 − 𝑅 is sufficiently small, it is approximated that #U(𝑣, 𝑅) = #U(𝑣, 𝑅 𝐸 𝑖 ,𝐶 𝑗 ) and #F(𝑣, 𝑅) = #F(𝑣, 𝑅 𝐸 𝑖 ,𝐶 𝑗 ).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Reduction of the Search Space</head><p>Equation 1 requires evaluating all possible patterns, where each expandable processor can either be expanded or not, leading to 2 |𝐸| possibilities, and each contractible processor can either be contracted or not, leading to 2 |𝐶| possibilities. A straightforward computation results in an exponential search space of 𝑂(2 |𝐸|+|𝐶| ), which is impractical for scalability. Thus, reducing the search space is necessary. Figure <ref type="figure" target="#fig_3">2</ref> illustrates the concept of reducing the search space (processors and nodes are treated as equivalent in this figure). For simplicity, assume that updates originate only from the center processor of the 𝑅-tree (we call 𝑅-center processor) and that all processor-to-processor distances are 1. After expansion-contraction, processors can be grouped based on their distance from the 𝑅-center processor, referred to as the maximum 𝑅-center distance in this paper. Graphs with identical maximum 𝑅-center distances exhibit the same update costs, making total cost dependent solely on read operations. Since read costs decrease as 𝑅 expands, only the case with the largest 𝑅 set within each group needs to be considered. Thus, the number of such groups determines the search space, which corresponds to the possible values of the maximum 𝑅-center distance after expansion-contraction, resulting in a complexity of 𝑂(|𝐸| + |𝐶|).</p><p>Only 𝑅-leaf processors can become maximum 𝑅-center processors after expansion-contraction. The number of possible types of 𝑅-leaf processors after expansion-contraction consists of the original 𝑅-leaf processors (|𝐶|), newly expanded 𝑅-leaf processors (|𝐸|), and processors that became The one with the most is optimal</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Enumerate of expansion-contraction pattern</head><p>-center node 1 2</p><p>Assume that updates occur only from the node 𝑅-leaf processors due to contraction (|𝐶|). Since all preexpansion 𝑅-leaf processors must be contractible, are exactly |𝐶| such processors. Consequently, the worst-case search space is 𝑂(|𝐶| + |𝐸| + |𝐶|) = 𝑂(|𝐸| + |𝐶|). In Figure <ref type="figure" target="#fig_3">2</ref>, there are only two groups based on the maximum 𝑅-center distance: 1 and 2, meaning that only these two groups need to be considered for optimal expansion-contraction patterns. However, in real scenarios, updates originate from multiple processors, not just the center processor. In such cases, even if the maximum 𝑅-center distance remains unchanged, the 𝑅-eccentric distance (the maximum shortest distance from an 𝑅 processor to any other 𝑅 processor) may vary, requiring separate calculations. By treating these separately, the search space is proven (proof omitted) to be 𝑂((|𝐸| + |𝐶|)<ref type="foot" target="#foot_1">2</ref> |𝑁 𝑅 (𝜎 𝑅 )|) where 𝑁 𝑅 (𝜎 𝑅 ) denotes neighboring 𝑅 processors of the 𝑅-center processor. Additionally, using tree dynamic programming (DP) and the sliding window technique, the expansion-contraction test can be computed in</p><formula xml:id="formula_3">𝑂(|𝑉 | + |𝑁 𝑅 (𝜎 𝑅 )|(|𝐸| + |𝐶|) log(|𝐸| + |𝐶|)) time.</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Experiments</head><p>This section presents experimental evaluations comparing the proposed method 2 with the ADR algorithm across three characteristic topologies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Experimental Setup</head><p>We conducted the experiments on an EC2 m5.16xlarge instance using Dejima <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b7">8,</ref><ref type="bibr" target="#b8">9,</ref><ref type="bibr" target="#b9">10]</ref>. Dejima is a decentralized data management system designed for flexible data integration at the database level with global consistency. Each processor was represented by deploying multiple Docker containers on a single machine. For concurrency control, the Two-Phase Locking (2PL) protocol <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13</ref>] was adopted. The evaluation criterion is throughput. Throughput was calculated by dividing the total number of successful transactions (reads and updates) executed across all processors by the execution time of 300 seconds. Additionally, the throughput was measured after the replication scheme 𝑅 had converged and stabilized. The replication scheme 𝑅 was created at the record level to minimize expansion cost. The expansion-contraction test was triggered every 𝑘 = 5 transactions. The topologies used in the experiments are shown in Figure <ref type="figure" target="#fig_4">3</ref>. The numbers in parentheses indicate the number of processors (nodes) in each topology.</p><p>We consider two types of transactions: (1) Update that modifies a column in a record, and (2) Read that reads all columns of a record. The table structure, update method, and read method in the RDBMS adhered to the YCSB <ref type="bibr" target="#b13">[14]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Star(4)</head><p>Line <ref type="bibr" target="#b8">(9)</ref> General( <ref type="formula">10</ref>)  In this experiment, 100 records were inserted into each processor as initial records, and these records were propagated across the entire system. For example, in General <ref type="bibr" target="#b9">(10)</ref>, 100 records are initially inserted into each processor, resulting in a total of 1,000 records.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Experimental Results</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.1.">Star Topology</head><p>The experimental results for the star topology are shown in Table <ref type="table" target="#tab_0">1</ref> and Table <ref type="table" target="#tab_1">2</ref>. Table <ref type="table" target="#tab_0">1</ref> compares the case where the replication scheme 𝑅 is minimized, meaning only the center processor is part of 𝑅, and the case where 𝑅 is maximized, meaning updates are propagated to all processors. Table <ref type="table" target="#tab_1">2</ref> shows the results of the existing method (the ADR algorithm) and the proposed method, along with their ratio (relative throughput).</p><p>For Star(4), as shown in Table <ref type="table" target="#tab_0">1</ref>, as the read ratio increases, max |𝑅| achieves higher throughput than min |𝑅|, with the performance gap widening at higher read ratios. As the proportion of read transactions increases, their impact becomes greater than that of update transactions, making it more effective to expand 𝑅 to reduce workload cost.</p><p>A comparison of the existing method and the proposed method in Star(4) is shown in Table <ref type="table" target="#tab_1">2</ref>. In the existing method, performance remains stable when the read ratio is low but degrades significantly as the read ratio increases. This is because the existing method tends to overestimate update costs in parallel processing environments, leading to an unnecessarily small |𝑅| and performance degradation in readheavy environments. This overestimation occurs because the existing method only considers communication cost, ignoring cases where updates can be executed concurrently without increasing execution time.</p><p>In contrast, the proposed method mitigates performance degradation due to its consideration of parallel execution costs. However, a slight performance decline was still observed, possibly due to the small 𝑘 value, which affects the accuracy of statistical data in the expansion-contraction test. Increasing 𝑘 could improve the accuracy and lead to better performance. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.2.">Linear and General Topologies</head><p>The experimental results for Linear <ref type="bibr" target="#b8">(9)</ref> and General <ref type="bibr" target="#b9">(10)</ref> topologies are shown in Table <ref type="table" target="#tab_2">3</ref> and Table <ref type="table" target="#tab_3">4</ref>.</p><p>In Linear <ref type="bibr" target="#b8">(9)</ref>, as shown in Table <ref type="table" target="#tab_2">3</ref>, as the read ratio increases, the throughput gap between min |𝑅| and max |𝑅| widens, favoring min |𝑅|. This is because a lower read ratio results in a higher proportion of update transactions, making a smaller |𝑅| more advantageous. Table <ref type="table" target="#tab_3">4</ref> shows that both methods achieve performance close to the optimal min |𝑅| case, demonstrating successful optimization. The reason for the lack of a significant difference between the two methods is that, unlike Star topology, Linear topology has a lower degree of parallelism, which reduces the performance gap between the methods.</p><p>In General <ref type="bibr" target="#b9">(10)</ref>, similar to Linear(9), as shown in Table <ref type="table" target="#tab_2">3</ref>, as the read ratio increases, the throughput gap between min |𝑅| and max |𝑅| widens, favoring min |𝑅|. Additionally, at a 10% read ratio, Table <ref type="table" target="#tab_3">4</ref> shows that both methods achieve optimal values with no notable difference. At 50% and 90% read ratios, the proposed method achieves higher throughput than the existing method. This result, as in Star topology, is attributed to the consideration of parallel computation and per-processor costs. At a 90% read ratio, the existing method underperforms both min |𝑅| and max |𝑅|, while the proposed method surpasses both. This indicates that the ADR algorithm sometimes converges to a worse solution than either min |𝑅| or max |𝑅|, whereas the proposed method has the potential to reach an optimal solution that is neither the smallest nor the largest |𝑅|.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>DOLAP 2025: 27th International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, co-located with EDBT/ICDT 2025, March 25, 2025, Barcelona, Spain * Corresponding author.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Replication scheme example, which consists of processors 𝑝4 and 𝑝5, depicted in green.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Illustration of search space reduction in expansioncontraction tests.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Topologies used in the experiments.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Comparison of throughput between min |𝑅| and max |𝑅| in Star topology. When generating the initial records for each table, record insertions into the base table of each processor were propagated to multiple processors via Dejima's data-sharing mechanism.</figDesc><table><row><cell></cell><cell></cell><cell>Star(4)</cell><cell></cell></row><row><cell>Read ratio</cell><cell>10</cell><cell>50</cell><cell>90</cell></row><row><cell>min |𝑅|</cell><cell cols="2">41.5 79.5</cell><cell>242.8</cell></row><row><cell>max |𝑅|</cell><cell>40.1</cell><cell cols="2">71.7 327.4</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Comparison of throughput between the existing method and the proposed method in Star topology.</figDesc><table><row><cell></cell><cell></cell><cell>Star(4)</cell><cell></cell></row><row><cell>Read ratio</cell><cell>10</cell><cell>50</cell><cell>90</cell></row><row><cell>Existing</cell><cell cols="2">41.8 77.2</cell><cell>297.4</cell></row><row><cell>Proposed</cell><cell cols="3">41.4 76.5 324.6</cell></row><row><cell>Ratio</cell><cell>-1%</cell><cell>-1%</cell><cell>+9%</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Comparison of throughput between min |𝑅| and max |𝑅| in Linear and General topologies.</figDesc><table><row><cell></cell><cell></cell><cell>Line(9)</cell><cell></cell><cell></cell><cell>General(10)</cell><cell></cell></row><row><cell>Read ratio</cell><cell>10</cell><cell>50</cell><cell>90</cell><cell>10</cell><cell>50</cell><cell>90</cell></row><row><cell>min |𝑅|</cell><cell cols="3">57.9 99.4 298.9</cell><cell cols="3">38.0 76.8 279.8</cell></row><row><cell>max |𝑅|</cell><cell>34.2</cell><cell cols="2">62.5 286.5</cell><cell>32.8</cell><cell cols="2">59.5 284.5</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 4</head><label>4</label><figDesc>Comparison of throughput between the existing method and the proposed method in Linear and General topologies.</figDesc><table><row><cell></cell><cell></cell><cell>Line(9)</cell><cell></cell><cell></cell><cell cols="2">General(10)</cell></row><row><cell>Read ratio</cell><cell>10</cell><cell>50</cell><cell>90</cell><cell>10</cell><cell>50</cell><cell>90</cell></row><row><cell>Existing</cell><cell>53.7</cell><cell cols="2">94.5 291.0</cell><cell>37.5</cell><cell>68.0</cell><cell>255.1</cell></row><row><cell>Proposed</cell><cell>54.4</cell><cell cols="2">95.2 293.6</cell><cell>37.8</cell><cell cols="2">74.9 286.1</cell></row><row><cell>Ratio</cell><cell cols="2">+1% +1%</cell><cell>+1%</cell><cell cols="2">+1% +10%</cell><cell>+12%</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">There is also an operation called "Switch". However, since the main algorithm consists primarily of expansion and contraction, it is omitted here for simplicity.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">source code is available at: https://github.com/OnizukaLab/dejimadynamic-replication</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgements</head><p>This work is supported by JSPS Kakenhi JP23K17456, JP23K25157, JP23K28096, and JST CREST JPMJCR22M2.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">An adaptive data replication algorithm</title>
		<author>
			<persName><forename type="first">O</forename><surname>Wolfson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Jajodia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Trans. Database Syst</title>
		<imprint>
			<biblScope unit="volume">22</biblScope>
			<biblScope unit="page" from="255" to="314" />
			<date type="published" when="1997">1997</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Modeling a dynamic data replication strategy to increase system availability in cloud computing environments</title>
		<author>
			<persName><forename type="first">D.-W</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G.-R</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-Z</forename><surname>Jin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X.-W</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Computer Science and Technology</title>
		<imprint>
			<biblScope unit="page" from="256" to="272" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Cdrm: A cost-effective dynamic replication management scheme for cloud storage cluster</title>
		<author>
			<persName><forename type="first">Q</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Veeravalli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zeng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Feng</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE International Conference on Cluster Computing</title>
				<imprint>
			<date type="published" when="2010">2010. 2010</date>
			<biblScope unit="page" from="188" to="196" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">A novel cost-effective dynamic data replication strategy for reliability in cloud data centres</title>
		<author>
			<persName><forename type="first">W</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yuan</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing</title>
				<imprint>
			<date type="published" when="2011">2011. 2011</date>
			<biblScope unit="page" from="496" to="502" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Qos-aware data replication for data-intensive applications in cloud computing systems</title>
		<author>
			<persName><forename type="first">J.-W</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C.-H</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Chang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Cloud Computing</title>
		<imprint>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="101" to="115" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">O</forename><surname>Lab</surname></persName>
		</author>
		<ptr target="https://github.com/OnizukaLab/dejima-prototype" />
		<title level="m">Dejima architecture</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><forename type="first">Y</forename><surname>Asano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Hidaka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ishihara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ko</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nakano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Onizuka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sasaki</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Shimizu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Tran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Tsushima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yoshikawa</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1809.10357</idno>
		<title level="m">Making view update strategies programmable -toward controlling and sharing distributed data</title>
				<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Controlling and sharing distributed data for implementing service alliance</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Asano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ishihara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Onizuka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yoshikawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">BigComp, IEEE</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Flexible framework for data integration and update propagation: System aspect</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Asano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Herr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Ishihara</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Kato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Nakano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Onizuka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sasaki</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">BigComp, IEEE</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Bidirectional collaborative data management</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Onizuka</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Yoshikawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Bidirectional Collaborative Data Management: Collaboration Frameworks for Decentralized Systems</title>
				<imprint>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="63" to="119" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Bernstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Hadzilacos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goodman</surname></persName>
		</author>
		<title level="m">Concurrency Control and Recovery in Database Systems</title>
				<imprint>
			<publisher>Addison-Wesley</publisher>
			<date type="published" when="1987">1987</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">The Theory of Database Concurrency Control</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Papadimitriou</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1986">1986</date>
			<publisher>Computer Science Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">The notions of consistency and predicate locks in a database system</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">P</forename><surname>Eswaran</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">A</forename><surname>Lorie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">L</forename><surname>Traiger</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Commun. ACM</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<biblScope unit="page" from="624" to="633" />
			<date type="published" when="1976">1976</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Benchmarking cloud serving systems with ycsb</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">F</forename><surname>Cooper</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Silberstein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Tam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ramakrishnan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sears</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC &apos;10</title>
				<meeting>the 1st ACM Symposium on Cloud Computing, SoCC &apos;10<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2010">2010</date>
			<biblScope unit="page" from="143" to="154" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
