Supporting Insertion in Encrypted Multi-Maps with Volume Hiding using a Trusted Execution Environment Shunta Ishihara Chiemi Watanabe Toshiyuki Amagasa University of Tsukuba Tsukuba University of Technology University of Tsukuba Tsukuba, Japan Tsukuba, Japan Tsukuba, Japan ishihara@kde.cs.tsukuba.ac.jp chiemi@a.tsukuba-tech.ac.jp amagasa@cs.tsukuba.ac.jp ABSTRACT However, it has been pointed out that simply encrypting the There is a new type of threat to encrypted databases, called “vol- data cannot ensure data privacy and could cause potential se- ume leakage,” which causes the volume of data associated with curity threats. One such security threat is volume leakage. This some keys in a multi-map to be leaked. To counter this threat, security threat involves an adversary obtaining the size (or vol- several methods for encrypting multi-maps to secure the volume ume) of the query result, which in turn could reveal information of data they contain have been proposed. One such method sup- related to the database and/or the query. For example, Kellaris ports the insertion of data into encrypted multi-maps. However, it et al. [16] and Grubbs et al. [10] showed that it is possible for suffers from inefficiency caused by its protocol, which stipulates an adversary to reconstruct the histogram of key values in the that all data be sent from the server to the client to perturb the database only by using the volume. In addition, Grubbs et al. [10] insertion position. For this reason, it is not practical for handling reported that the distribution of queries can be inferred from the large amounts of data. To address this problem, we propose an reconstructed histogram with high accuracy. improved method for encrypted multi-maps that supports data To solve this problem, Sarvar et al. [22] proposed a method insertion. The proposed method exploits the trusted execution for volume hiding in an encrypted multi-map. However, their environment (TEE) supported by modern processors, e.g., Intel proposed method assumes that the database is stable, and no SGX, for executing perturbation. TEE is a secure area of a main updates are allowed, which limits its application domain. To allow processor that guarantees the confidentiality and integrity of data insertion in an encrypted multi-map with volume hiding, the code and data inside it. Thus, we can secure the volume and we previously proposed combining local differential privacy and insertion position in an encrypted multi-map without conduct- randomized response to enable volume hiding to support data ing costly perturbation on the client-side. We also overcome the insertion [12]. However, that method suffers from slow insertion challenge of the capacity of the TEE being typically limited and execution due to the protocol used, which stipulates that all data too small to accommodate the multi-map itself. Specifically, we be sent to the client for perturbation of the insertion location. divide the hash table of the multi-map into partitions so that they Consequently, it is not practical for large databases. can be loaded into the TEE and apply perturbation within each In the meantime, recent processors are being developed with partition. We provide a security analysis of the proposed method, a trusted execution environment (TEE) [8] to meet the growing w.r.t. the volume hiding and the insertion position hiding. Further, demands for secure computation, e.g., Intel SGX [4]. A TEE is an we provide the results of experiments conducted to evaluate the isolated execution environment in a processor that offers protec- feasibility of potential applications in terms of the processing tion over the code and data inside, ensuring confidentiality and time and the amount of noise. integrity. In this paper, we propose an encrypted multi-map that supports data insertion using a TEE. The idea is to securely per- KEYWORDS turb the insertion position using TEE, thereby achieving volume hiding without sending the data from the server to the client. One encrypted multi-map, privacy, secure database, Intel SGX, cuckoo of the challenges of handling large data using a TEE is that it typ- hashing, homomorphic encryption ically offers limited memory space (e.g., 96MB in Intel SGX). To solve this problem, we divide the data into smaller blocks accord- ing to the available memory in the TEE and apply perturbation 1 INTRODUCTION within the block where insertion occurs. Cloud database services are becoming increasingly popular ow- To verify the utility of the proposed method, we conducted ing to their many advantages, such as reduced costs in terms of a security analysis concerning hiding the volume and insertion server management, deployment, and operation. Consequently, position. We also experimentally evaluated its feasibility in poten- the number of cloud database services and the volume of data tial applications in terms of the execution time of insertion and stored in such databases have been drastically increasing. In a the amount of noise necessary to hide the volume and insertion cloud database service, the cloud service provider is responsible position. for database management, which may lead to a security issue In this work, our target is healthcare data management in a because cloud service providers are not always trustworthy; i.e., cloud database as the use case. It is common in healthcare appli- they may be curious about the private data and could leak the cations to share medical records about patients among multiple data [1]. To address this issue, many researchers have proposed medical institutions, and leaking the number of patients may methods for encrypted databases to realize secure cloud database cause unintended privacy disclosure. Similarly, when inserting a services [7, 11]. new patient record, it is necessary to hide the insertion position because it may cause unintended disclosure of privacy, such as the type of disease. © Copyright 2022 for this paper by its author(s). Use permitted under Creative The main contributions of this study are as follows: Commons License Attribution 4.0 International (CC BY 4.0) • We propose a method that allows data insertion into an encrypted multi-map while ensuring that the multi-map remains deferentially private [9] We use dummy insertion with dummy keys and entries to perturb the insertion position. More precisely, for each insertion re- quest, we generate dummy keys based on the randomized response, allowing us to achieve local differential privacy. • We propose a scheme that exploits the TEE to perturb data securely. To deal with large amounts of data that cannot be directly loaded into the TEE, we partition the database (multi-map) into smaller blocks so that we can Figure 1: The “mendacity operation” in cuckoo hashing load each block into the TEE and perturb the insertion location within each block. • We evaluate the security of our proposed method. We use the randomized response mechanism of local dif- By creating the hash table in this manner, it is guaranteed that ferential privacy (LDP) [15] to hide the volume of data, the value 𝑥 is always in either 𝑇1 [ℎ 1 (𝑥)] or 𝑇2 [ℎ 2 (𝑥)], ensuring and evaluate its security by finding the privacy budget that it can be searched for with a time complexity of 𝑂 (1). 𝜖 of differential privacy. Furthermore, we update multi- There is a possibility that the mendacity operation may take a ple insertion positions via the mendacity operation based very long time, or not be finished if we are unlucky. To address on cuckoo hashing [21] and evaluate the security of this this issue, Kirsch, Mitzenmacher and Wieder proposed stash- operation in a TEE. based cuckoo hashing [18]. Stash-based cuckoo hashing avoids • We provide experimental results that verify the feasi- the above problem by using the stash. Any value that is not bility of the proposed scheme using a TEE, Intel SGX. inserted into the table while iterating over the threshold level is We compared the performance of the proposed scheme inserted into the stash. and non-secure baselines in terms of the insertion exe- cution time and the memory occupation time. We also 2.2 Trusted Execution Environment (TEE) compared the required noise when inserting data under Recent processors support a trusted execution environment (TEE) [8] different privacy mechanisms, i.e., DP and LDP. Based on to meet the growing demands of secure computation. A TEE is the experimental results, we show our method is useful an isolated execution environment in a processor that offers pro- for the specified use cases. The experimental results show tection for the code and data inside, ensuring confidentiality and that the method is useful. integrity. The remainder of this paper is organized as follows: Section 2 Specifically, 6th generation (and higher) Intel CPUs support briefly overviews some preliminaries for this work. In Section 3, a TEE called SGX [4]. SGX allows us to create a small (e.g., 96 we give an overview of existing methods. In Section 4, we discuss MB) protected memory area called the enclave that is isolated the proposed method for insertion on the server and the improve- from the rest of the system. Hence, we can run programs that ment of volume hiding and also evaluate its security capability. are protected from the OS (which is controlled by a third party) In Section 6, we report on the experiments conducted to evaluate and numerous applications/system-level attacks. the insertion operation of our method for a large database. Fi- The existing SGX is vulnerable to side-channel attacks; e.g., nally, Section 7 concludes this paper and outlines possible future cache-lines [19], branch execution [25], and page-table access [26]. work. However, the T-SGX [23] and Sanctum [5] systems have evolved to overcome such attacks, and it is believed that future versions 2 PRELIMINARIES of SGX will be resilient to those attacks. In this paper, we do not consider side-channel attacks. 2.1 Cuckoo hashing and its properties In both this study and our preceding work [12, 22], we utilize 3 EXISTING WORKS cuckoo hashing [21]. Therefore, we will briefly explain it and In this section, we introduce several related works as follow- its properties. Cuckoo hashing utilizes two arrays and two hash ing. In subsection 3.1, we introduce various privacy leakages. functions and is known to be robust. Let us consider inserting a In subsection 3.2, we introduce several works of updating for key using cuckoo hashing. If the hash value collides in one of the encrypted databases. In subsection 3.3, we introduce the work of arrays, the algorithm moves the already stored entry to the other Sarvar et al. [22] on which our work is based. In subsection 3.4, array using the other hash function and inserts a new entry into we introduce our previous work. In subsection 3.5, we discuss the vacancy. In this paper, We call the above property of cuckoo the problems of our previous work. hashing the mendacity operation. Figure 1 shows the use of two hash functions (ℎ 1 and ℎ 2 ) and two arrays (𝑇1 and 𝑇2 ) in inserting the entry x. 3.1 Various privacy leakages First, the algorithm checks whether the position ℎ 1 (𝑥) in 𝑇1 is Many existing encrypted databases focus on keeping only the empty. If the position is empty, it inserts 𝑥 into 𝑇1 at ℎ 1 (𝑥) and content of the data secret. However, it has been reported that the completes the operation; otherwise, (as shown in Figure 1), it existing encrypted databases are not secure enough. Liu et al. [20] removes the existing value 𝑦 and inserts the value 𝑥 instead. For and Islam et al. [13] proposed an access pattern attack method. the removed value 𝑦, it checks the vacancy status of ℎ 2 (𝑦) in 𝑇2 , The access pattern attack is an attack technique that identifies and inserts 𝑦 in 𝑇2 at ℎ 2 (𝑦) if it is empty; otherwise, it recursively identical queries using data search logs and access frequency. It processes the same operation. is prevented by the use of Oblivious RAM (ORAM). ORAM is a technique to hide the access pattern by changing the storage loca- 3.3 Sarvar’s work tion of the encrypted data each time they are accessed. By using Sarvar et al. [22] proposed a method that preserves privacy in ORAM, an attacker cannot know which data has been accessed, multi-maps while achieving volume hiding. In their method, they the frequency of searches, or even the relationships between the exploit differential privacy (DP) to hide the volume of data and data. Kellaris et al. [16] and Gurbbs et al. [10] proposed the attack reduce the communication cost. method using the volume of search results. They proposed an Specifically, they originally added dummy entries when ap- attack method that can reconstruct the distribution of data in a plying DP, which, however, increases storage costs. To address database and queries by observing the response size (volume) to this problem, they reduce the server’s storage size using cuckoo a sequence of search queries. There are several types of volume hashing so that they do not always add dummy entries when attacks, such as attacks on encrypted multimaps and attacks on inserting a new entry. Let us consider inserting multiple values range queries. In this paper, we deal with attacks on encrypted with the same key 𝑘. Instead of inserting values with the same multimaps, and in the following, we introduce a method for hid- key, they generate (surrogate) keys, i.e., 𝑘 + 1, 𝑘 + 2, . . ., and in- ing volume leakage in encrypted multimaps. The naive method sert the key–value pairs at different hash addresses in the tables of volume hiding is to use padding so that the number of data according to the generated keys, thereby avoiding the insertion in all keys is equal. However, it has a large overhead in terms of of multiple values with the same key. Consequently, they do not both storage and communication costs. Kamara et al. [14] and need to insert dummy entries to achieve DP. Sarvar et al. [22] proposed a volume hiding method that reduces However they need to maintain the number of stored values storage and communication costs compared to naive methods. In for each key. To achieve this, they introduce a count table to particular, since this paper is based on Sarvar et al.’s work [22], maintain the volume of each key. Because it maintains the volume we introduce their method in detail in the subsection 3.3. information, they apply DP only on the count table, reducing the extra storage cost of dummy entries. When searching the multi-map, given a search key 𝑘, they first interrogate the count table to get the volume of 𝑘, and generate 3.2 Updating for EDBs search keys, 𝑘 + 1, 𝑘 + 2, etc. Then, they interrogate the hash table to get the results. Note that the results may contain false In this subsection, we introduce several works of updating for positives owing to conflicts in the hash tables, which could be encrypted databases. Chenghong et al. [24] proposed a frame- regarded as dummy results for protecting the volume of query work for an encrypted database that prevents privacy from being results. leaked depending on the time of updating. The privacy leakage Figure 2 illustrates an example of Sarvar’s method. In this due to update time is caused, for example, by event data updates example, there are two values associated with the key key1. In from smart sensors in the building (security cameras, smart light this case, the hash address for the first value (𝑥 1 ) with the key bulbs, Wi-Fi access points, etc.). Even if each event data is en- key1 is calculated as ℎ 1 (𝑘𝑒𝑦1 + 1) or ℎ 2 (𝑘𝑒𝑦1 + 1), followed by crypted to protect the privacy of the people in the building, the ℎ 1 (𝑘𝑒𝑦1 + 2) or ℎ 2 (𝑘𝑒𝑦1 + 2), etc. The volume of each key is IoT provider can know the private information about the activi- maintained in the count table, where the values are protected ties in the building from the the event occurrence time without by applying DP. When retrieving values associated with a (user- decrypting them. To prevent such privacy leakage, it is necessary specified) query key 𝑘𝑒𝑦1, the system first gets its volume (4) and to decouple the relationship between event occurrence time and generates search keys (𝑘𝑒𝑦1 + 1, 𝑘𝑒𝑦1 + 2, 𝑘𝑒𝑦1 + 3, and 𝑘𝑒𝑦1 + 4). update time. The easiest way to prevent privacy is to not upload Then, it retrieves the values by accessing the hash addresses the data, in which case data analysts will not be able to utilize computed by the generated keys and the hash functions (ℎ 1 and the data. The next possible solution is to update the data every ℎ 2 ). Note that the results may contain false positives caused by unit of time, regardless of the occurrence of events. However, hash collisions (e.g., ℎ 2 (𝑘𝑒𝑦1 + 2) retrieves (𝑘𝑒𝑦2, 𝑥 2 )), which when events occur infrequently, most updates are dummies, and can be seen as dummy entries that can be filtered out on the the provider has the problem of wasting resources for unneces- client-side. sary computations. Chenghong et al. proposed a framework that One major limitation of Sarvar’s method is that they do not provides a guarantee of differential privacy in a single update for support any updates on the multi-map. the update time problem, and can be arbitrarily customized by the user by changing parameters for the three trade-off issues of privacy, data accuracy, and processing performance. 3.4 Our previous method Natacha et al. [6] proposed a key-value store that achieves ORAM-based access pattern confidentiality while providing ACID To address the limitation of Sarvar’s method (i.e., not supporting transactions, which are required in many applications. Existing updates), we previously proposed an extension that enables it ORAMs cannot support transactions for the following two rea- to support data insertion [12] while preventing volume leakage. sons. The first is that ORAM is not fault-tolerant. The Second Specifically, our method supports data insertion while maintain- is that ORAM has limited or no support for concurrency. They ing the multi-map as being deferentially private by using random- delay transaction commits until the end of a fixed-size epoch ized response based on local differential privacy (LDP). It hides and buffer their execution in a reliable proxy to enhance con- the position of past insertions using the mendacity operation, a sistency and durability on a per-epoch basis. Since only the last feature of cuckoo hashing. To protect related operations from value of the key changed during an epoch is written back to the inside attacks (e.g., by the server administrator), we perform the ORAM, the number of ORAM operations required to commit a operation to hide the insertion positions on the client. transaction can be reduced, thus reducing the cost of CPU and More specifically, when inserting a key–value pair, we apply a bandwidth amortization without increasing contention. randomized response, which is a mechanism for achieving LDP, to maintain the differential privacy of the multi-map. The key is Furthermore, when inserting a new value into the multi-map, we must hide the insertion position so that the server adminis- trator cannot infer the updated data from the index in the tables. We use the mendacity operation in cuckoo hashing to hide the in- sertion position. The mendacity operation is a recursive process that moves existing values to the other side of the hash tables until the values are all stored in the hash tables. As a result, one insertion may cause multiple updates in different positions in the tables, which is useful for hiding the insertion position. If an insertion causes only a few mendacity operations, we insert dummy entries to hide the insertion position. Figure 2: Example illustrating Sarvar’s method based on cuckoo hashing. chosen based on the following formula: ( 𝑘𝑒𝑦𝑡𝑟𝑢𝑒 with probability 𝑝 𝑘𝑒𝑦 = (1) 𝑘𝑒𝑦𝑑𝑢𝑚𝑚𝑦 with probability (1 − 𝑝) Figure 4: Inserting a data record whose key is 𝑘𝑒𝑦1. where 𝑘𝑒𝑦𝑡𝑟𝑢𝑒 and 𝑘𝑒𝑦𝑑𝑢𝑚𝑚𝑦 are the keys of the non-dummy and dummy data, respectively. We iterate the randomized response Figure 4 shows the process flow for updating the table when until 𝑘𝑒𝑦𝑡𝑟𝑢𝑒 is selected. (𝑘𝑒𝑦1, 𝑥 3 ) is inserted. First, the algorithm attempts to store the We do not use the generated keys for insertion but rather value at position ℎ 1 (𝑘𝑒𝑦1 + 3) in table 𝑇1 . Then, based on the for updating the count table. We also use it as done in Sarvar’s mendacity operation, the originally stored data are moved to method [22] to maintain the number of values (volume) associ- the other table, which results in (𝑘𝑒𝑦3, 𝑥 1 ) at ℎ 1 (𝑘𝑒𝑦1 + 3) being ated with each key. Note that the volume count includes both removed and stored at ℎ 2 (𝑘𝑒𝑦3 + 1) in table 𝑇2 . non-dummy and dummy data to guarantee differential privacy. In addition, by adding dummy data at position ℎ 1 (𝑘𝑒𝑦1 + 1), This process is performed on ciphertexts using additive homo- we intentionally increase the number of updates. We randomly morphic encryption. choose the position of the dummy entries. Note that if there are Figure 3 shows an example of updating the count table by in- data that we cannot store in the hash tables according to the serting key1. Note that key3 and key4 are selected as randomized mendacity operation, we store them in the stash on the client. responses. As can be seen, the volume of the dummy key is also updated at the same time so that we can prevent inferring of the 3.5 Drawbacks update on 𝑘𝑒𝑦1 from the difference between the volumes before Figure 5 gives an overview of our previous method, with an exam- and after the insertion. As the randomized response guarantees ple of inserting a new value into key1, depicting the mendacity LDP, privacy is maintained even after data insertion. operation for perturbing the updating positions. If the opera- tions are performed on the server, private information may be leaked from the operation history to inside attackers, such as system administrators. For this reason, in our previous method, we send the whole index back to the client to undergo mendacity operations to protect privacy. This incurs a high cost in terms of network traffic, particularly when the multi-map is large, and makes the process slow. This may cause several problems in real Figure 3: Updating volume when inserting a value with applications. One such problem is that the processing perfor- 𝑘𝑒𝑦1. mance heavily depends on the client’s performance, and in some cases, the client may not complete the process owing to insuffi- cient resources, such as memory. Another problem arises when Next, we discuss how to find the privacy budget 𝜖 of LDP. processing concurrent requests from two or more clients. When From the derivation of 𝜖 in the randomized response, it is found updating the multi-map, the map needs to be sent to the client 𝑝 using 𝑙𝑛( 1−𝑝 ) because and also locked until the update is completed, which significantly degrades the performance under concurrent access. 𝑄 (𝑘𝑒𝑦𝑖𝑛𝑠𝑒𝑟𝑡 | 𝑘𝑒𝑦𝑡𝑟𝑢𝑒 ) 𝑝 = 𝑄 (𝑘𝑒𝑦𝑖𝑛𝑠𝑒𝑟𝑡 | 𝑘𝑒𝑦𝑑𝑢𝑚𝑚𝑦 ) 1 − 𝑝 4 PROPOSED METHOD where 𝑄 (𝑘𝑒𝑦𝑖𝑛𝑠𝑒𝑟𝑡 | 𝑘𝑒𝑦𝑡𝑟𝑢𝑒 ) and 𝑄 (𝑘𝑒𝑦𝑖 𝑛𝑠𝑒𝑟𝑡 | 𝑘𝑒𝑦𝑑 𝑢𝑚𝑚𝑦) are We propose a new insertion method for encrypted multi-maps the probabilities that the key being inserted is selected and that with volume hiding. To eliminate the drawbacks of our previ- it is selected from the dummy, respectively. In addition, to ensure ous method, we do not rely on the client for perturbation of the 𝜖 is non-negative, 𝑝 >= 0.5 is required. Note here that dummy insertion position. Instead, we exploit the trusted execution envi- keys may not be selected by the randomized response. ronment (TEE) [8] provided by the main processor to perform Figure 5: Process flow of our previous method. multi-map perturbation securely. Specifically, we use Intel SGX 4.1 Hash table partitioning as the TEE. We use the enclave provided by Intel SGX to securely execute the Figure 6 shows an example of inserting (𝑘𝑒𝑦1, 𝑥 1 ). The main mendacity operation on the server side while hiding the updated difference from the previous method is that there is an enclave, position from inside attackers. One of the major challenges is a TEE provided by Intel SGX, on the server, and we update the that the available memory space in an enclave is limited (96 MB) hash tables using it. Thus, we update and perturb each hash table and is too small to load the hash tables entirely. To cope with without sending the entire hash table to the client. The values in this problem, we partition the cuckoo hash tables into smaller the hash table are encrypted and stored in the server’s storage. encrypted blocks (< 96 MB) and perform data insertions within The hash tables are partitioned into smaller blocks (< 96 MB) so each block in the enclave. Another factor is the characteristic of that they can be loaded into the enclave, because the size of the Intel SGX whereby the processing inside of an enclave is fast, enclave is limited up to 96 MB. When inserting a new dataset, it whereas calling a function in the enclave from outside (or vice is loaded into the enclave along with a hash block. We present versa) is slow (2x to 2000x slower than regular function calls) [2, the details in Section 4.1. 17]. Further, the argument size for a function call crossing the After copying the relevant hash block to the enclave, the men- border of the enclave is limited to 8 MB for one function call. dacity operation (Section 3.4) is applied. Because the process in Therefore, if the block size is set to 96 MB, we need to further the enclave is not visible to adversaries, they do not know which divide the block into smaller (< 8 MB) pieces and send from/to part of the hash table has been updated. The entries updated in the enclave via multiple function calls, which is time-consuming. the enclave are encrypted and written back to the main memory. Figure 7 shows how a new data record being inserted is as- Note that we can protect the process of perturbing hash entries signed to a block and data insertion performed on the enclave. As using the enclave. However, it is possible to observe the updated can be seen, the data being inserted are allocated to each block locations of the hash entries by comparing the table before and according to the hash address calculated in the enclave. Next, we after the insertion. In particular, if there is no hash collision, the copy the destination block into the enclave and also pass the data mendacity operation is not applied, leading to leakage of the being inserted to the enclave, followed by the execution of data updated location. Therefore, we perform the mendacity opera- insertion. In the enclave, the entire block is not decrypted; only tion by inserting dummy data when no hash collision occurs to the points where the mendacity operation occurs are decrypted. perturb the updated location. In addition, the dummy entries are Subsequently, the updated block is returned to the main memory, helpful because the adversary cannot guess how many values and the removed data are sent to the client. have been inserted from the update locations. After the completion of data insertion, the data removed from the cuckoo hash table are sent back to the client. Even when 4.2 Hiding the access locations in search the removed data are dummy data, they are sent back to the To perform a search, as in our previous method, we first obtain client, and the client can filter the dummies out. This prevents the volume of results associated with the search key from the the adversaries from correctly guessing whether the removed count table and calculate the hash addresses for accessing the data are dummy data or not by observing whether they are sent hash tables. If we assume that the hash table will not be updated, to the client. it is sufficiently safe to add noise to the count table based on DP The proposed method basically solves the problems of previous to hide the number of results. However, when insertions occur, methods by using the SGX. In summary, we make the following an attacker may infer what data are inserted by comparing the two extensions to our previous method. ciphertext before and after the insertion operation. Suppose that several values are associated with the same key 𝑘𝑒𝑦1 . An attacker • Hash table partitioning for handling large amounts of data. can infer the updated address by observing the access pattern • Hiding the access location against attacks that use the of queries with the same key (𝑘𝑒𝑦1 ) before and after the latest difference in the search results before and after data inser- insertion. Therefore, our method hides the access locations of tion. the data in the search. As in the case of insertion, the accessed We present the details of the proposed method below. data are located in the enclave to hide the access locations. The Figure 6: Process flow of the proposed method. The volume information is guaranteed to be deferentially pri- vate by adding noise. Let us assume that two accesses occur before and after an insertion, and the adversary observes the difference in the key’s volume. In this case, there are two possibilities: • One addition for an actual (non-dummy) value insertion into the key with probability p. • One addition for a selected dummy key by the randomized response with probability (1 − 𝑝). In other words, even if the number of reads increases by one, the adversary cannot know whether it is an increase by the insertion of actual or dummy data. Even if the adversary can see the increase in the number of reads before and after the insertion, Figure 7: Partitioning hash tables into blocks to perform it is not a problem because LDP is guaranteed. the mendacity operation in an enclave. Next, we describe the security of the mendacity operation to hide the position of the stored data. If there is only one updated position, the position where the actual (non-dummy) data are process flow of the search operation is as follows: First, the client inserted can be leaked by observing the corresponding position. sends a request {h(key)} to the server to retrieve the volume of the In our method, at least 𝑘 + 1(𝑘 ≥ 1) positions are updated, where key. Next, the client sends a request {Enc(key), Enc(volume)} to the 𝑘 is a parameter that is set by the user for the number of times server to retrieve the data from the key and the retrieved volume. to force a mendacity operation to occur. Therefore, the server The server computes the hash values to identify the block to administrator can only determine where the true data have been access and the hash addresses in the hash tables to retrieve the inserted with a probability of 1/(k+1) in the worst case. However, data. Then, the blocks containing the target data are copied into the adversary may be able to ascertain the block to which it the enclave and the search is performed in turn. is assigned. In our experiment, we found that the maximum capacity of one block is approximately 1000 records. Therefore, 5 SECURITY ANALYSIS it is possible to increase safety by perturbing the blocks as well. In this section, we evaluate the security of our proposed method. Specifically, the block that has the true key–value data inserted In our proposed method, we assume that the server adminis- can be kept a secret by faking the update of the block. trator is semi-honest. This semi-honest administrator follows established protocols but tries to steal the maximum amount of 6 EXPERIMENTAL EVALUATION data information they can. The adversary (e.g., server adminis- trator) can observe all systems except the data and operations in In this section, we present the details of the experiments con- the enclave. In addition, we assume that the adversary does not ducted to evaluate the proposed method. The specific purpose have any background knowledge on the stored data. was to evaluate the improvement in the performance of insertion With these assumptions as our basis, we analyze whether an and query by SGX and the influence of the noise addition under adversary can infer what the data are and where they are inserted LDP on the communication volume and processing time. when our proposed insertion operation is employed. Because To this end, we conducted the following experiments: the count table is encrypted via homomorphic encryption, the • Comparison of the processing times of data insertion adversary cannot know the value of the count table, although using the proposed method, our previous method, they may know that the value of the count table has been updated. and a non-secure method using one million artificial However, when a client gets the values, the client can access the data records and real-world data. In this experiment, index as many times as the volume of the key allows. Therefore, we first evaluated whether the processing time for insert- the administrator may infer the volume from the number of ing records using our proposed approach is sufficient for accesses to the index. practical use by comparing our previous method and the non-secure method. In addition, we evaluated the impact values. The proposed method is approximately 12x faster than of SGX and LDP on the processing time and the commu- the previous method. nication volume by comparing the proposed method with the non-secure method. Table 1: Processing time breakdown of the proposed • Comparison of the amount of noise between DP and method vs the non-secure method. LDP. In this experiment, we evaluated the impact of LDP on the amount of noise by comparing with DP. Specifically, Proposed Non-secure we compared the amount of noise in the volume when Noise addition (ms) 0.0024 0.0000 noise is added using DP after all data are inserted and when Updating count table (ms) 0.0221 0.0004 noise is added using LDP at the time of data insertion. Updating cuckoo hash table (ms) 10.9438 0.0032 • Comparison of the search times of the proposed Communication (ms) 2.4915 0.6229 method, Sarvar’s method, and the non-secure method Other (ms) 37.2359 0.0000 using one million artificial data records and real- Uniform world data. In this experiment, we first evaluated the Proposed Non-secure effect of the difference in noise between DP and LDP on Noise addition (ms) 0.0024 0.0000 the search time by comparing it with Sarvar’s method. Updating count table (ms) 0.0228 0.0005 Next, we evaluated the impact of SGX on the search time Updating cuckoo hash table (ms) 10.9699 0.0033 by comparing it with the non-secure method. Communication (ms) 2.5335 0.6334 The experimental environment was as follows: Other (ms) 36.7747 0.0000 Normal • Server: Microsoft Azure, Intel(R) Xeon(R) E-2288G CPU Proposed Non-secure 3.70 GHz, 4 GB RAM, Ubuntu 18.04.5 LTS Noise addition (ms) 0.0024 0.0000 • Client: MacBook Pro, dual-core Intel Core i7 3.1 GHz , 16 Updating count table (ms) 0.0225 0.0006 GB 1867 MHz DDR3, macOS (ver. 11.5.2) Updating cuckoo hash table (ms) 10.9600 0.0038 The datasets used in the experiments were artificial and real- Communication (ms) 2.4960 0.6240 world data. The artificial data comprised 1,000 keys and one Other (ms) 37.0466 0.0000 million data records. We used two artificial datasets: uniform Real-world and normal distributions. We used UCI Online Retail II [3] as real-world data. It has 5,305 keys and 1,067,371 data records. The total processing time of the proposed method for insertion is approximately 118x slower than that of the non-secure baseline. 6.1 Processing time of data insertion Note that in Table 1, the communication time is estimated based We compared the data insertion processing times of the proposed on the observation of the communication volume with the non- method, our previous work, and the non-secure method. We secure baseline in Figure 9. The communication time of the non- introduced our previous method Section 3.4. In the non-secure secure method was obtained by subtracting the processing time method, all records and indexes are stored in plain text and no from the total time. noise entry is added to the count table, although the processing flow is the same as that of the proposed method. Figure 8 shows the average time taken to insert one key–value pair when a client inserts 1,000 data items using the proposed and our previous method with artificial data following a uniform distribution, artificial data following a normal distribution, and real-world data. When a record is inserted into the multi-map, Figure 9: Communication volume comparison. The communication volume required to insert data is approxi- mately 4x greater than that of the non-secure method. Therefore, the communication time of the proposed method is also four times longer than that of the non-secure method. In Table 1, we can see that the proposed method has a sig- nificant difference in the cuckoo hash table update and other processing times compared to the non-secure method. The rea- Figure 8: Insertion: proposed method vs previous method. son for this large difference in updating the cuckoo hash table can be attributed to the characteristics of SGX. SGX is known the hash function is calculated by using the key and the num- for its fast processing inside the enclave, but very slow (2x to ber of stored records according to the key. Therefore, the time 2000x) [2, 17] processing of functions calling inside the enclave required for insertion is not related to the distribution of the key from outside the enclave (and vice versa) compared to normal function calls. The other processing time is obtained by subtract- Figure 10 compares the noise in DP and LDP for different ing each processing time from the overall processing time. This datasets: artificial data following a uniform distribution, artificial is considered to be the processing time for encryption and com- data following a normal distribution, and real-world data. The pounding. In the proposed method, the encryption-combination results show that the amount of noise in LDP is large compared is performed 11 times on average for each data insertion. This to DP. In the case of experiments using artificial data, about 1,000 results in a large overhead. noise units are assigned to each key compared to DP. In the case Thus, although there is more overhead in each process com- of experiments using real-world data, about 200 noise units are pared to the non-secure method, we believe that the fact that assigned to each key compared to DP. This is because, in LDP, the the time required for data insertion is kept at 0.5 seconds while amount of noise is proportional to the number of data records. In ensuring the high security of our method is practical. the proposed method, noise is only added to the number of counts in the count table, but not to the index. Therefore, the amount 6.2 Noise comparison between DP and LDP of this noise does not affect the amount of spatial computation. However, because the search is performed based on the number of count tables, this amount of noise will affect the search time. Therefore, in Section 6.3, we evaluate the impact on the search time. 6.3 Processing time of data search We conducted search experiments for the proposed method and Sarvar’s method. Figure 11 compares the average time and the average communication volume by the client to search all the keys for each method in which artificial data following a uniform distribution, artificial data following a normal distribution, and real-world data are stored. The figure shoiws that the amount Uniform distribution data Search time for uniform Communication volume of distribution the search for uniform distribution Normal distribution data Search time for normal Communication volume of distribution the search for normal distribution Search time for real-world Communication volume of data the search for real-world data Real-world data Figure 11: Average search times and average communica- tion volumes. Figure 10: Graph comparing the amount of noise between DP and LDP. of noise in the proposed method is about two times lesser than Sarvar’s method, whereas the search time of the proposed method ACKNOWLEDGEMENTS is about five times slower than Sarvar’s method. This is due to This work was supported by JSPS KAKENHI Grant Numbers the overhead caused by SGX. As mentioned in Section 4.2, the JP20362832. proposed method processes search operations in the enclave area to hide the access location during searches. Therefore, we compared the processing times incurred to search data from the REFERENCES [1] 2020. X-Force Threat Intelligence Index Reveals Top Cyberse- cuckoo hash table on the server side. Figure 12 shows the results. curity Risks of 2020. https://securityintelligence.com/posts/ As mentioned in Section 6.1, SGX has a characteristic wherein x-force-threat-intelligence-index-reveals-top-cybersecurity-risks-of-2020/. [2] Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre Martin, Christian Priebe, Joshua Lind, Divya Muthukumaran, Dan O’Keeffe, Mark L. Stillwell, David Goltzsche, Dave Eyers, Rüdiger Kapitza, Peter Pietzuch, and Christof Fetzer. 2016. SCONE: Secure Linux Containers with Intel SGX. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). USENIX Association, Savannah, GA, 689–703. https://www.usenix. org/conference/osdi16/technical-sessions/presentation/arnautov [3] Daqing Chen. 2019. Online Retail II. UCI Machine Learning Repository. [4] Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. Cryptology ePrint Archive, Report 2016/086. https://eprint.iacr.org/2016/086. [5] Victor Costan, Ilia Lebedev, and Srinivas Devadas. 2016. Sanctum: Minimal Hardware Extensions for Strong Software Isolation. In 25th USENIX Security Symposium (USENIX Security 16). USENIX Association, Austin, TX, 857–874. https://www.usenix.org/conference/usenixsecurity16/ technical-sessions/presentation/costan [6] Natacha Crooks, Matthew Burke, Ethan Cecchetti, Sitar Harel, Rachit Agar- wal, and Lorenzo Alvisi. 2018. Obladi: Oblivious Serializable Transactions in the Cloud. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 727–743. https://www.usenix.org/conference/osdi18/presentation/crooks [7] Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky. 2011. Search- Figure 12: Comparison of cuckoo hash table search times. able symmetric encryption: Improved definitions and efficient constructions. Journal of Computer Security 19 (01 2011), 895–934. https://doi.org/10.1145/ 1180405.1180417 [8] Sanjeev Das, Jan Werner, Manos Antonakakis, Michalis Polychronakis, and the processing inside the enclave is fast, while calling a function Fabian Monrose. 2019. SoK: The Challenges, Pitfalls, and Perils of Using Hardware Performance Counters for Security. 20–38. https://doi.org/10.1109/ inside the enclave from outside the enclave (or vice versa) is SP.2019.00021 very slow. Thus, the processing time is slow when compared to [9] Cynthia Dwork. 2006. Differential Privacy. In Automata, Languages and Programming, Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo Sarvar’s method. However, Sarvar’s method searches for the key Wegener (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–12. based on a uniquely defined token generated from the key, and [10] Paul Grubbs, Marie-Sarah Lacharite, Brice Minaud, and Kenneth G. Pater- there is a risk that the provider may identify the key. In contrast, son. 2018. Pump up the Volume: Practical Database Reconstruction from Volume Leakage on Range Queries. In Proceedings of the 2018 ACM SIGSAC the proposed method performs the search process in the enclave, Conference on Computer and Communications Security (Toronto, Canada) (CCS and thus the search is performed in secrecy. ’18). Association for Computing Machinery, New York, NY, USA, 315–331. https://doi.org/10.1145/3243734.3243864 [11] Hakan Hacigümüş, Bala Iyer, Chen Li, and Sharad Mehrotra. 2002. Executing 7 CONCLUSION AND FUTURE WORK SQL over Encrypted Data in the Database-Service-Provider Model. In Proceed- ings of the 2002 ACM SIGMOD International Conference on Management of Data In this paper, we proposed a method that supports data insertion (Madison, Wisconsin) (SIGMOD ’02). Association for Computing Machinery, in encrypted multi-maps using a trusted execution environment New York, NY, USA, 216–227. https://doi.org/10.1145/564691.564717 [12] Shunta Ishihara, Chiemi Watanabe, and Toshiyuki Amagasa. 2021. Supporting (TEE), such as Intel SGX. In the proposed method, the tables are Insertion in Encrypted Multi-Maps with Volume Hiding. In IEEE International partitioned into blocks and each block is processed independently Conference on Smart Computing, SMARTCOMP 2021, Irvine, CA, USA, August 23-27, 2021. IEEE, 264–269. https://doi.org/10.1109/SMARTCOMP52413.2021. to cope with the limited memory space (96 MB) in Intel SGX. 00058 Our experimental results indicate that it takes about 50 ms to [13] Mohammad Saiful Islam, Mehmet Kuzu, and Murat Kantarcioglu. 2012. Ac- insert a key–value pair. This is a speedup of about 12 times when cess pattern disclosure on searchable encryption: Ramification, attack and mitigation. In in Network and Distributed System Security Symposium (NDSS. compared to our previous method. In addition, we think that it is [14] Seny Kamara and Tarik Moataz. 2019. Computationally Volume-Hiding practical to have the data insertion time at 50 ms while keeping Structured Encryption. 11477 (2019), 183–213. https://doi.org/10.1007/ the key volume and update location secret. 978-3-030-17656-3_7 [15] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhod- The time taken by the search process is about 5x longer than nikova, and Adam Smith. 2011. What Can We Learn Privately? SIAM that of Sarvar’s method. This is due to the search process using J. Comput. 40, 3 (2011), 793–826. https://doi.org/10.1137/090756090 arXiv:https://doi.org/10.1137/090756090 Intel SGX as well as the difference in the amount of noise. In [16] Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O’Neill. 2016. Sarvar’s method, the search positions are not secret, whereas in Generic Attacks on Secure Outsourced Databases. In Proceedings of the 2016 our proposed method, the search positions are secret. Therefore, ACM SIGSAC Conference on Computer and Communications Security (Vienna, Austria) (CCS ’16). Association for Computing Machinery, New York, NY, USA, we believe that our method is more useful when dealing with 1329–1340. https://doi.org/10.1145/2976749.2978386 highly sensitive data. [17] Taehoon Kim, Joongun Park, Jaewook Woo, Seungheun Jeon, and Jaehyuk In future work, we would like to make more effective use of the Huh. 2019. ShieldStore: Shielded In-Memory Key-Value Storage with SGX. In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany) remaining enclave area. In this method, only 8 MB of the enclave (EuroSys ’19). Association for Computing Machinery, New York, NY, USA, area is used, owing to the argument-size limitation of Intel SGX. Article 14, 15 pages. https://doi.org/10.1145/3302424.3303951 [18] Adam Kirsch, Michael Mitzenmacher, and Udi Wieder. 2009. More Robust We are considering using the remaining area as a cache. In real- Hashing: Cuckoo Hashing with a Stash. SIAM J. Comput. 39, 4 (Dec. 2009), world workloads, there are many cases where access to keys is 1543–1561. heavily skewed. Thus, we would like to cache frequently accessed [19] Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, and Mar- cus Peinado. 2017. Inferring Fine-grained Control Flow Inside SGX Enclaves keys in the enclave, instead of storing them in the cuckoo hash with Branch Shadowing. In 26th USENIX Security Symposium (USENIX Security table, to speed up insertion and retrieval. 17). USENIX Association, Vancouver, BC, 557–574. https://www.usenix.org/ conference/usenixsecurity17/technical-sessions/presentation/lee-sangho [20] Chang Liu, Liehuang Zhu, Mingzhong Wang, and Yu-An Tan. 2014. Search Pattern Leakage in Searchable Encryption: Attacks and New Construction. Inf. Sci. 265 (may 2014), 176–188. https://doi.org/10.1016/j.ins.2013.11.021 [21] R. Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. J. Algorithms 51 (2004), 122–144. [22] Sarvar Patel, Giuseppe Persiano, Kevin Yeo, and Moti Yung. 2019. Mitigating Leakage in Secure Cloud-Hosted Data Structures: Volume-Hiding for Multi- Maps via Hashing. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (London, United Kingdom) (CCS ’19). Association for Computing Machinery, New York, NY, USA, 79–93. https: //doi.org/10.1145/3319535.3354213 [23] Ming-Wei Shih, Sangho Lee, Taesoo Kim, and Marcus Peinado. 2017. T- SGX: Eradicating Controlled-Channel Attacks Against Enclave Programs. In Network and Distributed System Security Symposium 2017 (NDSS’17) (net- work and distributed system security symposium 2017 (ndss’17) ed.). In- ternet Society. https://www.microsoft.com/en-us/research/publication/ t-sgx-eradicating-controlled-channel-attacks-enclave-programs/ [24] Chenghong Wang, Johes Bater, Kartik Nayak, and Ashwin Machanavajjhala. 2021. DP-Sync: Hiding Update Patterns in Secure Outsourced Databases with Differential Privacy. Proceedings of the 2021 International Conference on Management of Data (Jun 2021). https://doi.org/10.1145/3448016.3457306 [25] Jinwen Wang, Yueqiang Cheng, Qi Li, and Yong Jiang. 2018. Interface-Based Side Channel Attack Against Intel SGX. arXiv:1811.05378 [cs.CR] [26] Wenhao Wang, Guoxing Chen, Xiaorui Pan, Yinqian Zhang, XiaoFeng Wang, Vincent Bindschaedler, Haixu Tang, and Carl A. Gunter. 2017. Leaky Cauldron on the Dark Land. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (Oct 2017). https://doi.org/10.1145/3133956. 3134038