Supporting Insertion in Encrypted Multi-Maps
    with Volume Hiding using a Trusted Execution Environment
               Shunta Ishihara                                     Chiemi Watanabe                           Toshiyuki Amagasa
            University of Tsukuba                         Tsukuba University of Technology                    University of Tsukuba
               Tsukuba, Japan                                     Tsukuba, Japan                                 Tsukuba, Japan
        ishihara@kde.cs.tsukuba.ac.jp                       chiemi@a.tsukuba-tech.ac.jp                     amagasa@cs.tsukuba.ac.jp

ABSTRACT                                                                            However, it has been pointed out that simply encrypting the
There is a new type of threat to encrypted databases, called “vol-               data cannot ensure data privacy and could cause potential se-
ume leakage,” which causes the volume of data associated with                    curity threats. One such security threat is volume leakage. This
some keys in a multi-map to be leaked. To counter this threat,                   security threat involves an adversary obtaining the size (or vol-
several methods for encrypting multi-maps to secure the volume                   ume) of the query result, which in turn could reveal information
of data they contain have been proposed. One such method sup-                    related to the database and/or the query. For example, Kellaris
ports the insertion of data into encrypted multi-maps. However, it               et al. [16] and Grubbs et al. [10] showed that it is possible for
suffers from inefficiency caused by its protocol, which stipulates               an adversary to reconstruct the histogram of key values in the
that all data be sent from the server to the client to perturb the               database only by using the volume. In addition, Grubbs et al. [10]
insertion position. For this reason, it is not practical for handling            reported that the distribution of queries can be inferred from the
large amounts of data. To address this problem, we propose an                    reconstructed histogram with high accuracy.
improved method for encrypted multi-maps that supports data                         To solve this problem, Sarvar et al. [22] proposed a method
insertion. The proposed method exploits the trusted execution                    for volume hiding in an encrypted multi-map. However, their
environment (TEE) supported by modern processors, e.g., Intel                    proposed method assumes that the database is stable, and no
SGX, for executing perturbation. TEE is a secure area of a main                  updates are allowed, which limits its application domain. To allow
processor that guarantees the confidentiality and integrity of                   data insertion in an encrypted multi-map with volume hiding,
the code and data inside it. Thus, we can secure the volume and                  we previously proposed combining local differential privacy and
insertion position in an encrypted multi-map without conduct-                    randomized response to enable volume hiding to support data
ing costly perturbation on the client-side. We also overcome the                 insertion [12]. However, that method suffers from slow insertion
challenge of the capacity of the TEE being typically limited and                 execution due to the protocol used, which stipulates that all data
too small to accommodate the multi-map itself. Specifically, we                  be sent to the client for perturbation of the insertion location.
divide the hash table of the multi-map into partitions so that they              Consequently, it is not practical for large databases.
can be loaded into the TEE and apply perturbation within each                       In the meantime, recent processors are being developed with
partition. We provide a security analysis of the proposed method,                a trusted execution environment (TEE) [8] to meet the growing
w.r.t. the volume hiding and the insertion position hiding. Further,             demands for secure computation, e.g., Intel SGX [4]. A TEE is an
we provide the results of experiments conducted to evaluate the                  isolated execution environment in a processor that offers protec-
feasibility of potential applications in terms of the processing                 tion over the code and data inside, ensuring confidentiality and
time and the amount of noise.                                                    integrity. In this paper, we propose an encrypted multi-map that
                                                                                 supports data insertion using a TEE. The idea is to securely per-
KEYWORDS                                                                         turb the insertion position using TEE, thereby achieving volume
                                                                                 hiding without sending the data from the server to the client. One
encrypted multi-map, privacy, secure database, Intel SGX, cuckoo                 of the challenges of handling large data using a TEE is that it typ-
hashing, homomorphic encryption                                                  ically offers limited memory space (e.g., 96MB in Intel SGX). To
                                                                                 solve this problem, we divide the data into smaller blocks accord-
                                                                                 ing to the available memory in the TEE and apply perturbation
1    INTRODUCTION                                                                within the block where insertion occurs.
Cloud database services are becoming increasingly popular ow-                       To verify the utility of the proposed method, we conducted
ing to their many advantages, such as reduced costs in terms of                  a security analysis concerning hiding the volume and insertion
server management, deployment, and operation. Consequently,                      position. We also experimentally evaluated its feasibility in poten-
the number of cloud database services and the volume of data                     tial applications in terms of the execution time of insertion and
stored in such databases have been drastically increasing. In a                  the amount of noise necessary to hide the volume and insertion
cloud database service, the cloud service provider is responsible                position.
for database management, which may lead to a security issue                         In this work, our target is healthcare data management in a
because cloud service providers are not always trustworthy; i.e.,                cloud database as the use case. It is common in healthcare appli-
they may be curious about the private data and could leak the                    cations to share medical records about patients among multiple
data [1]. To address this issue, many researchers have proposed                  medical institutions, and leaking the number of patients may
methods for encrypted databases to realize secure cloud database                 cause unintended privacy disclosure. Similarly, when inserting a
services [7, 11].                                                                new patient record, it is necessary to hide the insertion position
                                                                                 because it may cause unintended disclosure of privacy, such as
                                                                                 the type of disease.
© Copyright 2022 for this paper by its author(s). Use permitted under Creative      The main contributions of this study are as follows:
Commons License Attribution 4.0 International (CC BY 4.0)
    • We propose a method that allows data insertion into
      an encrypted multi-map while ensuring that the
      multi-map remains deferentially private [9] We use
      dummy insertion with dummy keys and entries to perturb
      the insertion position. More precisely, for each insertion re-
      quest, we generate dummy keys based on the randomized
      response, allowing us to achieve local differential privacy.
    • We propose a scheme that exploits the TEE to perturb
      data securely. To deal with large amounts of data that
      cannot be directly loaded into the TEE, we partition the
      database (multi-map) into smaller blocks so that we can               Figure 1: The “mendacity operation” in cuckoo hashing
      load each block into the TEE and perturb the insertion
      location within each block.
    • We evaluate the security of our proposed method.
      We use the randomized response mechanism of local dif-                  By creating the hash table in this manner, it is guaranteed that
      ferential privacy (LDP) [15] to hide the volume of data,             the value 𝑥 is always in either 𝑇1 [ℎ 1 (𝑥)] or 𝑇2 [ℎ 2 (𝑥)], ensuring
      and evaluate its security by finding the privacy budget              that it can be searched for with a time complexity of 𝑂 (1).
      𝜖 of differential privacy. Furthermore, we update multi-                There is a possibility that the mendacity operation may take a
      ple insertion positions via the mendacity operation based            very long time, or not be finished if we are unlucky. To address
      on cuckoo hashing [21] and evaluate the security of this             this issue, Kirsch, Mitzenmacher and Wieder proposed stash-
      operation in a TEE.                                                  based cuckoo hashing [18]. Stash-based cuckoo hashing avoids
    • We provide experimental results that verify the feasi-               the above problem by using the stash. Any value that is not
      bility of the proposed scheme using a TEE, Intel SGX.                inserted into the table while iterating over the threshold level is
      We compared the performance of the proposed scheme                   inserted into the stash.
      and non-secure baselines in terms of the insertion exe-
      cution time and the memory occupation time. We also                  2.2    Trusted Execution Environment (TEE)
      compared the required noise when inserting data under                Recent processors support a trusted execution environment (TEE) [8]
      different privacy mechanisms, i.e., DP and LDP. Based on             to meet the growing demands of secure computation. A TEE is
      the experimental results, we show our method is useful               an isolated execution environment in a processor that offers pro-
      for the specified use cases. The experimental results show           tection for the code and data inside, ensuring confidentiality and
      that the method is useful.                                           integrity.
   The remainder of this paper is organized as follows: Section 2             Specifically, 6th generation (and higher) Intel CPUs support
briefly overviews some preliminaries for this work. In Section 3,          a TEE called SGX [4]. SGX allows us to create a small (e.g., 96
we give an overview of existing methods. In Section 4, we discuss          MB) protected memory area called the enclave that is isolated
the proposed method for insertion on the server and the improve-           from the rest of the system. Hence, we can run programs that
ment of volume hiding and also evaluate its security capability.           are protected from the OS (which is controlled by a third party)
In Section 6, we report on the experiments conducted to evaluate           and numerous applications/system-level attacks.
the insertion operation of our method for a large database. Fi-               The existing SGX is vulnerable to side-channel attacks; e.g.,
nally, Section 7 concludes this paper and outlines possible future         cache-lines [19], branch execution [25], and page-table access [26].
work.                                                                      However, the T-SGX [23] and Sanctum [5] systems have evolved
                                                                           to overcome such attacks, and it is believed that future versions
2 PRELIMINARIES                                                            of SGX will be resilient to those attacks. In this paper, we do not
                                                                           consider side-channel attacks.
2.1 Cuckoo hashing and its properties
In both this study and our preceding work [12, 22], we utilize             3     EXISTING WORKS
cuckoo hashing [21]. Therefore, we will briefly explain it and
                                                                           In this section, we introduce several related works as follow-
its properties. Cuckoo hashing utilizes two arrays and two hash
                                                                           ing. In subsection 3.1, we introduce various privacy leakages.
functions and is known to be robust. Let us consider inserting a
                                                                           In subsection 3.2, we introduce several works of updating for
key using cuckoo hashing. If the hash value collides in one of the
                                                                           encrypted databases. In subsection 3.3, we introduce the work of
arrays, the algorithm moves the already stored entry to the other
                                                                           Sarvar et al. [22] on which our work is based. In subsection 3.4,
array using the other hash function and inserts a new entry into
                                                                           we introduce our previous work. In subsection 3.5, we discuss
the vacancy. In this paper, We call the above property of cuckoo
                                                                           the problems of our previous work.
hashing the mendacity operation. Figure 1 shows the use of two
hash functions (ℎ 1 and ℎ 2 ) and two arrays (𝑇1 and 𝑇2 ) in inserting
the entry x.                                                               3.1    Various privacy leakages
    First, the algorithm checks whether the position ℎ 1 (𝑥) in 𝑇1 is      Many existing encrypted databases focus on keeping only the
empty. If the position is empty, it inserts 𝑥 into 𝑇1 at ℎ 1 (𝑥) and       content of the data secret. However, it has been reported that the
completes the operation; otherwise, (as shown in Figure 1), it             existing encrypted databases are not secure enough. Liu et al. [20]
removes the existing value 𝑦 and inserts the value 𝑥 instead. For          and Islam et al. [13] proposed an access pattern attack method.
the removed value 𝑦, it checks the vacancy status of ℎ 2 (𝑦) in 𝑇2 ,       The access pattern attack is an attack technique that identifies
and inserts 𝑦 in 𝑇2 at ℎ 2 (𝑦) if it is empty; otherwise, it recursively   identical queries using data search logs and access frequency. It
processes the same operation.                                              is prevented by the use of Oblivious RAM (ORAM). ORAM is a
technique to hide the access pattern by changing the storage loca-      3.3    Sarvar’s work
tion of the encrypted data each time they are accessed. By using        Sarvar et al. [22] proposed a method that preserves privacy in
ORAM, an attacker cannot know which data has been accessed,             multi-maps while achieving volume hiding. In their method, they
the frequency of searches, or even the relationships between the        exploit differential privacy (DP) to hide the volume of data and
data. Kellaris et al. [16] and Gurbbs et al. [10] proposed the attack   reduce the communication cost.
method using the volume of search results. They proposed an                  Specifically, they originally added dummy entries when ap-
attack method that can reconstruct the distribution of data in a        plying DP, which, however, increases storage costs. To address
database and queries by observing the response size (volume) to         this problem, they reduce the server’s storage size using cuckoo
a sequence of search queries. There are several types of volume         hashing so that they do not always add dummy entries when
attacks, such as attacks on encrypted multimaps and attacks on          inserting a new entry. Let us consider inserting multiple values
range queries. In this paper, we deal with attacks on encrypted         with the same key 𝑘. Instead of inserting values with the same
multimaps, and in the following, we introduce a method for hid-         key, they generate (surrogate) keys, i.e., 𝑘 + 1, 𝑘 + 2, . . ., and in-
ing volume leakage in encrypted multimaps. The naive method             sert the key–value pairs at different hash addresses in the tables
of volume hiding is to use padding so that the number of data           according to the generated keys, thereby avoiding the insertion
in all keys is equal. However, it has a large overhead in terms of      of multiple values with the same key. Consequently, they do not
both storage and communication costs. Kamara et al. [14] and            need to insert dummy entries to achieve DP.
Sarvar et al. [22] proposed a volume hiding method that reduces              However they need to maintain the number of stored values
storage and communication costs compared to naive methods. In           for each key. To achieve this, they introduce a count table to
particular, since this paper is based on Sarvar et al.’s work [22],     maintain the volume of each key. Because it maintains the volume
we introduce their method in detail in the subsection 3.3.              information, they apply DP only on the count table, reducing the
                                                                        extra storage cost of dummy entries.
                                                                             When searching the multi-map, given a search key 𝑘, they first
                                                                        interrogate the count table to get the volume of 𝑘, and generate
3.2    Updating for EDBs                                                search keys, 𝑘 + 1, 𝑘 + 2, etc. Then, they interrogate the hash
                                                                        table to get the results. Note that the results may contain false
In this subsection, we introduce several works of updating for
                                                                        positives owing to conflicts in the hash tables, which could be
encrypted databases. Chenghong et al. [24] proposed a frame-
                                                                        regarded as dummy results for protecting the volume of query
work for an encrypted database that prevents privacy from being
                                                                        results.
leaked depending on the time of updating. The privacy leakage
                                                                             Figure 2 illustrates an example of Sarvar’s method. In this
due to update time is caused, for example, by event data updates
                                                                        example, there are two values associated with the key key1. In
from smart sensors in the building (security cameras, smart light
                                                                        this case, the hash address for the first value (𝑥 1 ) with the key
bulbs, Wi-Fi access points, etc.). Even if each event data is en-
                                                                        key1 is calculated as ℎ 1 (𝑘𝑒𝑦1 + 1) or ℎ 2 (𝑘𝑒𝑦1 + 1), followed by
crypted to protect the privacy of the people in the building, the
                                                                        ℎ 1 (𝑘𝑒𝑦1 + 2) or ℎ 2 (𝑘𝑒𝑦1 + 2), etc. The volume of each key is
IoT provider can know the private information about the activi-
                                                                        maintained in the count table, where the values are protected
ties in the building from the the event occurrence time without
                                                                        by applying DP. When retrieving values associated with a (user-
decrypting them. To prevent such privacy leakage, it is necessary
                                                                        specified) query key 𝑘𝑒𝑦1, the system first gets its volume (4) and
to decouple the relationship between event occurrence time and
                                                                        generates search keys (𝑘𝑒𝑦1 + 1, 𝑘𝑒𝑦1 + 2, 𝑘𝑒𝑦1 + 3, and 𝑘𝑒𝑦1 + 4).
update time. The easiest way to prevent privacy is to not upload
                                                                        Then, it retrieves the values by accessing the hash addresses
the data, in which case data analysts will not be able to utilize
                                                                        computed by the generated keys and the hash functions (ℎ 1 and
the data. The next possible solution is to update the data every
                                                                        ℎ 2 ). Note that the results may contain false positives caused by
unit of time, regardless of the occurrence of events. However,
                                                                        hash collisions (e.g., ℎ 2 (𝑘𝑒𝑦1 + 2) retrieves (𝑘𝑒𝑦2, 𝑥 2 )), which
when events occur infrequently, most updates are dummies, and
                                                                        can be seen as dummy entries that can be filtered out on the
the provider has the problem of wasting resources for unneces-
                                                                        client-side.
sary computations. Chenghong et al. proposed a framework that
                                                                             One major limitation of Sarvar’s method is that they do not
provides a guarantee of differential privacy in a single update for
                                                                        support any updates on the multi-map.
the update time problem, and can be arbitrarily customized by
the user by changing parameters for the three trade-off issues of
privacy, data accuracy, and processing performance.
                                                                        3.4    Our previous method
    Natacha et al. [6] proposed a key-value store that achieves
ORAM-based access pattern confidentiality while providing ACID          To address the limitation of Sarvar’s method (i.e., not supporting
transactions, which are required in many applications. Existing         updates), we previously proposed an extension that enables it
ORAMs cannot support transactions for the following two rea-            to support data insertion [12] while preventing volume leakage.
sons. The first is that ORAM is not fault-tolerant. The Second          Specifically, our method supports data insertion while maintain-
is that ORAM has limited or no support for concurrency. They            ing the multi-map as being deferentially private by using random-
delay transaction commits until the end of a fixed-size epoch           ized response based on local differential privacy (LDP). It hides
and buffer their execution in a reliable proxy to enhance con-          the position of past insertions using the mendacity operation, a
sistency and durability on a per-epoch basis. Since only the last       feature of cuckoo hashing. To protect related operations from
value of the key changed during an epoch is written back to the         inside attacks (e.g., by the server administrator), we perform the
ORAM, the number of ORAM operations required to commit a                operation to hide the insertion positions on the client.
transaction can be reduced, thus reducing the cost of CPU and              More specifically, when inserting a key–value pair, we apply a
bandwidth amortization without increasing contention.                   randomized response, which is a mechanism for achieving LDP,
                                                                        to maintain the differential privacy of the multi-map. The key is
                                                                         Furthermore, when inserting a new value into the multi-map,
                                                                      we must hide the insertion position so that the server adminis-
                                                                      trator cannot infer the updated data from the index in the tables.
                                                                      We use the mendacity operation in cuckoo hashing to hide the in-
                                                                      sertion position. The mendacity operation is a recursive process
                                                                      that moves existing values to the other side of the hash tables
                                                                      until the values are all stored in the hash tables. As a result, one
                                                                      insertion may cause multiple updates in different positions in
                                                                      the tables, which is useful for hiding the insertion position. If
                                                                      an insertion causes only a few mendacity operations, we insert
                                                                      dummy entries to hide the insertion position.


Figure 2: Example illustrating Sarvar’s method based on
cuckoo hashing.


chosen based on the following formula:
              (
                𝑘𝑒𝑦𝑡𝑟𝑢𝑒      with probability 𝑝
        𝑘𝑒𝑦 =                                                   (1)
                𝑘𝑒𝑦𝑑𝑢𝑚𝑚𝑦 with probability (1 − 𝑝)                         Figure 4: Inserting a data record whose key is 𝑘𝑒𝑦1.
where 𝑘𝑒𝑦𝑡𝑟𝑢𝑒 and 𝑘𝑒𝑦𝑑𝑢𝑚𝑚𝑦 are the keys of the non-dummy and
dummy data, respectively. We iterate the randomized response             Figure 4 shows the process flow for updating the table when
until 𝑘𝑒𝑦𝑡𝑟𝑢𝑒 is selected.                                            (𝑘𝑒𝑦1, 𝑥 3 ) is inserted. First, the algorithm attempts to store the
   We do not use the generated keys for insertion but rather          value at position ℎ 1 (𝑘𝑒𝑦1 + 3) in table 𝑇1 . Then, based on the
for updating the count table. We also use it as done in Sarvar’s      mendacity operation, the originally stored data are moved to
method [22] to maintain the number of values (volume) associ-         the other table, which results in (𝑘𝑒𝑦3, 𝑥 1 ) at ℎ 1 (𝑘𝑒𝑦1 + 3) being
ated with each key. Note that the volume count includes both          removed and stored at ℎ 2 (𝑘𝑒𝑦3 + 1) in table 𝑇2 .
non-dummy and dummy data to guarantee differential privacy.              In addition, by adding dummy data at position ℎ 1 (𝑘𝑒𝑦1 + 1),
This process is performed on ciphertexts using additive homo-         we intentionally increase the number of updates. We randomly
morphic encryption.                                                   choose the position of the dummy entries. Note that if there are
   Figure 3 shows an example of updating the count table by in-       data that we cannot store in the hash tables according to the
serting key1. Note that key3 and key4 are selected as randomized      mendacity operation, we store them in the stash on the client.
responses. As can be seen, the volume of the dummy key is also
updated at the same time so that we can prevent inferring of the      3.5    Drawbacks
update on 𝑘𝑒𝑦1 from the difference between the volumes before         Figure 5 gives an overview of our previous method, with an exam-
and after the insertion. As the randomized response guarantees        ple of inserting a new value into key1, depicting the mendacity
LDP, privacy is maintained even after data insertion.                 operation for perturbing the updating positions. If the opera-
                                                                      tions are performed on the server, private information may be
                                                                      leaked from the operation history to inside attackers, such as
                                                                      system administrators. For this reason, in our previous method,
                                                                      we send the whole index back to the client to undergo mendacity
                                                                      operations to protect privacy. This incurs a high cost in terms
                                                                      of network traffic, particularly when the multi-map is large, and
                                                                      makes the process slow. This may cause several problems in real
Figure 3: Updating volume when inserting a value with                 applications. One such problem is that the processing perfor-
𝑘𝑒𝑦1.                                                                 mance heavily depends on the client’s performance, and in some
                                                                      cases, the client may not complete the process owing to insuffi-
                                                                      cient resources, such as memory. Another problem arises when
   Next, we discuss how to find the privacy budget 𝜖 of LDP.          processing concurrent requests from two or more clients. When
From the derivation of 𝜖 in the randomized response, it is found      updating the multi-map, the map needs to be sent to the client
           𝑝
using 𝑙𝑛( 1−𝑝 ) because                                               and also locked until the update is completed, which significantly
                                                                      degrades the performance under concurrent access.
                𝑄 (𝑘𝑒𝑦𝑖𝑛𝑠𝑒𝑟𝑡 | 𝑘𝑒𝑦𝑡𝑟𝑢𝑒 )   𝑝
                                         =
               𝑄 (𝑘𝑒𝑦𝑖𝑛𝑠𝑒𝑟𝑡 | 𝑘𝑒𝑦𝑑𝑢𝑚𝑚𝑦 ) 1 − 𝑝                        4     PROPOSED METHOD
where 𝑄 (𝑘𝑒𝑦𝑖𝑛𝑠𝑒𝑟𝑡 | 𝑘𝑒𝑦𝑡𝑟𝑢𝑒 ) and 𝑄 (𝑘𝑒𝑦𝑖 𝑛𝑠𝑒𝑟𝑡 | 𝑘𝑒𝑦𝑑 𝑢𝑚𝑚𝑦) are     We propose a new insertion method for encrypted multi-maps
the probabilities that the key being inserted is selected and that    with volume hiding. To eliminate the drawbacks of our previ-
it is selected from the dummy, respectively. In addition, to ensure   ous method, we do not rely on the client for perturbation of the
𝜖 is non-negative, 𝑝 >= 0.5 is required. Note here that dummy         insertion position. Instead, we exploit the trusted execution envi-
keys may not be selected by the randomized response.                  ronment (TEE) [8] provided by the main processor to perform
                                            Figure 5: Process flow of our previous method.


multi-map perturbation securely. Specifically, we use Intel SGX          4.1    Hash table partitioning
as the TEE.                                                              We use the enclave provided by Intel SGX to securely execute the
    Figure 6 shows an example of inserting (𝑘𝑒𝑦1, 𝑥 1 ). The main        mendacity operation on the server side while hiding the updated
difference from the previous method is that there is an enclave,         position from inside attackers. One of the major challenges is
a TEE provided by Intel SGX, on the server, and we update the            that the available memory space in an enclave is limited (96 MB)
hash tables using it. Thus, we update and perturb each hash table        and is too small to load the hash tables entirely. To cope with
without sending the entire hash table to the client. The values in       this problem, we partition the cuckoo hash tables into smaller
the hash table are encrypted and stored in the server’s storage.         encrypted blocks (< 96 MB) and perform data insertions within
The hash tables are partitioned into smaller blocks (< 96 MB) so         each block in the enclave. Another factor is the characteristic of
that they can be loaded into the enclave, because the size of the        Intel SGX whereby the processing inside of an enclave is fast,
enclave is limited up to 96 MB. When inserting a new dataset, it         whereas calling a function in the enclave from outside (or vice
is loaded into the enclave along with a hash block. We present           versa) is slow (2x to 2000x slower than regular function calls) [2,
the details in Section 4.1.                                              17]. Further, the argument size for a function call crossing the
    After copying the relevant hash block to the enclave, the men-       border of the enclave is limited to 8 MB for one function call.
dacity operation (Section 3.4) is applied. Because the process in        Therefore, if the block size is set to 96 MB, we need to further
the enclave is not visible to adversaries, they do not know which        divide the block into smaller (< 8 MB) pieces and send from/to
part of the hash table has been updated. The entries updated in          the enclave via multiple function calls, which is time-consuming.
the enclave are encrypted and written back to the main memory.              Figure 7 shows how a new data record being inserted is as-
Note that we can protect the process of perturbing hash entries          signed to a block and data insertion performed on the enclave. As
using the enclave. However, it is possible to observe the updated        can be seen, the data being inserted are allocated to each block
locations of the hash entries by comparing the table before and          according to the hash address calculated in the enclave. Next, we
after the insertion. In particular, if there is no hash collision, the   copy the destination block into the enclave and also pass the data
mendacity operation is not applied, leading to leakage of the            being inserted to the enclave, followed by the execution of data
updated location. Therefore, we perform the mendacity opera-             insertion. In the enclave, the entire block is not decrypted; only
tion by inserting dummy data when no hash collision occurs to            the points where the mendacity operation occurs are decrypted.
perturb the updated location. In addition, the dummy entries are         Subsequently, the updated block is returned to the main memory,
helpful because the adversary cannot guess how many values               and the removed data are sent to the client.
have been inserted from the update locations.
    After the completion of data insertion, the data removed from
the cuckoo hash table are sent back to the client. Even when             4.2    Hiding the access locations in search
the removed data are dummy data, they are sent back to the               To perform a search, as in our previous method, we first obtain
client, and the client can filter the dummies out. This prevents         the volume of results associated with the search key from the
the adversaries from correctly guessing whether the removed              count table and calculate the hash addresses for accessing the
data are dummy data or not by observing whether they are sent            hash tables. If we assume that the hash table will not be updated,
to the client.                                                           it is sufficiently safe to add noise to the count table based on DP
    The proposed method basically solves the problems of previous        to hide the number of results. However, when insertions occur,
methods by using the SGX. In summary, we make the following              an attacker may infer what data are inserted by comparing the
two extensions to our previous method.                                   ciphertext before and after the insertion operation. Suppose that
                                                                         several values are associated with the same key 𝑘𝑒𝑦1 . An attacker
    • Hash table partitioning for handling large amounts of data.
                                                                         can infer the updated address by observing the access pattern
    • Hiding the access location against attacks that use the
                                                                         of queries with the same key (𝑘𝑒𝑦1 ) before and after the latest
      difference in the search results before and after data inser-
                                                                         insertion. Therefore, our method hides the access locations of
      tion.
                                                                         the data in the search. As in the case of insertion, the accessed
   We present the details of the proposed method below.                  data are located in the enclave to hide the access locations. The
                                            Figure 6: Process flow of the proposed method.


                                                                           The volume information is guaranteed to be deferentially pri-
                                                                        vate by adding noise. Let us assume that two accesses occur before
                                                                        and after an insertion, and the adversary observes the difference
                                                                        in the key’s volume. In this case, there are two possibilities:
                                                                            • One addition for an actual (non-dummy) value insertion
                                                                              into the key with probability p.
                                                                            • One addition for a selected dummy key by the randomized
                                                                              response with probability (1 − 𝑝).
                                                                        In other words, even if the number of reads increases by one,
                                                                        the adversary cannot know whether it is an increase by the
                                                                        insertion of actual or dummy data. Even if the adversary can see
                                                                        the increase in the number of reads before and after the insertion,
Figure 7: Partitioning hash tables into blocks to perform               it is not a problem because LDP is guaranteed.
the mendacity operation in an enclave.                                      Next, we describe the security of the mendacity operation to
                                                                        hide the position of the stored data. If there is only one updated
                                                                        position, the position where the actual (non-dummy) data are
process flow of the search operation is as follows: First, the client
                                                                        inserted can be leaked by observing the corresponding position.
sends a request {h(key)} to the server to retrieve the volume of the
                                                                        In our method, at least 𝑘 + 1(𝑘 ≥ 1) positions are updated, where
key. Next, the client sends a request {Enc(key), Enc(volume)} to the
                                                                        𝑘 is a parameter that is set by the user for the number of times
server to retrieve the data from the key and the retrieved volume.
                                                                        to force a mendacity operation to occur. Therefore, the server
The server computes the hash values to identify the block to
                                                                        administrator can only determine where the true data have been
access and the hash addresses in the hash tables to retrieve the
                                                                        inserted with a probability of 1/(k+1) in the worst case. However,
data. Then, the blocks containing the target data are copied into
                                                                        the adversary may be able to ascertain the block to which it
the enclave and the search is performed in turn.
                                                                        is assigned. In our experiment, we found that the maximum
                                                                        capacity of one block is approximately 1000 records. Therefore,
5   SECURITY ANALYSIS
                                                                        it is possible to increase safety by perturbing the blocks as well.
In this section, we evaluate the security of our proposed method.       Specifically, the block that has the true key–value data inserted
In our proposed method, we assume that the server adminis-              can be kept a secret by faking the update of the block.
trator is semi-honest. This semi-honest administrator follows
established protocols but tries to steal the maximum amount of          6   EXPERIMENTAL EVALUATION
data information they can. The adversary (e.g., server adminis-
trator) can observe all systems except the data and operations in       In this section, we present the details of the experiments con-
the enclave. In addition, we assume that the adversary does not         ducted to evaluate the proposed method. The specific purpose
have any background knowledge on the stored data.                       was to evaluate the improvement in the performance of insertion
   With these assumptions as our basis, we analyze whether an           and query by SGX and the influence of the noise addition under
adversary can infer what the data are and where they are inserted       LDP on the communication volume and processing time.
when our proposed insertion operation is employed. Because                 To this end, we conducted the following experiments:
the count table is encrypted via homomorphic encryption, the                • Comparison of the processing times of data insertion
adversary cannot know the value of the count table, although                  using the proposed method, our previous method,
they may know that the value of the count table has been updated.             and a non-secure method using one million artificial
However, when a client gets the values, the client can access the             data records and real-world data. In this experiment,
index as many times as the volume of the key allows. Therefore,               we first evaluated whether the processing time for insert-
the administrator may infer the volume from the number of                     ing records using our proposed approach is sufficient for
accesses to the index.                                                        practical use by comparing our previous method and the
        non-secure method. In addition, we evaluated the impact        values. The proposed method is approximately 12x faster than
        of SGX and LDP on the processing time and the commu-           the previous method.
        nication volume by comparing the proposed method with
        the non-secure method.                                         Table 1: Processing time breakdown of the proposed
      • Comparison of the amount of noise between DP and               method vs the non-secure method.
        LDP. In this experiment, we evaluated the impact of LDP
        on the amount of noise by comparing with DP. Specifically,                                        Proposed        Non-secure
        we compared the amount of noise in the volume when                    Noise addition (ms)           0.0024          0.0000
        noise is added using DP after all data are inserted and when      Updating count table (ms)         0.0221          0.0004
        noise is added using LDP at the time of data insertion.         Updating cuckoo hash table (ms)    10.9438          0.0032
      • Comparison of the search times of the proposed                       Communication (ms)             2.4915          0.6229
        method, Sarvar’s method, and the non-secure method                        Other (ms)               37.2359          0.0000
        using one million artificial data records and real-                                     Uniform
        world data. In this experiment, we first evaluated the                                            Proposed        Non-secure
        effect of the difference in noise between DP and LDP on               Noise addition (ms)           0.0024          0.0000
        the search time by comparing it with Sarvar’s method.             Updating count table (ms)         0.0228          0.0005
        Next, we evaluated the impact of SGX on the search time         Updating cuckoo hash table (ms)    10.9699          0.0033
        by comparing it with the non-secure method.                          Communication (ms)             2.5335          0.6334
  The experimental environment was as follows:                                    Other (ms)               36.7747          0.0000
                                                                                                  Normal
      • Server: Microsoft Azure, Intel(R) Xeon(R) E-2288G CPU                                             Proposed        Non-secure
        3.70 GHz, 4 GB RAM, Ubuntu 18.04.5 LTS                                Noise addition (ms)           0.0024          0.0000
      • Client: MacBook Pro, dual-core Intel Core i7 3.1 GHz , 16         Updating count table (ms)         0.0225          0.0006
        GB 1867 MHz DDR3, macOS (ver. 11.5.2)                           Updating cuckoo hash table (ms)    10.9600          0.0038
The datasets used in the experiments were artificial and real-               Communication (ms)             2.4960          0.6240
world data. The artificial data comprised 1,000 keys and one                      Other (ms)               37.0466          0.0000
million data records. We used two artificial datasets: uniform                                 Real-world
and normal distributions. We used UCI Online Retail II [3] as
real-world data. It has 5,305 keys and 1,067,371 data records.             The total processing time of the proposed method for insertion
                                                                       is approximately 118x slower than that of the non-secure baseline.
6.1     Processing time of data insertion                              Note that in Table 1, the communication time is estimated based
We compared the data insertion processing times of the proposed        on the observation of the communication volume with the non-
method, our previous work, and the non-secure method. We               secure baseline in Figure 9. The communication time of the non-
introduced our previous method Section 3.4. In the non-secure          secure method was obtained by subtracting the processing time
method, all records and indexes are stored in plain text and no        from the total time.
noise entry is added to the count table, although the processing
flow is the same as that of the proposed method.
   Figure 8 shows the average time taken to insert one key–value
pair when a client inserts 1,000 data items using the proposed
and our previous method with artificial data following a uniform
distribution, artificial data following a normal distribution, and
real-world data. When a record is inserted into the multi-map,


                                                                             Figure 9: Communication volume comparison.

                                                                          The communication volume required to insert data is approxi-
                                                                       mately 4x greater than that of the non-secure method. Therefore,
                                                                       the communication time of the proposed method is also four
                                                                       times longer than that of the non-secure method.
                                                                          In Table 1, we can see that the proposed method has a sig-
                                                                       nificant difference in the cuckoo hash table update and other
                                                                       processing times compared to the non-secure method. The rea-
Figure 8: Insertion: proposed method vs previous method.
                                                                       son for this large difference in updating the cuckoo hash table
                                                                       can be attributed to the characteristics of SGX. SGX is known
the hash function is calculated by using the key and the num-          for its fast processing inside the enclave, but very slow (2x to
ber of stored records according to the key. Therefore, the time        2000x) [2, 17] processing of functions calling inside the enclave
required for insertion is not related to the distribution of the key   from outside the enclave (and vice versa) compared to normal
function calls. The other processing time is obtained by subtract-      Figure 10 compares the noise in DP and LDP for different
ing each processing time from the overall processing time. This      datasets: artificial data following a uniform distribution, artificial
is considered to be the processing time for encryption and com-      data following a normal distribution, and real-world data. The
pounding. In the proposed method, the encryption-combination         results show that the amount of noise in LDP is large compared
is performed 11 times on average for each data insertion. This       to DP. In the case of experiments using artificial data, about 1,000
results in a large overhead.                                         noise units are assigned to each key compared to DP. In the case
   Thus, although there is more overhead in each process com-        of experiments using real-world data, about 200 noise units are
pared to the non-secure method, we believe that the fact that        assigned to each key compared to DP. This is because, in LDP, the
the time required for data insertion is kept at 0.5 seconds while    amount of noise is proportional to the number of data records. In
ensuring the high security of our method is practical.               the proposed method, noise is only added to the number of counts
                                                                     in the count table, but not to the index. Therefore, the amount
6.2    Noise comparison between DP and LDP                           of this noise does not affect the amount of spatial computation.
                                                                     However, because the search is performed based on the number
                                                                     of count tables, this amount of noise will affect the search time.
                                                                     Therefore, in Section 6.3, we evaluate the impact on the search
                                                                     time.

                                                                     6.3     Processing time of data search
                                                                     We conducted search experiments for the proposed method and
                                                                     Sarvar’s method. Figure 11 compares the average time and the
                                                                     average communication volume by the client to search all the
                                                                     keys for each method in which artificial data following a uniform
                                                                     distribution, artificial data following a normal distribution, and
                                                                     real-world data are stored. The figure shoiws that the amount


                    Uniform distribution data


                                                                           Search time for uniform        Communication volume of
                                                                                 distribution              the search for uniform
                                                                                                                 distribution


                     Normal distribution data
                                                                           Search time for normal         Communication volume of
                                                                                distribution                the search for normal
                                                                                                                 distribution


                                                                        Search time for real-world        Communication volume of
                                                                                  data                     the search for real-world
                                                                                                                     data

                         Real-world data                             Figure 11: Average search times and average communica-
                                                                     tion volumes.
Figure 10: Graph comparing the amount of noise between
DP and LDP.
                                                                     of noise in the proposed method is about two times lesser than
Sarvar’s method, whereas the search time of the proposed method        ACKNOWLEDGEMENTS
is about five times slower than Sarvar’s method. This is due to        This work was supported by JSPS KAKENHI Grant Numbers
the overhead caused by SGX. As mentioned in Section 4.2, the           JP20362832.
proposed method processes search operations in the enclave
area to hide the access location during searches. Therefore, we
compared the processing times incurred to search data from the
                                                                       REFERENCES
                                                                        [1] 2020.        X-Force Threat Intelligence Index Reveals Top Cyberse-
cuckoo hash table on the server side. Figure 12 shows the results.          curity Risks of 2020.                 https://securityintelligence.com/posts/
As mentioned in Section 6.1, SGX has a characteristic wherein               x-force-threat-intelligence-index-reveals-top-cybersecurity-risks-of-2020/.
                                                                        [2] Sergei Arnautov, Bohdan Trach, Franz Gregor, Thomas Knauth, Andre Martin,
                                                                            Christian Priebe, Joshua Lind, Divya Muthukumaran, Dan O’Keeffe, Mark L.
                                                                            Stillwell, David Goltzsche, Dave Eyers, Rüdiger Kapitza, Peter Pietzuch, and
                                                                            Christof Fetzer. 2016. SCONE: Secure Linux Containers with Intel SGX. In
                                                                            12th USENIX Symposium on Operating Systems Design and Implementation
                                                                            (OSDI 16). USENIX Association, Savannah, GA, 689–703. https://www.usenix.
                                                                            org/conference/osdi16/technical-sessions/presentation/arnautov
                                                                        [3] Daqing Chen. 2019. Online Retail II. UCI Machine Learning Repository.
                                                                        [4] Victor Costan and Srinivas Devadas. 2016. Intel SGX Explained. Cryptology
                                                                            ePrint Archive, Report 2016/086. https://eprint.iacr.org/2016/086.
                                                                        [5] Victor Costan, Ilia Lebedev, and Srinivas Devadas. 2016.             Sanctum:
                                                                            Minimal Hardware Extensions for Strong Software Isolation. In 25th
                                                                            USENIX Security Symposium (USENIX Security 16). USENIX Association,
                                                                            Austin, TX, 857–874. https://www.usenix.org/conference/usenixsecurity16/
                                                                            technical-sessions/presentation/costan
                                                                        [6] Natacha Crooks, Matthew Burke, Ethan Cecchetti, Sitar Harel, Rachit Agar-
                                                                            wal, and Lorenzo Alvisi. 2018. Obladi: Oblivious Serializable Transactions
                                                                            in the Cloud. In 13th USENIX Symposium on Operating Systems Design
                                                                            and Implementation (OSDI 18). USENIX Association, Carlsbad, CA, 727–743.
                                                                            https://www.usenix.org/conference/osdi18/presentation/crooks
                                                                        [7] Reza Curtmola, Juan Garay, Seny Kamara, and Rafail Ostrovsky. 2011. Search-
Figure 12: Comparison of cuckoo hash table search times.                    able symmetric encryption: Improved definitions and efficient constructions.
                                                                            Journal of Computer Security 19 (01 2011), 895–934. https://doi.org/10.1145/
                                                                            1180405.1180417
                                                                        [8] Sanjeev Das, Jan Werner, Manos Antonakakis, Michalis Polychronakis, and
the processing inside the enclave is fast, while calling a function         Fabian Monrose. 2019. SoK: The Challenges, Pitfalls, and Perils of Using
                                                                            Hardware Performance Counters for Security. 20–38. https://doi.org/10.1109/
inside the enclave from outside the enclave (or vice versa) is              SP.2019.00021
very slow. Thus, the processing time is slow when compared to           [9] Cynthia Dwork. 2006. Differential Privacy. In Automata, Languages and
                                                                            Programming, Michele Bugliesi, Bart Preneel, Vladimiro Sassone, and Ingo
Sarvar’s method. However, Sarvar’s method searches for the key              Wegener (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 1–12.
based on a uniquely defined token generated from the key, and          [10] Paul Grubbs, Marie-Sarah Lacharite, Brice Minaud, and Kenneth G. Pater-
there is a risk that the provider may identify the key. In contrast,        son. 2018. Pump up the Volume: Practical Database Reconstruction from
                                                                            Volume Leakage on Range Queries. In Proceedings of the 2018 ACM SIGSAC
the proposed method performs the search process in the enclave,             Conference on Computer and Communications Security (Toronto, Canada) (CCS
and thus the search is performed in secrecy.                                ’18). Association for Computing Machinery, New York, NY, USA, 315–331.
                                                                            https://doi.org/10.1145/3243734.3243864
                                                                       [11] Hakan Hacigümüş, Bala Iyer, Chen Li, and Sharad Mehrotra. 2002. Executing
7   CONCLUSION AND FUTURE WORK                                              SQL over Encrypted Data in the Database-Service-Provider Model. In Proceed-
                                                                            ings of the 2002 ACM SIGMOD International Conference on Management of Data
In this paper, we proposed a method that supports data insertion            (Madison, Wisconsin) (SIGMOD ’02). Association for Computing Machinery,
in encrypted multi-maps using a trusted execution environment               New York, NY, USA, 216–227. https://doi.org/10.1145/564691.564717
                                                                       [12] Shunta Ishihara, Chiemi Watanabe, and Toshiyuki Amagasa. 2021. Supporting
(TEE), such as Intel SGX. In the proposed method, the tables are            Insertion in Encrypted Multi-Maps with Volume Hiding. In IEEE International
partitioned into blocks and each block is processed independently           Conference on Smart Computing, SMARTCOMP 2021, Irvine, CA, USA, August
                                                                            23-27, 2021. IEEE, 264–269. https://doi.org/10.1109/SMARTCOMP52413.2021.
to cope with the limited memory space (96 MB) in Intel SGX.                 00058
Our experimental results indicate that it takes about 50 ms to         [13] Mohammad Saiful Islam, Mehmet Kuzu, and Murat Kantarcioglu. 2012. Ac-
insert a key–value pair. This is a speedup of about 12 times when           cess pattern disclosure on searchable encryption: Ramification, attack and
                                                                            mitigation. In in Network and Distributed System Security Symposium (NDSS.
compared to our previous method. In addition, we think that it is      [14] Seny Kamara and Tarik Moataz. 2019. Computationally Volume-Hiding
practical to have the data insertion time at 50 ms while keeping            Structured Encryption. 11477 (2019), 183–213. https://doi.org/10.1007/
the key volume and update location secret.                                  978-3-030-17656-3_7
                                                                       [15] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhod-
   The time taken by the search process is about 5x longer than             nikova, and Adam Smith. 2011. What Can We Learn Privately? SIAM
that of Sarvar’s method. This is due to the search process using            J. Comput. 40, 3 (2011), 793–826.          https://doi.org/10.1137/090756090
                                                                            arXiv:https://doi.org/10.1137/090756090
Intel SGX as well as the difference in the amount of noise. In         [16] Georgios Kellaris, George Kollios, Kobbi Nissim, and Adam O’Neill. 2016.
Sarvar’s method, the search positions are not secret, whereas in            Generic Attacks on Secure Outsourced Databases. In Proceedings of the 2016
our proposed method, the search positions are secret. Therefore,            ACM SIGSAC Conference on Computer and Communications Security (Vienna,
                                                                            Austria) (CCS ’16). Association for Computing Machinery, New York, NY, USA,
we believe that our method is more useful when dealing with                 1329–1340. https://doi.org/10.1145/2976749.2978386
highly sensitive data.                                                 [17] Taehoon Kim, Joongun Park, Jaewook Woo, Seungheun Jeon, and Jaehyuk
   In future work, we would like to make more effective use of the          Huh. 2019. ShieldStore: Shielded In-Memory Key-Value Storage with SGX.
                                                                            In Proceedings of the Fourteenth EuroSys Conference 2019 (Dresden, Germany)
remaining enclave area. In this method, only 8 MB of the enclave            (EuroSys ’19). Association for Computing Machinery, New York, NY, USA,
area is used, owing to the argument-size limitation of Intel SGX.           Article 14, 15 pages. https://doi.org/10.1145/3302424.3303951
                                                                       [18] Adam Kirsch, Michael Mitzenmacher, and Udi Wieder. 2009. More Robust
We are considering using the remaining area as a cache. In real-            Hashing: Cuckoo Hashing with a Stash. SIAM J. Comput. 39, 4 (Dec. 2009),
world workloads, there are many cases where access to keys is               1543–1561.
heavily skewed. Thus, we would like to cache frequently accessed       [19] Sangho Lee, Ming-Wei Shih, Prasun Gera, Taesoo Kim, Hyesoon Kim, and Mar-
                                                                            cus Peinado. 2017. Inferring Fine-grained Control Flow Inside SGX Enclaves
keys in the enclave, instead of storing them in the cuckoo hash             with Branch Shadowing. In 26th USENIX Security Symposium (USENIX Security
table, to speed up insertion and retrieval.                                 17). USENIX Association, Vancouver, BC, 557–574. https://www.usenix.org/
     conference/usenixsecurity17/technical-sessions/presentation/lee-sangho
[20] Chang Liu, Liehuang Zhu, Mingzhong Wang, and Yu-An Tan. 2014. Search
     Pattern Leakage in Searchable Encryption: Attacks and New Construction.
     Inf. Sci. 265 (may 2014), 176–188. https://doi.org/10.1016/j.ins.2013.11.021
[21] R. Pagh and Flemming Friche Rodler. 2004. Cuckoo hashing. J. Algorithms 51
     (2004), 122–144.
[22] Sarvar Patel, Giuseppe Persiano, Kevin Yeo, and Moti Yung. 2019. Mitigating
     Leakage in Secure Cloud-Hosted Data Structures: Volume-Hiding for Multi-
     Maps via Hashing. In Proceedings of the 2019 ACM SIGSAC Conference on
     Computer and Communications Security (London, United Kingdom) (CCS ’19).
     Association for Computing Machinery, New York, NY, USA, 79–93. https:
     //doi.org/10.1145/3319535.3354213
[23] Ming-Wei Shih, Sangho Lee, Taesoo Kim, and Marcus Peinado. 2017. T-
     SGX: Eradicating Controlled-Channel Attacks Against Enclave Programs.
     In Network and Distributed System Security Symposium 2017 (NDSS’17) (net-
     work and distributed system security symposium 2017 (ndss’17) ed.). In-
     ternet Society.     https://www.microsoft.com/en-us/research/publication/
     t-sgx-eradicating-controlled-channel-attacks-enclave-programs/
[24] Chenghong Wang, Johes Bater, Kartik Nayak, and Ashwin Machanavajjhala.
     2021. DP-Sync: Hiding Update Patterns in Secure Outsourced Databases
     with Differential Privacy. Proceedings of the 2021 International Conference on
     Management of Data (Jun 2021). https://doi.org/10.1145/3448016.3457306
[25] Jinwen Wang, Yueqiang Cheng, Qi Li, and Yong Jiang. 2018. Interface-Based
     Side Channel Attack Against Intel SGX. arXiv:1811.05378 [cs.CR]
[26] Wenhao Wang, Guoxing Chen, Xiaorui Pan, Yinqian Zhang, XiaoFeng Wang,
     Vincent Bindschaedler, Haixu Tang, and Carl A. Gunter. 2017. Leaky Cauldron
     on the Dark Land. Proceedings of the 2017 ACM SIGSAC Conference on Computer
     and Communications Security (Oct 2017). https://doi.org/10.1145/3133956.
     3134038