=Paper=
{{Paper
|id=Vol-1558/paper36
|storemode=property
|title=Access Pattern Confidentiality-Preserving Relational Databases: Deployment Concept and Efficiency Evaluation
|pdfUrl=https://ceur-ws.org/Vol-1558/paper36.pdf
|volume=Vol-1558
|authors=Alexander Degitz,Jens Köhler,Hannes Hartenstein
|dblpUrl=https://dblp.org/rec/conf/edbt/DegitzKH16
}}
==Access Pattern Confidentiality-Preserving Relational Databases: Deployment Concept and Efficiency Evaluation==
Access Pattern Confidentiality-Preserving Relational Databases: Deployment Concept and Efficiency Evaluation Alexander Degitz, Jens Köhler, Hannes Hartenstein Karlsruhe Institute of Technology (KIT) - Steinbuch Centre for Computing & Institute of Telematics Karlsruhe, Germany {alexander.degitz, jens.koehler, hartenstein}@kit.edu ABSTRACT is a special case of confidential data outsourcing as the out- In this paper, we address the question whether access pat- sourced data does not only have to be protected but also tern confidentiality-preserving databases with an underlay- has to be efficiently searchable. To achieve that, a variety of ing B-tree index structure are feasible in practice by propos- confidentiality preserving indexing approaches (CPIs) were ing integrative deployment concepts that support important proposed that allow to efficiently evaluate database queries database query functionalities based on ORAM and shuffled on outsourced encrypted data [12]. Most existing CPIs only B-trees. Furthermore, we provide a rigorous efficiency eval- protect the confidentiality of data at rest, i.e., data that uation to determine the cost of the proposed concepts with is persisted by the SP. However, in reality, the SP is also regard to storage, network, and query latency as well as able to monitor database queries and the encrypted query to investigate the influence of scenario factors like database results, as well as database record inserts, updates, and dele- size, number of records, and network bandwidth. In par- tions. Such observations can be used to reveal confidential ticular, we show that ORAM-based concepts only cause an data. For instance, if the SP observes that two encrypted overhead of factor 5.9 for evaluating equality conditions on records 1 and 2 are the result of a query for all records which a database with up to 10 million records. contain an attribute value X, most CPIs guarantee that the plaintext attribute value X is hidden from the SP. However, the SP can deduce that the two returned records contain Categories and Subject Descriptors the same attribute value. The SP might then be able to ap- I.8 [Database and storage security]: Management and ply background knowledge like “record 1 contains attribute querying of encrypted data; H.2.4 [Information systems]: value X” to learn that “record 2 contains attribute value X”. Information storage systems; Cloud based storage In particular, to protect against SPs that can monitor queries, a CPI has to guarantee that the SP is not able to de- termine which encrypted records matched an executed query 1. INTRODUCTION and the SP is not able to correlate executed queries (access Many potential cloud customers are reluctant to make use pattern confidentiality). Both properties can be trivially of Database-as-a-Service (DaaS) offerings and to outsource achieved by retrieving the entire database for each query confidential data to cloud storage providers (SPs) due to the to make the queries indistinguishable. This induces signif- possibility that the SPs do not behave honestly or do not ap- icant efficiency costs in terms of latency, transmission, and ply sufficient protective measures to protect the data against computation. Oblivious RAM (ORAM) approaches [5, 9, third party attackers. Enforcing the confidentiality of data 13, 4, 15] allow to store and retrieve records based on fixed before it is outsourced to the SP mitigates the risk of a data identifiers and enforce pattern and access confidentiality by confidentiality breach. With the advent of the internet of selectively retrieving, re-encrypting, and re-submitting spe- things, it is expected that users are not even aware of who cific parts of the outsourced database in an interactive way. will store their data anymore [2]. Thus, techniques that en- However, ORAM schemes do not support the search capa- force data confidentiality in the trusted domain of the user bilities that are required by relational databases and, as of before the data is outsourced will gain even more impor- now, it is unclear whether their efficiency is acceptable for tance. For the sake of readability and without loss of gener- relational database scenarios1 . ality, we abstract from the specific nature of the attacker and In this paper, we investigate the question whether preserv- assume that the SP aims to breach data confidentiality in ing access PATtern CONFidentiality in relational Data- the following. Outsourcing confidential relational databases Bases is practically feasible by proposing PATCONFDB, a deployment concept for ORAM schemes in the database context that supports common database searches and pro- vides pattern and access confidentiality. We show that PAT- CONFDB provides confidentiality guarantees against hon- est-but-curious attackers who can view the persisted data and monitor the executed database queries, including record (c) 2016, Copyright is with the authors. Published in the Workshop Pro- insert, update, as well as delete operations. We show that by ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor- deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this 1 paper is permitted under the terms of the Creative Commons license CC- http://outsourcedbits.org/2013/12/20/how-to-search-on- by-nc-nd 4.0 encrypted-data-part-4-oblivious-rams/ making well-considered use of ORAM protocols, it is possi- ORAM to encrypt relational databases is feasible with re- ble to achieve significant efficiency benefits compared to re- gard to efficiency. In this paper, we do not propose a new trieving the entire database for each query. Furthermore, we ORAM scheme, but PATCONFDB, a concept on how to use compare the efficiency of PATCONFDB to previously pro- existing ORAM schemes to enable the execution of search posed shuffled B-tree approaches [7, 17, 3] which also aim queries on relational databases. to provide access pattern confidentiality, but are not able to Encrypted B-trees [3] were proposed to enforce the con- provide strict security guarantees. In order to make a shuf- fidentiality of data at rest. To increase the computational fled B-tree approach comparable, we extend it to support cost of inference attacks based on access patterns, shuffling various query types that are prevalent in database query the B-tree after each database query was proposed [6, 7, workloads. The main contributions of this paper are: 17]. Note that, despite making access pattern based infer- ence attacks harder, the Shuffle Index does not provide strict • PATCONFDB, a concept to efficiently apply ORAM security guarantees like ORAM does. Furthermore, to the protocols in DaaS settings that supports searching best of our knowledge, no approach based on shuffled B-trees a database for records which match specified equality, provides the ability to insert, update or delete data. In this range, and prefix query conditions. paper, we extend the Shuffle Index approach [6] to support • A concept to enhance shuffled B-tree approaches them. Our extensions can be applied to all schemes that to be used in DaaS settings, so that update operations as make use of a shuffled B-tree. well as range and prefix queries can be executed. • An efficiency evaluation that shows that significant 3. DEPLOYMENT REQUIREMENTS performance increases are possible by considerately ap- plying ORAM protocols. Furthermore, we compare the 3.1 Database Functionality Requirements PATCONFDB approach to the less secure shuffled B- Databases typically support inserts, updates, and dele- tree approach and provide guidelines for choosing the tions of database records. Furthermore, typically it is re- most suitable indexing approach for a given out- quired to search databases for specific records. The struc- sourcing scenario. tured query language (SQL) specifies a variety of different query types. In the following, we consider the following In Section 2 related work is discussed. In Section 3, the types of queries: specific requirements of relational databases for deployment concepts of access pattern confidentiality-preserving schemes 1. Equality selection: Query for records that contain are summarized. In Section 4, we introduce PATCONFDB, a specific attribute value A. an ORAM-based access pattern confidentiality-preserving Example: SELECT * WHERE Name = ’Andy’; indexing approach, and show how shuffled B-tree indexes can 2. Prefix selection: Query for records containing an be extended to also satisfy the requirements. We provide an attribute value that starts with prefix A. efficiency evaluation of the proposed schemes in Section 5. Example: SELECT * WHERE Name LIKE ’An%’; Furthermore, we discuss possible functionality extensions for 3. Range selection: Query for records that contain an our deployment concepts Section 6 and conclude the paper attribute value that is smaller than value A and bigger in Section 7. than value B. Example: SELECT * WHERE Name Between ’An’ 2. RELATED WORK AND ’Ce’; Many protocols have been proposed to efficiently perform We discuss the challenges of providing access and pattern keyword searches over encrypted data, i.e. [11, 1, 12]. While confidentiality for additional query types in Section 6. these approaches ensure content confidentiality, they leak the access patterns of queries. Based on an exemplary data- 3.2 Confidentiality Requirements base, Islam et al. [10] showed that, with a small amount An approach to securely outsource confidential databases of background knowledge about the data, up to 80% of the has to enforce both content and access pattern confidentiality encrypted data can be revealed by an attacker who watches to protect against attackers that are able to monitor queries the query and the corresponding results through a frequency on the outsourced data. analysis. This brought forward the need for CPIs that also Content confidentiality: A database outsourcing ap- ensure access pattern confidentiality. proach provides content confidentiality if an attacker that Oblivious RAM (ORAM) was first proposed by Goldreich is able to view the outsourced data is not able to learn the and Ostrovsky as a way to ensure software protection [8]. content of the database’s records. ORAM prevents that an attacker who observes the RAM Access pattern confidentiality: We adopted our def- learns any information about the RAM access patterns of ex- inition of access pattern confidentiality from [16]. An ap- ecuted programs. Improved schemes were proposed over the proach provides access pattern confidentiality if an attacker last few years which have put a focus on ORAM to be a con- that monitors database queries and query results as well as siderable alternative when it comes to secure data outsourc- database insert, update, and deletions is not able to tell 1) ing [5, 9, 13, 4, 15], i.e., storing and retrieving data records which parts of the database were accessed by a database based on fixed identifiers. Outsourced relational databases, operation, 2) when a part of a database was last updated, on the other hand, are more complex and require efficient 3) whether specific data was repeatedly accessed, and 4) search for database records that contain a specific attribute whether the database was queried or updated. This partic- value or contain attribute values that fall in a specific range. ularly means that even a series of accesses to the same record ORAM schemes do not support such search queries and, to is unrecognizable for attackers and therefore does not leak the best of our knowledge, it is unclear whether applying any information. 4. DEPLOYMENT CONCEPTS J 4.1 PATCONFDB Index Layer 1 ORAM Instance In this section, we introduce the PATCONFDB concept D I O T that uses existing ORAM schemes as building blocks to sup- port searches on outsourced data while enforcing access pat- Index Layer 2 tern confidentiality. The interface of all existing ORAM ORAM Instance schemes is ORAM.get(ID) and ORAM.put(ID, block), i.e., A C F H J K Q S V X data blocks can be stored and retrieved based on a fixed ID. All existing ORAM schemes share the following char- acteristic. To retrieve a stored data block, first a set of r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16 encrypted data block containers (DC) has to be downloaded Record Store ORAM Instance and decrypted. The decrypted data blocks then have to be re-encrypted and the resulting encrypted DCs have to be Figure 1: Hierarchy of the ORAM instances in PAT- re-uploaded to the SP. PATCONFDB is agnostic to the un- CONFDB. derlying ORAM scheme and initializes multiple instances of the used ORAM scheme. These ORAM instances are inde- nism as described for equality selections. That is, for each pendent from each other. Each ORAM instance hides the attribute value that lies in the queried range, an equality content and the access pattern of the database from the SP. selection query is evaluated and the results of these queries In most database scenarios, records have to be searchable are aggregated to the result of the range/prefix selection. A based on record attributes. This creates the need for indexes prefix selection can be considered as a special kind of range to be outsourced to an external SP. We use a B-tree index selection, that spans over every attribute value of records structure for PATCONFDB so that the efficiency evalua- inside the queried prefix. tion is better comparable to the Shuffle Index, which also Insert / delete / update. To insert a new record, an uses a B-tree. To search for records that contain a specific equality selection for the attribute value of the new record attribute value, the highest level node of the B-tree is re- is performed, the set of identifiers of records that contain trieved. The retrieved node can be used to determine which the attribute value is retrieved and the identifier of the in- node of the next lower B-tree layer is closest to the queried serted record is added before re-encrypting and re-uploading value. This process is repeated until the leaf of the B-tree is the data to the corresponding index layers. Furthermore, reached, which contains the record identifiers of the records the records that matched the equality selection are retrieved that contain a matching attribute value. from the record store ORAM instance and the new record Each node of a B-tree can be encrypted to protect the con- is inserted in the record store when re-encrypting and re- fidentiality of the data at rest. However, the attacker would uploading the retrieved records. Deleting a record works still learn access patterns and distinguish different types of analogously, by removing the record before re-encrypting queries by monitoring which nodes were retrieved to eval- and re-uploading the retrieved records and the record iden- uate a given query. To hide which nodes were retrieved, tifiers. Updating a record is considered a concatenation of a PATCONFDB stores every layer of the B-tree in a separate delete and an insert operation. ORAM instance as shown in Figure 1. For the remainder of As ORAM schemes allow to store and retrieve data blocks this paper we call the ORAM instances that contain the in- based on a fixed ID but shuffle the stored encrypted data dex layers and the ORAM instance that contains the actual blocks within the DCs with every access, each ORAM in- data records record store. In the following, we describe how stance has to map the fixed block IDs to the current location PATCONFDB supports the database functionality that is within the ORAM instance to determine which encrypted described in Section 3. data blocks have to be returned. This is illustrated in Fig- Equality selections. To retrieve all records that contain ure 2a. For instance, to retrieve node 1 (containing “D” and a specific attribute value, first the root node of the B-tree has “I”) of index layer 1, the encrypted data block 2 has to be to be retrieved and decrypted. Based on the decrypted node retrieved from the ORAM instance. Thus, data block ID 1 it can be determined which node X has to be retrieved from has to be mapped on the encrypted data block ID 2 first. the next lower layer in the B-tree. To retrieve X, the ORAM To perform this mapping without harming access pattern instance that corresponds to X’s layer has to be queried. If confidentiality, more data has to be transmitted and further the B-tree is stored in n ORAM instances, n ORAM in- round trips are necessary between client and SP (see [14]). stances have to be accessed to retrieve the B-tree’s leaf node The overhead for mapping fixed data block IDs on ORAM that contains the identifier of the records that match the instance locations can be avoided in the PATCONFDB case. database query. Based on the record identifiers, the match- As shown in Figure 2b, the DC ID can be directly stored ing records can be retrieved from the record store ORAM in the corresponding B-tree node of the next upper layer. instance. If more than one record matches the query, the Thus, when executing a query, the IDs of DCs that have equality selection has to be evaluated again for each match- to be retrieved from the next layer are already known to ing record before retrieving it from the record store, even if the client without any further mapping. For instance, after all identifiers of matching records are already known. This retrieving and decrypting the root node (containing “J”), mechanism is needed to keep queries indistinguishable, as the client already knows that the left B-tree node of the explained in Section 4.2. next lower layer is contained in the encrypted data block 2. Prefix / range selections. To keep the type of queries As ORAM requires to shuffle data blocks and to store indistinguishable, a prefix selection is executed as a sequen- them in different DCs after each retrieval, data blocks of tial series of equality selections through the same mecha- the lower layers have to be re-encrypted and re-uploaded Data block ID J riodically so that bursts in queries could not be detected. Data block container ID 1 2 Of course, this also generates a large overhead in network Index Layer 1 ORAM Instance 1 2 traffic and query latency. 2 1 12 D I Position O T Insert / delete / update. Since insert, update, and Map 21 3 4 5 1 2 delete operations are achieved by equality selection queries, Index Layer 2 ORAM Instance 1 2 3 4 5 attackers are not able to distinguish them from equality se- 2 5 4 1 3 lection queries and, thus, prefix/range selection queries. Position 12345 Map 41532 F H V X Q S A C J K Since every query or write operation on PATCONFDB leads to the same pattern of successive indistinguishable op- (a) ORAM index layers erations on ORAM instances, the SP is not able to monitor J access patterns. Thus, PATCONFDB enforces access pat- 2 1 tern confidentiality. Index Layer 1 ORAM Instance 1 2 O T D I 4.3 Shuffled B-tree Index Extensions 5 3 2 4 1 To enhance access pattern confidentiality for the retrieval Index Layer 2 ORAM Instance 1 2 3 4 5 of a single record, three methods are used by existing shuf- F H V X Q S A C J K fled B-tree approaches [6, 7, 17]: 1) Cover searches, as seen in Figure 3, are randomly chosen nodes that are retrieved (b) Efficiency improved ORAM index layers in parallel to the nodes that are actually relevant for the executed query in each layer. For example, if two cover Figure 2: Efficiency optimized index layers. searches are executed, attackers see three retrieved nodes for every layer of the Shuffle Tree. Attackers cannot distin- prior to data blocks of the upper layers. When a query is guish between cover searches and nodes that are relevant for executed, first data blocks have to be retrieved from index the executed query. 2) Node caching at the trusted client is layer 1 to get the B-tree node that contains the ID of the used to increase access pattern obfuscation. After a node is DC that has to be retrieved from index layer 2. But the being accessed, it will replace the least recently used node data blocks in index layer 1 cannot be written back right in the client-side cache. All nodes on the path from the root away, as – due to shuffling – the data blocks of index layer node to the leaf node are stored inside the cache. If a node 2 are likely to be stored in different DCs than before. Only that is in the cache is retrieved, an additional cover search after new DCs have been assigned to the data blocks of the is executed so an attacker cannot infer that the node was record store ORAM instance, the data blocks in the upper inside the cache. 3) Node shuffling is used to increase the index layers can be updated and written back recursively. difficulty for attackers to learn the parent-child relationship of the nodes. After each query execution, the content of all 4.2 Security of PATCONFDB nodes stored on the client is shuffled using a random permu- The PATCONFDB concept consists of multiple ORAM tation before they are written back to the encrypted B-tree. instances. We assume that the utilized ORAM scheme to Node shuffling is outlined in Figure 3. create each ORAM instance satisfies the ORAM security In the remainder of this paper we denote the leaf nodes of notion, i.e., any read or write operations on the data within the Shuffle Tree as data nodes and the non-leaf nodes as nav- the instance are indistinguishable. We now examine the se- igation nodes. In the following, we show how the concept of curity implications of executing queries across multiple index shuffled B-trees can be extended to not only support equal- layers, i.e., multiple ORAM instances. ity selections, but also range and prefix selections as well as Selection queries. As shown in Section 4.1, sequential database modifications, i.e., insert, update, and deletion of equality selections are used to evaluate range and prefix se- database records. lections. Thus, it suffices to show that equality selections Insert / delete. Current Shuffle Index concepts [6, 17] are indistinguishable from the perspective of the SP. Every assume an upfront knowledge of all database records. Know- ORAM instance has to be queried exactly once to evaluate ing the exact distribution of records is not possible in most an equality selection, so honest-but-curious attackers see a real world scenarios. Thus, concept extensions that allow constant number of indistinguishable ORAM accesses. From to insert, update, and delete records are necessary. We pro- this, they learn the number of index layers, but they neither pose a modified shuffled B-tree scheme with a dynamically learn the parent-child-relationship of nodes in the B-tree nor expanding and shrinking B-tree that is initialized with only the total size of data stored in the database. If an equal- a small amount of navigation nodes. ity selection matches multiple records, the query is executed The concept to insert records is outlined in Figure 4. over all index layers for each matching record. Otherwise, Whenever a record is inserted, an equality selection query it would be possible for an attacker to observe the number for the attribute value of the new record is performed to find of matching records by counting the sequential accesses to the node X to store the record in. If this node X is full, a new the record store ORAM instance. Attackers could still mon- node N is created to store the record in. The reference to N itor bursts in queries, but this could as likely be caused by is stored in the parent node of node X. If the parent node is an equality selection that has matched many records as by full as well, another new node is inserted on the next higher a prefix or range selection. So the type of selection query layer of the B-tree. This works recursively to the root of the remains hidden from attackers. The frequency of queries tree, which does not have any size limitations, because it can still be observed by the attacker. Access frequency is does not need to be of a certain size to be indistinguishable an information leak that can be found in all current ORAM from any other node. Up to half of the records in the full schemes. One solution would be to query the database pe- node X are copied to the new node N while inserting the Storage position in shuffled B-tree 1 nodes is shuffled, padded, and encrypted with a new nonce Database query J Cover search 3 2 in every access. Yang et al. [17] published a proof that, by 2 3 using the shuffle mechanism, the probability that a node is O T D I the child of a parent node X is equal to the probability of 8 6 5 7 4 it being the child of any other parent node Y after a large enough number of accesses were performed after the last 4 5 6 7 8 retrieval of X. Since this number of accesses is not further F H V X Q S A C J K specified, it remains unclear when shuffled B-trees enforce access pattern confidentiality sufficiently. (a) Query execution with one cover search To use the Shuffle Index in relational databases, methods to insert, update and delete data had to be implemented. Storage position in shuffled B-tree 1 As described in Section 4.3, this leads to the expansion and Database query J Cover search 3 2 shrinkage of the Shuffle Tree over time. Therefore, informa- 2 3 tion on the size of the database is leaked and the insert, up- O T D I date, as well as delete operations are distinguishable. Thus, 8 7 5 6 4 the fourth access pattern confidentiality requirement defined in Section 3.2 is not satisfied. However, this does not lead 4 5 6 7 8 to the breach of the other three confidentiality requirements. F H V X A C Q S J K Besides the total size of the database, attackers can also infer how many records can be stored in one node by monitoring (b) Shuffled B-tree after node shuffling a series of inserts and track how often a new node is being created. Users have to decide if these information leaks are Figure 3: Shuffled B-tree: Query execution with cover acceptable for their specific scenarios. searches. Prefix / range selections. The performance optimiza- tion for prefix/range selection queries as proposed in Sec- new record. After the insertion, the nodes are re-encrypted, tion 4.3 introduces two minor information leaks. Attack- shuffled and re-uploaded in the same way they would have ers can distinguish range and prefix selection queries from been in case of a regular equality selection query. The dele- other database queries and they can estimate the amount of tion of a record works similar. When the last record in a records that matched a prefix/range query to a certain de- data node is deleted, the data node itself and its reference gree. Attackers can observe the total number of data nodes in the parent node is deleted. that were retrieved, but they do not know how many of those Update. To ensure the alphabetical sorting of all records, nodes are cover nodes. We argue that both information leaks an update that changes the attribute value of the record are acceptable since the type of query is distinguishable any- works as a deletion of the old record followed by an insertion way due to the dynamically expanding shuffled B-tree. The of a new one. amount of retrieved records would have also been observable Prefix / range selections. Since insert, update and to a certain degree without this optimization, because bursts delete operations are already distinguishable from selections, in queries correlate to the amount of retrieved records. it is not necessary to make prefix and range selections in- distinguishable from equality selections. Therefore, we can use the following method to enhance the performance of 5. EFFICIENCY EVALUATION prefix and range selections. Instead of retrieving only one data node, every data node inside one navigation node that 5.1 Query Latency Overhead matches the query can be downloaded. Even though x cover We evaluated the query latency overhead induced by the nodes have to be retrieved to obfuscate each data node that use of PATCONFDB and the modified shuffled B-tree by was actually retrieved, this method significantly reduces the measuring the query latency of equality, prefix, and range network traffic and the number of sequential data node re- selections. For this evaluation, we used the state-of-the- trievals, as discussed in Section 5.2. art Path ORAM [16] scheme as ORAM scheme for PAT- CONFDB. Path ORAM organizes the ORAM DCs in a bi- 4.4 Security of Shuffled B-trees nary tree. Each leaf of this binary tree is assigned a unique position and each data block is assigned to a position. The The security guarantees of the Shuffle Index are based invariant of Path ORAM states that every data block with on the assumption that attackers are not able to determine an assigned position p is stored on the path from the root DC the parent-child relationship of nodes in different layers of to the leaf DC with position p. So for every read or write the Shuffle Tree. To achieve this, the content of all queried operation on a Path ORAM instance with n DCs, log(n) DCs are retrieved from the SP. After the operation the DCs are re-encrypted with a nonce and re-uploaded to the SP. Al Ar Al An Ar We instantiated PATCONFDB with two index layers and Alex Armin Alex Andy Armin one record store. During our measurements, we optimized Alois Arthur Alois Anna Arthur the capacity of DCs in each layer, resulting in container Andy - Amy - - capacities between 10 and 100 data blocks. Our shuffled Anna - - - - B-tree extensions are implemented with a Shuffle Index [6] as described in Section 4.3 with two cover searches, a cache Figure 4: An excerpt of a Shuffle Tree before and after the size of five and a Shuffle Tree height of three. In our mea- insertion of the record ’Amy’. surements, we found these parameter choices to have the DB size Records Bandwidth and latency 100000 Baseline ShuffleIndex > time limit > time limit Scenario 1 1 GB 1 mio 1 Gbit/s and 5 ms DBCopy PATCONFDB Scenario 2 1 GB 1 mio 10 Mbit/s and 50 ms 10000 Scenario 3 1 GB 10.000 1 Gbit/s and 5 ms 1000 Scenario 4 1 GB 10.000 10 Mbit/s and 50 ms Scenario 5 10 GB 10 mio 1 Gbit/s and 5 ms 100 Scenario 6 10 GB 10 mio 10 Mbit/s and 50 ms Scenario 7 10 GB 100.000 1 Gbit/s and 5 ms 10 Scenario 8 10 GB 100.000 10 Mbit/s and 50 ms 1 Table 1: Summary of the scenario parameters chosen for 1 2 3 4 5 6 7 8 each scenario. Figure 6: Measured query latency of range selections over 100000 all scenarios on a logarithmic y scale. Baseline ShuffleIndex DBCopy PATCONFDB 10000 in case more records are queried (scen. 1,3,5,7 vs. 2,4,6,8). 1000 Note that the confidence intervals for the queries executed with PATCONFDB are significantly larger than the ones of 100 the other protocols. This is caused by different sized network overheads as explained in Section 5.2. For scenarios 6 and 10 8, PATCONFDB even exceeded our time limit of 7h for one query. The Shuffle Index is much less influenced by 1 the bandwidth of the network link, but its query latency 1 2 3 4 5 6 7 8 is already by a factor of 10 higher than the latency of the Baseline protocol. Figure 5: Measured query latency of equality selections over Our measurements show that DBCopy can outperform all scenarios on a logarithmic y scale. PATCONFDB for range selections if more than about 0,1% off all records are queried in range selections. The results in- dicate that the PATCONFDB performs better in scenarios best trade-off between security and performance. To better with databases that contain small amounts of records with a compare the query latency, the naive approach of enforc- large size, whereas the Shuffle Index performs better in sce- ing access pattern confidentiality by downloading the whole narios with databases that contain a large number of small encrypted database on every access is evaluated, hereafter records. For scenarios in which many records are queried referred to as DBCopy. To specify the overhead compared in range selections a network link with a high bandwidth to to an approach that does not provide any security mecha- the SP is needed to keep query latency low. nisms, an unencrypted database retrieval protocol was also evaluated, hereafter referred to as Baseline. 5.2 Network Overhead The query latency has been measured on a client computer In the following, we investigate the network overhead of with an Intel Core i7 CPU with 2,4GHz and 8GB RAM. As a PATCONFDB and shuffled B-trees, i.e., the amount of data database server a Microsoft SQL Server Express was used on that has to be transmitted between the client and the SP. a virtual machine in Hyper-V with 2 virtual CPUs and 16GB First, we provide analytical models to highlight the factors RAM. The test data has been created with a random number that influence the network overhead both for PATCONFDB generator and is evenly distributed. Before every test run, and shuffled B-trees. Then we present and interpret the each database is initialized so that the generated database network overheads we measured for the scenarios that we records fill 10% of it. In Table 1 the parametrization of our introduced in Section 5.1. 2k -factorial experimental design scenarios is shown. The network overhead NPs to retrieve a record from a The query latency of equality selections measured over PATCONFDB instance that is based on ORAM scheme s all scenarios is shown in Figure 5. It can be seen, that an can be calculated as follows: increase of the size of the database results in an increased XHI s query latency for all tested protocols (scen. 1-4 vs. 5-8). NPs = Nrs + N t + Ni The bandwidth of the network link is only a critical fac- i=1 tor for the DBCopy protocol, because an equality selection for HI = number of index layers, Nrs = network overhead only retrieves a small number of records, so the total net- induced by the retrieval of a record from the record store work traffic is low for all other protocols (scen. 1,3,5,7 vs. depending on the ORAM scheme s, N t = size of the root 2,4,6,8). The Baseline and the PATCONFDB protocol are node of the PATCONFDB B-tree, Nis = network overhead significantly influenced by the number of records (scen. 1,2 induced by the retrieval of a record from an index layer vs. 3,4). For the Baseline protocol this is the case, because depending on the ORAM scheme s. it does not have an index tree to efficiently search for at- For an implementation of PATCONFDB with Path O- tribute values. For the PATCONFDB protocol it takes a RAM, the retrieval of a record requires every Path ORAM long time to sequentially query index layers which contain a instance to be accessed two times (Retrieval and Upload of large number of identifiers. DCs). Since the DCs are stored in a binary tree and all DC The query latency of range selections measured over all on the path from the root to the leaf of the tree are retrieved, scenarios is shown in Figure 6. Now that the bandwidth of the network traffic for every ORAM instance equals the tree the network link influences the query latency significantly height h multiplied with the size of the DC d. Scenario SI(EQ) SI(PR) PC(EQ) PC(PR) database records’ size to the chosen maximum database size. 1+2 562,0 6,56 7080 6557 If the maximum database size is 100GB but only 10GB of 3+4 110,4 6,31 683,1 669,8 data is stored inside, the relative storage overhead to the 5+6 1460 6,57 5901 5761 actual data size is 10. The size of the index depends on the 7+8 111,3 6,26 1025 971,3 maximum number of records that can be stored in a PAT- CONFDB instance. Each different attribute value of any Table 2: Ratio of Shuffle Index (SI) and PATCONFDB (PC) record is stored inside of the lowest index layer to provide network overhead to the size of the queried records for equal- references to every record that is stored in the database. ity (EQ) and prefix (PR) selections. Since index layers are not allowed to grow in size, they have to be initialized to their maximum size as well. So, in the The network overhead NS to retrieve a record from a worst case of records that contain only the indexed attribute, Shuffle Index can be calculated as follows: the lowest index layer has the same size as the record store. NS = 2 · r + n · (h − 2) · (1 + a + 2 · c) + d · (1 + a + 2 · c) However, the number of DCs that have to be referenced de- creases fast for the upper index layers, so that the additional for h = height of Shuffle Tree, r = root node size, n = storage overhead for them is very low. In our evaluations the navigation node size, d = data node size, c = number of storage overhead for a PATCONFDB instance that is filled executed cover searches, a = cache size. The network traffic to its maximum capacity never exceeded a factor of 2,2 in is induced by three factors: I) Retrieval and upload of the comparison to the plain text database size with an aver- root node (2 · r), II) retrieval and upload of navigation nodes age factor of 2,08, which we argue is feasible for relational including cover nodes (n·(h−2)·(1+a+2·c)), III) read and databases. If the PATCONFDB instance is not filled to its write of the data nodes (d · (1 + a + 2 · c)). Note that actually maximum capacity, that factor has to be divided by the ratio needed nodes are retrieved and then uploaded as part of the of used space u to calculate the overall storage overhead. whole cache, inducing a total overhead of n · (1 + a) per In contrast to PATCONFDB, the Shuffle Index does B-tree layer. not need to be initialized to its maximum database size. The measured network overheads for the scenarios The storage overhead of the Shuffle Index is induced by the that we introduced in Section 5.1 are shown in Table 2 storage needed for navigation nodes and by the ratio of the for PATCONFDB and the Shuffle Index. The table shows average number to the maximum number of records stored the ratio of the induced network traffic of each approach to in data nodes. This used space of data nodes u influences the size of the queried records. For instance, if the queried the storage overhead similarly as seen with PATCONFDB. record has a size of 100B and a network traffic of 5,62KB is Since the database records are stored, so that they are sorted measured for the evaluation of the query, the relative over- alphabetically in the leaf nodes of the Shuffle Tree, an index head amounts to 562. The scenarios that only differ in net- layer that contains every attribute value as seen with the work bandwidth are paired in this table, since the network PATCONFDB approach is not necessary. This combined bandwidth does not affect the amount of network traffic. with the dynamic fan out of navigation nodes significantly For equality selections, the measurements indicate that reduces the storage overhead induced by navigation nodes. the total number of records stored in the database has a large In our evaluations the storage overhead for a Shuffle Index impact on the relative network overhead for both schemes instance that only contains data nodes which are filled to (scen. 1,2,5,6 vs. 3,4,7,8). Since the total network traffic for their maximum capacity, never exceeded a factor of 1,02 in equality selections in both schemes remains low (<1,5MB), comparison to the plain text database size. That factor has the large network overhead does not have a similarly large to be divided by the ratio u of used space of data nodes to impact on the query latency measured in Section 5.1. In calculate the overall storage overhead. the case of prefix selections, the performance optimization It can be seen, that the storage overhead induced by in- proposed in Section 4.3 for the Shuffle Index reduces the rel- dexing techniques is a low and constant factor for both ap- ative network overhead significantly. Since prefix selections proaches and therefore does not restrict the feasibility of ac- in the PATCONFDB are executed as a series of sequential cess pattern preserving relational databases. However, the equality selections, no significant reduction of the relative constraint of PATCONFDB that the database has to be ini- network overhead is measured. The relative overhead is only tialized to its maximum size, could be problematic in sce- reduced, if by chance multiple records that match the query narios where the size of the database fluctuates frequently. are retrieved within the same DC. Since the relative network overhead of PATCONFDB re- mains at a high level, it is not feasible to use PATCONFDB 6. FUNCTIONALITY EXTENSIONS in scenarios, in which a large number of records is queried at We address the challenges in complying to access pattern the same time or in a short time frame. The Shuffle Index, confidentiality in DaaS scenarios with multiple types of however, can also be used in scenarios which include large indexes, i.e. strings, number, and dates, for the use of the range or prefix selections. PATCONFDB approach. We discuss two situations in which PATCONFDB would leak information about the pattern of 5.3 Storage Overhead queries, if used in the same way as in a single index type In the following we investigate the storage overhead on the scenario. A basic equality selection results in one query to external SP. This storage overhead consists of the storage each index layer and one query to the record store. that is needed for the index as well as of the additionally Insert and delete operations have to access I (number required storage to store record, i.e., the padding of nodes of index types) DCs from the index layer to insert or delete or the initialization of a database to its maximum size. every attribute value of each record to the corresponding The storage overhead of the PATCONFDB approach data record. Therefore an attacker can differentiate between is predominantly influenced by the ratio u of the actual an equality selection and an insert or a delete operation. To prevent this information leak, I DCs from the index layer 8. REFERENCES have to be retrieved, before the record store is queried. [1] M. Bellare, A. Boldyreva, and A. O’Neill. Deterministic and efficiently searchable encryption. In Proc. of the Update operations have to access a DC of every index Annual Intl. Cryptology Conf. on Advances in Cryptology type, which is involved in the SQL operation, from the index (CRYPTO), pages 535–552, 2007. layer and then one DC of the record store. Since it is impos- [2] P. Brody and V. Pureswaran. Device democracy: Saving sible to know the current value before it is update, another the future of the internet of things. IBM Global Business DC has to be queried from the index layer, to delete the ref- Services Executive Report, 2014. erence that is no longer needed. Therefore an attacker can [3] A. Ceselli, E. Damiani, S. D. C. D. Vimercati, S. Jajodia, differentiate between an equality selection and an update S. Paraboschi, and P. Samarati. Modeling and assessing inference exposure in encrypted databases. ACM operation. To prevent this information leak, a randomly Transactions on Information and System Security, chosen set of DCs from the index structure would have to 8(1):119–152, 2005. be retrieved after the actual query is executed. [4] I. Damgård, S. Meldgaard, and J. B. Nielsen. Perfectly Further query types. In this paper, we investigated secure oblivious ram without random oracles. In Proc. of how equality, range, and prefix selections can be evaluated the Conf. on Theory of Cryptography (TCC), pages on access pattern confidentiality-preserving databases. We 144–163, 2011. introduced equality, range and prefix selections based on [5] J. Dautrich, E. Stefanov, and E. Shi. Burst ORAM: strings. However, our concepts can be seamlessly applied Minimizing ORAM response times for bursty access patterns. In Proc. of the USENIX Security Symposium, for range and prefix selections on other data types like inte- pages 749–764, 2014. gers. We argue that our investigated query types are already [6] S. De Capitani di Vimercati, S. Foresti, S. Paraboschi, sufficient for many scenarios in which relational databases G. Pelosi, and P. Samarati. Efficient and private access to are used. PATCONFDB and modified shuffled B-trees, as outsourced data. In Proc. of the IEEE Intl. Conf. on introduced in Section 4, can be extended to support table Distributed Computing Systems (ICDCS), pages 710–719, joins and nested queries. Again the queries are divided into 2011. sequential equality selections so that they are indistinguish- [7] S. De Capitani di Vimercati, S. Foresti, S. Paraboschi, G. Pelosi, and P. Samarati. Distributed shuffling for able from any other query. For this to work, all indexes preserving access confidentiality. In Proc. of the European of all tables have to be stored in the same B-tree. If more Symp. on Research in Computer Security (ESORICS), than one B-tree is used for indexing, every B-tree has to be pages 628–645, 2013. accessed every time a query is executed for the queries to [8] O. Goldreich and R. Ostrovsky. Software protection and be indistinguishable. In future work we plan to implement simulation on oblivious rams. Journal of the ACM, and evaluate these query types to give recommendations on 43(3):431–473, 1996. which setup performs best for which database scenario. [9] M. T. Goodrich and M. Mitzenmacher. Privacy-preserving access of outsourced data via oblivious RAM simulation. In Proc. of the Intl. Conf. on Automata, Languages and 7. CONCLUSIONS & FUTURE WORK Programming (ICALP), pages 576–587, 2011. To investigate the feasibility of access and pattern confi- [10] M. S. Islam, M. Kuzu, and M. Kantarcioglu. Access pattern dentiality-preserving relational databases with a B-tree in- disclosure on searchable encryption: Ramification, attack and mitigation. In Proc. of the Network and Distributed dex, we proposed PATCONFDB, an ORAM-based concept Systems Security (NDSS) Symposium, 2012. to achieve access pattern confidentiality, and extended exist- [11] S. Kamara, C. Papamanthou, and T. Roeder. Dynamic ing shuffled B-tree approaches to support essential database searchable symmetric encryption. In Proc. of the ACM operations. In particular, empirical measurements of these Conf. on Computer and Communications Security (CCS), concepts showed that enforcing access pattern confidential- pages 965–976, 2012. ity in relational databases only induces an overhead of factor [12] J. Köhler, K. Jünemann, and H. Hartenstein. Confidential 5.9 for evaluating equality conditions on a database with up database-as-a-service approaches: taxonomy and survey. Journal of Cloud Computing, 4(1), 2015. to 10 million records. The extended Shuffle Index induces an [13] R. Ostrovsky. Efficient computation on oblivious RAMs. In even smaller overhead of factor 2.3 for the same setup, but Proc. of the ACM Symposium on Theory of Computing provides no strict and well-defined access pattern confiden- (STOC), pages 514–523, 1990. tiality guarantees. Our performance evaluation showed to [14] L. Ren, X. Yu, C. W. Fletcher, M. van Dijk, and which extend the induced overhead depends on the network S. Devadas. Design space exploration and optimization of link to the SP, the structure of the data and the query work- path oblivious RAM in secure processors. In Proc. of the load of the outsourcing scenario. In particular, our results Intl. Symp. on Computer Architecture (ISCA), pages 571–582, 2013. showed that shuffled B-trees in general outperform ORAM- [15] E. Stefanov and E. Shi. ObliviStore: High performance based B-trees and, thus, can be used in a wider variety of oblivious distributed cloud data store. In Proc. of the DaaS scenarios, if no strict security guarantees are required. Network and Distributed Systems Security (NDSS) The findings of this paper highlight multiple future re- Symposium, 2013. search directions. It is worthwhile to aim for further effi- [16] E. Stefanov, M. van Dijk, E. Shi, C. Fletcher, L. Ren, ciency improvements of ORAM-based database indexes, as X. Yu, and S. Devadas. Path ORAM: An extremely simple well as to evaluate the performance of other index struc- oblivious RAM protocol. In Proc. of the ACM Conf. on tures like bitmaps. The superior efficiency of shuffled B-tree Computer and Communications Security (CCS), pages 299–310, 2013. approaches makes it also worthwhile to aim for a better un- [17] K. Yang, J. Zhang, W. Zhang, and D. Qiao. A light-weight derstanding of their security guarantees. Furthermore, we solution to preservation of access pattern privacy in plan to investigate how both ORAM-based and shuffle-based un-trusted clouds. In Proce. of the European Conf. on schemes can be extended to support relational databases Research in Computer Security (ESORICS), pages with multiple index types and queries that include more 528–547, 2011. complex operations.