=Paper= {{Paper |id=Vol-1558/paper36 |storemode=property |title=Access Pattern Confidentiality-Preserving Relational Databases: Deployment Concept and Efficiency Evaluation |pdfUrl=https://ceur-ws.org/Vol-1558/paper36.pdf |volume=Vol-1558 |authors=Alexander Degitz,Jens Köhler,Hannes Hartenstein |dblpUrl=https://dblp.org/rec/conf/edbt/DegitzKH16 }} ==Access Pattern Confidentiality-Preserving Relational Databases: Deployment Concept and Efficiency Evaluation== https://ceur-ws.org/Vol-1558/paper36.pdf

Access Pattern Confidentiality-Preserving Relational
Databases: Deployment Concept and Efficiency Evaluation

Alexander Degitz, Jens Köhler, Hannes Hartenstein
Karlsruhe Institute of Technology (KIT) - Steinbuch Centre for Computing & Institute of Telematics
Karlsruhe, Germany
{alexander.degitz, jens.koehler, hartenstein}@kit.edu

ABSTRACT is a special case of confidential data outsourcing as the out-
In this paper, we address the question whether access pat- sourced data does not only have to be protected but also
tern confidentiality-preserving databases with an underlay- has to be efficiently searchable. To achieve that, a variety of
ing B-tree index structure are feasible in practice by propos- confidentiality preserving indexing approaches (CPIs) were
ing integrative deployment concepts that support important proposed that allow to efficiently evaluate database queries
database query functionalities based on ORAM and shuffled on outsourced encrypted data [12]. Most existing CPIs only
B-trees. Furthermore, we provide a rigorous efficiency eval- protect the confidentiality of data at rest, i.e., data that
uation to determine the cost of the proposed concepts with is persisted by the SP. However, in reality, the SP is also
regard to storage, network, and query latency as well as able to monitor database queries and the encrypted query
to investigate the influence of scenario factors like database results, as well as database record inserts, updates, and dele-
size, number of records, and network bandwidth. In par- tions. Such observations can be used to reveal confidential
ticular, we show that ORAM-based concepts only cause an data. For instance, if the SP observes that two encrypted
overhead of factor 5.9 for evaluating equality conditions on records 1 and 2 are the result of a query for all records which
a database with up to 10 million records. contain an attribute value X, most CPIs guarantee that the
plaintext attribute value X is hidden from the SP. However,
the SP can deduce that the two returned records contain
Categories and Subject Descriptors the same attribute value. The SP might then be able to ap-
I.8 [Database and storage security]: Management and ply background knowledge like “record 1 contains attribute
querying of encrypted data; H.2.4 [Information systems]: value X” to learn that “record 2 contains attribute value X”.
Information storage systems; Cloud based storage In particular, to protect against SPs that can monitor
queries, a CPI has to guarantee that the SP is not able to de-
termine which encrypted records matched an executed query
1. INTRODUCTION and the SP is not able to correlate executed queries (access
Many potential cloud customers are reluctant to make use pattern confidentiality). Both properties can be trivially
of Database-as-a-Service (DaaS) offerings and to outsource achieved by retrieving the entire database for each query
confidential data to cloud storage providers (SPs) due to the to make the queries indistinguishable. This induces signif-
possibility that the SPs do not behave honestly or do not ap- icant efficiency costs in terms of latency, transmission, and
ply sufficient protective measures to protect the data against computation. Oblivious RAM (ORAM) approaches [5, 9,
third party attackers. Enforcing the confidentiality of data 13, 4, 15] allow to store and retrieve records based on fixed
before it is outsourced to the SP mitigates the risk of a data identifiers and enforce pattern and access confidentiality by
confidentiality breach. With the advent of the internet of selectively retrieving, re-encrypting, and re-submitting spe-
things, it is expected that users are not even aware of who cific parts of the outsourced database in an interactive way.
will store their data anymore [2]. Thus, techniques that en- However, ORAM schemes do not support the search capa-
force data confidentiality in the trusted domain of the user bilities that are required by relational databases and, as of
before the data is outsourced will gain even more impor- now, it is unclear whether their efficiency is acceptable for
tance. For the sake of readability and without loss of gener- relational database scenarios1 .
ality, we abstract from the specific nature of the attacker and In this paper, we investigate the question whether preserv-
assume that the SP aims to breach data confidentiality in ing access PATtern CONFidentiality in relational Data-
the following. Outsourcing confidential relational databases Bases is practically feasible by proposing PATCONFDB,
a deployment concept for ORAM schemes in the database
context that supports common database searches and pro-
vides pattern and access confidentiality. We show that PAT-
CONFDB provides confidentiality guarantees against hon-
est-but-curious attackers who can view the persisted data
and monitor the executed database queries, including record
(c) 2016, Copyright is with the authors. Published in the Workshop Pro- insert, update, as well as delete operations. We show that by
ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor-
deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this 1
paper is permitted under the terms of the Creative Commons license CC- http://outsourcedbits.org/2013/12/20/how-to-search-on-
by-nc-nd 4.0 encrypted-data-part-4-oblivious-rams/
making well-considered use of ORAM protocols, it is possi- ORAM to encrypt relational databases is feasible with re-
ble to achieve significant efficiency benefits compared to re- gard to efficiency. In this paper, we do not propose a new
trieving the entire database for each query. Furthermore, we ORAM scheme, but PATCONFDB, a concept on how to use
compare the efficiency of PATCONFDB to previously pro- existing ORAM schemes to enable the execution of search
posed shuffled B-tree approaches [7, 17, 3] which also aim queries on relational databases.
to provide access pattern confidentiality, but are not able to Encrypted B-trees [3] were proposed to enforce the con-
provide strict security guarantees. In order to make a shuf- fidentiality of data at rest. To increase the computational
fled B-tree approach comparable, we extend it to support cost of inference attacks based on access patterns, shuffling
various query types that are prevalent in database query the B-tree after each database query was proposed [6, 7,
workloads. The main contributions of this paper are: 17]. Note that, despite making access pattern based infer-
ence attacks harder, the Shuffle Index does not provide strict
• PATCONFDB, a concept to efficiently apply ORAM security guarantees like ORAM does. Furthermore, to the
protocols in DaaS settings that supports searching best of our knowledge, no approach based on shuffled B-trees
a database for records which match specified equality, provides the ability to insert, update or delete data. In this
range, and prefix query conditions. paper, we extend the Shuffle Index approach [6] to support
• A concept to enhance shuffled B-tree approaches them. Our extensions can be applied to all schemes that
to be used in DaaS settings, so that update operations as make use of a shuffled B-tree.
well as range and prefix queries can be executed.
• An efficiency evaluation that shows that significant 3. DEPLOYMENT REQUIREMENTS
performance increases are possible by considerately ap-
plying ORAM protocols. Furthermore, we compare the 3.1 Database Functionality Requirements
PATCONFDB approach to the less secure shuffled B- Databases typically support inserts, updates, and dele-
tree approach and provide guidelines for choosing the tions of database records. Furthermore, typically it is re-
most suitable indexing approach for a given out- quired to search databases for specific records. The struc-
sourcing scenario. tured query language (SQL) specifies a variety of different
query types. In the following, we consider the following
In Section 2 related work is discussed. In Section 3, the types of queries:
specific requirements of relational databases for deployment
concepts of access pattern confidentiality-preserving schemes 1. Equality selection: Query for records that contain
are summarized. In Section 4, we introduce PATCONFDB, a specific attribute value A.
an ORAM-based access pattern confidentiality-preserving Example: SELECT * WHERE Name = ’Andy’;
indexing approach, and show how shuffled B-tree indexes can 2. Prefix selection: Query for records containing an
be extended to also satisfy the requirements. We provide an attribute value that starts with prefix A.
efficiency evaluation of the proposed schemes in Section 5. Example: SELECT * WHERE Name LIKE ’An%’;
Furthermore, we discuss possible functionality extensions for 3. Range selection: Query for records that contain an
our deployment concepts Section 6 and conclude the paper attribute value that is smaller than value A and bigger
in Section 7. than value B.
Example: SELECT * WHERE Name Between ’An’
2. RELATED WORK AND ’Ce’;
Many protocols have been proposed to efficiently perform We discuss the challenges of providing access and pattern
keyword searches over encrypted data, i.e. [11, 1, 12]. While confidentiality for additional query types in Section 6.
these approaches ensure content confidentiality, they leak
the access patterns of queries. Based on an exemplary data- 3.2 Confidentiality Requirements
base, Islam et al. [10] showed that, with a small amount An approach to securely outsource confidential databases
of background knowledge about the data, up to 80% of the has to enforce both content and access pattern confidentiality
encrypted data can be revealed by an attacker who watches to protect against attackers that are able to monitor queries
the query and the corresponding results through a frequency on the outsourced data.
analysis. This brought forward the need for CPIs that also Content confidentiality: A database outsourcing ap-
ensure access pattern confidentiality. proach provides content confidentiality if an attacker that
Oblivious RAM (ORAM) was first proposed by Goldreich is able to view the outsourced data is not able to learn the
and Ostrovsky as a way to ensure software protection [8]. content of the database’s records.
ORAM prevents that an attacker who observes the RAM Access pattern confidentiality: We adopted our def-
learns any information about the RAM access patterns of ex- inition of access pattern confidentiality from [16]. An ap-
ecuted programs. Improved schemes were proposed over the proach provides access pattern confidentiality if an attacker
last few years which have put a focus on ORAM to be a con- that monitors database queries and query results as well as
siderable alternative when it comes to secure data outsourc- database insert, update, and deletions is not able to tell 1)
ing [5, 9, 13, 4, 15], i.e., storing and retrieving data records which parts of the database were accessed by a database
based on fixed identifiers. Outsourced relational databases, operation, 2) when a part of a database was last updated,
on the other hand, are more complex and require efficient 3) whether specific data was repeatedly accessed, and 4)
search for database records that contain a specific attribute whether the database was queried or updated. This partic-
value or contain attribute values that fall in a specific range. ularly means that even a series of accesses to the same record
ORAM schemes do not support such search queries and, to is unrecognizable for attackers and therefore does not leak
the best of our knowledge, it is unclear whether applying any information.
4. DEPLOYMENT CONCEPTS J
4.1 PATCONFDB Index Layer 1
ORAM Instance
In this section, we introduce the PATCONFDB concept D I O T
that uses existing ORAM schemes as building blocks to sup-
port searches on outsourced data while enforcing access pat- Index Layer 2
tern confidentiality. The interface of all existing ORAM ORAM Instance
schemes is ORAM.get(ID) and ORAM.put(ID, block), i.e., A C F H J K Q S V X
data blocks can be stored and retrieved based on a fixed
ID. All existing ORAM schemes share the following char-
acteristic. To retrieve a stored data block, first a set of r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15 r16
encrypted data block containers (DC) has to be downloaded Record Store
ORAM Instance
and decrypted. The decrypted data blocks then have to be
re-encrypted and the resulting encrypted DCs have to be Figure 1: Hierarchy of the ORAM instances in PAT-
re-uploaded to the SP. PATCONFDB is agnostic to the un- CONFDB.
derlying ORAM scheme and initializes multiple instances of
the used ORAM scheme. These ORAM instances are inde- nism as described for equality selections. That is, for each
pendent from each other. Each ORAM instance hides the attribute value that lies in the queried range, an equality
content and the access pattern of the database from the SP. selection query is evaluated and the results of these queries
In most database scenarios, records have to be searchable are aggregated to the result of the range/prefix selection. A
based on record attributes. This creates the need for indexes prefix selection can be considered as a special kind of range
to be outsourced to an external SP. We use a B-tree index selection, that spans over every attribute value of records
structure for PATCONFDB so that the efficiency evalua- inside the queried prefix.
tion is better comparable to the Shuffle Index, which also Insert / delete / update. To insert a new record, an
uses a B-tree. To search for records that contain a specific equality selection for the attribute value of the new record
attribute value, the highest level node of the B-tree is re- is performed, the set of identifiers of records that contain
trieved. The retrieved node can be used to determine which the attribute value is retrieved and the identifier of the in-
node of the next lower B-tree layer is closest to the queried serted record is added before re-encrypting and re-uploading
value. This process is repeated until the leaf of the B-tree is the data to the corresponding index layers. Furthermore,
reached, which contains the record identifiers of the records the records that matched the equality selection are retrieved
that contain a matching attribute value. from the record store ORAM instance and the new record
Each node of a B-tree can be encrypted to protect the con- is inserted in the record store when re-encrypting and re-
fidentiality of the data at rest. However, the attacker would uploading the retrieved records. Deleting a record works
still learn access patterns and distinguish different types of analogously, by removing the record before re-encrypting
queries by monitoring which nodes were retrieved to eval- and re-uploading the retrieved records and the record iden-
uate a given query. To hide which nodes were retrieved, tifiers. Updating a record is considered a concatenation of a
PATCONFDB stores every layer of the B-tree in a separate delete and an insert operation.
ORAM instance as shown in Figure 1. For the remainder of As ORAM schemes allow to store and retrieve data blocks
this paper we call the ORAM instances that contain the in- based on a fixed ID but shuffle the stored encrypted data
dex layers and the ORAM instance that contains the actual blocks within the DCs with every access, each ORAM in-
data records record store. In the following, we describe how stance has to map the fixed block IDs to the current location
PATCONFDB supports the database functionality that is within the ORAM instance to determine which encrypted
described in Section 3. data blocks have to be returned. This is illustrated in Fig-
Equality selections. To retrieve all records that contain ure 2a. For instance, to retrieve node 1 (containing “D” and
a specific attribute value, first the root node of the B-tree has “I”) of index layer 1, the encrypted data block 2 has to be
to be retrieved and decrypted. Based on the decrypted node retrieved from the ORAM instance. Thus, data block ID 1
it can be determined which node X has to be retrieved from has to be mapped on the encrypted data block ID 2 first.
the next lower layer in the B-tree. To retrieve X, the ORAM To perform this mapping without harming access pattern
instance that corresponds to X’s layer has to be queried. If confidentiality, more data has to be transmitted and further
the B-tree is stored in n ORAM instances, n ORAM in- round trips are necessary between client and SP (see [14]).
stances have to be accessed to retrieve the B-tree’s leaf node The overhead for mapping fixed data block IDs on ORAM
that contains the identifier of the records that match the instance locations can be avoided in the PATCONFDB case.
database query. Based on the record identifiers, the match- As shown in Figure 2b, the DC ID can be directly stored
ing records can be retrieved from the record store ORAM in the corresponding B-tree node of the next upper layer.
instance. If more than one record matches the query, the Thus, when executing a query, the IDs of DCs that have
equality selection has to be evaluated again for each match- to be retrieved from the next layer are already known to
ing record before retrieving it from the record store, even if the client without any further mapping. For instance, after
all identifiers of matching records are already known. This retrieving and decrypting the root node (containing “J”),
mechanism is needed to keep queries indistinguishable, as the client already knows that the left B-tree node of the
explained in Section 4.2. next lower layer is contained in the encrypted data block 2.
Prefix / range selections. To keep the type of queries As ORAM requires to shuffle data blocks and to store
indistinguishable, a prefix selection is executed as a sequen- them in different DCs after each retrieval, data blocks of
tial series of equality selections through the same mecha- the lower layers have to be re-encrypted and re-uploaded
Data block ID
J riodically so that bursts in queries could not be detected.
Data block container ID 1 2
Of course, this also generates a large overhead in network
Index Layer 1
ORAM Instance
1 2 traffic and query latency.
2 1
12 D I
Position

O T Insert / delete / update. Since insert, update, and
Map

21 3 4 5 1 2
delete operations are achieved by equality selection queries,
Index Layer 2
ORAM Instance 1 2 3 4 5 attackers are not able to distinguish them from equality se-
2 5 4 1 3 lection queries and, thus, prefix/range selection queries.
Position

12345
Map

41532 F H V X Q S A C J K Since every query or write operation on PATCONFDB
leads to the same pattern of successive indistinguishable op-
(a) ORAM index layers
erations on ORAM instances, the SP is not able to monitor
J access patterns. Thus, PATCONFDB enforces access pat-
2 1 tern confidentiality.
Index Layer 1
ORAM Instance 1 2
O T D I 4.3 Shuffled B-tree Index Extensions
5 3 2 4 1
To enhance access pattern confidentiality for the retrieval
Index Layer 2
ORAM Instance 1 2 3 4 5 of a single record, three methods are used by existing shuf-
F H V X Q S A C J K fled B-tree approaches [6, 7, 17]: 1) Cover searches, as seen
in Figure 3, are randomly chosen nodes that are retrieved
(b) Efficiency improved ORAM index layers in parallel to the nodes that are actually relevant for the
executed query in each layer. For example, if two cover
Figure 2: Efficiency optimized index layers. searches are executed, attackers see three retrieved nodes
for every layer of the Shuffle Tree. Attackers cannot distin-
prior to data blocks of the upper layers. When a query is guish between cover searches and nodes that are relevant for
executed, first data blocks have to be retrieved from index the executed query. 2) Node caching at the trusted client is
layer 1 to get the B-tree node that contains the ID of the used to increase access pattern obfuscation. After a node is
DC that has to be retrieved from index layer 2. But the being accessed, it will replace the least recently used node
data blocks in index layer 1 cannot be written back right in the client-side cache. All nodes on the path from the root
away, as – due to shuffling – the data blocks of index layer node to the leaf node are stored inside the cache. If a node
2 are likely to be stored in different DCs than before. Only that is in the cache is retrieved, an additional cover search
after new DCs have been assigned to the data blocks of the is executed so an attacker cannot infer that the node was
record store ORAM instance, the data blocks in the upper inside the cache. 3) Node shuffling is used to increase the
index layers can be updated and written back recursively. difficulty for attackers to learn the parent-child relationship
of the nodes. After each query execution, the content of all
4.2 Security of PATCONFDB nodes stored on the client is shuffled using a random permu-
The PATCONFDB concept consists of multiple ORAM tation before they are written back to the encrypted B-tree.
instances. We assume that the utilized ORAM scheme to Node shuffling is outlined in Figure 3.
create each ORAM instance satisfies the ORAM security In the remainder of this paper we denote the leaf nodes of
notion, i.e., any read or write operations on the data within the Shuffle Tree as data nodes and the non-leaf nodes as nav-
the instance are indistinguishable. We now examine the se- igation nodes. In the following, we show how the concept of
curity implications of executing queries across multiple index shuffled B-trees can be extended to not only support equal-
layers, i.e., multiple ORAM instances. ity selections, but also range and prefix selections as well as
Selection queries. As shown in Section 4.1, sequential database modifications, i.e., insert, update, and deletion of
equality selections are used to evaluate range and prefix se- database records.
lections. Thus, it suffices to show that equality selections Insert / delete. Current Shuffle Index concepts [6, 17]
are indistinguishable from the perspective of the SP. Every assume an upfront knowledge of all database records. Know-
ORAM instance has to be queried exactly once to evaluate ing the exact distribution of records is not possible in most
an equality selection, so honest-but-curious attackers see a real world scenarios. Thus, concept extensions that allow
constant number of indistinguishable ORAM accesses. From to insert, update, and delete records are necessary. We pro-
this, they learn the number of index layers, but they neither pose a modified shuffled B-tree scheme with a dynamically
learn the parent-child-relationship of nodes in the B-tree nor expanding and shrinking B-tree that is initialized with only
the total size of data stored in the database. If an equal- a small amount of navigation nodes.
ity selection matches multiple records, the query is executed The concept to insert records is outlined in Figure 4.
over all index layers for each matching record. Otherwise, Whenever a record is inserted, an equality selection query
it would be possible for an attacker to observe the number for the attribute value of the new record is performed to find
of matching records by counting the sequential accesses to the node X to store the record in. If this node X is full, a new
the record store ORAM instance. Attackers could still mon- node N is created to store the record in. The reference to N
itor bursts in queries, but this could as likely be caused by is stored in the parent node of node X. If the parent node is
an equality selection that has matched many records as by full as well, another new node is inserted on the next higher
a prefix or range selection. So the type of selection query layer of the B-tree. This works recursively to the root of the
remains hidden from attackers. The frequency of queries tree, which does not have any size limitations, because it
can still be observed by the attacker. Access frequency is does not need to be of a certain size to be indistinguishable
an information leak that can be found in all current ORAM from any other node. Up to half of the records in the full
schemes. One solution would be to query the database pe- node X are copied to the new node N while inserting the
Storage position in shuffled B-tree 1 nodes is shuffled, padded, and encrypted with a new nonce
Database query J
Cover search
3 2 in every access. Yang et al. [17] published a proof that, by
2 3 using the shuffle mechanism, the probability that a node is
O T D I the child of a parent node X is equal to the probability of
8 6 5 7 4 it being the child of any other parent node Y after a large
enough number of accesses were performed after the last
4 5 6 7 8 retrieval of X. Since this number of accesses is not further
F H V X Q S A C J K specified, it remains unclear when shuffled B-trees enforce
access pattern confidentiality sufficiently.
(a) Query execution with one cover search To use the Shuffle Index in relational databases, methods
to insert, update and delete data had to be implemented.
Storage position in shuffled B-tree 1 As described in Section 4.3, this leads to the expansion and
Database query J
Cover search
3 2 shrinkage of the Shuffle Tree over time. Therefore, informa-
2 3 tion on the size of the database is leaked and the insert, up-
O T D I date, as well as delete operations are distinguishable. Thus,
8 7 5 6 4 the fourth access pattern confidentiality requirement defined
in Section 3.2 is not satisfied. However, this does not lead
4 5 6 7 8 to the breach of the other three confidentiality requirements.
F H V X A C Q S J K Besides the total size of the database, attackers can also infer
how many records can be stored in one node by monitoring
(b) Shuffled B-tree after node shuffling a series of inserts and track how often a new node is being
created. Users have to decide if these information leaks are
Figure 3: Shuffled B-tree: Query execution with cover acceptable for their specific scenarios.
searches. Prefix / range selections. The performance optimiza-
tion for prefix/range selection queries as proposed in Sec-
new record. After the insertion, the nodes are re-encrypted, tion 4.3 introduces two minor information leaks. Attack-
shuffled and re-uploaded in the same way they would have ers can distinguish range and prefix selection queries from
been in case of a regular equality selection query. The dele- other database queries and they can estimate the amount of
tion of a record works similar. When the last record in a records that matched a prefix/range query to a certain de-
data node is deleted, the data node itself and its reference gree. Attackers can observe the total number of data nodes
in the parent node is deleted. that were retrieved, but they do not know how many of those
Update. To ensure the alphabetical sorting of all records, nodes are cover nodes. We argue that both information leaks
an update that changes the attribute value of the record are acceptable since the type of query is distinguishable any-
works as a deletion of the old record followed by an insertion way due to the dynamically expanding shuffled B-tree. The
of a new one. amount of retrieved records would have also been observable
Prefix / range selections. Since insert, update and to a certain degree without this optimization, because bursts
delete operations are already distinguishable from selections, in queries correlate to the amount of retrieved records.
it is not necessary to make prefix and range selections in-
distinguishable from equality selections. Therefore, we can
use the following method to enhance the performance of
5. EFFICIENCY EVALUATION
prefix and range selections. Instead of retrieving only one
data node, every data node inside one navigation node that
5.1 Query Latency Overhead
matches the query can be downloaded. Even though x cover We evaluated the query latency overhead induced by the
nodes have to be retrieved to obfuscate each data node that use of PATCONFDB and the modified shuffled B-tree by
was actually retrieved, this method significantly reduces the measuring the query latency of equality, prefix, and range
network traffic and the number of sequential data node re- selections. For this evaluation, we used the state-of-the-
trievals, as discussed in Section 5.2. art Path ORAM [16] scheme as ORAM scheme for PAT-
CONFDB. Path ORAM organizes the ORAM DCs in a bi-
4.4 Security of Shuffled B-trees nary tree. Each leaf of this binary tree is assigned a unique
position and each data block is assigned to a position. The
The security guarantees of the Shuffle Index are based
invariant of Path ORAM states that every data block with
on the assumption that attackers are not able to determine
an assigned position p is stored on the path from the root DC
the parent-child relationship of nodes in different layers of
to the leaf DC with position p. So for every read or write
the Shuffle Tree. To achieve this, the content of all queried
operation on a Path ORAM instance with n DCs, log(n)
DCs are retrieved from the SP. After the operation the DCs
are re-encrypted with a nonce and re-uploaded to the SP.
Al Ar Al An Ar We instantiated PATCONFDB with two index layers and
Alex Armin Alex Andy Armin one record store. During our measurements, we optimized
Alois Arthur Alois Anna Arthur the capacity of DCs in each layer, resulting in container
Andy - Amy - - capacities between 10 and 100 data blocks. Our shuffled
Anna - - - - B-tree extensions are implemented with a Shuffle Index [6]
as described in Section 4.3 with two cover searches, a cache
Figure 4: An excerpt of a Shuffle Tree before and after the size of five and a Shuffle Tree height of three. In our mea-
insertion of the record ’Amy’. surements, we found these parameter choices to have the
DB size Records Bandwidth and latency 100000
Baseline ShuffleIndex > time limit > time limit
Scenario 1 1 GB 1 mio 1 Gbit/s and 5 ms DBCopy PATCONFDB

Scenario 2 1 GB 1 mio 10 Mbit/s and 50 ms 10000

Scenario 3 1 GB 10.000 1 Gbit/s and 5 ms
1000
Scenario 4 1 GB 10.000 10 Mbit/s and 50 ms
Scenario 5 10 GB 10 mio 1 Gbit/s and 5 ms 100
Scenario 6 10 GB 10 mio 10 Mbit/s and 50 ms
Scenario 7 10 GB 100.000 1 Gbit/s and 5 ms 10
Scenario 8 10 GB 100.000 10 Mbit/s and 50 ms
1
Table 1: Summary of the scenario parameters chosen for
1 2 3 4 5 6 7 8
each scenario.
Figure 6: Measured query latency of range selections over
100000 all scenarios on a logarithmic y scale.
Baseline ShuffleIndex
DBCopy PATCONFDB

10000
in case more records are queried (scen. 1,3,5,7 vs. 2,4,6,8).
1000
Note that the confidence intervals for the queries executed
with PATCONFDB are significantly larger than the ones of
100 the other protocols. This is caused by different sized network
overheads as explained in Section 5.2. For scenarios 6 and
10 8, PATCONFDB even exceeded our time limit of 7h for
one query. The Shuffle Index is much less influenced by
1 the bandwidth of the network link, but its query latency
1 2 3 4 5 6 7 8
is already by a factor of 10 higher than the latency of the
Baseline protocol.
Figure 5: Measured query latency of equality selections over Our measurements show that DBCopy can outperform
all scenarios on a logarithmic y scale. PATCONFDB for range selections if more than about 0,1%
off all records are queried in range selections. The results in-
dicate that the PATCONFDB performs better in scenarios
best trade-off between security and performance. To better
with databases that contain small amounts of records with a
compare the query latency, the naive approach of enforc-
large size, whereas the Shuffle Index performs better in sce-
ing access pattern confidentiality by downloading the whole
narios with databases that contain a large number of small
encrypted database on every access is evaluated, hereafter
records. For scenarios in which many records are queried
referred to as DBCopy. To specify the overhead compared
in range selections a network link with a high bandwidth to
to an approach that does not provide any security mecha-
the SP is needed to keep query latency low.
nisms, an unencrypted database retrieval protocol was also
evaluated, hereafter referred to as Baseline. 5.2 Network Overhead
The query latency has been measured on a client computer
In the following, we investigate the network overhead of
with an Intel Core i7 CPU with 2,4GHz and 8GB RAM. As a
PATCONFDB and shuffled B-trees, i.e., the amount of data
database server a Microsoft SQL Server Express was used on
that has to be transmitted between the client and the SP.
a virtual machine in Hyper-V with 2 virtual CPUs and 16GB
First, we provide analytical models to highlight the factors
RAM. The test data has been created with a random number
that influence the network overhead both for PATCONFDB
generator and is evenly distributed. Before every test run,
and shuffled B-trees. Then we present and interpret the
each database is initialized so that the generated database
network overheads we measured for the scenarios that we
records fill 10% of it. In Table 1 the parametrization of our
introduced in Section 5.1.
2k -factorial experimental design scenarios is shown.
The network overhead NPs to retrieve a record from a
The query latency of equality selections measured over
PATCONFDB instance that is based on ORAM scheme s
all scenarios is shown in Figure 5. It can be seen, that an
can be calculated as follows:
increase of the size of the database results in an increased XHI s
query latency for all tested protocols (scen. 1-4 vs. 5-8). NPs = Nrs + N t + Ni
The bandwidth of the network link is only a critical fac- i=1

tor for the DBCopy protocol, because an equality selection for HI = number of index layers, Nrs = network overhead
only retrieves a small number of records, so the total net- induced by the retrieval of a record from the record store
work traffic is low for all other protocols (scen. 1,3,5,7 vs. depending on the ORAM scheme s, N t = size of the root
2,4,6,8). The Baseline and the PATCONFDB protocol are node of the PATCONFDB B-tree, Nis = network overhead
significantly influenced by the number of records (scen. 1,2 induced by the retrieval of a record from an index layer
vs. 3,4). For the Baseline protocol this is the case, because depending on the ORAM scheme s.
it does not have an index tree to efficiently search for at- For an implementation of PATCONFDB with Path O-
tribute values. For the PATCONFDB protocol it takes a RAM, the retrieval of a record requires every Path ORAM
long time to sequentially query index layers which contain a instance to be accessed two times (Retrieval and Upload of
large number of identifiers. DCs). Since the DCs are stored in a binary tree and all DC
The query latency of range selections measured over all on the path from the root to the leaf of the tree are retrieved,
scenarios is shown in Figure 6. Now that the bandwidth of the network traffic for every ORAM instance equals the tree
the network link influences the query latency significantly height h multiplied with the size of the DC d.
Scenario SI(EQ) SI(PR) PC(EQ) PC(PR) database records’ size to the chosen maximum database size.
1+2 562,0 6,56 7080 6557 If the maximum database size is 100GB but only 10GB of
3+4 110,4 6,31 683,1 669,8 data is stored inside, the relative storage overhead to the
5+6 1460 6,57 5901 5761 actual data size is 10. The size of the index depends on the
7+8 111,3 6,26 1025 971,3 maximum number of records that can be stored in a PAT-
CONFDB instance. Each different attribute value of any
Table 2: Ratio of Shuffle Index (SI) and PATCONFDB (PC) record is stored inside of the lowest index layer to provide
network overhead to the size of the queried records for equal- references to every record that is stored in the database.
ity (EQ) and prefix (PR) selections. Since index layers are not allowed to grow in size, they have
to be initialized to their maximum size as well. So, in the
The network overhead NS to retrieve a record from a
worst case of records that contain only the indexed attribute,
Shuffle Index can be calculated as follows:
the lowest index layer has the same size as the record store.
NS = 2 · r + n · (h − 2) · (1 + a + 2 · c) + d · (1 + a + 2 · c) However, the number of DCs that have to be referenced de-
creases fast for the upper index layers, so that the additional
for h = height of Shuffle Tree, r = root node size, n =
storage overhead for them is very low. In our evaluations the
navigation node size, d = data node size, c = number of
storage overhead for a PATCONFDB instance that is filled
executed cover searches, a = cache size. The network traffic
to its maximum capacity never exceeded a factor of 2,2 in
is induced by three factors: I) Retrieval and upload of the
comparison to the plain text database size with an aver-
root node (2 · r), II) retrieval and upload of navigation nodes
age factor of 2,08, which we argue is feasible for relational
including cover nodes (n·(h−2)·(1+a+2·c)), III) read and
databases. If the PATCONFDB instance is not filled to its
write of the data nodes (d · (1 + a + 2 · c)). Note that actually
maximum capacity, that factor has to be divided by the ratio
needed nodes are retrieved and then uploaded as part of the
of used space u to calculate the overall storage overhead.
whole cache, inducing a total overhead of n · (1 + a) per
In contrast to PATCONFDB, the Shuffle Index does
B-tree layer.
not need to be initialized to its maximum database size.
The measured network overheads for the scenarios
The storage overhead of the Shuffle Index is induced by the
that we introduced in Section 5.1 are shown in Table 2
storage needed for navigation nodes and by the ratio of the
for PATCONFDB and the Shuffle Index. The table shows
average number to the maximum number of records stored
the ratio of the induced network traffic of each approach to
in data nodes. This used space of data nodes u influences
the size of the queried records. For instance, if the queried
the storage overhead similarly as seen with PATCONFDB.
record has a size of 100B and a network traffic of 5,62KB is
Since the database records are stored, so that they are sorted
measured for the evaluation of the query, the relative over-
alphabetically in the leaf nodes of the Shuffle Tree, an index
head amounts to 562. The scenarios that only differ in net-
layer that contains every attribute value as seen with the
work bandwidth are paired in this table, since the network
PATCONFDB approach is not necessary. This combined
bandwidth does not affect the amount of network traffic.
with the dynamic fan out of navigation nodes significantly
For equality selections, the measurements indicate that
reduces the storage overhead induced by navigation nodes.
the total number of records stored in the database has a large
In our evaluations the storage overhead for a Shuffle Index
impact on the relative network overhead for both schemes
instance that only contains data nodes which are filled to
(scen. 1,2,5,6 vs. 3,4,7,8). Since the total network traffic for
their maximum capacity, never exceeded a factor of 1,02 in
equality selections in both schemes remains low (<1,5MB),
comparison to the plain text database size. That factor has
the large network overhead does not have a similarly large
to be divided by the ratio u of used space of data nodes to
impact on the query latency measured in Section 5.1. In
calculate the overall storage overhead.
the case of prefix selections, the performance optimization
It can be seen, that the storage overhead induced by in-
proposed in Section 4.3 for the Shuffle Index reduces the rel-
dexing techniques is a low and constant factor for both ap-
ative network overhead significantly. Since prefix selections
proaches and therefore does not restrict the feasibility of ac-
in the PATCONFDB are executed as a series of sequential
cess pattern preserving relational databases. However, the
equality selections, no significant reduction of the relative
constraint of PATCONFDB that the database has to be ini-
network overhead is measured. The relative overhead is only
tialized to its maximum size, could be problematic in sce-
reduced, if by chance multiple records that match the query
narios where the size of the database fluctuates frequently.
are retrieved within the same DC.
Since the relative network overhead of PATCONFDB re-
mains at a high level, it is not feasible to use PATCONFDB 6. FUNCTIONALITY EXTENSIONS
in scenarios, in which a large number of records is queried at We address the challenges in complying to access pattern
the same time or in a short time frame. The Shuffle Index, confidentiality in DaaS scenarios with multiple types of
however, can also be used in scenarios which include large indexes, i.e. strings, number, and dates, for the use of the
range or prefix selections. PATCONFDB approach. We discuss two situations in which
PATCONFDB would leak information about the pattern of
5.3 Storage Overhead queries, if used in the same way as in a single index type
In the following we investigate the storage overhead on the scenario. A basic equality selection results in one query to
external SP. This storage overhead consists of the storage each index layer and one query to the record store.
that is needed for the index as well as of the additionally Insert and delete operations have to access I (number
required storage to store record, i.e., the padding of nodes of index types) DCs from the index layer to insert or delete
or the initialization of a database to its maximum size. every attribute value of each record to the corresponding
The storage overhead of the PATCONFDB approach data record. Therefore an attacker can differentiate between
is predominantly influenced by the ratio u of the actual an equality selection and an insert or a delete operation.
To prevent this information leak, I DCs from the index layer 8. REFERENCES
have to be retrieved, before the record store is queried. [1] M. Bellare, A. Boldyreva, and A. O’Neill. Deterministic
and efficiently searchable encryption. In Proc. of the
Update operations have to access a DC of every index Annual Intl. Cryptology Conf. on Advances in Cryptology
type, which is involved in the SQL operation, from the index (CRYPTO), pages 535–552, 2007.
layer and then one DC of the record store. Since it is impos- [2] P. Brody and V. Pureswaran. Device democracy: Saving
sible to know the current value before it is update, another the future of the internet of things. IBM Global Business
DC has to be queried from the index layer, to delete the ref- Services Executive Report, 2014.
erence that is no longer needed. Therefore an attacker can [3] A. Ceselli, E. Damiani, S. D. C. D. Vimercati, S. Jajodia,
differentiate between an equality selection and an update S. Paraboschi, and P. Samarati. Modeling and assessing
inference exposure in encrypted databases. ACM
operation. To prevent this information leak, a randomly
Transactions on Information and System Security,
chosen set of DCs from the index structure would have to 8(1):119–152, 2005.
be retrieved after the actual query is executed. [4] I. Damgård, S. Meldgaard, and J. B. Nielsen. Perfectly
Further query types. In this paper, we investigated secure oblivious ram without random oracles. In Proc. of
how equality, range, and prefix selections can be evaluated the Conf. on Theory of Cryptography (TCC), pages
on access pattern confidentiality-preserving databases. We 144–163, 2011.
introduced equality, range and prefix selections based on [5] J. Dautrich, E. Stefanov, and E. Shi. Burst ORAM:
strings. However, our concepts can be seamlessly applied Minimizing ORAM response times for bursty access
patterns. In Proc. of the USENIX Security Symposium,
for range and prefix selections on other data types like inte- pages 749–764, 2014.
gers. We argue that our investigated query types are already [6] S. De Capitani di Vimercati, S. Foresti, S. Paraboschi,
sufficient for many scenarios in which relational databases G. Pelosi, and P. Samarati. Efficient and private access to
are used. PATCONFDB and modified shuffled B-trees, as outsourced data. In Proc. of the IEEE Intl. Conf. on
introduced in Section 4, can be extended to support table Distributed Computing Systems (ICDCS), pages 710–719,
joins and nested queries. Again the queries are divided into 2011.
sequential equality selections so that they are indistinguish- [7] S. De Capitani di Vimercati, S. Foresti, S. Paraboschi,
G. Pelosi, and P. Samarati. Distributed shuffling for
able from any other query. For this to work, all indexes preserving access confidentiality. In Proc. of the European
of all tables have to be stored in the same B-tree. If more Symp. on Research in Computer Security (ESORICS),
than one B-tree is used for indexing, every B-tree has to be pages 628–645, 2013.
accessed every time a query is executed for the queries to [8] O. Goldreich and R. Ostrovsky. Software protection and
be indistinguishable. In future work we plan to implement simulation on oblivious rams. Journal of the ACM,
and evaluate these query types to give recommendations on 43(3):431–473, 1996.
which setup performs best for which database scenario. [9] M. T. Goodrich and M. Mitzenmacher. Privacy-preserving
access of outsourced data via oblivious RAM simulation. In
Proc. of the Intl. Conf. on Automata, Languages and
7. CONCLUSIONS & FUTURE WORK Programming (ICALP), pages 576–587, 2011.
To investigate the feasibility of access and pattern confi- [10] M. S. Islam, M. Kuzu, and M. Kantarcioglu. Access pattern
dentiality-preserving relational databases with a B-tree in- disclosure on searchable encryption: Ramification, attack
and mitigation. In Proc. of the Network and Distributed
dex, we proposed PATCONFDB, an ORAM-based concept Systems Security (NDSS) Symposium, 2012.
to achieve access pattern confidentiality, and extended exist- [11] S. Kamara, C. Papamanthou, and T. Roeder. Dynamic
ing shuffled B-tree approaches to support essential database searchable symmetric encryption. In Proc. of the ACM
operations. In particular, empirical measurements of these Conf. on Computer and Communications Security (CCS),
concepts showed that enforcing access pattern confidential- pages 965–976, 2012.
ity in relational databases only induces an overhead of factor [12] J. Köhler, K. Jünemann, and H. Hartenstein. Confidential
5.9 for evaluating equality conditions on a database with up database-as-a-service approaches: taxonomy and survey.
Journal of Cloud Computing, 4(1), 2015.
to 10 million records. The extended Shuffle Index induces an
[13] R. Ostrovsky. Efficient computation on oblivious RAMs. In
even smaller overhead of factor 2.3 for the same setup, but Proc. of the ACM Symposium on Theory of Computing
provides no strict and well-defined access pattern confiden- (STOC), pages 514–523, 1990.
tiality guarantees. Our performance evaluation showed to [14] L. Ren, X. Yu, C. W. Fletcher, M. van Dijk, and
which extend the induced overhead depends on the network S. Devadas. Design space exploration and optimization of
link to the SP, the structure of the data and the query work- path oblivious RAM in secure processors. In Proc. of the
load of the outsourcing scenario. In particular, our results Intl. Symp. on Computer Architecture (ISCA), pages
571–582, 2013.
showed that shuffled B-trees in general outperform ORAM-
[15] E. Stefanov and E. Shi. ObliviStore: High performance
based B-trees and, thus, can be used in a wider variety of oblivious distributed cloud data store. In Proc. of the
DaaS scenarios, if no strict security guarantees are required. Network and Distributed Systems Security (NDSS)
The findings of this paper highlight multiple future re- Symposium, 2013.
search directions. It is worthwhile to aim for further effi- [16] E. Stefanov, M. van Dijk, E. Shi, C. Fletcher, L. Ren,
ciency improvements of ORAM-based database indexes, as X. Yu, and S. Devadas. Path ORAM: An extremely simple
well as to evaluate the performance of other index struc- oblivious RAM protocol. In Proc. of the ACM Conf. on
tures like bitmaps. The superior efficiency of shuffled B-tree Computer and Communications Security (CCS), pages
299–310, 2013.
approaches makes it also worthwhile to aim for a better un- [17] K. Yang, J. Zhang, W. Zhang, and D. Qiao. A light-weight
derstanding of their security guarantees. Furthermore, we solution to preservation of access pattern privacy in
plan to investigate how both ORAM-based and shuffle-based un-trusted clouds. In Proce. of the European Conf. on
schemes can be extended to support relational databases Research in Computer Security (ESORICS), pages
with multiple index types and queries that include more 528–547, 2011.
complex operations.