=Paper= {{Paper |id=Vol-1558/paper36 |storemode=property |title=Access Pattern Confidentiality-Preserving Relational Databases: Deployment Concept and Efficiency Evaluation |pdfUrl=https://ceur-ws.org/Vol-1558/paper36.pdf |volume=Vol-1558 |authors=Alexander Degitz,Jens Köhler,Hannes Hartenstein |dblpUrl=https://dblp.org/rec/conf/edbt/DegitzKH16 }} ==Access Pattern Confidentiality-Preserving Relational Databases: Deployment Concept and Efficiency Evaluation== https://ceur-ws.org/Vol-1558/paper36.pdf
   Access Pattern Confidentiality-Preserving Relational
Databases: Deployment Concept and Efficiency Evaluation

                                  Alexander Degitz, Jens Köhler, Hannes Hartenstein
          Karlsruhe Institute of Technology (KIT) - Steinbuch Centre for Computing & Institute of Telematics
                                                 Karlsruhe, Germany
                                 {alexander.degitz, jens.koehler, hartenstein}@kit.edu


ABSTRACT                                                                  is a special case of confidential data outsourcing as the out-
In this paper, we address the question whether access pat-                sourced data does not only have to be protected but also
tern confidentiality-preserving databases with an underlay-               has to be efficiently searchable. To achieve that, a variety of
ing B-tree index structure are feasible in practice by propos-            confidentiality preserving indexing approaches (CPIs) were
ing integrative deployment concepts that support important                proposed that allow to efficiently evaluate database queries
database query functionalities based on ORAM and shuffled                 on outsourced encrypted data [12]. Most existing CPIs only
B-trees. Furthermore, we provide a rigorous efficiency eval-              protect the confidentiality of data at rest, i.e., data that
uation to determine the cost of the proposed concepts with                is persisted by the SP. However, in reality, the SP is also
regard to storage, network, and query latency as well as                  able to monitor database queries and the encrypted query
to investigate the influence of scenario factors like database            results, as well as database record inserts, updates, and dele-
size, number of records, and network bandwidth. In par-                   tions. Such observations can be used to reveal confidential
ticular, we show that ORAM-based concepts only cause an                   data. For instance, if the SP observes that two encrypted
overhead of factor 5.9 for evaluating equality conditions on              records 1 and 2 are the result of a query for all records which
a database with up to 10 million records.                                 contain an attribute value X, most CPIs guarantee that the
                                                                          plaintext attribute value X is hidden from the SP. However,
                                                                          the SP can deduce that the two returned records contain
Categories and Subject Descriptors                                        the same attribute value. The SP might then be able to ap-
I.8 [Database and storage security]: Management and                       ply background knowledge like “record 1 contains attribute
querying of encrypted data; H.2.4 [Information systems]:                  value X” to learn that “record 2 contains attribute value X”.
Information storage systems; Cloud based storage                             In particular, to protect against SPs that can monitor
                                                                          queries, a CPI has to guarantee that the SP is not able to de-
                                                                          termine which encrypted records matched an executed query
1.    INTRODUCTION                                                        and the SP is not able to correlate executed queries (access
   Many potential cloud customers are reluctant to make use               pattern confidentiality). Both properties can be trivially
of Database-as-a-Service (DaaS) offerings and to outsource                achieved by retrieving the entire database for each query
confidential data to cloud storage providers (SPs) due to the             to make the queries indistinguishable. This induces signif-
possibility that the SPs do not behave honestly or do not ap-             icant efficiency costs in terms of latency, transmission, and
ply sufficient protective measures to protect the data against            computation. Oblivious RAM (ORAM) approaches [5, 9,
third party attackers. Enforcing the confidentiality of data              13, 4, 15] allow to store and retrieve records based on fixed
before it is outsourced to the SP mitigates the risk of a data            identifiers and enforce pattern and access confidentiality by
confidentiality breach. With the advent of the internet of                selectively retrieving, re-encrypting, and re-submitting spe-
things, it is expected that users are not even aware of who               cific parts of the outsourced database in an interactive way.
will store their data anymore [2]. Thus, techniques that en-              However, ORAM schemes do not support the search capa-
force data confidentiality in the trusted domain of the user              bilities that are required by relational databases and, as of
before the data is outsourced will gain even more impor-                  now, it is unclear whether their efficiency is acceptable for
tance. For the sake of readability and without loss of gener-             relational database scenarios1 .
ality, we abstract from the specific nature of the attacker and              In this paper, we investigate the question whether preserv-
assume that the SP aims to breach data confidentiality in                 ing access PATtern CONFidentiality in relational Data-
the following. Outsourcing confidential relational databases              Bases is practically feasible by proposing PATCONFDB,
                                                                          a deployment concept for ORAM schemes in the database
                                                                          context that supports common database searches and pro-
                                                                          vides pattern and access confidentiality. We show that PAT-
                                                                          CONFDB provides confidentiality guarantees against hon-
                                                                          est-but-curious attackers who can view the persisted data
                                                                          and monitor the executed database queries, including record
(c) 2016, Copyright is with the authors. Published in the Workshop Pro-   insert, update, as well as delete operations. We show that by
ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor-
deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this      1
paper is permitted under the terms of the Creative Commons license CC-      http://outsourcedbits.org/2013/12/20/how-to-search-on-
by-nc-nd 4.0                                                              encrypted-data-part-4-oblivious-rams/
making well-considered use of ORAM protocols, it is possi-         ORAM to encrypt relational databases is feasible with re-
ble to achieve significant efficiency benefits compared to re-     gard to efficiency. In this paper, we do not propose a new
trieving the entire database for each query. Furthermore, we       ORAM scheme, but PATCONFDB, a concept on how to use
compare the efficiency of PATCONFDB to previously pro-             existing ORAM schemes to enable the execution of search
posed shuffled B-tree approaches [7, 17, 3] which also aim         queries on relational databases.
to provide access pattern confidentiality, but are not able to        Encrypted B-trees [3] were proposed to enforce the con-
provide strict security guarantees. In order to make a shuf-       fidentiality of data at rest. To increase the computational
fled B-tree approach comparable, we extend it to support           cost of inference attacks based on access patterns, shuffling
various query types that are prevalent in database query           the B-tree after each database query was proposed [6, 7,
workloads. The main contributions of this paper are:               17]. Note that, despite making access pattern based infer-
                                                                   ence attacks harder, the Shuffle Index does not provide strict
• PATCONFDB, a concept to efficiently apply ORAM                   security guarantees like ORAM does. Furthermore, to the
  protocols in DaaS settings that supports searching               best of our knowledge, no approach based on shuffled B-trees
  a database for records which match specified equality,           provides the ability to insert, update or delete data. In this
  range, and prefix query conditions.                              paper, we extend the Shuffle Index approach [6] to support
• A concept to enhance shuffled B-tree approaches                  them. Our extensions can be applied to all schemes that
  to be used in DaaS settings, so that update operations as        make use of a shuffled B-tree.
  well as range and prefix queries can be executed.
• An efficiency evaluation that shows that significant             3.     DEPLOYMENT REQUIREMENTS
  performance increases are possible by considerately ap-
  plying ORAM protocols. Furthermore, we compare the               3.1      Database Functionality Requirements
  PATCONFDB approach to the less secure shuffled B-                  Databases typically support inserts, updates, and dele-
  tree approach and provide guidelines for choosing the            tions of database records. Furthermore, typically it is re-
  most suitable indexing approach for a given out-                 quired to search databases for specific records. The struc-
  sourcing scenario.                                               tured query language (SQL) specifies a variety of different
                                                                   query types. In the following, we consider the following
   In Section 2 related work is discussed. In Section 3, the       types of queries:
specific requirements of relational databases for deployment
concepts of access pattern confidentiality-preserving schemes           1. Equality selection: Query for records that contain
are summarized. In Section 4, we introduce PATCONFDB,                      a specific attribute value A.
an ORAM-based access pattern confidentiality-preserving                    Example: SELECT * WHERE Name = ’Andy’;
indexing approach, and show how shuffled B-tree indexes can             2. Prefix selection: Query for records containing an
be extended to also satisfy the requirements. We provide an                attribute value that starts with prefix A.
efficiency evaluation of the proposed schemes in Section 5.                Example: SELECT * WHERE Name LIKE ’An%’;
Furthermore, we discuss possible functionality extensions for           3. Range selection: Query for records that contain an
our deployment concepts Section 6 and conclude the paper                   attribute value that is smaller than value A and bigger
in Section 7.                                                              than value B.
                                                                           Example: SELECT * WHERE Name Between ’An’
2.   RELATED WORK                                                          AND ’Ce’;
   Many protocols have been proposed to efficiently perform          We discuss the challenges of providing access and pattern
keyword searches over encrypted data, i.e. [11, 1, 12]. While      confidentiality for additional query types in Section 6.
these approaches ensure content confidentiality, they leak
the access patterns of queries. Based on an exemplary data-        3.2      Confidentiality Requirements
base, Islam et al. [10] showed that, with a small amount              An approach to securely outsource confidential databases
of background knowledge about the data, up to 80% of the           has to enforce both content and access pattern confidentiality
encrypted data can be revealed by an attacker who watches          to protect against attackers that are able to monitor queries
the query and the corresponding results through a frequency        on the outsourced data.
analysis. This brought forward the need for CPIs that also            Content confidentiality: A database outsourcing ap-
ensure access pattern confidentiality.                             proach provides content confidentiality if an attacker that
   Oblivious RAM (ORAM) was first proposed by Goldreich            is able to view the outsourced data is not able to learn the
and Ostrovsky as a way to ensure software protection [8].          content of the database’s records.
ORAM prevents that an attacker who observes the RAM                   Access pattern confidentiality: We adopted our def-
learns any information about the RAM access patterns of ex-        inition of access pattern confidentiality from [16]. An ap-
ecuted programs. Improved schemes were proposed over the           proach provides access pattern confidentiality if an attacker
last few years which have put a focus on ORAM to be a con-         that monitors database queries and query results as well as
siderable alternative when it comes to secure data outsourc-       database insert, update, and deletions is not able to tell 1)
ing [5, 9, 13, 4, 15], i.e., storing and retrieving data records   which parts of the database were accessed by a database
based on fixed identifiers. Outsourced relational databases,       operation, 2) when a part of a database was last updated,
on the other hand, are more complex and require efficient          3) whether specific data was repeatedly accessed, and 4)
search for database records that contain a specific attribute      whether the database was queried or updated. This partic-
value or contain attribute values that fall in a specific range.   ularly means that even a series of accesses to the same record
ORAM schemes do not support such search queries and, to            is unrecognizable for attackers and therefore does not leak
the best of our knowledge, it is unclear whether applying          any information.
4.    DEPLOYMENT CONCEPTS                                                                                       J
4.1    PATCONFDB                                                     Index Layer 1
                                                                     ORAM Instance
   In this section, we introduce the PATCONFDB concept                                             D I                      O T
that uses existing ORAM schemes as building blocks to sup-
port searches on outsourced data while enforcing access pat-         Index Layer 2
tern confidentiality. The interface of all existing ORAM             ORAM Instance
schemes is ORAM.get(ID) and ORAM.put(ID, block), i.e.,                           A C               F H          J K         Q S         V X
data blocks can be stored and retrieved based on a fixed
ID. All existing ORAM schemes share the following char-
acteristic. To retrieve a stored data block, first a set of                  r1 r2 r3 r4 r5   r6   r7 r8   r9   r10   r11   r12   r13   r14 r15 r16
encrypted data block containers (DC) has to be downloaded            Record Store
                                                                     ORAM Instance
and decrypted. The decrypted data blocks then have to be
re-encrypted and the resulting encrypted DCs have to be             Figure 1: Hierarchy of the ORAM instances in PAT-
re-uploaded to the SP. PATCONFDB is agnostic to the un-             CONFDB.
derlying ORAM scheme and initializes multiple instances of
the used ORAM scheme. These ORAM instances are inde-                nism as described for equality selections. That is, for each
pendent from each other. Each ORAM instance hides the               attribute value that lies in the queried range, an equality
content and the access pattern of the database from the SP.         selection query is evaluated and the results of these queries
   In most database scenarios, records have to be searchable        are aggregated to the result of the range/prefix selection. A
based on record attributes. This creates the need for indexes       prefix selection can be considered as a special kind of range
to be outsourced to an external SP. We use a B-tree index           selection, that spans over every attribute value of records
structure for PATCONFDB so that the efficiency evalua-              inside the queried prefix.
tion is better comparable to the Shuffle Index, which also             Insert / delete / update. To insert a new record, an
uses a B-tree. To search for records that contain a specific        equality selection for the attribute value of the new record
attribute value, the highest level node of the B-tree is re-        is performed, the set of identifiers of records that contain
trieved. The retrieved node can be used to determine which          the attribute value is retrieved and the identifier of the in-
node of the next lower B-tree layer is closest to the queried       serted record is added before re-encrypting and re-uploading
value. This process is repeated until the leaf of the B-tree is     the data to the corresponding index layers. Furthermore,
reached, which contains the record identifiers of the records       the records that matched the equality selection are retrieved
that contain a matching attribute value.                            from the record store ORAM instance and the new record
   Each node of a B-tree can be encrypted to protect the con-       is inserted in the record store when re-encrypting and re-
fidentiality of the data at rest. However, the attacker would       uploading the retrieved records. Deleting a record works
still learn access patterns and distinguish different types of      analogously, by removing the record before re-encrypting
queries by monitoring which nodes were retrieved to eval-           and re-uploading the retrieved records and the record iden-
uate a given query. To hide which nodes were retrieved,             tifiers. Updating a record is considered a concatenation of a
PATCONFDB stores every layer of the B-tree in a separate            delete and an insert operation.
ORAM instance as shown in Figure 1. For the remainder of               As ORAM schemes allow to store and retrieve data blocks
this paper we call the ORAM instances that contain the in-          based on a fixed ID but shuffle the stored encrypted data
dex layers and the ORAM instance that contains the actual           blocks within the DCs with every access, each ORAM in-
data records record store. In the following, we describe how        stance has to map the fixed block IDs to the current location
PATCONFDB supports the database functionality that is               within the ORAM instance to determine which encrypted
described in Section 3.                                             data blocks have to be returned. This is illustrated in Fig-
   Equality selections. To retrieve all records that contain        ure 2a. For instance, to retrieve node 1 (containing “D” and
a specific attribute value, first the root node of the B-tree has   “I”) of index layer 1, the encrypted data block 2 has to be
to be retrieved and decrypted. Based on the decrypted node          retrieved from the ORAM instance. Thus, data block ID 1
it can be determined which node X has to be retrieved from          has to be mapped on the encrypted data block ID 2 first.
the next lower layer in the B-tree. To retrieve X, the ORAM         To perform this mapping without harming access pattern
instance that corresponds to X’s layer has to be queried. If        confidentiality, more data has to be transmitted and further
the B-tree is stored in n ORAM instances, n ORAM in-                round trips are necessary between client and SP (see [14]).
stances have to be accessed to retrieve the B-tree’s leaf node         The overhead for mapping fixed data block IDs on ORAM
that contains the identifier of the records that match the          instance locations can be avoided in the PATCONFDB case.
database query. Based on the record identifiers, the match-         As shown in Figure 2b, the DC ID can be directly stored
ing records can be retrieved from the record store ORAM             in the corresponding B-tree node of the next upper layer.
instance. If more than one record matches the query, the            Thus, when executing a query, the IDs of DCs that have
equality selection has to be evaluated again for each match-        to be retrieved from the next layer are already known to
ing record before retrieving it from the record store, even if      the client without any further mapping. For instance, after
all identifiers of matching records are already known. This         retrieving and decrypting the root node (containing “J”),
mechanism is needed to keep queries indistinguishable, as           the client already knows that the left B-tree node of the
explained in Section 4.2.                                           next lower layer is contained in the encrypted data block 2.
   Prefix / range selections. To keep the type of queries              As ORAM requires to shuffle data blocks and to store
indistinguishable, a prefix selection is executed as a sequen-      them in different DCs after each retrieval, data blocks of
tial series of equality selections through the same mecha-          the lower layers have to be re-encrypted and re-uploaded
      Data block ID
                                                      J                                                 riodically so that bursts in queries could not be detected.
      Data block container ID                     1       2
                                                                                                        Of course, this also generates a large overhead in network
  Index Layer 1
  ORAM Instance
                              1                                         2                               traffic and query latency.
                                  2                                         1
              12                                                                D I
   Position




                                      O T                                                                  Insert / delete / update. Since insert, update, and
     Map




              21                  3    4 5                                  1       2
                                                                                                        delete operations are achieved by equality selection queries,
  Index Layer 2
  ORAM Instance       1                 2                 3                 4             5             attackers are not able to distinguish them from equality se-
                          2                 5                 4                 1             3         lection queries and, thus, prefix/range selection queries.
   Position




              12345
     Map




              41532           F H               V X               Q S               A C           J K      Since every query or write operation on PATCONFDB
                                                                                                        leads to the same pattern of successive indistinguishable op-
                                      (a) ORAM index layers
                                                                                                        erations on ORAM instances, the SP is not able to monitor
                                                      J                                                 access patterns. Thus, PATCONFDB enforces access pat-
                                                  2       1                                             tern confidentiality.
  Index Layer 1
  ORAM Instance               1                                         2
                                      O T                                       D I                     4.3    Shuffled B-tree Index Extensions
                                  5    3 2                                  4       1
                                                                                                           To enhance access pattern confidentiality for the retrieval
  Index Layer 2
  ORAM Instance       1                 2                 3                 4             5             of a single record, three methods are used by existing shuf-
                              F H               V X               Q S               A C           J K   fled B-tree approaches [6, 7, 17]: 1) Cover searches, as seen
                                                                                                        in Figure 3, are randomly chosen nodes that are retrieved
                (b) Efficiency improved ORAM index layers                                               in parallel to the nodes that are actually relevant for the
                                                                                                        executed query in each layer. For example, if two cover
                Figure 2: Efficiency optimized index layers.                                            searches are executed, attackers see three retrieved nodes
                                                                                                        for every layer of the Shuffle Tree. Attackers cannot distin-
prior to data blocks of the upper layers. When a query is                                               guish between cover searches and nodes that are relevant for
executed, first data blocks have to be retrieved from index                                             the executed query. 2) Node caching at the trusted client is
layer 1 to get the B-tree node that contains the ID of the                                              used to increase access pattern obfuscation. After a node is
DC that has to be retrieved from index layer 2. But the                                                 being accessed, it will replace the least recently used node
data blocks in index layer 1 cannot be written back right                                               in the client-side cache. All nodes on the path from the root
away, as – due to shuffling – the data blocks of index layer                                            node to the leaf node are stored inside the cache. If a node
2 are likely to be stored in different DCs than before. Only                                            that is in the cache is retrieved, an additional cover search
after new DCs have been assigned to the data blocks of the                                              is executed so an attacker cannot infer that the node was
record store ORAM instance, the data blocks in the upper                                                inside the cache. 3) Node shuffling is used to increase the
index layers can be updated and written back recursively.                                               difficulty for attackers to learn the parent-child relationship
                                                                                                        of the nodes. After each query execution, the content of all
4.2            Security of PATCONFDB                                                                    nodes stored on the client is shuffled using a random permu-
   The PATCONFDB concept consists of multiple ORAM                                                      tation before they are written back to the encrypted B-tree.
instances. We assume that the utilized ORAM scheme to                                                   Node shuffling is outlined in Figure 3.
create each ORAM instance satisfies the ORAM security                                                      In the remainder of this paper we denote the leaf nodes of
notion, i.e., any read or write operations on the data within                                           the Shuffle Tree as data nodes and the non-leaf nodes as nav-
the instance are indistinguishable. We now examine the se-                                              igation nodes. In the following, we show how the concept of
curity implications of executing queries across multiple index                                          shuffled B-trees can be extended to not only support equal-
layers, i.e., multiple ORAM instances.                                                                  ity selections, but also range and prefix selections as well as
   Selection queries. As shown in Section 4.1, sequential                                               database modifications, i.e., insert, update, and deletion of
equality selections are used to evaluate range and prefix se-                                           database records.
lections. Thus, it suffices to show that equality selections                                               Insert / delete. Current Shuffle Index concepts [6, 17]
are indistinguishable from the perspective of the SP. Every                                             assume an upfront knowledge of all database records. Know-
ORAM instance has to be queried exactly once to evaluate                                                ing the exact distribution of records is not possible in most
an equality selection, so honest-but-curious attackers see a                                            real world scenarios. Thus, concept extensions that allow
constant number of indistinguishable ORAM accesses. From                                                to insert, update, and delete records are necessary. We pro-
this, they learn the number of index layers, but they neither                                           pose a modified shuffled B-tree scheme with a dynamically
learn the parent-child-relationship of nodes in the B-tree nor                                          expanding and shrinking B-tree that is initialized with only
the total size of data stored in the database. If an equal-                                             a small amount of navigation nodes.
ity selection matches multiple records, the query is executed                                              The concept to insert records is outlined in Figure 4.
over all index layers for each matching record. Otherwise,                                              Whenever a record is inserted, an equality selection query
it would be possible for an attacker to observe the number                                              for the attribute value of the new record is performed to find
of matching records by counting the sequential accesses to                                              the node X to store the record in. If this node X is full, a new
the record store ORAM instance. Attackers could still mon-                                              node N is created to store the record in. The reference to N
itor bursts in queries, but this could as likely be caused by                                           is stored in the parent node of node X. If the parent node is
an equality selection that has matched many records as by                                               full as well, another new node is inserted on the next higher
a prefix or range selection. So the type of selection query                                             layer of the B-tree. This works recursively to the root of the
remains hidden from attackers. The frequency of queries                                                 tree, which does not have any size limitations, because it
can still be observed by the attacker. Access frequency is                                              does not need to be of a certain size to be indistinguishable
an information leak that can be found in all current ORAM                                               from any other node. Up to half of the records in the full
schemes. One solution would be to query the database pe-                                                node X are copied to the new node N while inserting the
      Storage position in shuffled B-tree         1                                               nodes is shuffled, padded, and encrypted with a new nonce
      Database query                                      J
      Cover search
                                                      3       2                                   in every access. Yang et al. [17] published a proof that, by
                                2                                      3                          using the shuffle mechanism, the probability that a node is
                                        O T                                    D I                the child of a parent node X is equal to the probability of
                                    8       6 5                            7    4                 it being the child of any other parent node Y after a large
                                                                                                  enough number of accesses were performed after the last
  4                         5                     6                7                 8            retrieval of X. Since this number of accesses is not further
        F H                         V X                   Q S          A C                J K     specified, it remains unclear when shuffled B-trees enforce
                                                                                                  access pattern confidentiality sufficiently.
                (a) Query execution with one cover search                                            To use the Shuffle Index in relational databases, methods
                                                                                                  to insert, update and delete data had to be implemented.
      Storage position in shuffled B-tree         1                                               As described in Section 4.3, this leads to the expansion and
      Database query                                      J
      Cover search
                                                      3       2                                   shrinkage of the Shuffle Tree over time. Therefore, informa-
                                2                                      3                          tion on the size of the database is leaked and the insert, up-
                                        O T                                    D I                date, as well as delete operations are distinguishable. Thus,
                                    8       7 5                            6    4                 the fourth access pattern confidentiality requirement defined
                                                                                                  in Section 3.2 is not satisfied. However, this does not lead
  4                         5                     6                7                 8            to the breach of the other three confidentiality requirements.
        F H                         V X                   A C          Q S                J K     Besides the total size of the database, attackers can also infer
                                                                                                  how many records can be stored in one node by monitoring
                     (b) Shuffled B-tree after node shuffling                                     a series of inserts and track how often a new node is being
                                                                                                  created. Users have to decide if these information leaks are
Figure 3: Shuffled B-tree: Query execution with cover                                             acceptable for their specific scenarios.
searches.                                                                                            Prefix / range selections. The performance optimiza-
                                                                                                  tion for prefix/range selection queries as proposed in Sec-
new record. After the insertion, the nodes are re-encrypted,                                      tion 4.3 introduces two minor information leaks. Attack-
shuffled and re-uploaded in the same way they would have                                          ers can distinguish range and prefix selection queries from
been in case of a regular equality selection query. The dele-                                     other database queries and they can estimate the amount of
tion of a record works similar. When the last record in a                                         records that matched a prefix/range query to a certain de-
data node is deleted, the data node itself and its reference                                      gree. Attackers can observe the total number of data nodes
in the parent node is deleted.                                                                    that were retrieved, but they do not know how many of those
   Update. To ensure the alphabetical sorting of all records,                                     nodes are cover nodes. We argue that both information leaks
an update that changes the attribute value of the record                                          are acceptable since the type of query is distinguishable any-
works as a deletion of the old record followed by an insertion                                    way due to the dynamically expanding shuffled B-tree. The
of a new one.                                                                                     amount of retrieved records would have also been observable
   Prefix / range selections. Since insert, update and                                            to a certain degree without this optimization, because bursts
delete operations are already distinguishable from selections,                                    in queries correlate to the amount of retrieved records.
it is not necessary to make prefix and range selections in-
distinguishable from equality selections. Therefore, we can
use the following method to enhance the performance of
                                                                                                  5.    EFFICIENCY EVALUATION
prefix and range selections. Instead of retrieving only one
data node, every data node inside one navigation node that
                                                                                                  5.1    Query Latency Overhead
matches the query can be downloaded. Even though x cover                                             We evaluated the query latency overhead induced by the
nodes have to be retrieved to obfuscate each data node that                                       use of PATCONFDB and the modified shuffled B-tree by
was actually retrieved, this method significantly reduces the                                     measuring the query latency of equality, prefix, and range
network traffic and the number of sequential data node re-                                        selections. For this evaluation, we used the state-of-the-
trievals, as discussed in Section 5.2.                                                            art Path ORAM [16] scheme as ORAM scheme for PAT-
                                                                                                  CONFDB. Path ORAM organizes the ORAM DCs in a bi-
4.4        Security of Shuffled B-trees                                                           nary tree. Each leaf of this binary tree is assigned a unique
                                                                                                  position and each data block is assigned to a position. The
  The security guarantees of the Shuffle Index are based
                                                                                                  invariant of Path ORAM states that every data block with
on the assumption that attackers are not able to determine
                                                                                                  an assigned position p is stored on the path from the root DC
the parent-child relationship of nodes in different layers of
                                                                                                  to the leaf DC with position p. So for every read or write
the Shuffle Tree. To achieve this, the content of all queried
                                                                                                  operation on a Path ORAM instance with n DCs, log(n)
                                                                                                  DCs are retrieved from the SP. After the operation the DCs
                                                                                                  are re-encrypted with a nonce and re-uploaded to the SP.
              Al Ar                                               Al An Ar                           We instantiated PATCONFDB with two index layers and
   Alex                             Armin                 Alex     Andy                  Armin    one record store. During our measurements, we optimized
   Alois                            Arthur                Alois    Anna                  Arthur   the capacity of DCs in each layer, resulting in container
   Andy                             -                     Amy      -                     -        capacities between 10 and 100 data blocks. Our shuffled
   Anna                             -                     -        -                     -        B-tree extensions are implemented with a Shuffle Index [6]
                                                                                                  as described in Section 4.3 with two cover searches, a cache
Figure 4: An excerpt of a Shuffle Tree before and after the                                       size of five and a Shuffle Tree height of three. In our mea-
insertion of the record ’Amy’.                                                                    surements, we found these parameter choices to have the
                    DB size              Records       Bandwidth and latency     100000
                                                                                            Baseline        ShuffleIndex           > time limit       > time limit
 Scenario 1         1 GB                 1 mio         1 Gbit/s and 5 ms                    DBCopy         PATCONFDB

 Scenario 2         1 GB                 1 mio         10 Mbit/s and 50 ms        10000

 Scenario 3         1 GB                 10.000        1 Gbit/s and 5 ms
                                                                                     1000
 Scenario 4         1 GB                 10.000        10 Mbit/s and 50 ms
 Scenario 5         10 GB                10 mio        1 Gbit/s and 5 ms             100
 Scenario 6         10 GB                10 mio        10 Mbit/s and 50 ms
 Scenario 7         10 GB                100.000       1 Gbit/s and 5 ms              10
 Scenario 8         10 GB                100.000       10 Mbit/s and 50 ms
                                                                                       1
Table 1: Summary of the scenario parameters chosen for
                                                                                              1        2           3       4   5          6       7          8
each scenario.
                                                                               Figure 6: Measured query latency of range selections over
  100000                                                                       all scenarios on a logarithmic y scale.
              Baseline        ShuffleIndex
              DBCopy         PATCONFDB

   10000
                                                                               in case more records are queried (scen. 1,3,5,7 vs. 2,4,6,8).
    1000
                                                                               Note that the confidence intervals for the queries executed
                                                                               with PATCONFDB are significantly larger than the ones of
     100                                                                       the other protocols. This is caused by different sized network
                                                                               overheads as explained in Section 5.2. For scenarios 6 and
     10                                                                        8, PATCONFDB even exceeded our time limit of 7h for
                                                                               one query. The Shuffle Index is much less influenced by
      1                                                                        the bandwidth of the network link, but its query latency
                1        2           3       4     5      6    7    8
                                                                               is already by a factor of 10 higher than the latency of the
                                                                               Baseline protocol.
Figure 5: Measured query latency of equality selections over                      Our measurements show that DBCopy can outperform
all scenarios on a logarithmic y scale.                                        PATCONFDB for range selections if more than about 0,1%
                                                                               off all records are queried in range selections. The results in-
                                                                               dicate that the PATCONFDB performs better in scenarios
best trade-off between security and performance. To better
                                                                               with databases that contain small amounts of records with a
compare the query latency, the naive approach of enforc-
                                                                               large size, whereas the Shuffle Index performs better in sce-
ing access pattern confidentiality by downloading the whole
                                                                               narios with databases that contain a large number of small
encrypted database on every access is evaluated, hereafter
                                                                               records. For scenarios in which many records are queried
referred to as DBCopy. To specify the overhead compared
                                                                               in range selections a network link with a high bandwidth to
to an approach that does not provide any security mecha-
                                                                               the SP is needed to keep query latency low.
nisms, an unencrypted database retrieval protocol was also
evaluated, hereafter referred to as Baseline.                                  5.2      Network Overhead
   The query latency has been measured on a client computer
                                                                                 In the following, we investigate the network overhead of
with an Intel Core i7 CPU with 2,4GHz and 8GB RAM. As a
                                                                               PATCONFDB and shuffled B-trees, i.e., the amount of data
database server a Microsoft SQL Server Express was used on
                                                                               that has to be transmitted between the client and the SP.
a virtual machine in Hyper-V with 2 virtual CPUs and 16GB
                                                                               First, we provide analytical models to highlight the factors
RAM. The test data has been created with a random number
                                                                               that influence the network overhead both for PATCONFDB
generator and is evenly distributed. Before every test run,
                                                                               and shuffled B-trees. Then we present and interpret the
each database is initialized so that the generated database
                                                                               network overheads we measured for the scenarios that we
records fill 10% of it. In Table 1 the parametrization of our
                                                                               introduced in Section 5.1.
2k -factorial experimental design scenarios is shown.
                                                                                 The network overhead NPs to retrieve a record from a
   The query latency of equality selections measured over
                                                                               PATCONFDB instance that is based on ORAM scheme s
all scenarios is shown in Figure 5. It can be seen, that an
                                                                               can be calculated as follows:
increase of the size of the database results in an increased                                                      XHI s
query latency for all tested protocols (scen. 1-4 vs. 5-8).                                     NPs = Nrs + N t +      Ni
The bandwidth of the network link is only a critical fac-                                                                             i=1

tor for the DBCopy protocol, because an equality selection                     for HI = number of index layers, Nrs = network overhead
only retrieves a small number of records, so the total net-                    induced by the retrieval of a record from the record store
work traffic is low for all other protocols (scen. 1,3,5,7 vs.                 depending on the ORAM scheme s, N t = size of the root
2,4,6,8). The Baseline and the PATCONFDB protocol are                          node of the PATCONFDB B-tree, Nis = network overhead
significantly influenced by the number of records (scen. 1,2                   induced by the retrieval of a record from an index layer
vs. 3,4). For the Baseline protocol this is the case, because                  depending on the ORAM scheme s.
it does not have an index tree to efficiently search for at-                     For an implementation of PATCONFDB with Path O-
tribute values. For the PATCONFDB protocol it takes a                          RAM, the retrieval of a record requires every Path ORAM
long time to sequentially query index layers which contain a                   instance to be accessed two times (Retrieval and Upload of
large number of identifiers.                                                   DCs). Since the DCs are stored in a binary tree and all DC
   The query latency of range selections measured over all                     on the path from the root to the leaf of the tree are retrieved,
scenarios is shown in Figure 6. Now that the bandwidth of                      the network traffic for every ORAM instance equals the tree
the network link influences the query latency significantly                    height h multiplied with the size of the DC d.
      Scenario   SI(EQ)     SI(PR)    PC(EQ)       PC(PR)           database records’ size to the chosen maximum database size.
      1+2        562,0      6,56      7080         6557             If the maximum database size is 100GB but only 10GB of
      3+4        110,4      6,31      683,1        669,8            data is stored inside, the relative storage overhead to the
      5+6        1460       6,57      5901         5761             actual data size is 10. The size of the index depends on the
      7+8        111,3      6,26      1025         971,3            maximum number of records that can be stored in a PAT-
                                                                    CONFDB instance. Each different attribute value of any
Table 2: Ratio of Shuffle Index (SI) and PATCONFDB (PC)             record is stored inside of the lowest index layer to provide
network overhead to the size of the queried records for equal-      references to every record that is stored in the database.
ity (EQ) and prefix (PR) selections.                                Since index layers are not allowed to grow in size, they have
                                                                    to be initialized to their maximum size as well. So, in the
  The network overhead NS to retrieve a record from a
                                                                    worst case of records that contain only the indexed attribute,
Shuffle Index can be calculated as follows:
                                                                    the lowest index layer has the same size as the record store.
 NS = 2 · r + n · (h − 2) · (1 + a + 2 · c) + d · (1 + a + 2 · c)   However, the number of DCs that have to be referenced de-
                                                                    creases fast for the upper index layers, so that the additional
for h = height of Shuffle Tree, r = root node size, n =
                                                                    storage overhead for them is very low. In our evaluations the
navigation node size, d = data node size, c = number of
                                                                    storage overhead for a PATCONFDB instance that is filled
executed cover searches, a = cache size. The network traffic
                                                                    to its maximum capacity never exceeded a factor of 2,2 in
is induced by three factors: I) Retrieval and upload of the
                                                                    comparison to the plain text database size with an aver-
root node (2 · r), II) retrieval and upload of navigation nodes
                                                                    age factor of 2,08, which we argue is feasible for relational
including cover nodes (n·(h−2)·(1+a+2·c)), III) read and
                                                                    databases. If the PATCONFDB instance is not filled to its
write of the data nodes (d · (1 + a + 2 · c)). Note that actually
                                                                    maximum capacity, that factor has to be divided by the ratio
needed nodes are retrieved and then uploaded as part of the
                                                                    of used space u to calculate the overall storage overhead.
whole cache, inducing a total overhead of n · (1 + a) per
                                                                       In contrast to PATCONFDB, the Shuffle Index does
B-tree layer.
                                                                    not need to be initialized to its maximum database size.
   The measured network overheads for the scenarios
                                                                    The storage overhead of the Shuffle Index is induced by the
that we introduced in Section 5.1 are shown in Table 2
                                                                    storage needed for navigation nodes and by the ratio of the
for PATCONFDB and the Shuffle Index. The table shows
                                                                    average number to the maximum number of records stored
the ratio of the induced network traffic of each approach to
                                                                    in data nodes. This used space of data nodes u influences
the size of the queried records. For instance, if the queried
                                                                    the storage overhead similarly as seen with PATCONFDB.
record has a size of 100B and a network traffic of 5,62KB is
                                                                    Since the database records are stored, so that they are sorted
measured for the evaluation of the query, the relative over-
                                                                    alphabetically in the leaf nodes of the Shuffle Tree, an index
head amounts to 562. The scenarios that only differ in net-
                                                                    layer that contains every attribute value as seen with the
work bandwidth are paired in this table, since the network
                                                                    PATCONFDB approach is not necessary. This combined
bandwidth does not affect the amount of network traffic.
                                                                    with the dynamic fan out of navigation nodes significantly
   For equality selections, the measurements indicate that
                                                                    reduces the storage overhead induced by navigation nodes.
the total number of records stored in the database has a large
                                                                    In our evaluations the storage overhead for a Shuffle Index
impact on the relative network overhead for both schemes
                                                                    instance that only contains data nodes which are filled to
(scen. 1,2,5,6 vs. 3,4,7,8). Since the total network traffic for
                                                                    their maximum capacity, never exceeded a factor of 1,02 in
equality selections in both schemes remains low (<1,5MB),
                                                                    comparison to the plain text database size. That factor has
the large network overhead does not have a similarly large
                                                                    to be divided by the ratio u of used space of data nodes to
impact on the query latency measured in Section 5.1. In
                                                                    calculate the overall storage overhead.
the case of prefix selections, the performance optimization
                                                                       It can be seen, that the storage overhead induced by in-
proposed in Section 4.3 for the Shuffle Index reduces the rel-
                                                                    dexing techniques is a low and constant factor for both ap-
ative network overhead significantly. Since prefix selections
                                                                    proaches and therefore does not restrict the feasibility of ac-
in the PATCONFDB are executed as a series of sequential
                                                                    cess pattern preserving relational databases. However, the
equality selections, no significant reduction of the relative
                                                                    constraint of PATCONFDB that the database has to be ini-
network overhead is measured. The relative overhead is only
                                                                    tialized to its maximum size, could be problematic in sce-
reduced, if by chance multiple records that match the query
                                                                    narios where the size of the database fluctuates frequently.
are retrieved within the same DC.
   Since the relative network overhead of PATCONFDB re-
mains at a high level, it is not feasible to use PATCONFDB          6.   FUNCTIONALITY EXTENSIONS
in scenarios, in which a large number of records is queried at         We address the challenges in complying to access pattern
the same time or in a short time frame. The Shuffle Index,          confidentiality in DaaS scenarios with multiple types of
however, can also be used in scenarios which include large          indexes, i.e. strings, number, and dates, for the use of the
range or prefix selections.                                         PATCONFDB approach. We discuss two situations in which
                                                                    PATCONFDB would leak information about the pattern of
5.3     Storage Overhead                                            queries, if used in the same way as in a single index type
   In the following we investigate the storage overhead on the      scenario. A basic equality selection results in one query to
external SP. This storage overhead consists of the storage          each index layer and one query to the record store.
that is needed for the index as well as of the additionally            Insert and delete operations have to access I (number
required storage to store record, i.e., the padding of nodes        of index types) DCs from the index layer to insert or delete
or the initialization of a database to its maximum size.            every attribute value of each record to the corresponding
   The storage overhead of the PATCONFDB approach                   data record. Therefore an attacker can differentiate between
is predominantly influenced by the ratio u of the actual            an equality selection and an insert or a delete operation.
To prevent this information leak, I DCs from the index layer     8.   REFERENCES
have to be retrieved, before the record store is queried.         [1] M. Bellare, A. Boldyreva, and A. O’Neill. Deterministic
                                                                      and efficiently searchable encryption. In Proc. of the
   Update operations have to access a DC of every index               Annual Intl. Cryptology Conf. on Advances in Cryptology
type, which is involved in the SQL operation, from the index          (CRYPTO), pages 535–552, 2007.
layer and then one DC of the record store. Since it is impos-     [2] P. Brody and V. Pureswaran. Device democracy: Saving
sible to know the current value before it is update, another          the future of the internet of things. IBM Global Business
DC has to be queried from the index layer, to delete the ref-         Services Executive Report, 2014.
erence that is no longer needed. Therefore an attacker can        [3] A. Ceselli, E. Damiani, S. D. C. D. Vimercati, S. Jajodia,
differentiate between an equality selection and an update             S. Paraboschi, and P. Samarati. Modeling and assessing
                                                                      inference exposure in encrypted databases. ACM
operation. To prevent this information leak, a randomly
                                                                      Transactions on Information and System Security,
chosen set of DCs from the index structure would have to              8(1):119–152, 2005.
be retrieved after the actual query is executed.                  [4] I. Damgård, S. Meldgaard, and J. B. Nielsen. Perfectly
   Further query types. In this paper, we investigated                secure oblivious ram without random oracles. In Proc. of
how equality, range, and prefix selections can be evaluated           the Conf. on Theory of Cryptography (TCC), pages
on access pattern confidentiality-preserving databases. We            144–163, 2011.
introduced equality, range and prefix selections based on         [5] J. Dautrich, E. Stefanov, and E. Shi. Burst ORAM:
strings. However, our concepts can be seamlessly applied              Minimizing ORAM response times for bursty access
                                                                      patterns. In Proc. of the USENIX Security Symposium,
for range and prefix selections on other data types like inte-        pages 749–764, 2014.
gers. We argue that our investigated query types are already      [6] S. De Capitani di Vimercati, S. Foresti, S. Paraboschi,
sufficient for many scenarios in which relational databases           G. Pelosi, and P. Samarati. Efficient and private access to
are used. PATCONFDB and modified shuffled B-trees, as                 outsourced data. In Proc. of the IEEE Intl. Conf. on
introduced in Section 4, can be extended to support table             Distributed Computing Systems (ICDCS), pages 710–719,
joins and nested queries. Again the queries are divided into          2011.
sequential equality selections so that they are indistinguish-    [7] S. De Capitani di Vimercati, S. Foresti, S. Paraboschi,
                                                                      G. Pelosi, and P. Samarati. Distributed shuffling for
able from any other query. For this to work, all indexes              preserving access confidentiality. In Proc. of the European
of all tables have to be stored in the same B-tree. If more           Symp. on Research in Computer Security (ESORICS),
than one B-tree is used for indexing, every B-tree has to be          pages 628–645, 2013.
accessed every time a query is executed for the queries to        [8] O. Goldreich and R. Ostrovsky. Software protection and
be indistinguishable. In future work we plan to implement             simulation on oblivious rams. Journal of the ACM,
and evaluate these query types to give recommendations on             43(3):431–473, 1996.
which setup performs best for which database scenario.            [9] M. T. Goodrich and M. Mitzenmacher. Privacy-preserving
                                                                      access of outsourced data via oblivious RAM simulation. In
                                                                      Proc. of the Intl. Conf. on Automata, Languages and
7.   CONCLUSIONS & FUTURE WORK                                        Programming (ICALP), pages 576–587, 2011.
   To investigate the feasibility of access and pattern confi-   [10] M. S. Islam, M. Kuzu, and M. Kantarcioglu. Access pattern
dentiality-preserving relational databases with a B-tree in-          disclosure on searchable encryption: Ramification, attack
                                                                      and mitigation. In Proc. of the Network and Distributed
dex, we proposed PATCONFDB, an ORAM-based concept                     Systems Security (NDSS) Symposium, 2012.
to achieve access pattern confidentiality, and extended exist-   [11] S. Kamara, C. Papamanthou, and T. Roeder. Dynamic
ing shuffled B-tree approaches to support essential database          searchable symmetric encryption. In Proc. of the ACM
operations. In particular, empirical measurements of these            Conf. on Computer and Communications Security (CCS),
concepts showed that enforcing access pattern confidential-           pages 965–976, 2012.
ity in relational databases only induces an overhead of factor   [12] J. Köhler, K. Jünemann, and H. Hartenstein. Confidential
5.9 for evaluating equality conditions on a database with up          database-as-a-service approaches: taxonomy and survey.
                                                                      Journal of Cloud Computing, 4(1), 2015.
to 10 million records. The extended Shuffle Index induces an
                                                                 [13] R. Ostrovsky. Efficient computation on oblivious RAMs. In
even smaller overhead of factor 2.3 for the same setup, but           Proc. of the ACM Symposium on Theory of Computing
provides no strict and well-defined access pattern confiden-          (STOC), pages 514–523, 1990.
tiality guarantees. Our performance evaluation showed to         [14] L. Ren, X. Yu, C. W. Fletcher, M. van Dijk, and
which extend the induced overhead depends on the network              S. Devadas. Design space exploration and optimization of
link to the SP, the structure of the data and the query work-         path oblivious RAM in secure processors. In Proc. of the
load of the outsourcing scenario. In particular, our results          Intl. Symp. on Computer Architecture (ISCA), pages
                                                                      571–582, 2013.
showed that shuffled B-trees in general outperform ORAM-
                                                                 [15] E. Stefanov and E. Shi. ObliviStore: High performance
based B-trees and, thus, can be used in a wider variety of            oblivious distributed cloud data store. In Proc. of the
DaaS scenarios, if no strict security guarantees are required.        Network and Distributed Systems Security (NDSS)
   The findings of this paper highlight multiple future re-           Symposium, 2013.
search directions. It is worthwhile to aim for further effi-     [16] E. Stefanov, M. van Dijk, E. Shi, C. Fletcher, L. Ren,
ciency improvements of ORAM-based database indexes, as                X. Yu, and S. Devadas. Path ORAM: An extremely simple
well as to evaluate the performance of other index struc-             oblivious RAM protocol. In Proc. of the ACM Conf. on
tures like bitmaps. The superior efficiency of shuffled B-tree        Computer and Communications Security (CCS), pages
                                                                      299–310, 2013.
approaches makes it also worthwhile to aim for a better un-      [17] K. Yang, J. Zhang, W. Zhang, and D. Qiao. A light-weight
derstanding of their security guarantees. Furthermore, we             solution to preservation of access pattern privacy in
plan to investigate how both ORAM-based and shuffle-based             un-trusted clouds. In Proce. of the European Conf. on
schemes can be extended to support relational databases               Research in Computer Security (ESORICS), pages
with multiple index types and queries that include more               528–547, 2011.
complex operations.