Merkle Hash Tree based Techniques for Data Integrity of
                       Outsourced Data

                     Muhammad Saqib Niaz                                                    Gunter Saake
                     Dept. of Computer Science                                        Dept. of Computer Science
                     Otto von Guericke University                                     Otto von Guericke University
                        Magdeburg, Germany                                               Magdeburg, Germany
               saqib@iti.cs.uni-magdeburg.de                                          gunter.saake@ovgu.de

ABSTRACT                                                                   and DSP and the links between clients and DSP are secure
One of the problems associated with outsourcing data to                    using some technique like SSL and the forgery of the data
cloud service providers is the data integrity of outsourced                over these links can be easily detectable.
data. In this paper we present data integrity techniques                     Following are some the features that need to be considered
for the outsourced data. Data integrity encompasses the                    in designing data integrity techniques for outsourced data:
completeness, correctness and freshness of the data. This
                                                                                • Computation overhead for the DO
paper focuses on the Merkle Hash Tree based data integrity
techniques. It also presents the techniques for storage and                     • Computation overhead of the DSP
retrieval of Merkle Hash Tree based authentication data to
and from cloud data service provider. Critical analysis of                      • Storage overhead of DSP
the Radix Path Identifiers, a technique for storage of Merkle
                                                                                • Computation overhead of the client
Hash Trees in the databases, is presented in this paper.
                                                                                • Storage overhead of the client
General Terms
                                                                              The rest of the paper is organized as follows. A basic data
Cloud Databases, Security                                                  integrity technique is presented in Section 2. The Merkle
                                                                           Hash Tree based data integrity scheme is presented in Sec-
Keywords                                                                   tion 3. Section 4 explains the storage and retrieval technique
Database Security, Data Integrity, Outsourced Data                         for Merkle Hash Tree based authentication data. Section 5
                                                                           presents our analysis of the Radix Path Identifiers technique
                                                                           and the ongoing work on a new technique. Finally, the con-
1.   INTRODUCTION                                                          clusions follow in Section 6.
   Data outsourcing means to store your data on third party
cloud data service providers. It is cheaper and easier to                  2.    BASIC TECHNIQUE
maintain the data on a cloud data service instead of main-
taining it in data owner’s own premises. Besides all the                      For the rest of the paper, we assume that there is a mech-
benefits, data outsourcing poses numerous security threats                 anism in place to securely share some data between DO and
to the outsourced data. The list includes but is not limited               clients. This data could be the public key of the DO or some
to data integrity, access privacy and unauthorized access of               hash data. Only the DO can modify the data and the clients
data. The focus of this paper is data integrity that encom-                have read-only access of the data at the DSP.
passes completeness, correctness and freshness.                               The simplest data integrity technique could be to indi-
   There are three parties involved in these schemes. Data                 vidually sign all the tuples in a data table and storing the
owner (DO), data clients and data service provider (DSP).                  signatures in a separate column in the same data table. Af-
A DSP provides all the data services and can be trusted                    terwards, on query of client, this signature can be sent to a
with the server availability, timely backups, replications and             client along with the tuple data. Clients can then check the
disaster recovery. But the DSP cannot be trusted with the                  integrity of the data by verifying the signature of the DO for
integrity of outsourced data. A DSP has unlimited access                   the associated tuple. This scheme poses a huge computation
to the data to make it possible for the DSP to forge the                   overhead for the DO and the clients. Despite the computa-
data in anyway. It is assumed that the link between the DO                 tional overhead, it has a linear storage overhead as a distinct
                                                                           signature needs to be stored with each tuple. Still, attacks
                                                                           are possible on this scheme. DSP can delete some valid tu-
                                                                           ples from the data and the client would never be able to
                                                                           establish this fact. DSP can send an incomplete data set to
                                                                           the client and this forgery will also go undetected at client’s
                                                                           end.
                                                                              Data integrity schemes can be divided into two main cat-
                                                                           egories, i.e., probabilistic and deterministic. Probabilistic
27th GI-Workshop on Foundations of Databases (Grundlagen von Daten-
banken), 26.05.2015 - 29.05.2015, Magdeburg, Germany.                      approaches for data integrity have been suggested in [7, 11,
Copyright is held by the author/owner(s).                                  12]. The proposed techniques do not require any changes

                                                                      66
                                                                                  Figure 2: B+ Tree of order 3
             Figure 1: Merkle Hash Tree
                                                                     the receiver. The receiver can calculate the H(2) from data
                                                                     block 2. H(1,2) can then be calculated by using the received
at the DSP end, but sometimes the integrity results can be           H(1) and calculated H(2). In the same way, H(1,4) can be
doubtful, as evident from the name.                                  calculated and then H(1,8). The receiver then can compare
   The second category consists of deterministic approaches          the calculated H(1,8) with the already shared H ´(1,8) and
that generally base on Authenticated Data Structures (ADS)           if both the hashes match then the integrity of data block 2
[6, 5, 10, 1, 2]. ADS based schemes will be the focus of the         is confirmed.
rest of the paper.                                                      Some important facts regarding Merkle’s Signature Scheme
                                                                     are as follows:
3.    MERKLE HASH TREES
                                                                        • Security of this signature scheme depends on the secu-
   Authenticated Data Structures is a technique in which
                                                                          rity of the hash function.
some kind of authentication data is stored on the DSP. On
the client’s query, a DSP returns the queried data along                • Only one hash needs to be maintained/shared securely.
with some extra authentication data that is then used by
the client to verify the authenticity of returned data.                 • To authenticate any data block only log2 n hashes need
   Numerous techniques have been proposed that utilizes                   to be transfered, where n denotes total number of data
ADS for checking the data integrity. Signature aggrega-                   blocks.
tion based schemes have been proposed in [6, 5]. These
approaches require to modify signatures of all the records,             • In case of integrity checking of a continuous range of
which renders it impractical considering the number of sig-               blocks, even less than log2 n hashes need to be trans-
natures [10]. The authenticated skip lists based approach                 fered.
has been proposed in [1]. A modified Merkle Hash Tree
(MHT) based scheme has been proposed in [2] named super-             3.2   B+ Trees
efficient data integrity scheme. In this scheme the main                B+ trees are a special case of B trees as shown in Figure
MHT is divided into smaller MHTs and the root hashes of              2. They are n-ary trees. The root node can be a leaf node
these sub-trees are signed. The purpose of the division in           or it can be an internal node. Internal nodes only hold keys,
smaller MHTs is to avoid the unnecessary calculation up to           they do not hold data. Data always stays in the leaf nodes.
the root of the main hash tree.                                      Leaf nodes are connected through pointers to form a kind of
   The Merkle Hash Tree based data integrity techniques for          linked list. This linkage helps in sequential traversal of the
outsourced data are based on a signature scheme proposed             data.
by Merkle in [4]. This scheme eliminated the need of digital            Let n be the order of a B+ tree. The root node can hold
signatures for the data integrity purposes. MHT based data           1 to n-1 keys when root node is the only node in the tree.
integrity techniques are based on two sub-components, i.e.,          If root node is an internal node then it can have 2 to n child
Merkle’s Signature Scheme and B+ Trees.                              nodes. Internal nodes can have dn/2e to n child nodes.
                                                                     Leaf nodes can hold dn/2e to n-1 keys [8].
3.1   Merkle’s Signature Scheme
   Merkle proposed a Signature Scheme based on a binary              3.3   Data Integrity based on Merkle Hash Tree
tree of hashes in [4]. Figure 1 shows a typical example of             Data integrity schemes based on MHT have been designed
an MHT. Each leaf node holds the hash of a data block,               by replacing binary trees with B+ trees in original Merkle’s
e.g.,H(1) holds the hash of the data block 1. Internal nodes         Signature Scheme. The B+ tree presented in Figure 2 is
hold the hash of the concatenated hashes of their children           used with some modifications. Leaf nodes are linked with
e.g. H(1,2) = H(H(1) | H(2)) where ’|’ indicates concate-            direct pointers. Besides keys, leaf nodes also hold the hashes
nation. This scheme is based on the assumption that a                of the data records pointed by corresponding keys. As an
safe/trusted way exists to share the root of the tree between        example, the leaf node 20, 30 also holds the hashes of the
the signer and the verifier. To verify the integrity of any          data records of the keys 20 and 30. Internal nodes’ pointers
data block, the whole tree of hashes does not need to be             also hold the hashes of the concatenated hashes of its child
transmitted to the verifier. A signer transmits the hashes of        nodes. Like right pointer of the internal node 20 holds the
only those nodes which are involved in the authentication            hash of the concatenated hashes of the data records pointed
path of the data block under consideration. For example,             by keys 20 and 30 i.e. H(20,30) = H(H(20) | H(30)).
if the receiver needs to verify the integrity of data block 2          Security of Merkle Hash Tree based data integrity schemes
then only H(1), H(3,4) and H(5,8) need to be transfered to           depend on the security of the hash function as in original

                                                                67
            Table 1: Employee Data Table
                 ID Name Salary
                 10   Alice  1000
                 20   Bob    2000
                 30   Cindy  3000
                 40   Dan    3000
                 50   Eva    2000
                 60   Felix  1000


                                                                             Figure 4: MHT with Radix Path Identifiers


                                                                                         
                                                                                             l                    if l == 0
                                                                                 rpi =                                            (1)
                                                                                             rpiparent ∗ rb + i   if l > 0
                                                                       For example, to calculate the RPI of the key 60 in the leaf
                                                                       node, the level of the key is determined. The level is not zero
                                                                       so the lower part of the equation is applicable and also note
     Figure 3: Merkle Hash Tree based on Table 1                       that all the calculations done are based on ternary number
                                                                       system. i in this case is 1 as 60 is the second key in the leaf
                                                                       node. RPI of the parent is 12 and the rb is 3. Multiplying
Merkle’s signature scheme. This scheme resolves the fresh-             rpiparent with rb gives 120 and adding i into it gives 121, so
ness issue of the query results, too. Each time a DO updates           the RPI of the key 60 in leaf node is 121.
the data in the DSP, a new root hash is calculated based on              The proposed Radix Path Identifier scheme has several
the newly updated state of the data. By sharing the new                important properties:
root hash with the clients, freshness can be ensured.                    1. RPIs are continuous in nodes, but not continuous among
                                                                            two consecutive nodes. For example, the base-3 num-
3.4    Implementation Issues                                                bers 10, 11, 12 are continuous but 110 and 120 are
   A problem associated with the MHTs based data integrity                  not continuous as shown in Figure 4.
schemes is the efficient storage and retrieval of MHTs in the
DSP’s database. Numerous approaches exist to store hier-                 2. From an RPI, we can easily find the RPI of its par-
archical or tree like data in a database, e.g., adjacency list,             ent pointer based on the fact that rpiparent equals to
nested set, nested interval and closure table etc. Each ap-                 brpi/rb c .
proach has its pros and cons. For this specific problem of               3. From the RPI in a node, we can easily calculate the
storing MHTs, a new technique named Radix Path Identi-                      min and max RPIs in the node, which are (brpi/rb c) ∗
fiers has been proposed in [10].                                            rb and (brpi/rb c) ∗ rb + (rb − 1).
                                                                         4. From an RPI in a node, we can easily compute the
4.    RADIX PATH IDENTIFIER                                                 index i of the pointer or key in the node, which is rpi
   Consider a data table named Employee as shown in Table                   mod rb .
1. A Merkle Hash Tree is created based on the data in
Employee table as shown in Figure 3. Data is inserted in               4.1     MHT Storage in the Database
the MHT in an ascending order. Fanout of this B+ tree is                  Two models have been suggested in [10] for storage of
three that’s why every node holds either one or two keys.              Radix Path Identifiers based MHTs in the database. The
   The basic idea is to assign numbers based on a radix to             first method is to store the whole data in one authentication
each pointer of the internal node and each key of the leaf             table called Single Authentication Table (SAT). The second
node in order to uniquely identify them in a MHT. Radix                method is to store each level of MHT in an individual table
could be any number equal to or greater than fanout of                 called Level Based Authentication Table (LBAT).
the MHT. We take 3 as the radix for the MHT created for
Employee table.                                                        4.1.1    Single Authentication Table
   Radix path identifiers have been added to the MHT shown                In this technique, one table holds the entire authentica-
in Figure 3 and the modified MHT is shown in 4. Radix Path             tion data as shown in Table 2. A tuple in this table rep-
Identifier of a pointer or key (in leaf node) depends upon its         resents either a pointer in an internal node or a key in a
level in MHT and position in a node. Let l be the level of             leaf node of the MHT. The authentication table has four
the MHT. The level of root node is 0 and the level of leaf             columns named as ID, RPI, Hash and Level. The ID col-
nodes is the maximum. rb is the radix base. f denotes the              umn in authentication table corresponds to the values in ID
fanout of the MHT. i denotes the index of a pointer or a               column in Employee table. In case of leaf nodes, each key
key in a node, ranging from 0 to f. Radix Path Identifier              corresponds to a tuple in the Employee table so the map-
rpi can be computed using the following equation:                      ping is straight forward. However in case of internal nodes,

                                                                  68
      Table 2: Single Authentication Table (SAT)                        Table 3: Level Based Authentication Tables (LBAT)
                ID RPI Hash Level                                                                      Emp 1
                -1   0    hash    2                                                               ID RPI Hash
                30   1    hash    2                                                Emp 2 (Root)   -1   0     hash
                -1   0    hash    1                                             ID RPI Hash       20   1     hash
                20   1    hash    1                                             -1   0      hash  -1   3     hash
                -1   3    hash    1                                             30   1      hash  40   4     hash
                40   4    hash    1                                                               50   5     hash
                50   5    hash    1
                10   0    hash    0                                                      Employee (Leaf Nodes)
                20   3    hash    0                                                ID   Name Salary RPI             Hash
                30   9    hash    0                                                10   Alice  1000      0          hash
                40   12   hash    0                                                20   Bob    2000      3          hash
                50   15   hash    0                                                30   Cindy  3000      9          hash
                60   16   hash    0                                                40   Dan    3000      12         hash
                                                                                   50   Eva    2000      15         hash
                                                                                   60   Felix  1000      16         hash
the number of pointers is always 1 more than the number
of keys. Because of that one pointer is stored in the table
with -1 as the ID. Rest of the pointers are saved with the                 Authentication data extracted from LBAT is used to com-
IDs of the corresponding keys in the node. Considering the              pute the root hash of the MHT. For data extraction from
left most pointer in each internal node as an extra pointer,            LBAT table, four different ways have been presented i.e.
-1 is assigned in the ID column of these pointers. The Hash             Multi-Join, Single-Join, Zero-Join and Range-Condition in
column holds the hashes associated with the pointers in in-             [9]. In all the following methods, data will be extracted from
ternal nodes and keys in leaf nodes.                                    LBATs to verify the authenticity of data record with ID 40.
   The RPI holds the Radix Path Identifier of the pointer               All the required data involved in the authentication path of
in internal node or key in leaf node. RPIs shown in Figure              ID 40 also needs to be extracted from the LBAT and the
4 are unique because these numbers are written in base 3                pointers involved in authentication path are marked with
with their preceding zeros. However, for storing RPIs in                black color in Figure 4.
authentication table, preceding zeros are ignored and the
RPIs are converted into base 10 numbers. This results in                4.2.1    Multi-Join
mapping of different base 3 numbers to the same base 10                    In the Multi-Join approach, all the authentication data
numbers. For example, 011 in third level and 11 in second               from respective LBATs are retrieved in a single query. Fol-
level transform to the same base 10 number i.e. 4. Con-                 lowing SQL statement retrieves the authentication data of
sequently, the transformed RPIs are unique in a level but               record with ID 40. In order to fetch all the authentication
they can be repeated among different levels of the tree. In             data in one query, multiple left outer joins have been used
order to distinguish between the same RPIs, a Level column              which introduces redundancy in the result.
is added to the authentication table.
                                                                        select a0.RPI as RPI0, a0.hash as hash0,
4.1.2    Level Based Authentication Table                               a1.RPI as RPI1, a1.hash as hash1,
   In the LBAT technique, authentication data for each level            a2.RPI as RPI2, a2.hash as hash2
of an MHT is stored in an individual table. Besides this, an            from Employee emp
extra table is created that holds the data about the authen-            left join Employee a0 on a0.RPI/3 = emp.RPI/3
tication tables i.e. name of the table and its associated level.        left join Emp_1 a1 on a1.RPI/3 = emp.RPI/(3*3)
LBATs for the MHT shown in Figure 4 are shown in Table 3.               left join Emp_2 a2 on a2.RPI/3 = emp.RPI/(3*3*3)
As every LBAT table represents one level in the tree, there is          where emp.ID = 40;
no need to have a column Level in the authentication tables.
Level 0 authentication table has exactly the same number of             4.2.2    Single-Join
records as the Employee table so both tables can be merged                 In the Single-Join approach, data from each authentica-
to form one. RPI and Hash columns have been added at                    tion table is retrieved separately. As ID column does not
the end of Employee table to hold the authentication data               exist in authentication tables that’s why in each query, the
for Level 0.                                                            authentication table has been joined with the Employee ta-
                                                                        ble on column RPI.
4.1.3    Performance comparison between both schemes
  Considering the table level locks during updates and in-              select e0.RPI, e0.hash
serts, it is easier/faster to update authentication data in             from Employee emp
LBAT than SAT. In the LBAT, as authentication data is                   left outer join Employee e0 on e0.RPI/3 = emp.RPI/3
stored along with the data record, it makes it straight for-            where ID = 40;
ward to retrieve authentication data for the leaf level along
with the required table data.                                           select e1.RPI, e1.hash
                                                                        from Employee emp
4.2     Authentication Data Extraction                                  left outer join Emp_1 e1 on e1.RPI/3 = emp.RPI/(3*3)

                                                                   69
where emp.ID = 40;

select e2.RPI, e2.hash
from Employee emp
left outer join Emp_2 e2 on e2.RPI/3 = emp.RPI/(3*3*3)
where emp.ID = 40;

4.2.3    Zero-Join
  As evident from the name, no tables are joined for query-
ing authentication data. Each table is queried individually.
To query each table without any joins, the RPI of the record
under consideration have to be extracted first and stored in
some variable. Afterwards this stored RPI is used to extract
authentication data from the LBATs.

declare @rpid as int;                                                                Figure 5: Updating a Record
select @rpid = RPI from Employee where ID = 40;

select RPI, hash from Employee where RPI/3=@rpid/3;                     than the authentication of a single record. Let suppose, our
                                                                        range query returns a set of records from ID 20 to 40. User
select RPI, hash from Emp_1 where RPI/3=@rpid/(3*3);                    needs to find the two bounding values of the range under
                                                                        consideration. In this case, the two bounding values would
select RPI, hash from Emp_2 where RPI/3=@rpid/(3*3*3);                  be 10 and 50. The user just needs to fetch the range of
                                                                        records from ID 20 to 40 without their authentication data.
4.2.4    Range-Condition                                                In order to authenticate this range, only the authentication
  Execution of queries presented in Zero-Join section scans             data of the two bounding values is required, i.e., 10 and 50.
RPI for each query. This scan can be replaced with index                By verifying the data integrity of the bounding values, user
seek by creating an index on RPI and replacing the Zero-                can be assured of the data integrity of the whole range.
Join queries with Range-Conditions. Following queries show
how to utilize an index created on RPI and efficiently query
                                                                        4.3.2    Update
data from LBAT for authentication of data record ID 40.                   Update is a more complicated operation than Select. In
                                                                        updating a record, along with updating the data table, the
declare @rpid as int;                                                   user also needs to update the hashes in all the records in-
select @rpid = RPI from Employee where ID = 40;                         volved in the authentication path of the updated record. For
                                                                        example, if user updates the data record with ID 60, hash
select RPI, hash from Employee                                          of this record will be updated in the Employee table along
where RPI >= (@rpid/3)*3                                                with the data update. In addition user needs to update the
and RPI < (@rpid/3)*3+3;                                                hash data in the pointers marked as 12 and 1 as shown in
                                                                        Figure 5.
select RPI, hash from Emp_1
where RPI >= (@rpid/(3*3))*3                                            4.3.3    Insert & Delete
and RPI < (@rpid/(3*3))*3+3;                                               Insert & Delete are more complicated operations than up-
                                                                        dating a record. Insert & Delete could affect the associated
select RPI, hash from Emp_2                                             MHT in three different ways. Simplest case could be that
where RPI >= (@rpid/(3*3*3))*3                                          the insertion or deletion of a record effects only a single leaf
and RPI < (@rpid/(3*3*3))*3+3;                                          i.e. a key can be added or deleted from a leaf node, in this
                                                                        case only the data involved in the authentication path of the
Each of the above given queries retrieve authentication data            affected leaf node needs to be updated. A little more com-
from a specific level of the MHT. The range condition spec-             plicated case could be the change in a subtree of the MHT,
ified in the above queries encompasses all the RPIs of the              in this case all the authentication records of that subtree
elements present in a node. For example, at level three, a              needs to be updated. In addition the authentication path of
node can have following RPIs i.e. 120, 121 and 122. In the              the root of the updated subtree also needs to be updated.
RPIs, the last digit always stays less than the fanout of the           The most complex case could be the addition or deletion of
tree that is why 3 is mentioned as the upper bound in the               a new level in the MHT. In case of addition of a new level
query.                                                                  following needs to be done:
4.3     Data Operations                                                   1. Addition of a new LBAT table in the database for the
  Four data operations, i.e. select, insert, update and delete,              newly inserted level in the MHT.
will be discussed.
                                                                          2. Information regarding the new LBAT table needs to
4.3.1    Select                                                              be inserted in the table that holds the data about the
  Selection of a single record along with its associated au-                 LBATs.
thentication data has already been discussed in detail. Au-
thentication of a continuous range of records is a bit different          3. Update of data in all the LBATs.

                                                                   70
5.    ANALYSIS & ONGOING WORK                                         7.   REFERENCES
                                                                       [1] G. Di Battista and B. Palazzi. Authenticated
5.1     Analysis                                                           relational tables and authenticated skip lists. In
   In this section, we analyze the properties of Radix Path                Proceedings of the 21st Annual IFIP WG 11.3
Identifiers and identify our next steps based on it. Merkle                Working Conference on Data and Applications
Hash Tree based data integrity technique guarantees all three              Security, pages 31–46, Berlin, Heidelberg, 2007.
aspects of data integrity, i.e., completeness, correctness and             Springer-Verlag.
freshness [3]. And in doing so, it completely avoids digital           [2] M. T. Goodrich, R. Tamassia, and N. Triandopoulos.
signatures which pose a lot of computation overhead. Anal-                 Super-efficient verification of dynamic outsourced
ysis of the Radix Path Identifier technique is as follows:                 databases. In Proceedings of the 2008 The
     • It is assumed that only the DO can change the data.                 Cryptopgraphers’ Track at the RSA Conference on
                                                                           Topics in Cryptology, CT-RSA’08, pages 407–424,
     • All the tests are performed on traditional DBMS i.e.                Berlin, Heidelberg, 2008. Springer-Verlag.
       SQL Server [10]. NoSQL databases may perform dif-
                                                                       [3] F. Li, M. Hadjieleftheriou, G. Kollios, and L. Reyzin.
       ferently.
                                                                           Dynamic authenticated index structures for
     • Cached technique for Update results in the lowest over-             outsourced databases. In Proceedings of the 2006 ACM
       head. Without caching of data at DO’s end, the over-                SIGMOD International Conference on Management of
       head can go up to 100%.                                             Data, SIGMOD ’06, pages 121–132, New York, NY,
                                                                           USA, 2006. ACM.
     • Insert at the end of the table gives better result than
       the Insert at the beginning of the table. Both cases            [4] R. C. Merkle. A certified digital signature. In
       poses significant overhead. No results have been pub-               Advances in Cryptology - CRYPTO ’89, 9th Annual
       lished for Delete.                                                  International Cryptology Conference, Santa Barbara,
                                                                           California, USA, August 20-24, 1989, Proceedings,
     • In order to modify the data, DO either has to download              pages 218–238, 1989.
       the whole copy of the table along with authentication           [5] M. Narasimha and G. Tsudik. Authentication of
       data or has to keep the whole data cached at DO’s                   outsourced databases using signature aggregation and
       premises.                                                           chaining. In In International Conference on Database
                                                                           Systems for Advanced Applications (DASFAA, pages
5.2     Ongoing Work                                                       420–436. DASFAA, 2006.
  We are currently starting to work on a Data Integrity                [6] H. Pang, A. Jain, K. Ramamritham, and K.-L. Tan.
Technique that is based on Merkle Hash Trees and Radix                     Verifying completeness of relational query results in
Path Identifiers. We want to achieve following goals in this               data publishing. In Proceedings of the 2005 ACM
new technique:                                                             SIGMOD International Conference on Management of
     • Multiple users should be able to manipulate data.                   Data, SIGMOD ’05, pages 407–418, New York, NY,
     • A user should not be required to keep a copy of the                 USA, 2005. ACM.
       data in order to modify the data because keeping a              [7] R. Sion. Query execution assurance for outsourced
       copy of data eliminates the purpose of data outsourc-               databases. In Proceedings of the 31st International
       ing.                                                                Conference on Very Large Data Bases, VLDB ’05,
                                                                           pages 601–612. VLDB Endowment, 2005.
     • Communication overhead should be minimized to a                 [8] A. L. Tharp. File Organization and Processing. John
       level near to performing operations in a normal database.           Wiley & Sons, Inc., New York, NY, USA, 1988.
       For instance, to insert a row of data, a single insert          [9] W. Wei and T. Yu. Practical integrity assurance for
       statement should be sent to the database.                           big data processing deployed over open cloud, 2013.
     • The technique is being designed keeping in view the            [10] W. Wei and T. Yu. Integrity assurance for outsourced
       NoSQL database concepts, too.                                       databases without DBMS modification. In Data and
                                                                           Applications Security and Privacy XXVIII - 28th
6.    CONCLUSIONS                                                          Annual IFIP WG 11.3 Working Conference, DBSec
  Numerous techniques have been proposed in the litera-                    2014, Vienna, Austria, July 14-16, 2014. Proceedings,
ture to check the data integrity of the outsourced data to                 pages 1–16, 2014.
untrusted clouds. Focus of our paper was on Merkle Hash               [11] M. Xie, H. Wang, J. Yin, and X. Meng. Integrity
Tree based techniques. Despite a lot of research on MHT                    auditing of outsourced data. In Proceedings of the 33rd
based techniques, still insertion and deletion pose a lot of               International Conference on Very Large Data Bases,
communication and computation overhead. We have also                       VLDB ’07, pages 782–793. VLDB Endowment, 2007.
discussed a technique named Radix Path Identifiers to store           [12] M. Xie, H. Wang, J. Yin, and X. Meng. Providing
and retrieve authentication data in the DSP’s database. We                 freshness guarantees for outsourced databases. In
plan to design a new technique to eliminate the shortcomings               Proceedings of the 11th International Conference on
of the current data integrity techniques. The main purpose                 Extending Database Technology: Advances in Database
of our technique is to avoid keeping or fetching the copy of               Technology, EDBT ’08, pages 323–332, New York, NY,
whole data in order to run an insert or update statement.                  USA, 2008. ACM.
We are also going to simplify the process of inserting and
deleting the data the way we are used to do in traditional
DBMSs.

                                                                 71