=Paper=
{{Paper
|id=Vol-2343/paper1
|storemode=property
|title=Efficient, Consistent and Secure Global-Scale Data Management
|pdfUrl=https://ceur-ws.org/Vol-2343/paper1.pdf
|volume=Vol-2343
|authors=Amr El Abbadi
|dblpUrl=https://dblp.org/rec/conf/bdcsintell/Abbadi18
}}
==Efficient, Consistent and Secure Global-Scale Data Management==
Efficient, Consistent and Secure Global-Scale Data
Management
Sujaya Maiyya* Faisal Nawab† Cetin Sahin** Victor Zakhary*
Divyakant Agrawal* Amr El Abbadi*
*UC Santa Barbara, **SAP, † UC Santa Cruz
*{sujaya maiyya, victorzakhary, agrawal, elabbadi}@ucsb.edu, **{cetin08@gmail.com} † {fnawab@ucsc.edu}
Abstract—Processing and analyzing data is becoming increas- tions, such as online shops, social networks, and collaborative
ingly ubiquitous and is the driving force behind the sustained applications.
growth of Internet applications and the emergence of Big Data We present Dynamic Paxos (DPaxos), a Paxos-based con-
Analytics. These applications typically adopt the cloud model
where they are hosted in a single datacenter. This introduces a sensus protocol [1] to manage access to partitioned data across
fundamental limitation: communication to a centralized datacen- globally-distributed datacenters and edge nodes. DPaxos is
ter incurs significant latencies. The utilization of edge nodes is intended to implement a State Machine Replication component
inevitable for the future success and growth of many emerging in data management systems for the edge. DPaxos targets the
low latency and mobile applications. In this talk, we will explore unique opportunities of utilizing edge computing resources
various technologies that aim to facilitate building global-scale
and edge-aware data management systems. These approaches to support emerging applications with stringent mobility and
are based on Geo-replication, where data is replicated across real-time requirements such as Augmented and Virtual Reality
geographic locations to be closer to users, and Edge-awareness, and vehicular applications. The main objective of DPaxos
where applications are deployed on edge locations to bypass the is to reduce the latency of serving user requests, recovering
last-mile infrastructure. We propose novel consensus approaches from failures, and reacting to mobility. DPaxos achieves these
that manage access to partitioned data across globally-distributed
datacenters and edge nodes. The main objective is to reduce the objectives by a few proposed changes to the traditional Paxos
latency of serving user requests, while ensuring fault-tolerance protocol. Most notably, DPaxos proposes a dynamic allocation
and adapting gracefully to mobility. In addition to failures, data of quorums (i.e., groups of nodes) that are needed for Paxos
centers are constantly exposed to an increasing number of non- Leader Election. Leader Election quorums in DPaxos are
trivial adversarial threats. Traditional cryptographic methods smaller than traditional Paxos and expand only in the presence
either limit the functionality of the data, or significantly increase
retrieval costs. We will highlight some novel approaches that of conflicts.
ensure efficient privacy preserving access to data in the Cloud. DPaxos proposes Zone-centric Quorums as an alternative
Index Terms—Data Management, Cloud, Edge, Privacy. to majority-based techniques to avoid unnecessary wide-area
communication. A zone denotes a collection of neighboring
edge nodes. DPaxos restricts the communication correspond-
ing to a data partition to be within the zones where its users are
The utilization of edge nodes is inevitable for the success located. To do this, DPaxos distinguishes between the quorums
and growth of emerging low latency applications, such as that are needed to perform the main two tasks in Paxos: Leader
Augmented and Virtual Reality (AR/VR) and vehicular net- Election (coordination with other nodes to select a leader for
works. Such applications have stringent latency requirements a partition and is typically invoked in reaction to failures or
that the current cloud model cannot satisfy. This is due to the mobility) and Replication (committing data from a leader to
large communication latency between users and their closest secondary nodes and is typically invoked for every transaction
datacenter (up to 100ms). This latency problem is exacerbated or request). In typical workloads, Replication is more frequent
for applications that serve users across large geographical than Leader Election, and thus data management systems
areas. In such cases, users incur wide area latency as large as should be prioritized to optimize its performance. Ideally, for
100s of milliseconds to seconds. Placing data closer to users performance, Replication would be performed within a zone
at edge nodes overcomes this fundamental communication rather than a majority of nodes. Flexible Paxos, proposed
latency limit. We envision, as others have, that the cloud model by Howard et al. [2], shows that it is possible to assign
will extend to edge locations similar to how content delivery arbitrarily small Replication quorums as long as they satisfy
networks utilize edge locations. However, rather than edge the condition: a Leader Election quorum must intersect all
locations being used for data caching only, they will also host Replication quorums. This means that in Flexible Paxos-based
data management components that will allow manipulation approaches, the trade-off of small Replication quorums within
and querying of local edge partitions. In this presentation, we zones is an expensive Leader Election quorum that must span
focus on transaction processing as the data management task to all zones.
be supported by the edge data management components. The We base DPaxos Zone-Centric Quorums on the theoretical
model and focus of this aims to serve web and cloud applica- foundation laid by Flexible Paxos and adapt its quorum alloca-
1
tion techniques to the practical application of data management the encrypted database and the index. Indeed, the differential
on globally-distributed edge nodes. Then, we propose two privacy model is todays de facto standard for protecting per-
approaches to overcome Flexible Paxos significant Leader sonal information that needs to be partially disclosed. PINED-
Election penalty: RQ executes queries in the order of at least one magnitude
1) Expanding Quorums: this approach overcomes Flexible faster.
Paxos intersection condition and allows both Leader I. ACKNOWLEDGMENT
Election and Replication quorums to be small. DPaxos
This work was partially funded by the NSF grants CNS-
is the first Paxos protocol that allows Leader Election to
1528178, CNS-1703560 and CNS 1815733.
not intersect with all Replication quorums. Rather, the
Leader Election quorum starts small and then grows to R EFERENCES
only intersect with Replication quorums that are being [1] Faisal Nawab, Divyakant Agrawal, and Amr El Abbadi. Dpaxos:
used by other leaders. Managing data closer to users for low-latency and mobile applications.
2) Leader Handoff: this approach supports fast leader mo- In Proceedings of the 2018 International Conference on Management of
Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018,
bility. Mobility, unlike failures, is triggered by known pages 1221–1236, 2018.
user actions, and hence can be exploited to optimize [2] Heidi Howard, Dahlia Malkhi, and Alexander Spiegelman. Flexible
Leader Election. DPaxos exploits this to enable Leader paxos: Quorum intersection revisited. In 20th International Conference
on Principles of Distributed Systems, OPODIS 2016, December 13-16,
Election via a lightweight, single round of messaging 2016, Madrid, Spain, pages 25:1–25:14, 2016.
between the old and the new leaders. [3] Cetin Sahin, Victor Zakhary, Amr El Abbadi, Huijia Lin, and Stefano
Tessaro. Taostore: Overcoming asynchronicity in oblivious data storage.
In addition to failures, data centers are constantly exposed In IEEE Symposium on Security and Privacy, SP 2016, San Jose, CA,
to an increasing number of non-trivial adversarial threats. Tra- USA, May 22-26, 2016, pages 198–217, 2016.
[4] Cetin Sahin, Tristan Allard, Reza Akbarinia, Amr El Abbadi, and Esther
ditional cryptographic methods either limit the functionality Pacitti. A differentially private index for range query processing in clouds.
of the data, or significantly increase retrieval costs. During In 34th IEEE International Conference on Data Engineering, ICDE 2018,
the last decade, a large body of academic work has tackled Paris, France, April 16-19, 2018, pages 857–868, 2018.
the problem of outsourcing databases to an untrusted cloud
while maintaining both confidentiality and SQL-like querying
functionality (at least partially). We will highlight some novel
approaches that ensure efficient privacy preserving access to
data in the Cloud. In particular, we briefly discuss TaoStore [3]
and PinedRQ‘ [4].
TaoStore is an oblivious storage systems that hides both
the contents of the data as well as access patterns from an
untrusted cloud provider. The target scenario is one where
multiple users from a trusted group (e.g., corporate employees)
asynchronously access and edit potentially overlapping data
sets through a trusted proxy mediating client-cloud com-
munication. TaoStore is built on top of a new tree-based
ORAM scheme that processes client requests concurrently and
asynchronously in a non-blocking fashion. This results in a
substantial gain in throughput, simplicity, and flexibility over
previous systems.
PinedRQ is a differentially private index for outsourced en-
crypted dataset and that supports non-aggregate range queries
on cloud stored data, while achieving both privacy and effi-
ciency. Performing range queries efficiently in an untrusted
cloud setting has not been addressed in a satisfactory manner.
Range queries express a bounded restriction over the retrieved
records. They are fundamental database operations. Pine-
dRQ sends two complementary data structures to the cloud:
an encrypted version of the database, e.g., AES encryption
scheme, indexed by a hierarchy of histograms, such that both
are perturbed to satisfy differential privacy. Efficiency comes
from the disclosure of the index, in the clear, to the cloud,
for guiding the query execution strategy. No computation is
ever performed on encrypted data. Privacy comes from the
differential privacy guarantees of the function that computes
2