=Paper=
{{Paper
|id=Vol-2929/poster3
|storemode=property
|title=Declarative Querying of Heterogeneous NoSQL Stores
|pdfUrl=https://ceur-ws.org/Vol-2929/poster3.pdf
|volume=Vol-2929
|authors=Nikolaos Koutroumanis,Nikolaos Kousathanas,Christos Doulkeridis,Akrivi Vlachou
|dblpUrl=https://dblp.org/rec/conf/vldb/KoutroumanisKDV21
}}
==Declarative Querying of Heterogeneous NoSQL Stores==
Declarative Querying of Heterogeneous NoSQL Stores
Nikolaos Koutroumanis Nikolaos Kousathanas
Dept. of Digital Systems Dept. of Digital Systems
University of Piraeus University of Piraeus
Piraeus, Greece Piraeus, Greece
koutroumanis@unipi.gr nikolaos.kousathanas@gmail.com
Christos Doulkeridis Akrivi Vlachou
Dept. of Digital Systems Dept.of Inf. & Com.Syst.Engineering
University of Piraeus University of Aegean
Piraeus, Greece Karlovasi, Greece
cdoulk@unipi.gr avlachou@aegean.gr
ABSTRACT Facebook’s Presto [5] (recently known as Trino), which is an SQL-
Nowadays, large quantities of data reside in different and heteroge- compliant query engine that operates on a wide variety of different
neous NoSQL stores that accommodate the individual requirements data sources. Again, the general idea is to put a new query engine
of each application, such as scalability, efficiency and flexibility in order to unify query processing on top of existing systems that
to schema changes. In contrast to the well-established relational already provide native support for query processing. Although this
model, NoSQL stores are still non-standardized and use heteroge- is meaningful for certain applications, it is not necessarily appeal-
neous languages and APIs for data access. In consequence, big data ing for developers that need to use popular NoSQL stores in their
developers and data analysts need to write customized code for data big data architectures and query them using the same language.
access, exploration and analysis over different NoSQL stores. We Instead, we envision a unified approach for declarative querying
present a solution to this problem that allows seamless access to dif- of heterogeneous NoSQL stores using the same query language. Yet,
ferent NoSQL stores using a common programming API. Moreover, our objective is to support this without building a new query engine.
we show that we can exploit this API in order to provide declarative Our solution to this problem is a lightweight, unified API, called
access to NoSQL stores using a SQL-like language. NoDA [3, 4] (https://github.com/the-noda-project), that consists
of simple data access operators, such as filter, project, sort,
Reference Format: limit and aggregate. Inspired by the ODBC/JDBC paradigm in
Nikolaos Koutroumanis, Nikolaos Kousathanas, Christos Doulkeridis, relational databases, NoDA defines data access operators that are
and Akrivi Vlachou. Declarative Querying of Heterogeneous NoSQL Stores. implemented for different NoSQL stores. Using NoDA, developers
In the 2nd Workshop on Search, Exploration, and Analysis in
can express their queries in the same language, but target differ-
Heterogeneous Datastores (SEA Data 2021).
ent NoSQL stores by simply changing only the connection to the
underlying store. Perhaps most importantly, NoDA’s data access op-
erators have enabled the provision of an SQL interface which takes
1 MOTIVATION & RESEARCH CHALLENGES
as input a SQL statement, translates it to NoDA data access opera-
Despite their popularity in the development of scalable, big data tors, which can be directed to any of the supported NoSQL stores.
applications, NoSQL stores [1] still rely on heterogeneous data Currently, we have implemented NoDA [3] for diverse NoSQL
models, languages and APIs. Even though this is considered as a stores: MongoDB (document store), HBase (wide-column store),
positive feature for modern, data-intensive applications (as we know Redis (key-value store) and Neo4J (graph database).
nowadays that “one size does not fit all” when it comes to DBMS [6]),
it also poses important problems. In particular, developers need to
2 FUTURE RESEARCH DIRECTIONS
learn different query languages to access different NoSQL stores, a
fact that also hinders portability of applications when a different Several interesting research directions can be followed in the future:
storage system is chosen. • How our approach can be exploited to fetch data stored
Existing solutions to this problem include polystores [2], data- across multiple NoSQL stores and retrieve the combined
base engines that use different systems (including NoSQL) for stor- results.
age of different data types. However, polystores comprise yet an- • Handling more complex data types is also challenging; cur-
other query engine (with components for query execution, opti- rently, we work on spatio-temporal data, but other complex
mization, etc.) that needs to interact with existing storage systems types are of interest, such as trajectories, graphs and textu-
that include their own query engines. Another relevant approach is ally annotated spatial data.
Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
• Our approach focuses on analytical queries, so extending it
for the volume as a collection by its editors. This volume and its papers are published towards supporting updates is also of interest.
under the Creative Commons License Attribution 4.0 International (CC BY 4.0). • How to efficiently support joins of distributed data collec-
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal-
ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021, tions is another challenging direction, even more across
Copenhagen, Denmark) on CEUR-WS.org. different NoSQL stores.
ACKNOWLEDGMENTS Zdonik. 2015. The BigDAWG Polystore System. SIGMOD Record 44, 2 (2015),
11–16.
The research work was supported by the Hellenic Foundation for [3] Nikolaos Koutroumanis, Nikolaos Kousathanas, Christos Doulkeridis, and Akrivi
Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Vlachou. 2021. A Demonstration of NoDA: Unified Access to NoSQL Stores. In
Proceedings of the 47th International Conference on Very Large Data Bases (VLDB’21),
Research Projects to support Faculty members and Researchers and Copenhagen, Denmark, August 16-20, 2021.
the procurement of high-cost research equipment grant” (Project [4] Nikolaos Koutroumanis, Panagiotis Nikitopoulos, Akrivi Vlachou, and Christos
Number: HFRI-FM17-81). Doulkeridis. 2019. NoDA: Unified NoSQL Data Access Operators for Mobility
Data. In Proceedings of the 16th International Symposium on Spatial and Temporal
Databases, SSTD 2019, Vienna, Austria, August 19-21, 2019. 174–177.
REFERENCES [5] Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian
[1] Ali Davoudian, Liu Chen, and Mengchi Liu. 2018. A Survey on NoSQL Stores. Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher
ACM Comput. Surv. 51, 2 (2018), 40:1–40:43. Berner. 2019. Presto: SQL on Everything. In 35th IEEE International Conference on
[2] Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magdalena Balazinska, Bill Data Engineering. 1802–1813.
Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stanley B. [6] Michael Stonebraker. 2008. Technical perspective - One size fits all: an idea whose
time has come and gone. Commun. ACM 51, 12 (2008), 76.
2