=Paper= {{Paper |id=Vol-2929/poster3 |storemode=property |title=Declarative Querying of Heterogeneous NoSQL Stores |pdfUrl=https://ceur-ws.org/Vol-2929/poster3.pdf |volume=Vol-2929 |authors=Nikolaos Koutroumanis,Nikolaos Kousathanas,Christos Doulkeridis,Akrivi Vlachou |dblpUrl=https://dblp.org/rec/conf/vldb/KoutroumanisKDV21 }} ==Declarative Querying of Heterogeneous NoSQL Stores== https://ceur-ws.org/Vol-2929/poster3.pdf
              Declarative Querying of Heterogeneous NoSQL Stores
                         Nikolaos Koutroumanis                                                            Nikolaos Kousathanas
                            Dept. of Digital Systems                                                       Dept. of Digital Systems
                             University of Piraeus                                                          University of Piraeus
                                Piraeus, Greece                                                                Piraeus, Greece
                            koutroumanis@unipi.gr                                                     nikolaos.kousathanas@gmail.com

                            Christos Doulkeridis                                                               Akrivi Vlachou
                            Dept. of Digital Systems                                                Dept.of Inf. & Com.Syst.Engineering
                             University of Piraeus                                                         University of Aegean
                                Piraeus, Greece                                                               Karlovasi, Greece
                               cdoulk@unipi.gr                                                              avlachou@aegean.gr

ABSTRACT                                                                                  Facebook’s Presto [5] (recently known as Trino), which is an SQL-
Nowadays, large quantities of data reside in different and heteroge-                      compliant query engine that operates on a wide variety of different
neous NoSQL stores that accommodate the individual requirements                           data sources. Again, the general idea is to put a new query engine
of each application, such as scalability, efficiency and flexibility                      in order to unify query processing on top of existing systems that
to schema changes. In contrast to the well-established relational                         already provide native support for query processing. Although this
model, NoSQL stores are still non-standardized and use heteroge-                          is meaningful for certain applications, it is not necessarily appeal-
neous languages and APIs for data access. In consequence, big data                        ing for developers that need to use popular NoSQL stores in their
developers and data analysts need to write customized code for data                       big data architectures and query them using the same language.
access, exploration and analysis over different NoSQL stores. We                             Instead, we envision a unified approach for declarative querying
present a solution to this problem that allows seamless access to dif-                    of heterogeneous NoSQL stores using the same query language. Yet,
ferent NoSQL stores using a common programming API. Moreover,                             our objective is to support this without building a new query engine.
we show that we can exploit this API in order to provide declarative                      Our solution to this problem is a lightweight, unified API, called
access to NoSQL stores using a SQL-like language.                                         NoDA [3, 4] (https://github.com/the-noda-project), that consists
                                                                                          of simple data access operators, such as filter, project, sort,
Reference Format:                                                                         limit and aggregate. Inspired by the ODBC/JDBC paradigm in
Nikolaos Koutroumanis, Nikolaos Kousathanas, Christos Doulkeridis,                        relational databases, NoDA defines data access operators that are
and Akrivi Vlachou. Declarative Querying of Heterogeneous NoSQL Stores.                   implemented for different NoSQL stores. Using NoDA, developers
In the 2nd Workshop on Search, Exploration, and Analysis in
                                                                                          can express their queries in the same language, but target differ-
Heterogeneous Datastores (SEA Data 2021).
                                                                                          ent NoSQL stores by simply changing only the connection to the
                                                                                          underlying store. Perhaps most importantly, NoDA’s data access op-
                                                                                          erators have enabled the provision of an SQL interface which takes
1    MOTIVATION & RESEARCH CHALLENGES
                                                                                          as input a SQL statement, translates it to NoDA data access opera-
Despite their popularity in the development of scalable, big data                         tors, which can be directed to any of the supported NoSQL stores.
applications, NoSQL stores [1] still rely on heterogeneous data                           Currently, we have implemented NoDA [3] for diverse NoSQL
models, languages and APIs. Even though this is considered as a                           stores: MongoDB (document store), HBase (wide-column store),
positive feature for modern, data-intensive applications (as we know                      Redis (key-value store) and Neo4J (graph database).
nowadays that “one size does not fit all” when it comes to DBMS [6]),
it also poses important problems. In particular, developers need to
                                                                                          2   FUTURE RESEARCH DIRECTIONS
learn different query languages to access different NoSQL stores, a
fact that also hinders portability of applications when a different                       Several interesting research directions can be followed in the future:
storage system is chosen.                                                                     • How our approach can be exploited to fetch data stored
    Existing solutions to this problem include polystores [2], data-                            across multiple NoSQL stores and retrieve the combined
base engines that use different systems (including NoSQL) for stor-                             results.
age of different data types. However, polystores comprise yet an-                             • Handling more complex data types is also challenging; cur-
other query engine (with components for query execution, opti-                                  rently, we work on spatio-temporal data, but other complex
mization, etc.) that needs to interact with existing storage systems                            types are of interest, such as trajectories, graphs and textu-
that include their own query engines. Another relevant approach is                              ally annotated spatial data.
Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
                                                                                              • Our approach focuses on analytical queries, so extending it
for the volume as a collection by its editors. This volume and its papers are published         towards supporting updates is also of interest.
under the Creative Commons License Attribution 4.0 International (CC BY 4.0).                 • How to efficiently support joins of distributed data collec-
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal-
ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021,                tions is another challenging direction, even more across
Copenhagen, Denmark) on CEUR-WS.org.                                                            different NoSQL stores.
ACKNOWLEDGMENTS                                                                               Zdonik. 2015. The BigDAWG Polystore System. SIGMOD Record 44, 2 (2015),
                                                                                              11–16.
The research work was supported by the Hellenic Foundation for                            [3] Nikolaos Koutroumanis, Nikolaos Kousathanas, Christos Doulkeridis, and Akrivi
Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I.                         Vlachou. 2021. A Demonstration of NoDA: Unified Access to NoSQL Stores. In
                                                                                              Proceedings of the 47th International Conference on Very Large Data Bases (VLDB’21),
Research Projects to support Faculty members and Researchers and                              Copenhagen, Denmark, August 16-20, 2021.
the procurement of high-cost research equipment grant” (Project                           [4] Nikolaos Koutroumanis, Panagiotis Nikitopoulos, Akrivi Vlachou, and Christos
Number: HFRI-FM17-81).                                                                        Doulkeridis. 2019. NoDA: Unified NoSQL Data Access Operators for Mobility
                                                                                              Data. In Proceedings of the 16th International Symposium on Spatial and Temporal
                                                                                              Databases, SSTD 2019, Vienna, Austria, August 19-21, 2019. 174–177.
REFERENCES                                                                                [5] Raghav Sethi, Martin Traverso, Dain Sundstrom, David Phillips, Wenlei Xie, Yutian
[1] Ali Davoudian, Liu Chen, and Mengchi Liu. 2018. A Survey on NoSQL Stores.                 Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and Christopher
    ACM Comput. Surv. 51, 2 (2018), 40:1–40:43.                                               Berner. 2019. Presto: SQL on Everything. In 35th IEEE International Conference on
[2] Jennie Duggan, Aaron J. Elmore, Michael Stonebraker, Magdalena Balazinska, Bill           Data Engineering. 1802–1813.
    Howe, Jeremy Kepner, Sam Madden, David Maier, Tim Mattson, and Stanley B.             [6] Michael Stonebraker. 2008. Technical perspective - One size fits all: an idea whose
                                                                                              time has come and gone. Commun. ACM 51, 12 (2008), 76.




                                                                                      2