<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Declarative Querying of Heterogeneous NoSQL Stores</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nikolaos Koutroumanis</string-name>
          <email>koutroumanis@unipi.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christos Doulkeridis</string-name>
          <email>cdoulk@unipi.gr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nikolaos Kousathanas</string-name>
          <email>nikolaos.kousathanas@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akrivi Vlachou</string-name>
          <email>avlachou@aegean.gr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Digital Systems, University of Piraeus</institution>
          ,
          <addr-line>Piraeus</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept.of Inf. &amp; Com.Syst.Engineering, University of Aegean</institution>
          ,
          <addr-line>Karlovasi</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays, large quantities of data reside in diferent and heterogeneous NoSQL stores that accommodate the individual requirements of each application, such as scalability, eficiency and flexibility to schema changes. In contrast to the well-established relational model, NoSQL stores are still non-standardized and use heterogeneous languages and APIs for data access. In consequence, big data developers and data analysts need to write customized code for data access, exploration and analysis over diferent NoSQL stores. We present a solution to this problem that allows seamless access to different NoSQL stores using a common programming API. Moreover, we show that we can exploit this API in order to provide declarative access to NoSQL stores using a SQL-like language.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        MOTIVATION &amp; RESEARCH CHALLENGES
Despite their popularity in the development of scalable, big data
applications, NoSQL stores [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] still rely on heterogeneous data
models, languages and APIs. Even though this is considered as a
positive feature for modern, data-intensive applications (as we know
nowadays that “one size does not fit all” when it comes to DBMS [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]),
it also poses important problems. In particular, developers need to
learn diferent query languages to access diferent NoSQL stores, a
fact that also hinders portability of applications when a diferent
storage system is chosen.
      </p>
      <p>
        Existing solutions to this problem include polystores [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
database engines that use diferent systems (including NoSQL) for
storage of diferent data types. However, polystores comprise yet
another query engine (with components for query execution,
optimization, etc.) that needs to interact with existing storage systems
that include their own query engines. Another relevant approach is
Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
for the volume as a collection by its editors. This volume and its papers are published
under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and
Analysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021,
Copenhagen, Denmark) on CEUR-WS.org.
Facebook’s Presto [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] (recently known as Trino), which is an
SQLcompliant query engine that operates on a wide variety of diferent
data sources. Again, the general idea is to put a new query engine
in order to unify query processing on top of existing systems that
already provide native support for query processing. Although this
is meaningful for certain applications, it is not necessarily
appealing for developers that need to use popular NoSQL stores in their
big data architectures and query them using the same language.
      </p>
      <p>
        Instead, we envision a unified approach for declarative querying
of heterogeneous NoSQL stores using the same query language. Yet,
our objective is to support this without building a new query engine.
Our solution to this problem is a lightweight, unified API, called
NoDA [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ] (https://github.com/the-noda-project), that consists
of simple data access operators, such as filter, project, sort,
limit and aggregate. Inspired by the ODBC/JDBC paradigm in
relational databases, NoDA defines data access operators that are
implemented for diferent NoSQL stores. Using NoDA, developers
can express their queries in the same language, but target
diferent NoSQL stores by simply changing only the connection to the
underlying store. Perhaps most importantly, NoDA’s data access
operators have enabled the provision of an SQL interface which takes
as input a SQL statement, translates it to NoDA data access
operators, which can be directed to any of the supported NoSQL stores.
Currently, we have implemented NoDA [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for diverse NoSQL
stores: MongoDB (document store), HBase (wide-column store),
Redis (key-value store) and Neo4J (graph database).
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>FUTURE RESEARCH DIRECTIONS</title>
      <p>Several interesting research directions can be followed in the future:
• How our approach can be exploited to fetch data stored
across multiple NoSQL stores and retrieve the combined
results.
• Handling more complex data types is also challenging;
currently, we work on spatio-temporal data, but other complex
types are of interest, such as trajectories, graphs and
textually annotated spatial data.
• Our approach focuses on analytical queries, so extending it
towards supporting updates is also of interest.
• How to eficiently support joins of distributed data
collections is another challenging direction, even more across
diferent NoSQL stores.</p>
    </sec>
    <sec id="sec-3">
      <title>ACKNOWLEDGMENTS</title>
      <p>The research work was supported by the Hellenic Foundation for
Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I.
Research Projects to support Faculty members and Researchers and
the procurement of high-cost research equipment grant” (Project
Number: HFRI-FM17-81).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Ali</given-names>
            <surname>Davoudian</surname>
          </string-name>
          , Liu Chen, and Mengchi Liu.
          <year>2018</year>
          .
          <article-title>A Survey on NoSQL Stores</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>51</volume>
          ,
          <issue>2</issue>
          (
          <year>2018</year>
          ),
          <volume>40</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          :
          <fpage>43</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jennie</given-names>
            <surname>Duggan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Aaron J.</given-names>
            <surname>Elmore</surname>
          </string-name>
          , Michael Stonebraker, Magdalena Balazinska, Bill Howe, Jeremy Kepner, Sam Madden, David Maier,
          <string-name>
            <given-names>Tim</given-names>
            <surname>Mattson</surname>
          </string-name>
          , and
          <string-name>
            <surname>Stanley</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Zdonik</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>The BigDAWG Polystore System</article-title>
          .
          <source>SIGMOD Record 44</source>
          ,
          <issue>2</issue>
          (
          <year>2015</year>
          ),
          <fpage>11</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Nikolaos</given-names>
            <surname>Koutroumanis</surname>
          </string-name>
          , Nikolaos Kousathanas, Christos Doulkeridis, and
          <string-name>
            <given-names>Akrivi</given-names>
            <surname>Vlachou</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>A Demonstration of NoDA: Unified Access to NoSQL Stores</article-title>
          .
          <source>In Proceedings of the 47th International Conference on Very Large Data Bases (VLDB'21)</source>
          , Copenhagen, Denmark,
          <source>August 16-20</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Nikolaos</given-names>
            <surname>Koutroumanis</surname>
          </string-name>
          , Panagiotis Nikitopoulos, Akrivi Vlachou, and
          <string-name>
            <given-names>Christos</given-names>
            <surname>Doulkeridis</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>NoDA: Unified NoSQL Data Access Operators for Mobility Data</article-title>
          .
          <source>In Proceedings of the 16th International Symposium on Spatial and Temporal Databases</source>
          ,
          <string-name>
            <surname>SSTD</surname>
          </string-name>
          <year>2019</year>
          , Vienna, Austria,
          <source>August 19-21</source>
          ,
          <year>2019</year>
          .
          <fpage>174</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Raghav</given-names>
            <surname>Sethi</surname>
          </string-name>
          , Martin Traverso, Dain Sundstrom, David Phillips,
          <string-name>
            <given-names>Wenlei</given-names>
            <surname>Xie</surname>
          </string-name>
          , Yutian Sun, Nezih Yegitbasi, Haozhun Jin, Eric Hwang, Nileema Shingte, and
          <string-name>
            <given-names>Christopher</given-names>
            <surname>Berner</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Presto: SQL on Everything</article-title>
          .
          <source>In 35th IEEE International Conference on Data Engineering</source>
          . 1802-
          <fpage>1813</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Michael</given-names>
            <surname>Stonebraker</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Technical perspective - One size fits all: an idea whose time has come and gone</article-title>
          .
          <source>Commun. ACM</source>
          <volume>51</volume>
          ,
          <issue>12</issue>
          (
          <year>2008</year>
          ),
          <fpage>76</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>