<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Multi-model Query Processing Meets Category Theory and Functional Programming</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Valter Uotila</string-name>
          <email>ifrst.last@helsinki.fi</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dieter Gawlick</string-name>
          <email>ifrst.last@oracle.com</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Reference Format:</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gregory Pogossiants</string-name>
          <email>gregp_21@yahoo.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Jiaheng Lu, University of Helsinki</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SATS Technologies</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das, and Gregory Pogossiants. Multi-model Query Processing Meets Category, Theory and Functional Programming. In the 2nd Workshop on Search</institution>
          ,
          <addr-line>Exploration, and Analysis in Heterogeneous Datastores (SEA Data 2021).</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Zhen Hua Liu, Souripriya Das, Oracle Corporation</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>The current multi-model database management systems (MMDBS) are becoming more complex. We propose category theory as a foundation for a new query language design, query processing, and transformation frameworks for MMDBS. We describe the recent challenges of MMDBS and represent possible solutions to them. Finally, we propose a category theory-inspired prototype system.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The multi-model database management systems (MMDBS) [
        <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
        ]
are gradually becoming more complex, which creates an urgent
need for a better theory to formalize the systems. We identify that
the end-user’s experience is often poorly addressed in the design
and implementation of the systems. For example, NoSQL is mainly
targeted at developers. Technology is supposed to evolve according
to the business and end-user’s needs. Higher-level abstraction can
simplify the systems and enable a better user experience.
      </p>
      <p>
        The theory should be a standard across diferent domains and it
should be powerful enough to express a wide variety of concepts
on a suitable abstraction level. We believe that a candidate to be
such a theory is category theory. Liu et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed this role
to category theory to reason about declarative constructions and
transformations between various data models. The standard
introduction to category theory is MacLane [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and other good are
[
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ].
      </p>
      <p>
        David Spivak [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] has applied category theory to model
relational databases in order to category theoretically migrate relational
data. The commercial application of this category theory-based
relational database framework is implemented by Conexus [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
    </sec>
    <sec id="sec-2">
      <title>CHALLENGES IN MMDBS</title>
      <p>
        MMDBS is characterized by the capability to handle multiple data
models against a single, unified backend. The models can include
relational, graph, hierarchical, text, images, audio, video, spatial,
Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
for the volume as a collection by its editors. This volume and its papers are published
under the Creative Commons License Attribution 4.0 International (CC BY 4.0).
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and
Analysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021,
Copenhagen, Denmark) on CEUR-WS.org.
expressions, and other complex data structures. It is required that
MMDBS implement a single declarative query language that
enables users to execute cross-model queries. Another wanted feature
is a unified indexing mechanism that can index multiple data
instances across diferent models. MMDBS should have the capability
to perform extensive data transformations which automatically
create views and materialize data between diferent models. Oracle
converged database [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is an example of a commercial MMDBS.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>TOWARDS MODERN MMDBS</title>
      <p>Historically, we had hierarchical and network data models, and then
the relational data model. Now, in addition to the relational model,
we have re-invented the hierarchical models as JSON/XML, and
the network models as RDF and property graphs. NoSQL system
complicates the matter by forcing users to access data without
declarative language in a very loose transactional system. All of
these eforts have regressed the usability of DBMS.</p>
      <p>The principle of DBMS is that there is no single data model
that is the best or the worst. Therefore, it is time to introduce the
concept of a virtual data model. Virtual data model design is similar
to the concept of virtual memory in classical OS design and virtual
machine in modern cloud computing environment design.</p>
      <p>
        The modern DBMS needs to follow both schema-first or
schemalater paradigms and also support temporal aspects of data [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The
temporal dimension of data is often poorly implemented in DBMS.
For example, a part of temporality is event detection which could be
tackled by developing calculus logic on top of queries. The modern
DBMS would benefit from the unification of meta-data and data to
define schema-flexible storing, indexing, and querying features [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>DEMO SYSTEM AND CONCLUSION</title>
      <p>
        We have developed a demonstration system called MultiCategory
[
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ] to demonstrate our solutions. The system’s backend is
implemented with Haskell. It ofers a fold function-based query
processing mechanism which is a method to model queries from a
category theoretical perspective [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A multi-model schema is
represented as a category that is mapped to the multi-model instance.
Formally our approach for modeling MMDBS and data
transformations using category theory is represented in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Our future work includes researching data integration, migration,
transformation, temporal, and virtual data model challenges using
category theory. Recent progress in applied category theory has
shown that category theory is a very powerful framework to model
and formally define complex systems.</p>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENTS</title>
      <p>This paper is partially supported by Finnish Academy Project 310321
and Oracle ERO gift funding.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>2021. Categorical</given-names>
            <surname>Databases</surname>
          </string-name>
          . https://www.categoricaldata.net/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <fpage>2021</fpage>
          . Conexus. https://conexus.com/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Arvind</given-names>
            <surname>Bhope</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Building a modern app with Oracle's Converged Database</article-title>
          . https://blogs.oracle.com/database/post/building
          <article-title>-a-modern-app-withoracles-converged-database</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Dieter</given-names>
            <surname>Gawlick</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Querying the Past, the Present, and the Future</article-title>
          .
          <source>In Proceedings of the 20th International Conference on Data Engineering, ICDE</source>
          <year>2004</year>
          , 30 March - 2
          <source>April</source>
          <year>2004</year>
          , Boston, MA, USA,
          <string-name>
            <given-names>Z. Meral</given-names>
            <surname>Özsoyoglu and Stanley B. Zdonik</surname>
          </string-name>
          (Eds.).
          <source>IEEE Computer Society</source>
          , 867. https://doi.org/10.1109/ICDE.
          <year>2004</year>
          .1320094
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Torsten</given-names>
            <surname>Grust</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Monad Comprehensions: A Versatile Representation for Queries</article-title>
          . Springer Berlin Heidelberg, Berlin, Heidelberg,
          <fpage>288</fpage>
          -
          <lpage>311</lpage>
          . https://doi.org/10.1007/ 978-3-
          <fpage>662</fpage>
          -05372-0_
          <fpage>12</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.M.</given-names>
            <surname>Lane</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Categories for the Working Mathematician</article-title>
          . Springer New York, 233 Spring St, New York, NY 10013, USA.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Zhen</given-names>
            <surname>Hua Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Gawlick</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Management of Flexible Schema Data in RDBMSs - Opportunities and Limitations for NoSQL -</article-title>
          .
          <source>In CIDR.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Zhen</given-names>
            <surname>Hua</surname>
          </string-name>
          <string-name>
            <surname>Liu</surname>
          </string-name>
          , Jiaheng Lu, Dieter Gawlick, Heli Helskyaho, Gregory Pogossiants, and
          <string-name>
            <given-names>Zhe</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Multi-model Database Management Systems - A Look Forward</article-title>
          .
          <source>In Polystores VLDB 2018 Workshops</source>
          .
          <fpage>16</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jiaheng</given-names>
            <surname>Lu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Irena</given-names>
            <surname>Holubová</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Multi-model Databases: A New Journey to Handle the Variety of Data</article-title>
          .
          <source>ACM Comput. Surv</source>
          .
          <volume>52</volume>
          ,
          <issue>3</issue>
          (
          <year>2019</year>
          ),
          <volume>55</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>55</lpage>
          :
          <fpage>38</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Jiaheng</surname>
            <given-names>Lu</given-names>
          </string-name>
          , Irena Holubová, and
          <string-name>
            <given-names>Bogdan</given-names>
            <surname>Cautis</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Multi-Model Databases and Tightly Integrated Polystores: Current Practices, Comparisons, and Open Challenges</article-title>
          .
          <source>In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (Torino, Italy) (CIKM '18)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>2301</fpage>
          -
          <lpage>2302</lpage>
          . https://doi.org/10.1145/3269206. 3274269
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Riehl</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Category Theory in Context</article-title>
          .
          <source>Dover Publications</source>
          ,
          <volume>31</volume>
          2nd St, Mineola, NY 11501, USA. www.math.jhu.edu/~eriehl/context.pdf
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>David</given-names>
            <surname>Spivak</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Category Theory for the Sciences</article-title>
          .
          <article-title>(</article-title>
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>David</surname>
            <given-names>I.</given-names>
          </string-name>
          <string-name>
            <surname>Spivak</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Functorial Data Migration</article-title>
          .
          <source>CoRR abs/1009</source>
          .1166 (
          <year>2010</year>
          ). arXiv:
          <volume>1009</volume>
          .1166 http://arxiv.org/abs/1009.1166
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Valter</given-names>
            <surname>Uotila</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jiaheng</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>A Formal Categorical Theoretical Framework for Multi-Model Data Transformation</article-title>
          ,
          <source>In Poly: VLDB Workshop on Polystore Systems for Heterogeneous Data in Multiple Databases with Privacy and Security Assurances. Poly</source>
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Valter</given-names>
            <surname>Uotila</surname>
          </string-name>
          and
          <string-name>
            <given-names>Jiaheng</given-names>
            <surname>Lu</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>MultiCategory demo video</article-title>
          . https://youtu.be/ uceIi91AGsg.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Valter</surname>
            <given-names>Uotila</given-names>
          </string-name>
          , Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu,
          <string-name>
            <surname>Souripriya Das</surname>
            , and
            <given-names>Gregory</given-names>
          </string-name>
          <string-name>
            <surname>Pogossiants</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>MultiCategory: Multi-model Query Processing Meets Category Theory and Functional Programming</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>14</volume>
          ,
          <fpage>2663</fpage>
          -
          <lpage>2666</lpage>
          . Issue 12. https://doi.org/10.14778/3476311.3476314
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>