=Paper= {{Paper |id=Vol-2929/poster6 |storemode=property |title=Multi-model Query Processing Meets Category Theory and Functional Programming |pdfUrl=https://ceur-ws.org/Vol-2929/poster6.pdf |volume=Vol-2929 |authors=Valter Uotila,Jiaheng Lu,Dieter Gawlick,Zhen Hua Liu,Souripriya Das,Gregory Pogossiants |dblpUrl=https://dblp.org/rec/conf/vldb/UotilaLGLDP21 }} ==Multi-model Query Processing Meets Category Theory and Functional Programming== https://ceur-ws.org/Vol-2929/poster6.pdf
       Multi-model Query Processing Meets Category Theory and
                      Functional Programming
                    Valter Uotila                                            Dieter Gawlick                            Gregory Pogossiants
                     Jiaheng Lu                                              Zhen Hua Liu                                SATS Technologies
               University of Helsinki                                        Souripriya Das                             gregp_21@yahoo.com
                first.last@helsinki.fi                                      Oracle Corporation
                                                                           first.last@oracle.com
ABSTRACT                                                                                  expressions, and other complex data structures. It is required that
The current multi-model database management systems (MMDBS)                               MMDBS implement a single declarative query language that en-
are becoming more complex. We propose category theory as a                                ables users to execute cross-model queries. Another wanted feature
foundation for a new query language design, query processing, and                         is a unified indexing mechanism that can index multiple data in-
transformation frameworks for MMDBS. We describe the recent                               stances across different models. MMDBS should have the capability
challenges of MMDBS and represent possible solutions to them.                             to perform extensive data transformations which automatically
Finally, we propose a category theory-inspired prototype system.                          create views and materialize data between different models. Oracle
                                                                                          converged database [3] is an example of a commercial MMDBS.
Reference Format:
Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das,
and Gregory Pogossiants. Multi-model Query Processing Meets Category                      3   TOWARDS MODERN MMDBS
Theory and Functional Programming. In the 2nd Workshop on Search,
                                                                                          Historically, we had hierarchical and network data models, and then
Exploration, and Analysis in Heterogeneous Datastores (SEA Data 2021).
                                                                                          the relational data model. Now, in addition to the relational model,
                                                                                          we have re-invented the hierarchical models as JSON/XML, and
                                                                                          the network models as RDF and property graphs. NoSQL system
1    INTRODUCTION
                                                                                          complicates the matter by forcing users to access data without
The multi-model database management systems (MMDBS) [9, 10]                               declarative language in a very loose transactional system. All of
are gradually becoming more complex, which creates an urgent                              these efforts have regressed the usability of DBMS.
need for a better theory to formalize the systems. We identify that                          The principle of DBMS is that there is no single data model
the end-user’s experience is often poorly addressed in the design                         that is the best or the worst. Therefore, it is time to introduce the
and implementation of the systems. For example, NoSQL is mainly                           concept of a virtual data model. Virtual data model design is similar
targeted at developers. Technology is supposed to evolve according                        to the concept of virtual memory in classical OS design and virtual
to the business and end-user’s needs. Higher-level abstraction can                        machine in modern cloud computing environment design.
simplify the systems and enable a better user experience.                                    The modern DBMS needs to follow both schema-first or schema-
   The theory should be a standard across different domains and it                        later paradigms and also support temporal aspects of data [4]. The
should be powerful enough to express a wide variety of concepts                           temporal dimension of data is often poorly implemented in DBMS.
on a suitable abstraction level. We believe that a candidate to be                        For example, a part of temporality is event detection which could be
such a theory is category theory. Liu et al. [8] proposed this role                       tackled by developing calculus logic on top of queries. The modern
to category theory to reason about declarative constructions and                          DBMS would benefit from the unification of meta-data and data to
transformations between various data models. The standard in-                             define schema-flexible storing, indexing, and querying features [7].
troduction to category theory is MacLane [6] and other good are
[11, 12].
   David Spivak [13] has applied category theory to model rela-                           4   DEMO SYSTEM AND CONCLUSION
tional databases in order to category theoretically migrate relational                    We have developed a demonstration system called MultiCategory
data. The commercial application of this category theory-based re-                        [15, 16] to demonstrate our solutions. The system’s backend is
lational database framework is implemented by Conexus [1, 2].                             implemented with Haskell. It offers a fold function-based query
                                                                                          processing mechanism which is a method to model queries from a
2    CHALLENGES IN MMDBS                                                                  category theoretical perspective [5]. A multi-model schema is rep-
MMDBS is characterized by the capability to handle multiple data                          resented as a category that is mapped to the multi-model instance.
models against a single, unified backend. The models can include                          Formally our approach for modeling MMDBS and data transforma-
relational, graph, hierarchical, text, images, audio, video, spatial,                     tions using category theory is represented in [14].
Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
                                                                                             Our future work includes researching data integration, migration,
for the volume as a collection by its editors. This volume and its papers are published   transformation, temporal, and virtual data model challenges using
under the Creative Commons License Attribution 4.0 International (CC BY 4.0).             category theory. Recent progress in applied category theory has
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal-
ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021,          shown that category theory is a very powerful framework to model
Copenhagen, Denmark) on CEUR-WS.org.                                                      and formally define complex systems.
ACKNOWLEDGMENTS                                                                               In Polystores VLDB 2018 Workshops. 16–29.
                                                                                          [9] Jiaheng Lu and Irena Holubová. 2019. Multi-model Databases: A New Journey to
This paper is partially supported by Finnish Academy Project 310321                           Handle the Variety of Data. ACM Comput. Surv. 52, 3 (2019), 55:1–55:38.
and Oracle ERO gift funding.                                                             [10] Jiaheng Lu, Irena Holubová, and Bogdan Cautis. 2018. Multi-Model Databases
                                                                                              and Tightly Integrated Polystores: Current Practices, Comparisons, and Open
                                                                                              Challenges. In Proceedings of the 27th ACM International Conference on Information
REFERENCES                                                                                    and Knowledge Management (Torino, Italy) (CIKM ’18). Association for Computing
[1] 2021. Categorical Databases. https://www.categoricaldata.net/                             Machinery, New York, NY, USA, 2301–2302. https://doi.org/10.1145/3269206.
[2] 2021. Conexus. https://conexus.com/                                                       3274269
[3] Arvind Bhope. 2021. Building a modern app with Oracle’s Converged Data-              [11] E. Riehl. 2017. Category Theory in Context. Dover Publications, 31 2nd St, Mineola,
    base. https://blogs.oracle.com/database/post/building-a-modern-app-with-                  NY 11501, USA. www.math.jhu.edu/~eriehl/context.pdf
    oracles-converged-database                                                           [12] David Spivak. 2014. Category Theory for the Sciences. (2014).
[4] Dieter Gawlick. 2004. Querying the Past, the Present, and the Future. In Pro-        [13] David I. Spivak. 2010. Functorial Data Migration. CoRR abs/1009.1166 (2010).
    ceedings of the 20th International Conference on Data Engineering, ICDE 2004, 30          arXiv:1009.1166 http://arxiv.org/abs/1009.1166
    March - 2 April 2004, Boston, MA, USA, Z. Meral Özsoyoglu and Stanley B. Zdonik      [14] Valter Uotila and Jiaheng Lu. 2021. A Formal Categorical Theoretical Framework
    (Eds.). IEEE Computer Society, 867. https://doi.org/10.1109/ICDE.2004.1320094             for Multi-Model Data Transformation, In Poly: VLDB Workshop on Polystore
[5] Torsten Grust. 2004. Monad Comprehensions: A Versatile Representation for Queries.        Systems for Heterogeneous Data in Multiple Databases with Privacy and Security
    Springer Berlin Heidelberg, Berlin, Heidelberg, 288–311. https://doi.org/10.1007/         Assurances. Poly 2021.
    978-3-662-05372-0_12                                                                 [15] Valter Uotila and Jiaheng Lu. 2021. MultiCategory demo video. https://youtu.be/
[6] S.M. Lane. 1998. Categories for the Working Mathematician. Springer New York,             uceIi91AGsg.
    233 Spring St, New York, NY 10013, USA.                                              [16] Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das, and
[7] Zhen Hua Liu and D. Gawlick. 2015. Management of Flexible Schema Data in                  Gregory Pogossiants. 2021. MultiCategory: Multi-model Query Processing Meets
    RDBMSs - Opportunities and Limitations for NoSQL -. In CIDR.                              Category Theory and Functional Programming. Proc. VLDB Endow. 14, 2663 –
[8] Zhen Hua Liu, Jiaheng Lu, Dieter Gawlick, Heli Helskyaho, Gregory Pogossiants,            2666. Issue 12. https://doi.org/10.14778/3476311.3476314
    and Zhe Wu. 2018. Multi-model Database Management Systems - A Look Forward.