Multi-model Query Processing Meets Category Theory and Functional Programming Valter Uotila Dieter Gawlick Gregory Pogossiants Jiaheng Lu Zhen Hua Liu SATS Technologies University of Helsinki Souripriya Das gregp_21@yahoo.com first.last@helsinki.fi Oracle Corporation first.last@oracle.com ABSTRACT expressions, and other complex data structures. It is required that The current multi-model database management systems (MMDBS) MMDBS implement a single declarative query language that en- are becoming more complex. We propose category theory as a ables users to execute cross-model queries. Another wanted feature foundation for a new query language design, query processing, and is a unified indexing mechanism that can index multiple data in- transformation frameworks for MMDBS. We describe the recent stances across different models. MMDBS should have the capability challenges of MMDBS and represent possible solutions to them. to perform extensive data transformations which automatically Finally, we propose a category theory-inspired prototype system. create views and materialize data between different models. Oracle converged database [3] is an example of a commercial MMDBS. Reference Format: Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das, and Gregory Pogossiants. Multi-model Query Processing Meets Category 3 TOWARDS MODERN MMDBS Theory and Functional Programming. In the 2nd Workshop on Search, Historically, we had hierarchical and network data models, and then Exploration, and Analysis in Heterogeneous Datastores (SEA Data 2021). the relational data model. Now, in addition to the relational model, we have re-invented the hierarchical models as JSON/XML, and the network models as RDF and property graphs. NoSQL system 1 INTRODUCTION complicates the matter by forcing users to access data without The multi-model database management systems (MMDBS) [9, 10] declarative language in a very loose transactional system. All of are gradually becoming more complex, which creates an urgent these efforts have regressed the usability of DBMS. need for a better theory to formalize the systems. We identify that The principle of DBMS is that there is no single data model the end-user’s experience is often poorly addressed in the design that is the best or the worst. Therefore, it is time to introduce the and implementation of the systems. For example, NoSQL is mainly concept of a virtual data model. Virtual data model design is similar targeted at developers. Technology is supposed to evolve according to the concept of virtual memory in classical OS design and virtual to the business and end-user’s needs. Higher-level abstraction can machine in modern cloud computing environment design. simplify the systems and enable a better user experience. The modern DBMS needs to follow both schema-first or schema- The theory should be a standard across different domains and it later paradigms and also support temporal aspects of data [4]. The should be powerful enough to express a wide variety of concepts temporal dimension of data is often poorly implemented in DBMS. on a suitable abstraction level. We believe that a candidate to be For example, a part of temporality is event detection which could be such a theory is category theory. Liu et al. [8] proposed this role tackled by developing calculus logic on top of queries. The modern to category theory to reason about declarative constructions and DBMS would benefit from the unification of meta-data and data to transformations between various data models. The standard in- define schema-flexible storing, indexing, and querying features [7]. troduction to category theory is MacLane [6] and other good are [11, 12]. David Spivak [13] has applied category theory to model rela- 4 DEMO SYSTEM AND CONCLUSION tional databases in order to category theoretically migrate relational We have developed a demonstration system called MultiCategory data. The commercial application of this category theory-based re- [15, 16] to demonstrate our solutions. The system’s backend is lational database framework is implemented by Conexus [1, 2]. implemented with Haskell. It offers a fold function-based query processing mechanism which is a method to model queries from a 2 CHALLENGES IN MMDBS category theoretical perspective [5]. A multi-model schema is rep- MMDBS is characterized by the capability to handle multiple data resented as a category that is mapped to the multi-model instance. models against a single, unified backend. The models can include Formally our approach for modeling MMDBS and data transforma- relational, graph, hierarchical, text, images, audio, video, spatial, tions using category theory is represented in [14]. Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021 Our future work includes researching data integration, migration, for the volume as a collection by its editors. This volume and its papers are published transformation, temporal, and virtual data model challenges using under the Creative Commons License Attribution 4.0 International (CC BY 4.0). category theory. Recent progress in applied category theory has Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal- ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021, shown that category theory is a very powerful framework to model Copenhagen, Denmark) on CEUR-WS.org. and formally define complex systems. ACKNOWLEDGMENTS In Polystores VLDB 2018 Workshops. 16–29. [9] Jiaheng Lu and Irena Holubová. 2019. Multi-model Databases: A New Journey to This paper is partially supported by Finnish Academy Project 310321 Handle the Variety of Data. ACM Comput. Surv. 52, 3 (2019), 55:1–55:38. and Oracle ERO gift funding. [10] Jiaheng Lu, Irena Holubová, and Bogdan Cautis. 2018. Multi-Model Databases and Tightly Integrated Polystores: Current Practices, Comparisons, and Open Challenges. In Proceedings of the 27th ACM International Conference on Information REFERENCES and Knowledge Management (Torino, Italy) (CIKM ’18). Association for Computing [1] 2021. Categorical Databases. https://www.categoricaldata.net/ Machinery, New York, NY, USA, 2301–2302. https://doi.org/10.1145/3269206. [2] 2021. Conexus. https://conexus.com/ 3274269 [3] Arvind Bhope. 2021. Building a modern app with Oracle’s Converged Data- [11] E. Riehl. 2017. Category Theory in Context. Dover Publications, 31 2nd St, Mineola, base. https://blogs.oracle.com/database/post/building-a-modern-app-with- NY 11501, USA. www.math.jhu.edu/~eriehl/context.pdf oracles-converged-database [12] David Spivak. 2014. Category Theory for the Sciences. (2014). [4] Dieter Gawlick. 2004. Querying the Past, the Present, and the Future. In Pro- [13] David I. Spivak. 2010. Functorial Data Migration. CoRR abs/1009.1166 (2010). ceedings of the 20th International Conference on Data Engineering, ICDE 2004, 30 arXiv:1009.1166 http://arxiv.org/abs/1009.1166 March - 2 April 2004, Boston, MA, USA, Z. Meral Özsoyoglu and Stanley B. Zdonik [14] Valter Uotila and Jiaheng Lu. 2021. A Formal Categorical Theoretical Framework (Eds.). IEEE Computer Society, 867. https://doi.org/10.1109/ICDE.2004.1320094 for Multi-Model Data Transformation, In Poly: VLDB Workshop on Polystore [5] Torsten Grust. 2004. Monad Comprehensions: A Versatile Representation for Queries. Systems for Heterogeneous Data in Multiple Databases with Privacy and Security Springer Berlin Heidelberg, Berlin, Heidelberg, 288–311. https://doi.org/10.1007/ Assurances. Poly 2021. 978-3-662-05372-0_12 [15] Valter Uotila and Jiaheng Lu. 2021. MultiCategory demo video. https://youtu.be/ [6] S.M. Lane. 1998. Categories for the Working Mathematician. Springer New York, uceIi91AGsg. 233 Spring St, New York, NY 10013, USA. [16] Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das, and [7] Zhen Hua Liu and D. Gawlick. 2015. Management of Flexible Schema Data in Gregory Pogossiants. 2021. MultiCategory: Multi-model Query Processing Meets RDBMSs - Opportunities and Limitations for NoSQL -. In CIDR. Category Theory and Functional Programming. Proc. VLDB Endow. 14, 2663 – [8] Zhen Hua Liu, Jiaheng Lu, Dieter Gawlick, Heli Helskyaho, Gregory Pogossiants, 2666. Issue 12. https://doi.org/10.14778/3476311.3476314 and Zhe Wu. 2018. Multi-model Database Management Systems - A Look Forward.