=Paper=
{{Paper
|id=Vol-2929/poster6
|storemode=property
|title=Multi-model Query Processing Meets Category Theory and Functional Programming
|pdfUrl=https://ceur-ws.org/Vol-2929/poster6.pdf
|volume=Vol-2929
|authors=Valter Uotila,Jiaheng Lu,Dieter Gawlick,Zhen Hua Liu,Souripriya Das,Gregory Pogossiants
|dblpUrl=https://dblp.org/rec/conf/vldb/UotilaLGLDP21
}}
==Multi-model Query Processing Meets Category Theory and Functional Programming==
Multi-model Query Processing Meets Category Theory and
Functional Programming
Valter Uotila Dieter Gawlick Gregory Pogossiants
Jiaheng Lu Zhen Hua Liu SATS Technologies
University of Helsinki Souripriya Das gregp_21@yahoo.com
first.last@helsinki.fi Oracle Corporation
first.last@oracle.com
ABSTRACT expressions, and other complex data structures. It is required that
The current multi-model database management systems (MMDBS) MMDBS implement a single declarative query language that en-
are becoming more complex. We propose category theory as a ables users to execute cross-model queries. Another wanted feature
foundation for a new query language design, query processing, and is a unified indexing mechanism that can index multiple data in-
transformation frameworks for MMDBS. We describe the recent stances across different models. MMDBS should have the capability
challenges of MMDBS and represent possible solutions to them. to perform extensive data transformations which automatically
Finally, we propose a category theory-inspired prototype system. create views and materialize data between different models. Oracle
converged database [3] is an example of a commercial MMDBS.
Reference Format:
Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das,
and Gregory Pogossiants. Multi-model Query Processing Meets Category 3 TOWARDS MODERN MMDBS
Theory and Functional Programming. In the 2nd Workshop on Search,
Historically, we had hierarchical and network data models, and then
Exploration, and Analysis in Heterogeneous Datastores (SEA Data 2021).
the relational data model. Now, in addition to the relational model,
we have re-invented the hierarchical models as JSON/XML, and
the network models as RDF and property graphs. NoSQL system
1 INTRODUCTION
complicates the matter by forcing users to access data without
The multi-model database management systems (MMDBS) [9, 10] declarative language in a very loose transactional system. All of
are gradually becoming more complex, which creates an urgent these efforts have regressed the usability of DBMS.
need for a better theory to formalize the systems. We identify that The principle of DBMS is that there is no single data model
the end-user’s experience is often poorly addressed in the design that is the best or the worst. Therefore, it is time to introduce the
and implementation of the systems. For example, NoSQL is mainly concept of a virtual data model. Virtual data model design is similar
targeted at developers. Technology is supposed to evolve according to the concept of virtual memory in classical OS design and virtual
to the business and end-user’s needs. Higher-level abstraction can machine in modern cloud computing environment design.
simplify the systems and enable a better user experience. The modern DBMS needs to follow both schema-first or schema-
The theory should be a standard across different domains and it later paradigms and also support temporal aspects of data [4]. The
should be powerful enough to express a wide variety of concepts temporal dimension of data is often poorly implemented in DBMS.
on a suitable abstraction level. We believe that a candidate to be For example, a part of temporality is event detection which could be
such a theory is category theory. Liu et al. [8] proposed this role tackled by developing calculus logic on top of queries. The modern
to category theory to reason about declarative constructions and DBMS would benefit from the unification of meta-data and data to
transformations between various data models. The standard in- define schema-flexible storing, indexing, and querying features [7].
troduction to category theory is MacLane [6] and other good are
[11, 12].
David Spivak [13] has applied category theory to model rela- 4 DEMO SYSTEM AND CONCLUSION
tional databases in order to category theoretically migrate relational We have developed a demonstration system called MultiCategory
data. The commercial application of this category theory-based re- [15, 16] to demonstrate our solutions. The system’s backend is
lational database framework is implemented by Conexus [1, 2]. implemented with Haskell. It offers a fold function-based query
processing mechanism which is a method to model queries from a
2 CHALLENGES IN MMDBS category theoretical perspective [5]. A multi-model schema is rep-
MMDBS is characterized by the capability to handle multiple data resented as a category that is mapped to the multi-model instance.
models against a single, unified backend. The models can include Formally our approach for modeling MMDBS and data transforma-
relational, graph, hierarchical, text, images, audio, video, spatial, tions using category theory is represented in [14].
Copyright © 2021 for the individual papers by the papers’ authors. Copyright © 2021
Our future work includes researching data integration, migration,
for the volume as a collection by its editors. This volume and its papers are published transformation, temporal, and virtual data model challenges using
under the Creative Commons License Attribution 4.0 International (CC BY 4.0). category theory. Recent progress in applied category theory has
Published in the Proceedings of the 2nd Workshop on Search, Exploration, and Anal-
ysis in Heterogeneous Datastores, co-located with VLDB 2021 (August 16-20, 2021, shown that category theory is a very powerful framework to model
Copenhagen, Denmark) on CEUR-WS.org. and formally define complex systems.
ACKNOWLEDGMENTS In Polystores VLDB 2018 Workshops. 16–29.
[9] Jiaheng Lu and Irena Holubová. 2019. Multi-model Databases: A New Journey to
This paper is partially supported by Finnish Academy Project 310321 Handle the Variety of Data. ACM Comput. Surv. 52, 3 (2019), 55:1–55:38.
and Oracle ERO gift funding. [10] Jiaheng Lu, Irena Holubová, and Bogdan Cautis. 2018. Multi-Model Databases
and Tightly Integrated Polystores: Current Practices, Comparisons, and Open
Challenges. In Proceedings of the 27th ACM International Conference on Information
REFERENCES and Knowledge Management (Torino, Italy) (CIKM ’18). Association for Computing
[1] 2021. Categorical Databases. https://www.categoricaldata.net/ Machinery, New York, NY, USA, 2301–2302. https://doi.org/10.1145/3269206.
[2] 2021. Conexus. https://conexus.com/ 3274269
[3] Arvind Bhope. 2021. Building a modern app with Oracle’s Converged Data- [11] E. Riehl. 2017. Category Theory in Context. Dover Publications, 31 2nd St, Mineola,
base. https://blogs.oracle.com/database/post/building-a-modern-app-with- NY 11501, USA. www.math.jhu.edu/~eriehl/context.pdf
oracles-converged-database [12] David Spivak. 2014. Category Theory for the Sciences. (2014).
[4] Dieter Gawlick. 2004. Querying the Past, the Present, and the Future. In Pro- [13] David I. Spivak. 2010. Functorial Data Migration. CoRR abs/1009.1166 (2010).
ceedings of the 20th International Conference on Data Engineering, ICDE 2004, 30 arXiv:1009.1166 http://arxiv.org/abs/1009.1166
March - 2 April 2004, Boston, MA, USA, Z. Meral Özsoyoglu and Stanley B. Zdonik [14] Valter Uotila and Jiaheng Lu. 2021. A Formal Categorical Theoretical Framework
(Eds.). IEEE Computer Society, 867. https://doi.org/10.1109/ICDE.2004.1320094 for Multi-Model Data Transformation, In Poly: VLDB Workshop on Polystore
[5] Torsten Grust. 2004. Monad Comprehensions: A Versatile Representation for Queries. Systems for Heterogeneous Data in Multiple Databases with Privacy and Security
Springer Berlin Heidelberg, Berlin, Heidelberg, 288–311. https://doi.org/10.1007/ Assurances. Poly 2021.
978-3-662-05372-0_12 [15] Valter Uotila and Jiaheng Lu. 2021. MultiCategory demo video. https://youtu.be/
[6] S.M. Lane. 1998. Categories for the Working Mathematician. Springer New York, uceIi91AGsg.
233 Spring St, New York, NY 10013, USA. [16] Valter Uotila, Jiaheng Lu, Dieter Gawlick, Zhen Hua Liu, Souripriya Das, and
[7] Zhen Hua Liu and D. Gawlick. 2015. Management of Flexible Schema Data in Gregory Pogossiants. 2021. MultiCategory: Multi-model Query Processing Meets
RDBMSs - Opportunities and Limitations for NoSQL -. In CIDR. Category Theory and Functional Programming. Proc. VLDB Endow. 14, 2663 –
[8] Zhen Hua Liu, Jiaheng Lu, Dieter Gawlick, Heli Helskyaho, Gregory Pogossiants, 2666. Issue 12. https://doi.org/10.14778/3476311.3476314
and Zhe Wu. 2018. Multi-model Database Management Systems - A Look Forward.