=Paper=
{{Paper
|id=Vol-3186/paper1
|storemode=property
|title=Synergy between Quantum Computers and Databases
|pdfUrl=https://ceur-ws.org/Vol-3186/paper_1.pdf
|volume=Vol-3186
|authors=Valter Uotila
|dblpUrl=https://dblp.org/rec/conf/vldb/Uotila22
}}
==Synergy between Quantum Computers and Databases==
Synergy between Quantum Computers and Databases Valter Uotila supervised by Prof. Jiaheng Lu and Prof. Jukka K. Nurminen University of Helsinki, Finland Abstract Academia, industry, and societies are showing increasing interest in the possibilities of quantum computing. The research in the intersection of quantum computing and databases is still in its initial steps. This work represents several crucial data management and query processing problems that will benefit from quantum computing. We outline how quantum computing will tackle these challenges and what kind of outcomes and speed-ups we expect. We discuss the position of quantum computing in data management and raise awareness of possible security threats in encryption. We aim to be realistic and point out technical difficulties that currently restrict implementations. 1. Introduction tion [5]. The applicability of quantum computing on database query optimization is shortly outlined in [6]. There are multiple different computing paradigms be- Most of the discussion considers security issues [7, 8] sides conventional CPU-based computing. Nowadays, which are also the most urgent questions. The National the most exciting computing paradigm is quantum com- Institute of Standards and Technology (NIST) has started puting. It is based on quantum mechanics [1] although the post-quantum cryptography standardization process the modern quantum computing software [2, 3] can be [9] because quantum computers will break RSA public- used with almost no knowledge of quantum physics. key cryptosystem in the future. The quantum computers differ in their hardware. The The future collapse of RSA and other classical encryp- most common hardware implementations are super- tion methods should also concern the database commu- conducting (IBM, Google, Rigetti), photonic (Xanadu), nity. Databases and cloud infrastructures will need to trapped ion (IonQ, Honeywell), adiabatic (D-wave), and implement quantum-safe encryption methods. Never- silicon spin qubits (Intel, HRL). Amazon Braket, IBM theless, that does not mean that future security systems Quantum, Xanadu, and D-wave Leap offer access to quan- must be built on quantum computers. NIST evaluates tum computers and simulators in the cloud. The wide encryption methods based on mathematical construc- variety of hardware types shows that none of the types tions considered too hard to be solved even on quantum has yet become standard, and the competition between computers. the quantum hardware companies is still going on. The future will show which quantum computing hardware type will become dominant. 2. Universal quantum circuit Quantum computers will not take over classical com- model and quantum annealing puting. Instead, they will be computing units, like GPU processors or supercomputers, along with classical com- We can roughly divide the quantum computing field into puters and databases. We can send them specific and two parts depending on the hardware. These are NISQ computationally complex problems. Thus the hybrid (Noisy Intermediate-Scale Quantum) era hardware and approach will be the most realistic option for practical quantum annealers. Algorithms for NISQ devices are quantum computing. implemented with the universal quantum circuit model. The classical introduction to quantum circuit model- 1.1. Related work based quantum computing is [1]. Classical computing is based on bits, 0 and 1, whereas quantum computing is There is relatively little research that applies quantum based on qubits computing to databases and data management. Recent database research has applied quantum computing in π = πΌ|0β© + π½|1β©, transaction scheduling [4], and multiple query optimiza- where πΌ, π½ β C are complex numbers whose norms sat- isfy |πΌ|2 + |π½|2 = 1. The numbers πΌ and π½ are called PhD Workshop of the 48th International Conference on Very Large probability amplitudes. The requirement |πΌ|2 + |π½|2 = 1 Databases (VLDB 2022) can be interpreted probabilistic way: the outcome |0β© is Envelope-Open valter.uotila@helsinki (V. Uotila) determined by the amplitude πΌ so that the probability of Β© 2022 Proceedings of the VLDB 2022 PhD Workshop, September 5, 2022. Sydney, Australia. Copyright (C) 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). measuring |0β© is |πΌ|2 . The probability of obtaining either CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 |0β© or |1β© must be 1. In general, a quantum system consists of multiple qubits. The system usually starts from the state |00 β― 0β©. The quantum computation proceeds by applying quan- tum logic gates to the system. Mathematically the gates are unitary matrices whose entries are complex numbers. Finally, the system is measured and ends up in a state determined by the probabilities associated with the am- plitudes. The measured outcome is the result of the algo- rithm, a sequence of classical bits. Quantum annealing is similar to simulated annealing but performed on a quantum computer utilizing certain quantum mechanical phenomena. Quantum annealing Figure 1: An example of the possibilities of joining three relations π 1 , π 2 and π 3 . is a heuristic method. To solve problems with quan- tum annealers, the problems are formulated in either the quadratic unconstrained binary optimization (QUBO) model or the Ising model, which are equivalent. As the Figure 1 represents the join order problem for three formulations in [10] indicate, we can express many fun- relations π 1 , π 2 and π 3 . The problem fast becomes in- damental and NP-hard computer science problems using feasible to solve exactly. Each edge carries a weight that Ising and QUBO models. That shows strong evidence describes the cost of joining the relations. The solution that many NP-hard problems in the database field can be to the problem is a join order tree with a minimum total solved using quantum computers. weight over all the join order trees. The tree must contain The comprehensive introduction on formulating prob- the single relations and the relation π 1 β π 2 β π 3 . lems using Ising and QUBO models is [10]. Let π₯π for When we express the problem with the QUBO model, π = 1, β¦ , π be binary variables. Then the objective func- we define the binary variables as the graphβs vertices in tion is Figure 1. The so-called couplers i.e. the variable pairs π π π π₯π π₯π in objective function (1) describe the edges. The π (π₯) = β β ππ,π π₯π π₯π + β ππ π₯π , (1) π=1 π=π+1 π=1 coefficients ππ,π in objective function (1) are determined by the cost function. The cost function estimates the where π₯π β {0, 1} and ππ,π , ππ β R for 1 β€ π, π β€ π. where cost of joining the relations. The solutionβs quality also coefficients ππ,π β R. The main goal of quantum annealing depends on how accurately the cost function can estimate is to find a point π₯ β such that π (π₯ β ) is a global minimum the joins. We impose constraints so that the result of the of the objective function. The leading company building minimization problem is a valid join order tree whose quantum annealers is D-wave. total weight over the selected edges is the minimum. In the final model, we need auxiliary variables to produce correct results. We are also planning to utilize a hybrid 3. Research directions approach since the number of variables and connectivity Although various optimization methods have been cen- is large in the problem. tral topics of database research since the beginning, there is still much room for improvement. This section out- 3.2. Size bound of conjunctive queries lines research directions on optimizing databases with quantum computing. Determining the size bound for conjunctive relational queries is another classical optimization problem for re- lational databases. The problem is defined and mostly 3.1. Join order optimization solved in [13, 14]. The most complicated and general case We propose a research direction to study the join order with arbitrary functional dependencies stays open since selection, a classical database optimization problem. The it is strongly related to an unsolved and hard problem problem is relatively well-studied [11]. The classical so- related to information-theoretical entropy vectors. lution to the problem uses dynamic programming. The More precisely the problem is to determine an ex- current research has tackled the problem with deep rein- ponent π (π) for a conjunctive query π = π 0 (π’0 ) β forcement learning [12]. π 1 (π’1 ) β§ β― β§ π π (π’π ) in a database π· which is given as These kinds of combinatorically hard problems are a sequence of relations π 1 , β¦ , π π . The size bound for the the best match for quantum computing. Studying the query π in the database π· is then problem using quantum computing, we aim to speed up |π(π·)| β€ rmax(π·)π (π) , the computation and improve the quality of the solution by extending the search space. where rmax(π·) is the largest relation in the database π·. The value π (π) is a solution to a linear program that parameterized circuits for SQL queries will be trained maximizes the entropy of the variable π’0 concerning cer- to estimate the query execution time, resultβs size, and tain information theoretic constraints. The constraints cost. We are studying if the circuits will perform faster arise from the functional dependencies in the database π·. than the current estimation methods [24] and if they will Unfortunately, to obtain the optimal solution, the linear provide better-quality estimations. program has infinitely many constraints and becomes Natural continuation for this work is also express infeasible. Since the solutions to the size bounds are rep- graph and document query languages with their diagram- resented as linear programs, there is already a known matic representations and study query transformations quantum algorithm [15] to solve them. between them. We have worked on data transforma- We believe that quantum computing will provide new tions [17] between relational, graph, and tree data mod- aspects to modeling the problem. Because the problem els. Query transformations are the next step, and they is probabilistic and classically very hard, the quantum are an essential part of multi-model databases. As cat- computing approach might produce good heuristic re- egory theory creates a consistent connection between sults and help understand the problem from a different schema and data [19], it will also create a solid connection angle. It is interesting to transfer the current solutions between queries and instances. Because the query trans- into a quantum computing format and research possible formation depends on the database schema and instance, speed-ups and quality improvements. category theory will be a suitable theoretical framework for defining the transformations. 3.3. Category theoretical methods for quantum computing and databases 4. Conclusion In our previous research [16, 17] we have developed a Academia, industry, and societies show growing interest conceptual framework to unify and model multi-model in the possibilities of quantum computing. The opti- databases using category theory [18]. Our work has been mization of data management and query processing will a continuation of the research, which proves that rela- benefit from quantum computing frameworks. Although tional databases are naturally category theoretical [19]. quantum computers are at an early stage, developing Category theory is a meta-mathematical field of mathe- algorithms to tackle the presented challenges is essential. matics, but it also has a lot of concrete applications, which The database community should be aware of the secu- are represented, for example, at the annual Applied Cate- rity threats to the encryption that quantum computing gory Theory conference. Multi-model databases [20] are eventually will create. a particular class of databases whose single, integrated This work proposes multiple research directions backend supports multiple data models and formats. In to study join order optimization, determine the size modern data management systems, it is crucial that we bound of conjunctive queries, transform queries into can handle a wide variety of data models seamlessly. parametrized circuits and perform query transformations In [21] authors develop a comprehensive diagrammatic using diagrammatic formalism. Quantum computers will theory to model quantum processes and quantum com- solve countless problems in database research, and we puting. The approach is based on category theory. Since are excited to be part of this new research area. our previous research on multi-model databases [16, 17] is also based on category theory, this approach appears promising to connect the fields theoretically. Category References theory as a high-level theoretical framework is needed since quantum computers are growing, and the gate- [1] M. A. Nielsen, I. L. Chuang, Quantum Compu- level design and coding of the algorithms are becoming tation and Quantum Information: 10th Anniver- harder. We need rigorous theoretical tools to model and sary Edition, Cambridge University Press, 2010. design higher-level quantum computing systems, and doi:10.1017/CBO9780511976667 . the category theory-based approach seems to tackle the [2] M. Treinish, J. Gambetta, P. N. et al., Qiskit/qiskit: challenge well. Qiskit 0.34.2, 2022. URL: https://doi.org/10.5281/ In [21, 22] authors use the category theory-based zenodo.6027041. doi:10.5281/zenodo.6027041 . ideas to develop quantum natural language processing [3] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, M. S. (QNLP) based on Lambekβs pregroup grammars [23] and Alam, S. Ahmed, J. M. Arrazola, C. Blank, A. Del- parametrized quantum circuits. We are working on ap- gado, S. Jahangiri, K. McKiernan, J. J. Meyer, Z. Niu, plying these diagrammatic and category theory-based A. SzΓ‘va, N. Killoran, Pennylane: Automatic dif- methods to represent SQL queries as parameterized quan- ferentiation of hybrid quantum-classical computa- tum circuits. SQL is based on context-free grammar, tions, 2020. arXiv:1811.04968 . equivalent to pregroup grammar in a certain sense. The [4] T. Bittner, S. Groppe, Avoiding blocking by schedul- URL: https://doi.org/10.1109/FOCS.2008.43. doi:10. ing transactions using quantum annealing, in: 1109/FOCS.2008.43 . Proceedings of the 24th Symposium on Interna- [14] G. Gottlob, S. T. Lee, G. Valiant, P. Valiant, Size tional Database Engineering & Applications, IDEAS and treewidth bounds for conjunctive queries, β20, Association for Computing Machinery, New J. ACM 59 (2012). URL: https://doi.org/10.1145/ York, NY, USA, 2020. URL: https://doi.org/10.1145/ 2220357.2220363. doi:10.1145/2220357.2220363 . 3410566.3410593. doi:10.1145/3410566.3410593 . [15] A. W. Harrow, A. Hassidim, S. Lloyd, Quantum [5] I. Trummer, C. Koch, Multiple query optimiza- algorithm for linear systems of equations, Phys. Rev. tion on the d-wave 2x adiabatic quantum computer, Lett. 103 (2009) 150502. URL: https://link.aps.org/ Proc. VLDB Endow. 9 (2016) 648β659. URL: https: doi/10.1103/PhysRevLett.103.150502. doi:10.1103/ //doi.org/10.14778/2947618.2947621. doi:10.14778/ PhysRevLett.103.150502 . 2947618.2947621 . [16] V. Uotila, J. Lu, D. Gawlick, Z. H. Liu, S. Das, [6] M. SchΓΆnberger, Applicability of quantum com- G. Pogossiants, Multicategory: Multi-model query puting on database query optimization, in: ACM processing meets category theory and functional SIGMOD Student Research Competition 2022 (un- programming, Proc. VLDB Endow. 14 (2021) dergraduate), 2022. 2663β2666. URL: https://doi.org/10.14778/3476311. [7] W. Liu, P. Gao, Z. Liu, H. Chen, M. Zhang, A 3476314. doi:10.14778/3476311.3476314 . quantum-based database query scheme for pri- [17] V. Uotila, J. Lu, A formal category theoretical frame- vacy preservation in cloud environment, Secu- work for multi-model data transformations, in: E. K. rity and Communication Networks 2019 (2019) Rezig, V. Gadepally, T. Mattson, M. Stonebraker, 1β14. URL: http://dx.doi.org/10.1155/2019/4923590. T. Kraska, F. Wang, G. Luo, J. Kong, A. Dubovit- doi:10.1155/2019/4923590 . skaya (Eds.), Heterogeneous Data Management, [8] V. Giovannetti, S. Lloyd, L. Maccone, Quantum pri- Polystores, and Analytics for Healthcare, Springer vate queries, Phys. Rev. Lett. 100 (2008) 230502. URL: International Publishing, Cham, 2021, pp. 14β28. https://link.aps.org/doi/10.1103/PhysRevLett.100. [18] E. Riehl, Category Theory in Context, Aurora: 230502. doi:10.1103/PhysRevLett.100.230502 . Dover Modern Math Originals, Dover Publications, [9] G. Alagic, J. Alperin-Sheriff, D. Apon, D. Cooper, 2017. URL: https://math.jhu.edu/~eriehl/context. Q. Dang, J. Kelsey, Y.-K. Liu, C. Miller, D. Moody, pdf. R. Peralta, R. Perlner, A. Robinson, D. Smith-Tone, [19] D. I. Spivak, Functorial data migration, 2013. Status Report on the Second Round of the NIST Post- arXiv:1009.1166 . Quantum Cryptography Standardization Process, [20] J. Lu, I. HolubovΓ‘, Multi-model databases: A new NIST Internal or Interagency Report (NISTIR) 8309, journey to handle the variety of data, ACM Com- 2020. URL: https://csrc.nist.gov/publications/detail/ put. Surv. 52 (2019). URL: https://doi.org/10.1145/ nistir/8309/final. doi:10.6028/NIST.IR.8309 . 3323214. doi:10.1145/3323214 . [10] A. Lucas, Ising formulations of many np problems, [21] B. Coecke, A. Kissinger, Picturing Quantum Pro- Frontiers in Physics 2 (2014). URL: http://dx.doi.org/ cesses: A First Course in Quantum Theory and 10.3389/fphy.2014.00005. doi:10.3389/fphy.2014. Diagrammatic Reasoning, Cambridge University 00005 . Press, 2017. doi:10.1017/9781316219317 . [11] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin, [22] R. Piedeleu, D. Kartsaklis, B. Coecke, M. Sadrzadeh, R. A. Lorie, T. G. Price, Access path selection in a Open system categorical quantum semantics in nat- relational database management system, in: Pro- ural language processing, arXiv:1502.00831 [cs, ceedings of the 1979 ACM SIGMOD International math] (2015). URL: http://arxiv.org/abs/1502.00831, Conference on Management of Data, SIGMOD β79, arXiv: 1502.00831. Association for Computing Machinery, New York, [23] J. Lambek, Type Grammar Revisited, volume 1582 NY, USA, 1979, p. 23β34. URL: https://doi.org/10. of Lecture Notes in Computer Science, Springer 1145/582095.582099. doi:10.1145/582095.582099 . Berlin Heidelberg, 1999, p. 1β27. URL: http:// [12] S. Krishnan, Z. Yang, K. Goldberg, J. Heller- link.springer.com/10.1007/3-540-48975-4_1. doi:10. stein, I. Stoica, Learning to optimize join 1007/3- 540- 48975- 4_1 . queries with deep reinforcement learning, 2019. [24] H. Lan, Z. Bao, Y. Peng, A survey on advanc- arXiv:1808.03196 . ing the dbms query optimizer: Cardinality esti- [13] A. Atserias, M. Grohe, D. Marx, Size bounds mation, cost model, and plan enumeration, Data and query plans for relational joins, in: Proceed- Science and Engineering 6 (2021) 86β101. doi:10. ings of the 2008 49th Annual IEEE Symposium 1007/s41019- 020- 00149- 7 . on Foundations of Computer Science, FOCS β08, IEEE Computer Society, USA, 2008, p. 739β748.