=Paper= {{Paper |id=Vol-3186/paper1 |storemode=property |title=Synergy between Quantum Computers and Databases |pdfUrl=https://ceur-ws.org/Vol-3186/paper_1.pdf |volume=Vol-3186 |authors=Valter Uotila |dblpUrl=https://dblp.org/rec/conf/vldb/Uotila22 }} ==Synergy between Quantum Computers and Databases== https://ceur-ws.org/Vol-3186/paper_1.pdf
Synergy between Quantum Computers and Databases
Valter Uotila
supervised by Prof. Jiaheng Lu and Prof. Jukka K. Nurminen
University of Helsinki, Finland


                                    Abstract
                                    Academia, industry, and societies are showing increasing interest in the possibilities of quantum computing. The research
                                    in the intersection of quantum computing and databases is still in its initial steps. This work represents several crucial
                                    data management and query processing problems that will benefit from quantum computing. We outline how quantum
                                    computing will tackle these challenges and what kind of outcomes and speed-ups we expect. We discuss the position of
                                    quantum computing in data management and raise awareness of possible security threats in encryption. We aim to be realistic
                                    and point out technical difficulties that currently restrict implementations.


1. Introduction                                                                             tion [5]. The applicability of quantum computing on
                                                                                            database query optimization is shortly outlined in [6].
There are multiple different computing paradigms be-                                           Most of the discussion considers security issues [7, 8]
sides conventional CPU-based computing. Nowadays,                                           which are also the most urgent questions. The National
the most exciting computing paradigm is quantum com-                                        Institute of Standards and Technology (NIST) has started
puting. It is based on quantum mechanics [1] although                                       the post-quantum cryptography standardization process
the modern quantum computing software [2, 3] can be                                         [9] because quantum computers will break RSA public-
used with almost no knowledge of quantum physics.                                           key cryptosystem in the future.
   The quantum computers differ in their hardware. The                                         The future collapse of RSA and other classical encryp-
most common hardware implementations are super-                                             tion methods should also concern the database commu-
conducting (IBM, Google, Rigetti), photonic (Xanadu),                                       nity. Databases and cloud infrastructures will need to
trapped ion (IonQ, Honeywell), adiabatic (D-wave), and                                      implement quantum-safe encryption methods. Never-
silicon spin qubits (Intel, HRL). Amazon Braket, IBM                                        theless, that does not mean that future security systems
Quantum, Xanadu, and D-wave Leap offer access to quan-                                      must be built on quantum computers. NIST evaluates
tum computers and simulators in the cloud. The wide                                         encryption methods based on mathematical construc-
variety of hardware types shows that none of the types                                      tions considered too hard to be solved even on quantum
has yet become standard, and the competition between                                        computers.
the quantum hardware companies is still going on. The
future will show which quantum computing hardware
type will become dominant.                                                                  2. Universal quantum circuit
   Quantum computers will not take over classical com-                                         model and quantum annealing
puting. Instead, they will be computing units, like GPU
processors or supercomputers, along with classical com-                                     We can roughly divide the quantum computing field into
puters and databases. We can send them specific and                                         two parts depending on the hardware. These are NISQ
computationally complex problems. Thus the hybrid                                           (Noisy Intermediate-Scale Quantum) era hardware and
approach will be the most realistic option for practical                                    quantum annealers. Algorithms for NISQ devices are
quantum computing.                                                                          implemented with the universal quantum circuit model.
                                                                                              The classical introduction to quantum circuit model-
1.1. Related work                                                                           based quantum computing is [1]. Classical computing is
                                                                                            based on bits, 0 and 1, whereas quantum computing is
There is relatively little research that applies quantum                                    based on qubits
computing to databases and data management. Recent
database research has applied quantum computing in                                                                           πœ‘ = 𝛼|0⟩ + 𝛽|1⟩,
transaction scheduling [4], and multiple query optimiza- where 𝛼, 𝛽 ∈ C are complex numbers whose norms sat-
                                                                                                      isfy |𝛼|2 + |𝛽|2 = 1. The numbers 𝛼 and 𝛽 are called
PhD Workshop of the 48th International Conference on Very Large                                       probability amplitudes. The requirement |𝛼|2 + |𝛽|2 = 1
Databases (VLDB 2022)                                                                                 can be interpreted probabilistic way: the outcome |0⟩ is
Envelope-Open valter.uotila@helsinki (V. Uotila)                                                      determined by the amplitude 𝛼 so that the probability of
                     Β© 2022 Proceedings of the VLDB 2022 PhD Workshop, September 5, 2022. Sydney,
                     Australia. Copyright (C) 2022 for this paper by its authors. Use permitted under
                     Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                                                                                      measuring |0⟩ is |𝛼|2 . The probability of obtaining either
 CEUR
 Workshop
 Proceedings         CEUR Workshop Proceedings (CEUR-WS.org)
               http://ceur-ws.org
               ISSN 1613-0073
                                                                                                      |0⟩ or |1⟩ must be 1.
   In general, a quantum system consists of multiple
qubits. The system usually starts from the state |00 β‹― 0⟩.
The quantum computation proceeds by applying quan-
tum logic gates to the system. Mathematically the gates
are unitary matrices whose entries are complex numbers.
   Finally, the system is measured and ends up in a state
determined by the probabilities associated with the am-
plitudes. The measured outcome is the result of the algo-
rithm, a sequence of classical bits.
   Quantum annealing is similar to simulated annealing
but performed on a quantum computer utilizing certain
quantum mechanical phenomena. Quantum annealing               Figure 1: An example of the possibilities of joining three
                                                              relations 𝑅1 , 𝑅2 and 𝑅3 .
is a heuristic method. To solve problems with quan-
tum annealers, the problems are formulated in either
the quadratic unconstrained binary optimization (QUBO)
model or the Ising model, which are equivalent. As the             Figure 1 represents the join order problem for three
formulations in [10] indicate, we can express many fun-        relations 𝑅1 , 𝑅2 and 𝑅3 . The problem fast becomes in-
damental and NP-hard computer science problems using           feasible to solve exactly. Each edge carries a weight that
Ising and QUBO models. That shows strong evidence              describes the cost of joining the relations. The solution
that many NP-hard problems in the database field can be        to the problem is a join order tree with a minimum total
solved using quantum computers.                                weight over all the join order trees. The tree must contain
   The comprehensive introduction on formulating prob-         the single relations and the relation 𝑅1 β‹ˆ 𝑅2 β‹ˆ 𝑅3 .
lems using Ising and QUBO models is [10]. Let π‘₯𝑖 for               When we express the problem with the QUBO model,
𝑖 = 1, … , 𝑛 be binary variables. Then the objective func-     we define the binary variables as the graph’s vertices in
tion is                                                        Figure 1. The so-called couplers i.e. the variable pairs
                       𝑛   𝑛            𝑛
                                                               π‘₯𝑖 π‘₯𝑗 in objective function (1) describe the edges. The
              𝑓 (π‘₯) = βˆ‘ βˆ‘ π‘Žπ‘–,𝑗 π‘₯𝑖 π‘₯𝑗 + βˆ‘ 𝑏𝑖 π‘₯𝑖 ,           (1)
                       𝑖=1 𝑗=𝑖+1           𝑖=1
                                                               coefficients 𝑐𝑖,𝑗 in objective function (1) are determined
                                                               by the cost function. The cost function estimates the
where π‘₯𝑖 ∈ {0, 1} and π‘Žπ‘–,𝑗 , 𝑏𝑖 ∈ R for 1 ≀ 𝑖, 𝑗 ≀ 𝑛. where cost of joining the relations. The solution’s quality also
coefficients 𝑐𝑖,𝑗 ∈ R. The main goal of quantum annealing depends on how accurately the cost function can estimate
is to find a point π‘₯ βˆ— such that 𝑓 (π‘₯ βˆ— ) is a global minimum the joins. We impose constraints so that the result of the
of the objective function. The leading company building minimization problem is a valid join order tree whose
quantum annealers is D-wave.                                   total weight over the selected edges is the minimum. In
                                                               the final model, we need auxiliary variables to produce
                                                               correct results. We are also planning to utilize a hybrid
3. Research directions                                         approach since the number of variables and connectivity
Although various optimization methods have been cen- is large in the problem.
tral topics of database research since the beginning, there
is still much room for improvement. This section out- 3.2. Size bound of conjunctive queries
lines research directions on optimizing databases with
quantum computing.                                             Determining the size bound for conjunctive relational
                                                               queries is another classical optimization problem for re-
                                                               lational databases. The problem is defined and mostly
3.1. Join order optimization                                   solved in [13, 14]. The most complicated and general case
We propose a research direction to study the join order        with   arbitrary functional dependencies stays open since
selection, a classical database optimization problem. The it is strongly related to an unsolved and hard problem
problem is relatively well-studied [11]. The classical so- related to information-theoretical entropy vectors.
lution to the problem uses dynamic programming. The                More precisely the problem is to determine an ex-
current research has tackled the problem with deep rein- ponent 𝑠(𝑄) for a conjunctive query 𝑄 = 𝑅0 (𝑒0 ) ←
forcement learning [12].                                       𝑅1 (𝑒1 ) ∧ β‹― ∧ 𝑅𝑛 (π‘’π‘š ) in a database 𝐷 which is given as
   These kinds of combinatorically hard problems are a sequence of relations 𝑅1 , … , 𝑅𝑛 . The size bound for the
the best match for quantum computing. Studying the query 𝑄 in the database 𝐷 is then
problem using quantum computing, we aim to speed up
                                                                                |𝑄(𝐷)| ≀ rmax(𝐷)𝑠(𝑄) ,
the computation and improve the quality of the solution
by extending the search space.                                where rmax(𝐷) is the largest relation in the database
𝐷. The value 𝑠(𝑄) is a solution to a linear program that        parameterized circuits for SQL queries will be trained
maximizes the entropy of the variable 𝑒0 concerning cer-        to estimate the query execution time, result’s size, and
tain information theoretic constraints. The constraints         cost. We are studying if the circuits will perform faster
arise from the functional dependencies in the database 𝐷.       than the current estimation methods [24] and if they will
Unfortunately, to obtain the optimal solution, the linear       provide better-quality estimations.
program has infinitely many constraints and becomes                Natural continuation for this work is also express
infeasible. Since the solutions to the size bounds are rep-     graph and document query languages with their diagram-
resented as linear programs, there is already a known           matic representations and study query transformations
quantum algorithm [15] to solve them.                           between them. We have worked on data transforma-
   We believe that quantum computing will provide new           tions [17] between relational, graph, and tree data mod-
aspects to modeling the problem. Because the problem            els. Query transformations are the next step, and they
is probabilistic and classically very hard, the quantum         are an essential part of multi-model databases. As cat-
computing approach might produce good heuristic re-             egory theory creates a consistent connection between
sults and help understand the problem from a different          schema and data [19], it will also create a solid connection
angle. It is interesting to transfer the current solutions      between queries and instances. Because the query trans-
into a quantum computing format and research possible           formation depends on the database schema and instance,
speed-ups and quality improvements.                             category theory will be a suitable theoretical framework
                                                                for defining the transformations.
3.3. Category theoretical methods for
     quantum computing and databases                            4. Conclusion
In our previous research [16, 17] we have developed a           Academia, industry, and societies show growing interest
conceptual framework to unify and model multi-model             in the possibilities of quantum computing. The opti-
databases using category theory [18]. Our work has been         mization of data management and query processing will
a continuation of the research, which proves that rela-         benefit from quantum computing frameworks. Although
tional databases are naturally category theoretical [19].       quantum computers are at an early stage, developing
Category theory is a meta-mathematical field of mathe-          algorithms to tackle the presented challenges is essential.
matics, but it also has a lot of concrete applications, which   The database community should be aware of the secu-
are represented, for example, at the annual Applied Cate-       rity threats to the encryption that quantum computing
gory Theory conference. Multi-model databases [20] are          eventually will create.
a particular class of databases whose single, integrated           This work proposes multiple research directions
backend supports multiple data models and formats. In           to study join order optimization, determine the size
modern data management systems, it is crucial that we           bound of conjunctive queries, transform queries into
can handle a wide variety of data models seamlessly.            parametrized circuits and perform query transformations
   In [21] authors develop a comprehensive diagrammatic         using diagrammatic formalism. Quantum computers will
theory to model quantum processes and quantum com-              solve countless problems in database research, and we
puting. The approach is based on category theory. Since         are excited to be part of this new research area.
our previous research on multi-model databases [16, 17]
is also based on category theory, this approach appears
promising to connect the fields theoretically. Category         References
theory as a high-level theoretical framework is needed
since quantum computers are growing, and the gate-               [1] M. A. Nielsen, I. L. Chuang, Quantum Compu-
level design and coding of the algorithms are becoming               tation and Quantum Information: 10th Anniver-
harder. We need rigorous theoretical tools to model and              sary Edition, Cambridge University Press, 2010.
design higher-level quantum computing systems, and                   doi:10.1017/CBO9780511976667 .
the category theory-based approach seems to tackle the           [2] M. Treinish, J. Gambetta, P. N. et al., Qiskit/qiskit:
challenge well.                                                      Qiskit 0.34.2, 2022. URL: https://doi.org/10.5281/
   In [21, 22] authors use the category theory-based                 zenodo.6027041. doi:10.5281/zenodo.6027041 .
ideas to develop quantum natural language processing             [3] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, M. S.
(QNLP) based on Lambek’s pregroup grammars [23] and                  Alam, S. Ahmed, J. M. Arrazola, C. Blank, A. Del-
parametrized quantum circuits. We are working on ap-                 gado, S. Jahangiri, K. McKiernan, J. J. Meyer, Z. Niu,
plying these diagrammatic and category theory-based                  A. SzΓ‘va, N. Killoran, Pennylane: Automatic dif-
methods to represent SQL queries as parameterized quan-              ferentiation of hybrid quantum-classical computa-
tum circuits. SQL is based on context-free grammar,                  tions, 2020. arXiv:1811.04968 .
equivalent to pregroup grammar in a certain sense. The
 [4] T. Bittner, S. Groppe, Avoiding blocking by schedul-          URL: https://doi.org/10.1109/FOCS.2008.43. doi:10.
     ing transactions using quantum annealing, in:                 1109/FOCS.2008.43 .
     Proceedings of the 24th Symposium on Interna-            [14] G. Gottlob, S. T. Lee, G. Valiant, P. Valiant, Size
     tional Database Engineering & Applications, IDEAS             and treewidth bounds for conjunctive queries,
     ’20, Association for Computing Machinery, New                 J. ACM 59 (2012). URL: https://doi.org/10.1145/
     York, NY, USA, 2020. URL: https://doi.org/10.1145/            2220357.2220363. doi:10.1145/2220357.2220363 .
     3410566.3410593. doi:10.1145/3410566.3410593 .           [15] A. W. Harrow, A. Hassidim, S. Lloyd, Quantum
 [5] I. Trummer, C. Koch, Multiple query optimiza-                 algorithm for linear systems of equations, Phys. Rev.
     tion on the d-wave 2x adiabatic quantum computer,             Lett. 103 (2009) 150502. URL: https://link.aps.org/
     Proc. VLDB Endow. 9 (2016) 648–659. URL: https:               doi/10.1103/PhysRevLett.103.150502. doi:10.1103/
     //doi.org/10.14778/2947618.2947621. doi:10.14778/             PhysRevLett.103.150502 .
     2947618.2947621 .                                        [16] V. Uotila, J. Lu, D. Gawlick, Z. H. Liu, S. Das,
 [6] M. SchΓΆnberger, Applicability of quantum com-                 G. Pogossiants, Multicategory: Multi-model query
     puting on database query optimization, in: ACM                processing meets category theory and functional
     SIGMOD Student Research Competition 2022 (un-                 programming, Proc. VLDB Endow. 14 (2021)
     dergraduate), 2022.                                           2663–2666. URL: https://doi.org/10.14778/3476311.
 [7] W. Liu, P. Gao, Z. Liu, H. Chen, M. Zhang, A                  3476314. doi:10.14778/3476311.3476314 .
     quantum-based database query scheme for pri-             [17] V. Uotila, J. Lu, A formal category theoretical frame-
     vacy preservation in cloud environment, Secu-                 work for multi-model data transformations, in: E. K.
     rity and Communication Networks 2019 (2019)                   Rezig, V. Gadepally, T. Mattson, M. Stonebraker,
     1–14. URL: http://dx.doi.org/10.1155/2019/4923590.            T. Kraska, F. Wang, G. Luo, J. Kong, A. Dubovit-
     doi:10.1155/2019/4923590 .                                    skaya (Eds.), Heterogeneous Data Management,
 [8] V. Giovannetti, S. Lloyd, L. Maccone, Quantum pri-            Polystores, and Analytics for Healthcare, Springer
     vate queries, Phys. Rev. Lett. 100 (2008) 230502. URL:        International Publishing, Cham, 2021, pp. 14–28.
     https://link.aps.org/doi/10.1103/PhysRevLett.100.        [18] E. Riehl, Category Theory in Context, Aurora:
     230502. doi:10.1103/PhysRevLett.100.230502 .                  Dover Modern Math Originals, Dover Publications,
 [9] G. Alagic, J. Alperin-Sheriff, D. Apon, D. Cooper,            2017. URL: https://math.jhu.edu/~eriehl/context.
     Q. Dang, J. Kelsey, Y.-K. Liu, C. Miller, D. Moody,           pdf.
     R. Peralta, R. Perlner, A. Robinson, D. Smith-Tone,      [19] D. I. Spivak, Functorial data migration, 2013.
     Status Report on the Second Round of the NIST Post-           arXiv:1009.1166 .
     Quantum Cryptography Standardization Process,            [20] J. Lu, I. HolubovΓ‘, Multi-model databases: A new
     NIST Internal or Interagency Report (NISTIR) 8309,            journey to handle the variety of data, ACM Com-
     2020. URL: https://csrc.nist.gov/publications/detail/         put. Surv. 52 (2019). URL: https://doi.org/10.1145/
     nistir/8309/final. doi:10.6028/NIST.IR.8309 .                 3323214. doi:10.1145/3323214 .
[10] A. Lucas, Ising formulations of many np problems,        [21] B. Coecke, A. Kissinger, Picturing Quantum Pro-
     Frontiers in Physics 2 (2014). URL: http://dx.doi.org/        cesses: A First Course in Quantum Theory and
     10.3389/fphy.2014.00005. doi:10.3389/fphy.2014.               Diagrammatic Reasoning, Cambridge University
     00005 .                                                       Press, 2017. doi:10.1017/9781316219317 .
[11] P. G. Selinger, M. M. Astrahan, D. D. Chamberlin,        [22] R. Piedeleu, D. Kartsaklis, B. Coecke, M. Sadrzadeh,
     R. A. Lorie, T. G. Price, Access path selection in a          Open system categorical quantum semantics in nat-
     relational database management system, in: Pro-               ural language processing, arXiv:1502.00831 [cs,
     ceedings of the 1979 ACM SIGMOD International                 math] (2015). URL: http://arxiv.org/abs/1502.00831,
     Conference on Management of Data, SIGMOD ’79,                 arXiv: 1502.00831.
     Association for Computing Machinery, New York,           [23] J. Lambek, Type Grammar Revisited, volume 1582
     NY, USA, 1979, p. 23–34. URL: https://doi.org/10.             of Lecture Notes in Computer Science, Springer
     1145/582095.582099. doi:10.1145/582095.582099 .               Berlin Heidelberg, 1999, p. 1–27. URL: http://
[12] S. Krishnan, Z. Yang, K. Goldberg, J. Heller-                 link.springer.com/10.1007/3-540-48975-4_1. doi:10.
     stein, I. Stoica, Learning to optimize join                   1007/3- 540- 48975- 4_1 .
     queries with deep reinforcement learning, 2019.          [24] H. Lan, Z. Bao, Y. Peng, A survey on advanc-
     arXiv:1808.03196 .                                            ing the dbms query optimizer: Cardinality esti-
[13] A. Atserias, M. Grohe, D. Marx, Size bounds                   mation, cost model, and plan enumeration, Data
     and query plans for relational joins, in: Proceed-            Science and Engineering 6 (2021) 86–101. doi:10.
     ings of the 2008 49th Annual IEEE Symposium                   1007/s41019- 020- 00149- 7 .
     on Foundations of Computer Science, FOCS ’08,
     IEEE Computer Society, USA, 2008, p. 739–748.