Development of a Parallel DBMS on the Basis of PostgreSQL ∗ c Constantin Pan South Ural State University kvapen@gmail.com M.Sc. advisor: Mikhail Zymbler Abstract to support distributed query processing. Several limita- tions in PostgreSQL’s query engine and corresponding The paper describes the architecture and the query execution techniques to improve performance of design of PargreSQL parallel database man- distributed query processing are presented. ParGRES [9] agement system (DBMS) for distributed mem- is an open-source database cluster middleware for high ory multiprocessors. PargreSQL is based upon performance OLAP query processing. ParGRES exploits PostgreSQL open-source DBMS and exploits intra-query parallelism on PC clusters and uses adaptive partitioned parallelism. virtual partitioning of the database. GParGRES [5] ex- ploits database replication and inter- and intra-query par- allelism to efficiently support OLAP queries in a grid. 1 Introduction The approach has two levels of query splitting: grid- Currently open-source PostgreSQL DBMS [12] is a re- level splitting, implemented by GParGRES, and node- liable alternative for commercial DBMSes. There are level splitting, implemented by ParGRES. many both practical database applications based upon In [1] building a hybrid between MapReduce and par- PostgreSQL and research projects devoted to extension allel database is explored. The authors created a proto- and improvement of PostgreSQL. type named HadoopDB on the basis of Hadoop and Post- One of the directions mentioned above is to adapt greSQL, that is as efficient as parallel DBMS, but as scal- PostgreSQL for parallel query processing. In this paper able, fault tolerant and flexible as MapReduce systems. we describe the architecture and design of PargreSQL PostgreSQL is used as the database layer and Hadoop as parallel DBMS for analytical data processing on dis- the communication layer. tributed multiprocessors. PargreSQL represents Post- Our contribution is embedding partitioned paral- greSQL with embedded partitioned parallelism. lelism [2] into PostgreSQL. We use methods for parallel The paper is organized as follows. Section 2 briefly query processing, proposed in [11] and [7]. discusses related work. Section 3 gives a description of the PostgreSQL DBMS architecture. Section 4 intro- 3 PostgreSQL Architecture duces design principles and architecture of PargreSQL DBMS. The results of experiments on the current partial PostgreSQL is based on the client-server model. A ses- implementation are shown in section 5. Section 6 con- sion involves three processes into interaction: a frontend, tains concluding remarks and directions for future work. a backend and a daemon (see fig. 1). connects 1 Daemon 2 Related work k 1 The research on extension and improvement of Post- Frontend <> greSQL DBMS includes the following. -user 1 k In [10] native XML type support in PostgreSQL is Backend queryexec 1 discussed. Adding data types to provide support of HL7 -executor medical information exchange standard in PostgreSQL is described in [4]. The authors of [3] propose an image- Figure 1: PostgreSQL processes handling extension to PostgreSQL. In [8] an approach The daemon handles incoming connections from to integration of PostgreSQL with the Semantic Web is frontends and creates a backend for each one. Each back- presented. end executes queries received from the related frontend. There are papers investigating adoption of Post- The activity diagram of a PostgreSQL session is shown greSQL for parallel query processing as well. In [6] in fig. 2. authors introduce their work on extending PostgreSQL There are following steps of query processing in Post- ∗ This paper is supported by the Russian Foundation for Basic Re- greSQL: parse, rewrite, plan/optimize, and execute. search (grant No. 09-07-00241-a). Respective PostgreSQL subsystems are depicted in Proceedings of the Spring Researcher’s Colloquium on Database fig. 3. Parser checks the syntax of the query string and and Information Systems, Moscow, Russia, 2011 builds a parse tree. Rewriter processes the tree according Frontend Daemon Backend S.S_ID P0 Result relation 00 ⋮ S0 09 connect accept 10 P1 Distribution ⋮ Merging 19 S1 fork ⋮ ⋮ ⋮ 90 ⋮ send query exec query Partitioning function ⋮ P9 99 S9 recv result send result Figure 5: Parallel query processing [more else queries] calculates the number of the processor node which this tuple should be placed at. A query is executed in parallel Figure 2: PostgreSQL session on all processor nodes as a set of parallel agents. Each agent processes its own fragment and generates a partial PostgreSQL query result. The partial results are merged into the re- sulting relation. Parser Storage The architecture of PargreSQL, in contrast with Post- greSQL, assumes that a client connects to two or more servers (see fig. 6). Rewriter Executor connects n Daemon Planner k 1 par_Frontend <> libpq -user 1 k par_Backend Backend libpq-be libpq-fe queryexec n -executor Figure 6: PargreSQL processes Figure 3: PostgreSQL subsystems The interaction sequence is shown in fig. 7. As op- to the rules specified by the user (e.g. view definitions). posed to PostgreSQL there are many daemons running in Planner creates an optimal execution plan for this query PargreSQL. A frontend connects to each of them, sends tree. Executor takes the execution plan and processes it the same query to many backends, and receives the result recursively from the root. Storage provides functions to relation. store and retrieve tuples and metadata. 2.1: create() d1 : Daemon b1 : par_Backend 1.1: connect() 3.1: sendquery() 5.1: sendresult() 4.n: exchange() Server Client f : par_Frontend Backend libpq-fe 1.n: connect() 4.1: exchange() 3.n: sendquery() 5.n: sendresult() libpq-be libpq-fe dn : Daemon bn : par_Backend 2.n: create() app Figure 7: Interaction of PargreSQL clients and servers Figure 4: PostgreSQL deployment Parallel query processing in PargreSQL is done in libpq implements frontend-backend interaction proto- more steps: parse, rewrite, plan/optimize, parallelize, col and consists of two parts: the frontend (libpq-fe) and execute, and balance. During the query execution each the backend (libpq-be). The former is deployed on the agent processes its own part of the relation independently client side and serves as an API for the end-user applica- so, to obtain the correct result, transfers of tuples are re- tion. The latter is deployed on the server side and serves quired. Parallelization stages creation of a parallel plan as an API for libpq-fe, as shown in fig. 4. by inserting special exchange operators into the corre- sponding places of the plan. Balance provides load- balancing of the server nodes. 4 PargreSQL Architecture PargreSQL subsystems are depicted in fig. 8. Post- PargreSQL utilizes the idea of partitioned parallelism [7] greSQL is one of them. PargreSQL development in- as shown in fig. 5. This form of parallelism supposes par- volves changes in Storage, Executor and Planner subsys- titioning relations among the disks of the multiprocessor tems of PostgreSQL. system. The changes in the old code are needed to integrate it The way the partitioning is done is defined by a frag- with the new subsystems. par Storage is responsible for mentation function, which for each tuple of the relation storing partitioning metadata of relations. par Exchange PargreSQL libpq-fe par_libpq-fe PostgreSQL PGconn par_PGconn <> Parser Storage par_Storage PQconnectdb() 1 par_PQconnectdb() * PQstatus() par_PQstatus() Rewriter Executor <> par_Exchange PQexec() par_PQexec() <> PQfinish() par_PQfinish() par_Balancer PGresult <> Planner par_Parallelizer Figure 10: PargreSQL libpq-fe wrapper libpq par_libpq libpq-be libpq-fe <> par_libpq-fe #define PGconn par_PGconn #define PQconnectdb(X) par_PQconnectdb() #define PQfinish(X) par_PQfinish(X) par_Compat #define PQstatus(X) par_PQstatus(X) #define PQexec(X,Y) par_PQexec(X,Y) Figure 11: PargreSQL compatibility macros Figure 8: PargreSQL subsystems 4.2 Exchange Operator Design encapsulates the exchange operator implementation. Ex- change operator is meant to compute the distribution Exchange operator [7, 11] serves to exchange tuples be- function ψ for each tuple of the relation, send “alien” tween parallel agents. It is inserted into execution plans tuples to the other nodes, and receive “own” tuples in by Parallelizer subsystem. The operator’s architecture is response. presented in fig. 12. There are however some new subsystems which do not require any changes in the old code: par libpq-fe and nge ha par Compat. par libpq-fe is a wrapper around libpq-fe, c ex it is needed to propagate queries from an application to merge many servers. par Compat makes this propagation trans- parent to the application. gather scatter split Server Client par_Backend libpq-fe par_libpq-fe libpq-be libpq-fe libpq-fe Figure 12: Exchange operator architecture app Fig. 13 shows new classes (grouped in par Exchange package) that implement exchange operator. Figure 9: PargreSQL deployment par_Exchange Executor The only difference of deployment schemes (see Exchange_Factory <> 1 par_Plan 1 Plan fig. 9) is that there is one more component on the client +make_exchange() MPS +frag_attr side — the lipq-fe wrapper. +init() 1 +next() +reset() * * * * 4.1 par libpq Design Split Merge Scatter Gather +init() -even -port -port par libpq subsystem consists of par libpq-fe library and +next() +isSending -NULLcnt +init() a set of macros (par Compat). +reset() +next() +init() +init() +reset() +next() +next() par libpq-fe is a library that is linked into frontend +reset() +reset() applications instead of original PostgreSQL libpq-fe, around which it is a wrapper. Its design is illustrated with a class diagram in fig. 10. The idea is to use original libpq-fe for connecting to Figure 13: Exchange operator classes many servers simultaneously. par Compat is a set of C preprocessor definitions for MPS subsystem (Message Passing System) is used by transparent usage of par libpq-fe. An example of what Scatter and Gather to transmit tuples. Its interface is like these macros are is given in fig. 11. MPI reduced to three methods: ISend, IRecv, and Test. Using these macros an application programmer can They are actually implemented on top of MPI. switch from PostgreSQL to PargreSQL without global Figs. 14, 15, 16, and 17 show algorithms for next() changes in the application code. method of four exchange subnodes. 5 Experimental Evaluation [right.isSending wait = TRUE] At the moment we have implemented par libpq and par Exchange subsystems of PargreSQL. The imple- left.next right.next mentation has been tested on the following query: [NULL] right.buffer := tuple select * from tab where tab.col % 10000 = 0 [tuple] [alien] ψ The query has been run against table tab consisting of [own] 108 tuples. The speedup relative to PostgreSQL is shown tuple in fig. 18. Figure 14: Split.next() method 6 Linear Actual 5 Split is meant to calculate fragmentation function for each tuple and choose whether to keep it on the processor Speedup 4 node or send it to other processor node. 3 2 even := not even [even] [odd] 1 1 2 3 4 5 6 (PostgreSQL) Nodes right.next left.next Figure 18: PargreSQL speedup [tuple] else else [tuple] [NULL] [NULL] tuple left.next right.next tuple 6 Conclusion [tuple] else else [tuple] In this paper we have described the architecture and de- [NULL] [NULL] sign of PargreSQL parallel DBMS for distributed mem- NULL NULL ory multiprocessors. PargreSQL is based upon Post- greSQL open-source DBMS and exploits partitioned par- Figure 15: Merge.next() method allelism. There are following issues in our future research. We Merge merges tuples from Gather and Split. plan to complete the implementation and to investigate its speedup and scalability. The future research is also [ok] NULL [isSending] going to be concentrated on implementing data updates, else Test wait transactions and fault tolerance. else to everyone [NULL] Isend(NULL) References ψ isSending := FALSE [1] Azza Abouzeid, Kamil Bajda-Pawlikowski, Isend(tuple, ψ) NULL Daniel J. Abadi, Alexander Rasin, and Avi Sil- berschatz. HadoopDB: An Architectural Hybrid isSending := TRUE NULL of MapReduce and DBMS Technologies for Analytical Workloads. PVLDB, 2(1):922–933, 2009. Figure 16: Scatter.next() method [2] David J. DeWitt and Jim Gray. Parallel Database Scatter sends tuples coming from Split to other pro- Systems: The Future of High Performance cessor nodes. Database Systems. Commun. ACM, 35(6):85–98, [all NULLs NULL 1992. gathered] else [tuple] Test else wait [3] Denise Guliato, Ernani V. de Melo, Ran- Irecv [NULL] garaj M. Rangayyan, and Robson C. Soares. NULLcnt++ Irecv POSTGRESQL-IE: An Image-handling Extension tuple for PostgreSQL. J. Digital Imaging, 22(2):149– 165, 2009. Figure 17: Gather.next() method [4] Yeb Havinga, Willem Dijkstra, and Ander de Kei- Gather does the opposite, receiving tuples from other jzer. Adding HL7 version 3 data types to Post- processor nodes. greSQL. CoRR, abs/1003.3370, 2010. [5] Nelson Kotowski, Alexandre A. B. Lima, Esther Pacitti, Patrick Valduriez, and Marta Mattoso. Par- allel query processing for OLAP in grids. Concur- rency and Computation: Practice and Experience, 20(17):2039–2048, 2008. [6] Rubao Lee and Minghong Zhou. Extending PostgreSQL to Support Distributed/Heterogeneous Query Processing. In Kotagiri Ramamohanarao, P. Radha Krishna, Mukesh K. Mohania, and Ekawit Nantajeewarawat, editors, DASFAA, volume 4443 of Lecture Notes in Computer Science, pages 1086– 1097. Springer, 2007. [7] Andrey V. Lepikhov and Leonid B. Sokolinsky. Query processing in a DBMS for cluster systems. Programming and Computer Software, 36(4):205– 215, 2010. [8] Dmitry V. Levshin and A. S. Markov. Algorithms for integrating PostgreSQL with the semantic web. Programming and Computer Software, 35(3):136– 144, 2009. [9] Melissa Paes, Alexandre A. B. Lima, Patrick Val- duriez, and Marta Mattoso. High-Performance Query Processing of a Real-World OLAP Database with ParGRES. In José M. Laginha M. Palma, Patrick Amestoy, Michel J. Daydé, Marta Mattoso, and João Correia Lopes, editors, VECPAR, volume 5336 of Lecture Notes in Computer Science, pages 188–200. Springer, 2008. [10] Nikolay Samokhvalov. XML Support in Post- greSQL. In Sergei D. Kuznetsov, Andrey Fomichev, Boris Novikov, and Dmitry Sha- porenkov, editors, SYRCoDIS, volume 256 of CEUR Workshop Proceedings. CEUR-WS.org, 2007. [11] Leonid B. Sokolinsky. Organization of Paral- lel Query Processing in Multiprocessor Database Machines with Hierarchical Architecture. Pro- gramming and Computer Software, 27(6):297–308, 2001. [12] Michael Stonebraker and Greg Kemnitz. The POSTGRES next generation database management system. Commun. ACM, 34:78–92, October 1991.