Development of a Parallel DBMS on the Basis of PostgreSQL ∗

                                                           c Constantin Pan
                                                      South Ural State University
                                                         kvapen@gmail.com

                                                   M.Sc. advisor: Mikhail Zymbler


                           Abstract                                     to support distributed query processing. Several limita-
                                                                        tions in PostgreSQL’s query engine and corresponding
     The paper describes the architecture and the                       query execution techniques to improve performance of
     design of PargreSQL parallel database man-                         distributed query processing are presented. ParGRES [9]
     agement system (DBMS) for distributed mem-                         is an open-source database cluster middleware for high
     ory multiprocessors. PargreSQL is based upon                       performance OLAP query processing. ParGRES exploits
     PostgreSQL open-source DBMS and exploits                           intra-query parallelism on PC clusters and uses adaptive
     partitioned parallelism.                                           virtual partitioning of the database. GParGRES [5] ex-
                                                                        ploits database replication and inter- and intra-query par-
                                                                        allelism to efficiently support OLAP queries in a grid.
1    Introduction                                                       The approach has two levels of query splitting: grid-
Currently open-source PostgreSQL DBMS [12] is a re-                     level splitting, implemented by GParGRES, and node-
liable alternative for commercial DBMSes. There are                     level splitting, implemented by ParGRES.
many both practical database applications based upon                        In [1] building a hybrid between MapReduce and par-
PostgreSQL and research projects devoted to extension                   allel database is explored. The authors created a proto-
and improvement of PostgreSQL.                                          type named HadoopDB on the basis of Hadoop and Post-
    One of the directions mentioned above is to adapt                   greSQL, that is as efficient as parallel DBMS, but as scal-
PostgreSQL for parallel query processing. In this paper                 able, fault tolerant and flexible as MapReduce systems.
we describe the architecture and design of PargreSQL                    PostgreSQL is used as the database layer and Hadoop as
parallel DBMS for analytical data processing on dis-                    the communication layer.
tributed multiprocessors. PargreSQL represents Post-                        Our contribution is embedding partitioned paral-
greSQL with embedded partitioned parallelism.                           lelism [2] into PostgreSQL. We use methods for parallel
    The paper is organized as follows. Section 2 briefly                query processing, proposed in [11] and [7].
discusses related work. Section 3 gives a description of
the PostgreSQL DBMS architecture. Section 4 intro-                      3   PostgreSQL Architecture
duces design principles and architecture of PargreSQL
DBMS. The results of experiments on the current partial                 PostgreSQL is based on the client-server model. A ses-
implementation are shown in section 5. Section 6 con-                   sion involves three processes into interaction: a frontend,
tains concluding remarks and directions for future work.                a backend and a daemon (see fig. 1).
                                                                                            connects     1
                                                                                                               Daemon
2    Related work                                                                          k                    1

The research on extension and improvement of Post-                                      Frontend                    <<create>>

greSQL DBMS includes the following.                                                   -user 1                   k

    In [10] native XML type support in PostgreSQL is                                                           Backend
                                                                                       queryexec         1
discussed. Adding data types to provide support of HL7                                             -executor

medical information exchange standard in PostgreSQL
is described in [4]. The authors of [3] propose an image-                           Figure 1: PostgreSQL processes
handling extension to PostgreSQL. In [8] an approach
                                                                            The daemon handles incoming connections from
to integration of PostgreSQL with the Semantic Web is
                                                                        frontends and creates a backend for each one. Each back-
presented.
                                                                        end executes queries received from the related frontend.
    There are papers investigating adoption of Post-                    The activity diagram of a PostgreSQL session is shown
greSQL for parallel query processing as well. In [6]                    in fig. 2.
authors introduce their work on extending PostgreSQL
                                                                            There are following steps of query processing in Post-
    ∗ This paper is supported by the Russian Foundation for Basic Re-   greSQL: parse, rewrite, plan/optimize, and execute.
search (grant No. 09-07-00241-a).                                           Respective PostgreSQL subsystems are depicted in
Proceedings of the Spring Researcher’s Colloquium on Database           fig. 3. Parser checks the syntax of the query string and
and Information Systems, Moscow, Russia, 2011                           builds a parse tree. Rewriter processes the tree according
             Frontend           Daemon                Backend                                              S.S_ID                   P0                   Result relation
                                                                                                               00
                                                                                                               ⋮                    S0
                                                                                                               09
              connect             accept                                                                       10
                                                                                                                                    P1


                                                                                                                    Distribution
                                                                                                               ⋮


                                                                                                                                                       Merging
                                                                                                               19
                                                                                                                                    S1
                                   fork
                                                                                                               ⋮                                                 ⋮
                                                                                                                                    ⋮
                                                                                                               90                               ⋮
            send query                               exec query            Partitioning function               ⋮                    P9
                                                                                                               99
                                                                                                                                    S9
            recv result                              send result


                                                                                     Figure 5: Parallel query processing
          [more
                    else
         queries]
                                                                   calculates the number of the processor node which this
                                                                   tuple should be placed at. A query is executed in parallel
               Figure 2: PostgreSQL session                        on all processor nodes as a set of parallel agents. Each
                                                                   agent processes its own fragment and generates a partial
                            PostgreSQL                             query result. The partial results are merged into the re-
                                                                   sulting relation.
                      Parser               Storage                    The architecture of PargreSQL, in contrast with Post-
                                                                   greSQL, assumes that a client connects to two or more
                                                                   servers (see fig. 6).
                     Rewriter             Executor
                                                                                    connects      n
                                                                                                           Daemon
                                           Planner                              k                              1
                                                                         par_Frontend                               <<create>>

                                  libpq                                   -user 1                              k

                                                                                                        par_Backend                              Backend
                       libpq-be           libpq-fe                         queryexec           n
                                                                                         -executor


                                                                                       Figure 6: PargreSQL processes
            Figure 3: PostgreSQL subsystems
                                                                      The interaction sequence is shown in fig. 7. As op-
to the rules specified by the user (e.g. view definitions).        posed to PostgreSQL there are many daemons running in
Planner creates an optimal execution plan for this query           PargreSQL. A frontend connects to each of them, sends
tree. Executor takes the execution plan and processes it           the same query to many backends, and receives the result
recursively from the root. Storage provides functions to           relation.
store and retrieve tuples and metadata.                                                        2.1: create()
                                                                        d1 : Daemon                             b1 : par_Backend

                                                                   1.1: connect()              3.1: sendquery()                    5.1: sendresult()
                                                                                                                                                            4.n: exchange()
           Server                              Client
                                                                                         f : par_Frontend
         Backend                           libpq-fe
                                                                   1.n: connect()                                                                           4.1: exchange()
                                                                                               3.n: sendquery()                    5.n: sendresult()
                           libpq-be
                                                      libpq-fe         dn : Daemon                              bn : par_Backend
                                                                                               2.n: create()
                                             app
                                                                    Figure 7: Interaction of PargreSQL clients and servers
            Figure 4: PostgreSQL deployment                           Parallel query processing in PargreSQL is done in
   libpq implements frontend-backend interaction proto-            more steps: parse, rewrite, plan/optimize, parallelize,
col and consists of two parts: the frontend (libpq-fe) and         execute, and balance. During the query execution each
the backend (libpq-be). The former is deployed on the              agent processes its own part of the relation independently
client side and serves as an API for the end-user applica-         so, to obtain the correct result, transfers of tuples are re-
tion. The latter is deployed on the server side and serves         quired. Parallelization stages creation of a parallel plan
as an API for libpq-fe, as shown in fig. 4.                        by inserting special exchange operators into the corre-
                                                                   sponding places of the plan. Balance provides load-
                                                                   balancing of the server nodes.
4   PargreSQL Architecture                                            PargreSQL subsystems are depicted in fig. 8. Post-
PargreSQL utilizes the idea of partitioned parallelism [7]         greSQL is one of them. PargreSQL development in-
as shown in fig. 5. This form of parallelism supposes par-         volves changes in Storage, Executor and Planner subsys-
titioning relations among the disks of the multiprocessor          tems of PostgreSQL.
system.                                                               The changes in the old code are needed to integrate it
    The way the partitioning is done is defined by a frag-         with the new subsystems. par Storage is responsible for
mentation function, which for each tuple of the relation           storing partitioning metadata of relations. par Exchange
                                 PargreSQL                                                      libpq-fe                                par_libpq-fe
               PostgreSQL
                                                                                                PGconn                                  par_PGconn
                                          <<use>>
      Parser               Storage                            par_Storage                  PQconnectdb()                         1   par_PQconnectdb()
                                                                                                         *
                                                                                           PQstatus()                                par_PQstatus()
      Rewriter            Executor        <<use>>
                                                             par_Exchange
                                                                                           PQexec()                                  par_PQexec()
                                          <<use>>                                          PQfinish()                                par_PQfinish()

                                                              par_Balancer
                                                                                                PGresult
                                          <<use>>
                           Planner                           par_Parallelizer

                                                                                                Figure 10: PargreSQL libpq-fe wrapper
                  libpq                                         par_libpq

       libpq-be           libpq-fe
                                          <<use>>
                                                              par_libpq-fe               #define PGconn par_PGconn
                                                                                         #define PQconnectdb(X) par_PQconnectdb()
                                                                                         #define PQfinish(X) par_PQfinish(X)
                                                              par_Compat                 #define PQstatus(X) par_PQstatus(X)
                                                                                         #define PQexec(X,Y) par_PQexec(X,Y)

                                                                                           Figure 11: PargreSQL compatibility macros
                Figure 8: PargreSQL subsystems
                                                                                 4.2      Exchange Operator Design
encapsulates the exchange operator implementation. Ex-
change operator is meant to compute the distribution                             Exchange operator [7, 11] serves to exchange tuples be-
function ψ for each tuple of the relation, send “alien”                          tween parallel agents. It is inserted into execution plans
tuples to the other nodes, and receive “own” tuples in                           by Parallelizer subsystem. The operator’s architecture is
response.                                                                        presented in fig. 12.
    There are however some new subsystems which do
not require any changes in the old code: par libpq-fe and                                               nge
                                                                                                      ha
par Compat. par libpq-fe is a wrapper around libpq-fe,
                                                                                                  c
                                                                                                ex


it is needed to propagate queries from an application to                                                            merge
many servers. par Compat makes this propagation trans-
parent to the application.                                                                             gather                            scatter


                                                                                                                        split
      Server                             Client
 par_Backend                         libpq-fe                par_libpq-fe
                      libpq-be                    libpq-fe
                                                                      libpq-fe
                                                                                               Figure 12: Exchange operator architecture
                                                                app
                                                                                    Fig. 13 shows new classes (grouped in par Exchange
                                                                                 package) that implement exchange operator.
                Figure 9: PargreSQL deployment
                                                                                                                 par_Exchange                          Executor
    The only difference of deployment schemes (see                                Exchange_Factory                                   <<entity>>    1
                                                                                                                                      par_Plan 1        Plan
fig. 9) is that there is one more component on the client                         +make_exchange()                  MPS               +frag_attr
side — the lipq-fe wrapper.                                                                                                                            +init()
                                                                                           1
                                                                                                                                                       +next()
                                                                                                                                                       +reset()
                                                                                        *               *              *               *
4.1    par libpq Design                                                                Split           Merge         Scatter          Gather

                                                                                   +init()            -even         -port            -port
par libpq subsystem consists of par libpq-fe library and                           +next()                          +isSending       -NULLcnt
                                                                                                      +init()
a set of macros (par Compat).                                                      +reset()           +next()       +init()          +init()
                                                                                                      +reset()      +next()          +next()
   par libpq-fe is a library that is linked into frontend                                                           +reset()         +reset()
applications instead of original PostgreSQL libpq-fe,
around which it is a wrapper. Its design is illustrated
with a class diagram in fig. 10.
   The idea is to use original libpq-fe for connecting to                                        Figure 13: Exchange operator classes
many servers simultaneously.
   par Compat is a set of C preprocessor definitions for                           MPS subsystem (Message Passing System) is used by
transparent usage of par libpq-fe. An example of what                            Scatter and Gather to transmit tuples. Its interface is like
these macros are is given in fig. 11.                                            MPI reduced to three methods: ISend, IRecv, and Test.
   Using these macros an application programmer can                              They are actually implemented on top of MPI.
switch from PostgreSQL to PargreSQL without global                                 Figs. 14, 15, 16, and 17 show algorithms for next()
changes in the application code.                                                 method of four exchange subnodes.
                                                                                                    5   Experimental Evaluation
                                  [right.isSending
                                                                    wait
                                            = TRUE]                                                 At the moment we have implemented par libpq and
                                                                                                    par Exchange subsystems of PargreSQL. The imple-
                   left.next                           right.next                                   mentation has been tested on the following query:
                                 [NULL]
                                              right.buffer := tuple                                 select * from tab where tab.col % 10000 = 0
                            [tuple]
                                                   [alien]
                            ψ                                                                           The query has been run against table tab consisting of
                                           [own]
                                                                                                    108 tuples. The speedup relative to PostgreSQL is shown
                                             tuple
                                                                                                    in fig. 18.
               Figure 14: Split.next() method                                                                       6
                                                                                                                           Linear
                                                                                                                           Actual
                                                                                                                    5
   Split is meant to calculate fragmentation function for
each tuple and choose whether to keep it on the processor


                                                                                                          Speedup
                                                                                                                    4
node or send it to other processor node.
                                                                                                                    3


                                                                                                                    2
                                          even := not even

                                 [even]                        [odd]                                                1
                                                                                                                  1            2    3           4   5   6
                                                                                                            (PostgreSQL)
                                                                                                                                        Nodes
                  right.next                                         left.next
                                                                                                                        Figure 18: PargreSQL speedup
              [tuple]                       else                else              [tuple]

                            [NULL]                                         [NULL]

      tuple        left.next                                         right.next             tuple
                                                                                                    6   Conclusion
              [tuple]                       else                else              [tuple]
                                                                                                    In this paper we have described the architecture and de-
                            [NULL]                                         [NULL]
                                                                                                    sign of PargreSQL parallel DBMS for distributed mem-
                                NULL                                          NULL
                                                                                                    ory multiprocessors. PargreSQL is based upon Post-
                                                                                                    greSQL open-source DBMS and exploits partitioned par-
              Figure 15: Merge.next() method
                                                                                                    allelism.
                                                                                                        There are following issues in our future research. We
   Merge merges tuples from Gather and Split.
                                                                                                    plan to complete the implementation and to investigate
                                                                                                    its speedup and scalability. The future research is also
                                                             [ok]      NULL
                            [isSending]
                                                                                                    going to be concentrated on implementing data updates,
                                                                    else
                                                     Test                     wait
                                                                                                    transactions and fault tolerance.
                         else                                 to everyone

                                 [NULL]
                                                   Isend(NULL)

                                                                                                    References
                        ψ                     isSending := FALSE
                                                                                                     [1] Azza Abouzeid, Kamil Bajda-Pawlikowski,
                   Isend(tuple, ψ)                              NULL                                     Daniel J. Abadi, Alexander Rasin, and Avi Sil-
                                                                                                         berschatz. HadoopDB: An Architectural Hybrid
                   isSending := TRUE                          NULL                                       of MapReduce and DBMS Technologies for
                                                                                                         Analytical Workloads. PVLDB, 2(1):922–933,
                                                                                                         2009.
              Figure 16: Scatter.next() method
                                                                                                     [2] David J. DeWitt and Jim Gray. Parallel Database
   Scatter sends tuples coming from Split to other pro-                                                  Systems: The Future of High Performance
cessor nodes.                                                                                            Database Systems. Commun. ACM, 35(6):85–98,
                                                          [all NULLs
                                                                              NULL
                                                                                                         1992.
                                                          gathered]
                                                   else
                                [tuple]
                                              Test            else
                                                                           wait                      [3] Denise Guliato, Ernani V. de Melo, Ran-
                        Irecv                      [NULL]                                                garaj M. Rangayyan, and Robson C. Soares.
                                           NULLcnt++                  Irecv                              POSTGRESQL-IE: An Image-handling Extension
                                tuple
                                                                                                         for PostgreSQL. J. Digital Imaging, 22(2):149–
                                                                                                         165, 2009.
              Figure 17: Gather.next() method
                                                                                                     [4] Yeb Havinga, Willem Dijkstra, and Ander de Kei-
   Gather does the opposite, receiving tuples from other                                                 jzer. Adding HL7 version 3 data types to Post-
processor nodes.                                                                                         greSQL. CoRR, abs/1003.3370, 2010.
 [5] Nelson Kotowski, Alexandre A. B. Lima, Esther
     Pacitti, Patrick Valduriez, and Marta Mattoso. Par-
     allel query processing for OLAP in grids. Concur-
     rency and Computation: Practice and Experience,
     20(17):2039–2048, 2008.
 [6] Rubao Lee and Minghong Zhou.            Extending
     PostgreSQL to Support Distributed/Heterogeneous
     Query Processing. In Kotagiri Ramamohanarao,
     P. Radha Krishna, Mukesh K. Mohania, and Ekawit
     Nantajeewarawat, editors, DASFAA, volume 4443
     of Lecture Notes in Computer Science, pages 1086–
     1097. Springer, 2007.
 [7] Andrey V. Lepikhov and Leonid B. Sokolinsky.
     Query processing in a DBMS for cluster systems.
     Programming and Computer Software, 36(4):205–
     215, 2010.
 [8] Dmitry V. Levshin and A. S. Markov. Algorithms
     for integrating PostgreSQL with the semantic web.
     Programming and Computer Software, 35(3):136–
     144, 2009.
 [9] Melissa Paes, Alexandre A. B. Lima, Patrick Val-
     duriez, and Marta Mattoso. High-Performance
     Query Processing of a Real-World OLAP Database
     with ParGRES. In José M. Laginha M. Palma,
     Patrick Amestoy, Michel J. Daydé, Marta Mattoso,
     and João Correia Lopes, editors, VECPAR, volume
     5336 of Lecture Notes in Computer Science, pages
     188–200. Springer, 2008.
[10] Nikolay Samokhvalov. XML Support in Post-
     greSQL.     In Sergei D. Kuznetsov, Andrey
     Fomichev, Boris Novikov, and Dmitry Sha-
     porenkov, editors, SYRCoDIS, volume 256 of
     CEUR Workshop Proceedings. CEUR-WS.org,
     2007.
[11] Leonid B. Sokolinsky. Organization of Paral-
     lel Query Processing in Multiprocessor Database
     Machines with Hierarchical Architecture. Pro-
     gramming and Computer Software, 27(6):297–308,
     2001.
[12] Michael Stonebraker and Greg Kemnitz. The
     POSTGRES next generation database management
     system. Commun. ACM, 34:78–92, October 1991.