Multi-Core Allocation Model for Database Systems

                                                           Simone Dominico
                                            Supervised by Eduardo Cunha de Almeida
                                               Federal University of Paraná, Brazil
                                                     sdominico@inf.ufpr.br


                                                                                                                                    Threads                                                      Core
ABSTRACT                                                                                                                                                                                        Number


                                                                                                                                                                                          T20
                                                                                                                                                      T14


                                                                                                                                                                              T18
                                                                                                                                                            T15
                                                                                                                                          T12


                                                                                                                                                                        T17
                                                                                                                                    T11


                                                                                                                                                                                    T19
                                                                                                                                                T13


                                                                                                                                                                  T16
                                                                                                                              T10
                                                                                                          T6
                                                                                                     T5


                                                                                                               T7


                                                                                                                         T9
                                                                                                                                                                                                  15


                                                                                                                    T8
Non-Uniform Memory Access (NUMA) architecture provides                                          0                                                                                                 14
a multi-task run in parallel with different access latency be-                                                                                                                                    13


                                                                              Timestamp (ms)
                                                                                               50                                                                                                 12
tween the nodes. The emergence of multi-core hardware                                                                                                                                             11
offers high processing power for multi-threaded Database                                                                                                                                          10
                                                                                               100                                                                                                9
Management Systems (DBMS). However, database threads                                                                                                                                              8
                                                                                               150                                                                                                7
run across multiple NUMA cores without exploring the hard-                                                                                                                                        6
ware to its full potential. The impact of data movement be-                                    200                                                                                                5
                                                                                                                                                                                                  4
tween NUMA nodes is a major challenge for multi-threaded                                                  OS/MonetDB                                                                              3
                                                                                               250                                                                                                2
DBMS. The goal of our thesis is to find out the efficient dis-                                                                                                                                    1
tribution of CPU-cores among the NUMA nodes in order to                                                                                                                                           0
mitigate the data movement. We propose an abstract core
allocation mechanism for query processing based on perfor-                    Figure 1: Evaluating the migration between cores of
mance metrics. In our preliminary results, our mechanism                      the threads generated by the TPC-H Q6 in a single-
is able to improve data traffic ratio between nodes in up to                  client execution in a 4x Quad-Core NUMA machine.
3.87x with increased memory throughput in up to 27%.

1.    INTRODUCTION                                                            threads to as many nodes and cores as possible to keep load
   The burgeoning ingestion of data requires new hardware                     balance, not considering the different access latencies in the
to allow real-time data analysis. In our thesis we focus on                   NUMA architecture. The result is the constant migration
Non-Uniform Memory Access (NUMA) hardware to boost                            of threads all over the nodes. Figure 1 shows the execution
throughput of multi-threaded Database Management Sys-                         of the TPC-H query 6 (Q6) in MonetDB: a multi-threaded
tems (DBMS) in Online Analytical Processing (OLAP). The                       DBMS. Our NUMA architecture consists of 4-nodes with
NUMA architecture is formed by multi-core nodes with pro-                     a Quad-Core AMD Opteron 8387 each. The OS tries to
cessors (or CPUs) attached to a memory bank. The memory                       keep the load balancing between the cores causing many
access latency varies according to the distance between the                   migration of threads with more data movement between the
node and the memory being accessed.                                           NUMA nodes. When running OLAP workloads, the current
   The varying memory access latency of NUMA impacts                          load balancing approach can cause many problems: resource
the performance of multi-threaded query processing. We                        contention, cache conflicts, cache invalidation and inefficient
observed the impact of NUMA in multi-threaded DBMS                            data movement.
executing the Volcano query parallelism model [4], like Mi-                      Many authors have investigated these problems by try-
crosoft SQLServer, and the materialization model, like Mon-                   ing to control the scheduling of threads and data movement
etDB. For instance, in the Volcano model, the parallelism                     in NUMA nodes [8, 3, 11, 5, 12, 7, 13] from within the
is encapsulated in “exchange” operators in the query exe-                     DBMS kernel. Our thesis follows a different direction where
cution: multiples threads execute the query plan and the                      we manage the allocation of NUMA cores from an abstract
Operating System (OS) scheduler is in charge of manag-                        model-based mechanism, aiming to accommodate the cur-
ing data and thread locality. In both models, however, the                    rent workload and mitigate the data movement. Abstract
conventional approach is letting the OS do the mapping of                     models allow supporting several underlying systems with-
                                                                              out modifications in the source-code. Besides, an abstract
                                                                              model allows different resources to be monitored, for exam-
                                                                              ple: memory throughput and CPU load. Our mechanism in
                                                                              a nutshell: a dynamic mechanism computes the local opti-
                                                                              mum number of cores with a rule-condition-action pipeline
                                                                              integrated with performance monitoring. This pipeline de-
                                                                              fines the performance-invariant through performance thresh-
Proceedings of the VLDB 2018 Ph.D. Workshop, August 27, 2018. Rio de          olds. If this performance-invariant holds, the local optimum
Janeiro, Brazil.
Copyright (C) 2018 for this paper by its authors. Copying permitted for       number of cores is set. The main challenges to present our
private and academic purposes.                                                mechanism are, as follows:


                                                                          1
                                                                                S0        S1              S0       S1           S0       S1              S0       S1
     1. Provide an elastic multi-core allocation mechanism for
                                                                            0    1    4    5          0    1   4    5       0    1   4    5          0    1   4    5
        the NUMA architecture.
                                                                            2    3    6        7      2    3   6        7   2    3   6        7      2    3   6        7


     2. Define an optimum number of cores based on the use                  8    9    12 13           8    9   12 13        8    9   12 13           8    9   12 13

        of computational resources to meet a workload.                      10 11     14 15           10 11    14 15        10 11    14 15           10 11    14 15
                                                                                S2        S3              S2       S3           S2       S3              S2       S3

     3. Manage the allocation of cores for the execution of                                        time                                           time
        mixed workloads.                                                         (a) Sparse Mode                                 (b) Dense Mode
  In this paper, we discuss our approach to tackle these
three challenges, the contribution already published and cur-         Figure 2: The Sparse and Dense modes over time.
rent work in progress.                                                Only the black boxes (i.e., cores) can be accessed by
                                                                      the OS. The red boxes are the next cores to allocate.

2.     RELATED WORK
                                                                      current workload. We define that the allocation of CPU-
   Over the last years, many different approaches were pre-
                                                                      cores should be performed dynamically to satisfy the de-
sented to improve the performance of DBMS multi-core par-
                                                                      mands of the workload and facilitate the OS thread schedul-
allelism in NUMA architecture. [10] presents a study of the
                                                                      ing with the least possible negative impact on performance.
NUMA effect when assigning threads to NUMA nodes in
                                                                      Our key idea is to consider data placement statistics from
OLTP. The authors present an approach of “hardware is-
                                                                      the OS side to define on which node the cores will be allo-
lands” to execute different OLTP deployments. They im-
                                                                      cated.
plement a prototype within the Shore-MT storage man-
ager that achieved robust OLTP throughput. However, the
“hardware islands” do not scale-out and the optimum size
                                                                      3.1       Elastic Multi-core Allocation Mechanism
of the island is yet undetermined.                                      The full description of our multi-core allocation mecha-
    [8] presents a NUMA-aware ring-based algorithm to coor-           nism is presented in [2]. Our mechanism is based on Predi-
dinate the movement of threads to catch up with data. The             cate/Transition PetriNets (PrT). Petri Nets are powerful ab-
algorithm uses two rings: an inner ring to represent the data         stract models to design concurrent and distributed systems.
partitions and an outer ring to represent the threads. The            Our mechanism leverages Petri Nets to model general prop-
outer ring rotates clockwise for all threads to access specific       erties of DBMSs up against concurrent OLAP workloads.
data partitions. The goal is to improve data placement and            Using the abstract model we have the flexibility to check
thread placement. In [3], NUMA cores are allocated one by             any resource usage by the database threads. The mecha-
one to mitigate access to remote memory when the OS tries             nism monitors the resource usage of the worker threads on
to keep data locality of MonetDB. Other storage and worker            top of OS kernel facilities to decide for the allocation of CPU
placement approaches are presented to mitigate the NUMA               cores (e.g., cgroups, mpstat, numactl, likwid).
effect in SAP Hana [11] or to build the ERIS storage engine
from scratch [5].                                                     3.2       The Allocation of Cores
   All of these contributions show important efforts into mak-          The challenge of the mechanism is to prevent both under-
ing the DBMSs explore multi-core CPUs to their full poten-            utilization or overutilization of the system by finding out the
tial, but they differ from our goal as they allocate NUMA             local optimum number of cores (LONC) to accommodate a
cores statically to perform data and worker placement strate-         given workload. In this section, we define the LONC and
gies. Recently, [12] presented an adaptive NUMA-aware                 also the allocation modes explored by the mechanism.
data placement mechanism. The mechanism decides be-
tween data placement and thread stealing when load im-
balance is detected. The number of working threads changes            3.2.1          The multi-core allocation modes
based on the core utilization, which differs from MonetDB [3]            In the mechanism, we define resource usage thresholds
and SQLServer [6] that statically define the number of threads        (e.g., CPU or data traffic in the memory controller) that
based on the number of available cores.                               are used to decide when it is necessary to allocate or release
   In [7] is presented a database scheduler to control the dis-       cores. An adaptive algorithm decides on which node to allo-
patching of query fragments, called “morsels”. The “morsels”          cate/release cores taking into account the accessed memory
are statically pinned in specific cores to take advantage of          addresses kept in a priority queue data structure.
the data location and avoid data movement between the                    Figure 2 shows the allocation of closed (Dense) and far
nodes. However, this scheduler does not take into account             apart cores (Sparse). The adaptive allocation is a combi-
the optimum number of cores to tackle the current workload.           nation of both w.r.t. the efficient subset of cores. To find
   In contrast to the related work, we propose a dynamic              the affinity between threads and data, our approach checks
multi-core allocation mechanism to mitigate the data move-            misses in the Translation Lookaside Buffer (TLB). When
ment providing to the OS the efficient sub-set of NUMA                a TLB miss occurs, the OS maps the thread accessing the
cores to perform database thread mapping.                             missed address to a node keeping this information in a data
                                                                      structure. Over time, new threads requesting the same ad-
                                                                      dress range are mapped to the same node where data is
3.     PROPOSED APPROACH                                              allocated. Therefore, we refer to as “the efficient subset of
  Our proposal involves an elastic multi-core allocation mech-        cores” the processing cores in the nodes with the requested
anism to define the optimum number of cores to treat the              address range.


                                                                  2
                                                           Threads                                                      Core
                                                                                                                       Number                                       5                                                             25


                                                                                                                 T20
                                                                             T14


                                                                                                     T18
                                                                                   T15
                                                           T11
                                                                 T12


                                                                                               T17
                                                                       T13


                                                                                         T16


                                                                                                           T19
                                                     T10
                                 T6
                            T5


                                      T7


                                                T9
                                                                                                                         15


                                                                                                                                          HT traﬃc (MB/s) (102)
                                           T8
                                                                                                                                                                  4.5


                                                                                                                                                                                                           L3 load misses (105)
                       0                                                                                                 14                                         4                                                             20
                                                                                                                         13                                       3.5
     Timestamp (ms)


                      50                                                                                                 12                                         3                                                             15
                                                                                                                         11                                       2.5
                                                                                                                         10
                      100                                                                                                                                           2                                                             10
                                                                                                                         9                                        1.5
                                                                                                                         8                                          1                                                             5
                      150                                                                                                7
                                                                                                                         6                                        0.5
                                                                                                                         5                                          0                                                             0
                      200
                                                                                                                         4


                                                                                                                                                                        OS


                                                                                                                                                                                De


                                                                                                                                                                                        Sp


                                                                                                                                                                                               Ad


                                                                                                                                                                                                                                         OS


                                                                                                                                                                                                                                                 De


                                                                                                                                                                                                                                                           Sp


                                                                                                                                                                                                                                                                    Ad
                                 Adaptive                                                                                3


                                                                                                                                                                                          ar


                                                                                                                                                                                                                                                             ar
                                                                                                                                                                                               ap


                                                                                                                                                                                                                                                                    ap
                                                                                                                                                                                   n


                                                                                                                                                                                                                                                    n
                                                                                                                                                                           /M


                                                                                                                                                                                                                                            /M
                                                                                                                                                                                   se


                                                                                                                                                                                                                                                      se
                      250


                                                                                                                                                                                          se


                                                                                                                                                                                                                                                             se
                                                                                                                         2


                                                                                                                                                                                                  t


                                                                                                                                                                                                                                                                       t
                                                                                                                                                                           on


                                                                                                                                                                                                                                            on
                                                                                                                                                                                                ive


                                                                                                                                                                                                                                                                     ive
                                                                                                                         1


                                                                                                                                                                            et


                                                                                                                                                                                                                                               et
                                                                                                                                                                              DB


                                                                                                                                                                                                                                                 DB
                                                                                                                         0

                                                                                                                                                                                                                             S0           S1          S2      S3
Figure 3: The migration of the threads spawn by
a single-client submitting the TPC-H Q6 supported
                                                                                                                                    Figure 4: Performance metrics of Q6 with a single
by the adaptive mode.
                                                                                                                                    client in 1 GB database in MonetDB.

                                                                                                                                                        4
                                                                                                                                                                                                                                                              OS/MonetDB
  In our adaptive mode, each entry of the priority queue
                                                                                                                                                        3
keeps the PIDs of the active threads with their address


                                                                                                                                          Mem TP (GB/s)
spaces and the number of pages per NUMA node. The cores                                                                                                 2

are allocated on nodes with more accessed pages and cores                                                                                               1
with the least number of accessed pages are released. In the
                                                                                                                                                        0
implementation of the affinity between threads and data,                                                                                                4
                                                                                                                                                                                                                                                           Adaptive/MonetDB
the data structure stores the node IDs used to pin threads
                                                                                                                                                        3
and allocate their address space.


                                                                                                                                          Mem TP (GB/s)
                                                                                                                                                        2
3.2.2                       The local optimum number of cores
                                                                                                                                                        1
   The number of allocated cores takes into account the arith-
metic CPU-load average of the active database threads. For-                                                                                             00                 50           100
                                                                                                                                                                                          S0
                                                                                                                                                                                                    150
                                                                                                                                                                                                      S1
                                                                                                                                                                                                                                  200
                                                                                                                                                                                                                                    S2
                                                                                                                                                                                                                                                 250
                                                                                                                                                                                                                                                   S3
                                                                                                                                                                                                                                                              300          350

mally, we define how to compute the number of cores, as
follows:
                                                                                                                                    Figure 5: Execution time (secs) and Memory
 ∀ w ∃ nalloc |(thmin < u < thmax )∧p(nalloc ) ≥ p(ntotal ) (1)                                                                     Throughput (GB/s) of the TPC-H execution with
                                                                                                                                    256 concurrent clients in 1 GB database.
  To any OLAP workload w, there is a certain number of
CPU cores nalloc such that the load of each core are be-
tween the minimum and maximum thresholds, in which the                                                                              our mechanism. Inside the mechanism, we coded the thresh-
database performance p(nalloc ) is equal or better than the                                                                         olds to thmin = 10 and thmax = 70 following the rules of
performance p(ntotal ) with all the CPU cores available in the                                                                      thumb in the literature [9] and they are kept in all the exper-
hardware. The performance function p(x) relies on system                                                                            iments. We experimented different thresholds, but decreas-
counters provided by the OS and the database.                                                                                       ing thmin lets too many cores in idle state, while increasing
                                                                                                                                    thmax leads to contention with too many busy cores.
3.3                   Mixed Workload
   In our initial efforts in this thesis, we focused on OLAP
                                                                                                                                    4.1                           Preliminary Results
due to the pressure to transfer and compute large volumes                                                                              Figure 3 shows the lifespan of threads of the TPC-H query
of data scattered across multiple NUMA nodes [1, 2]. How-                                                                           Q6 with a single client execution in a 1GB database. As
ever, the elastic multi-core mechanism can designate under-                                                                         expected, our mechanism limited a subset of cores required
used cores or NUMA nodes to serve different types of work-                                                                          to execute the query. The threads were executed in a single
loads. Our next direction is to investigate how to designate                                                                        NUMA node, while the OS expanded the MonetDB threads
underused cores to perform different workloads on different                                                                         on all nodes.
NUMA nodes. We expect the abstract nature of our mecha-                                                                                Figure 4 shows performance metrics to understand the
nism and the information of memory usage to facilitate the                                                                          impact of our mechanism in the migration of threads. The
choice of the optimal node to a different workload.                                                                                 result shows 2× more L3 cache misses and 9× more HT
                                                                                                                                    traffic in the OS scheduling than our adaptive mode. With
                                                                                                                                    less cores available for thread scheduling, the OS made good
4.                EXPERIMENTS                                                                                                       scheduling choices resulting in less remote access, less inter-
  In a preliminary study, we ran the experiments with OLAP                                                                          connection traffic and improved memory throughput.
workload on a NUMA machine formed by 4-node with a                                                                                     Figure 5 shows the result of the 22 queries of the TPC-
QuadCore AMD Opteron 8387 each. Nodes are intercon-                                                                                 H with 256 concurrent clients executing the queries in the
nected by Hyper-Transport (HT) link 3.x achieving 41.6                                                                              sequence of the TPC-H in 1GB database. We present the
GB/s maximum aggregate bandwidth. We implemented our                                                                                impact of our mechanism in the core allocation when the
prototype in C language and we compare our mechanism to                                                                             data access pattern changes. In the memory throughput
the Linux Debian 8 “Jessie” OS scheduling of threads spawn                                                                          results, our adaptive mechanism was 41% faster than the
by the MonetDB (v11.25.5) DBMS. We let all the 16 cores                                                                             OS/MonetDB. We observe that the system does not use all
available to the DBMS when running without the support of                                                                           the nodes all the time. For instance, initially the system


                                                                                                                                3
                                                   Execution Speedup (Adaptive Mode)
                                                                                                                                                                                   6.   ACKNOWLEDGMENTS
                                                                                                                                                                                     This work was partly funded by CAPES and CNPQ.
                     1.16
                            1.39
                                   1.15
                                          1.41
                                                 1.11
                                                        1.40
                                                               1.48
                                                                      1.17
                                                                             1.43
                                                                                    1.18
                                                                                           1.22
                                                                                                  1.33
                                                                                                         1.53
                                                                                                                1.33
                                                                                                                       1.17
                                                                                                                              1.35
                                                                                                                                     1.10
                                                                                                                                            1.35
                                                                                                                                                   1.29
                                                                                                                                                          1.34
                                                                                                                                                                 1.37
                                                                                                                                                                        1.32
                1

               0.8                                                                                                                                                                 7.   REFERENCES
HT/IMC ratio


               0.6
                                                                                                                                                                                    [1] S. Dominico, E. C. de Almeida, and J. A. Meira. A
                                                                                                                                                                                        petrinet mechanism for OLAP in NUMA. In DaMoN,
               0.4                                                                                                                                                                      2017.
               0.2                                                                                                                                                                  [2] S. Dominico, E. C. de Almeida, J. A. Meira, and
                                                                                                                                                                                        M. A. Z. Alves. An elastic multi-core allocation
                0
                                                                                                                                                                                        mechanism for database systems. In ICDE, 2018.
                     Q1
                            Q2
                                   Q3
                                          Q4
                                                 Q5
                                                        Q6
                                                               Q7
                                                                      Q8
                                                                             Q9
                                                                                    Q10
                                                                                           Q11
                                                                                                  Q12
                                                                                                         Q13
                                                                                                                Q14
                                                                                                                       Q15
                                                                                                                              Q16
                                                                                                                                     Q17
                                                                                                                                            Q18
                                                                                                                                                   Q19
                                                                                                                                                          Q20
                                                                                                                                                                 Q21
                                                                                                                                                                        Q22
                                                                                             Queries
                                                                                                                                                                                    [3] M. Gawade and M. L. Kersten. NUMA obliviousness
                                                                                                                                                                                        through memory mapping. In DaMoN, 2015.
        OS/MonetDB                                             Dense                                     Sparse                             Adaptive
                                                                                                                                                                                    [4] G. Graefe. Encapsulation of parallelism in the volcano
                                                                                                                                                                                        query processing system. In SIGMOD., pages 102–111,
Figure 6: Performance results for 1 GB TPC-H                                                                                                                                            1990.
queries with 256 concurrent clients. On the left,
                                                                                                                                                                                    [5] T. Kissinger, T. Kiefer, B. Schlegel, D. Habich,
the ratio HT/IMC shows how NUMA-friendly is the
                                                                                                                                                                                        D. Molka, and W. Lehner. ERIS: A NUMA-aware
system (the smaller, the better). On the top is the
                                                                                                                                                                                        in-memory storage engine for analytical workload. In
performance speedup for adaptive mode.
                                                                                                                                                                                        ADMS, 2014.
                                                                                                                                                                                    [6] P. Larson, C. Clinciu, C. Fraser, E. N. Hanson,
                                                                                                                                                                                        M. Mokhtar, M. Nowakiewicz, V. Papadimos, S. L.
used only the nodes S0 and S2.                                                                                                                                                          Price, S. Rangarajan, R. Rusanu, and M. Saubhasik.
   Figure 6 shows the results of interconnection traffic in re-                                                                                                                         Enhancements to SQL server column stores. In
lation with traffic HT/IMC ratio. In this experiment the                                                                                                                                SIGMOD, pages 1159–1168, 2013.
256 concurrent clients execute the 22 queries in random or-                                                                                                                         [7] V. Leis, P. A. Boncz, A. Kemper, and T. Neumann.
der. The results show the reduction in the local/remote                                                                                                                                 Morsel-driven parallelism: a NUMA-aware query
per-query data traffic ratio of up 3.87x (2.47x on average).                                                                                                                            evaluation framework for the many-core age. In
Overall, the adaptive mode presented the best results. For                                                                                                                              SIGMOD, pages 743–754, 2014.
instance, we observed 2x smaller HT/IMC ratio than the
                                                                                                                                                                                    [8] Y. Li, I. Pandis, R. Mueller, V. Raman, and G. M.
OS/MonetDB for queries such as Q9 that have the largest
                                                                                                                                                                                        Lohman. NUMA-aware algorithms: the case of data
number of joins operations. With less cores available, threads
                                                                                                                                                                                        shuffling. In CIDR, 2013.
locate data in the local node more often than the current ap-
                                                                                                                                                                                    [9] U. F. Minhas, R. Liu, A. Aboulnaga, K. Salem, J. Ng,
proach of letting all the cores available to the OS scheduler
                                                                                                                                                                                        and S. Robertson. Elastic scale-out for partition-based
(i.e., OS/MonetDB).
                                                                                                                                                                                        database systems. ICDEW12.
                                                                                                                                                                                   [10] D. Porobic, I. Pandis, M. Branco, P. Tözün, and
                                                                                                                                                                                        A. Ailamaki. OLTP on hardware islands. PVLDB,
5.                   CURRENT STATUS & FUTURE WORK                                                                                                                                       (11), 2012.
   This paper describes the motivation and challenges to our                                                                                                                       [11] I. Psaroudakis, T. Scheuer, N. May, A. Sellami, and
approach in the allocation of NUMA CPU-cores for paral-                                                                                                                                 A. Ailamaki. Scaling up concurrent main-memory
lel execution of OLAP. In our first evaluations, we discuss                                                                                                                             column-store scans: Towards adaptive NUMA-aware
the impact of data movement between NUMA nodes for                                                                                                                                      data and task placement. PVLDB, (12), 2015.
multi-threaded DBMS that hand over the mapping of query                                                                                                                            [12] I. Psaroudakis, T. Scheuer, N. May, A. Sellami, and
threads to the OS. We show the difficulty faced by the OS                                                                                                                               A. Ailamaki. Adaptive NUMA-aware data placement
scheduler to keep load balance, generating a vast amount of                                                                                                                             and task scheduling for analytical workloads in
data movement, interconnection traffic and cache invalida-                                                                                                                              main-memory column-stores. PVLDB, (2), 2016.
tion. In the initial contribution of this thesis [2], we present
                                                                                                                                                                                   [13] V. Raman, G. K. Attaluri, R. Barber, N. Chainani,
an abstract model-based mechanism to support the thread
                                                                                                                                                                                        D. Kalmuk, V. KulandaiSamy, J. Leenstra,
scheduling and data allocation across NUMA sockets. The
                                                                                                                                                                                        S. Lightstone, S. Liu, G. M. Lohman, T. Malkemus,
mechanism is the first part of our approach to mitigate the
                                                                                                                                                                                        R. Müller, I. Pandis, B. Schiefer, D. Sharpe, R. Sidle,
data movement in NUMA nodes. The preliminary results
                                                                                                                                                                                        A. J. Storm, and L. Zhang. DB2 with BLU
showed performance improvements when our mechanism of-
                                                                                                                                                                                        acceleration: So much more than just a column store.
fered to the OS only the local optimum CPU-cores, instead
                                                                                                                                                                                        PVLDB, 6(11):1080–1091, 2013.
of the traditional approach of making all the cores visible to
the OS all the time, like in current multi-threaded DBMSs:
MonetDB, SQL Server, SAP Hana and Hyper.
   As future directions, we plan to explore mixed workloads
as we observed that OLAP not always need the entire setup
of CPU-cores. Therefore, our agenda includes redesigning
our abstract model to accommodate concurrent OLTP and
OLAP. In particular, we plan to study the multi-core alloca-
tion in cloud computing environments that can particularly
benefit from our model.


                                                                                                                                                                               4