=Paper= {{Paper |id=None |storemode=property |title=WattDB - a Rocky Road to Energy Proportionality |pdfUrl=https://ceur-ws.org/Vol-1020/keynote_01.pdf |volume=Vol-1020 |dblpUrl=https://dblp.org/rec/conf/gvd/Harder13 }} ==WattDB - a Rocky Road to Energy Proportionality== https://ceur-ws.org/Vol-1020/keynote_01.pdf
           WattDB—a Rocky Road to Energy Proportionality

                                                            Theo Härder
                                      Databases and Information Systems Group
                                        University of Kaiserslautern, Germany
                                                haerder@cs.uni-kl.de


Extended Abstract                                                     pared to a single, brawny server, they offer higher energy
Energy efficiency is becoming more important in database              saving potential in turn.
design, i. e., the work delivered by a database server should            Current hardware is not energy proportional, because a
be accomplished by minimal energy consumption. So far, a              single server consumes, even when idle, a substantial frac-
substantial number of research papers examined and opti-              tion of its peak power [1]. Because typical usage patterns
mized the energy consumption of database servers or single            lead to a server utilization far less than its maximum, en-
components. In this way, our first efforts were exclusively fo-       ergy efficiency of a server aside from peak performance is
cused on the use of flash memory or SSDs in a DBMS context            reduced [4]. In order to achieve energy proportionality using
to identify their performance potential for typical DB opera-         commodity hardware, we have chosen a clustered approach,
tions. In particular, we developed tailor-made algorithms to          where each node can be powered independently. By turn-
support caching for flash-based databases [3], however with           ing on/off whole nodes, the overall performance and energy
limited success concerning the energy efficiency of the entire        consumption can be fitted to the current workload. Unused
database server.                                                      servers could be either shut down or made available to other
   A key observation made by Tsirogiannis et al. [5] con-             processes. If present in a cloud, those servers could be leased
cerning the energy efficiency of single servers, the best per-        to other applications.
forming configuration is also the most energy-efficient one,             We have developed a research prototype of a distribu-
because power use is not proportional to system utilization           ted DBMS called WattDB on a scale-out architecture, con-
and, for this reason, runtime needed for accomplishing a              sisting of n wimpy computing nodes, interconnected by an
computing task essentially determines energy consumption.             1GBit/s Ethernet switch. The cluster currently consists of
Based on our caching experiments for flash-based databases,           10 identical nodes, composed of an Intel Atom D510 CPU,
we came to the same conclusion [2]. Hence, the server sys-            2 GB DRAM and an SSD. The configuration is considered
tem must be fully utilized to be most energy efficient. How-          Amdahl-balanced, i. e., balanced between I/O and network
ever, real-world workloads do not stress servers continuously.        throughput on one hand and processing power on the other.
Typically, their average utilization ranges between 20 and               Compared to InfiniBand, the bandwidth of the intercon-
50% of peak performance [1]. Therefore, traditional single-           necting network is limited but sufficient to supply the light-
server DBMSs are chronically underutilized and operate be-            weight nodes with data. More expensive, yet faster con-
low their optimal energy-consumption-per-query ratio. As              nections would have required more powerful processors and
a result, there is a big optimization opportunity to decrease         more sophisticated I/O subsystems. Such a design would
energy consumption during off-peak times.                             have pushed the cost beyond limits, especially because we
   Because the energy use of single-server systems is far from        would not have been able to use commodity hardware. Fur-
being energy proportional, we came up with the hypothe-               thermore, by choosing lightweight components, the overall
sis that better energy efficiency may be achieved by a clus-          energy footprint is low and the smallest configuration, i. e.,
ter of nodes whose size is dynamically adjusted to the cur-           the one with the fewest number of nodes, exhibits low power
rent workload demand. For this reason, we shifted our re-             consumption. Moreover, experiments running on a small
search focus from inflexible single-server DBMSs to distribu-         cluster can easily be repeated on a cluster with more pow-
ted clusters running on lightweight nodes. Although distri-           erful nodes.
buted systems impose some performance degradation com-                   A dedicated node is the master node, handling incoming
                                                                      queries and coordinating the cluster. Some of the nodes
                                                                      have each four hard disks attached and act as storage nodes,
                                                                      providing persistent data storage to the cluster. The remain-
                                                                      ing nodes (without hard disks drives) are called processing
                                                                      nodes. Due to the lack of directly accessible storage, they
                                                                      can only operate on data provided by other nodes (see Fig-
                                                                      ure 1).
                                                                         All nodes can evaluate (partial) query plans and execute
                                                                      DB operators, e. g., sorting, aggregation, etc., but only the
25th GI-Workshop on Foundations of Databases (Grundlagen von Daten-
                                                                      storage nodes can access the DB storage structures, i. e.,
banken), 28.05.2013 - 31.05.2012, Ilmenau, Germany.                   tables and indexes. Each storage node maintains a DB buffer
Copyright is held by the author/owner(s).
                                                          Master Node


                  S                           S   Processing                              S
                  S
                          Processing          S
                                                                 S    Processing              Processing
                                                                 S                        S
                  D         Node              D     Node         D      Node              D     Node


                      S     Storage Node             S    Storage Node         S     Storage Node
                      S                              S                         S
                            Disk       Disk               Disk    Disk               Disk       Disk
                      D     Disk       Disk          D    Disk    Disk         D     Disk       Disk

                                   S   Storage Node              S     Storage Node
                                   S                             S
                                       Disk        Disk                Disk        Disk
                                   D   Disk        Disk          D     Disk        Disk

                                        Figure 1: Overview of the WattDB cluster


to keep recently referenced pages in main memory, whereas        1.   REFERENCES
a processing node does not cache intermediate results. As a      [1] L. A. Barroso and U. Hölzle. The Case for
consequence, each query needs to always fetch the qualified          Energy-Proportional Computing. IEEE Computer,
records from the corresponding storage nodes.                        40(12):33–37, 2007.
   Hence, our cluster design results in a shared-nothing ar-     [2] T. Härder, V. Hudlet, Y. Ou, and D. Schall. Energy
chitecture where the nodes only differentiate to those which         efficiency is not enough, energy proportionality is
have or have not direct access to DB data on external stor-          needed! In DASFAA Workshops, 1st Int. Workshop on
age. Each of the nodes is additionally equipped with a               FlashDB, LNCS 6637, pages 226–239, 2011.
128GB Solid-State Disk (Samsung 830 SSD). The SSDs do
                                                                 [3] Y. Ou, T. Härder, and D. Schall. Performance and
not store the DB data, they provide swap space to support
                                                                     Power Evaluation of Flash-Aware Buffer Algorithms. In
external sorting and to provide persistent storage for con-
                                                                     DEXA, LNCS 6261, pages 183–197, 2010.
figuration files. We have chosen SSDs, because their access
                                                                 [4] D. Schall, V. Höfner, and M. Kern. Towards an
latency is much lower compared to traditional hard disks;
                                                                     Enhanced Benchmark Advocating Energy-Efficient
hence, they are better suited for temp storage.
                                                                     Systems. In TPCTC, LNCS 7144, pages 31–45, 2012.
   In WattDB, a dedicated component, running on the mas-
ter node, controls the energy consumption, called Energy-        [5] D. Tsirogiannis, S. Harizopoulos, and M. A. Shah.
Controller. This component monitors the performance of               Analyzing the Energy Efficiency of a Database Server.
all nodes in the cluster. Depending on the current query             In SIGMOD Conference, pages 231–242, 2010.
workload and node utilization, the EnergyController acti-
vates and suspends nodes to guarantee a sufficiently high
node utilization depending on the workload demand. Sus-
pended nodes do only consume a fraction of the idle power,
but can be brought back online in a matter of a few sec-
onds. It also modifies query plans to dynamically distribute
the current workload on all running nodes thereby achieving
balanced utilization of the active processing nodes.
   As data-intensive workloads, we submit specific TPC-H
queries against a distributed shared-nothing DBMS, where
time and energy use are captured by specific monitoring and
measurement devices. We configure various static clusters
of varying sizes and show their influence on energy efficiency
and performance. Further, using an EnergyController and
a load-aware scheduler, we verify the hypothesis that en-
ergy proportionality for database management tasks can be
well approximated by dynamic clusters of wimpy computing
nodes.