=Paper=
{{Paper
|id=None
|storemode=property
|title=WattDB - a Rocky Road to Energy Proportionality
|pdfUrl=https://ceur-ws.org/Vol-1020/keynote_01.pdf
|volume=Vol-1020
|dblpUrl=https://dblp.org/rec/conf/gvd/Harder13
}}
==WattDB - a Rocky Road to Energy Proportionality==
WattDB—a Rocky Road to Energy Proportionality Theo Härder Databases and Information Systems Group University of Kaiserslautern, Germany haerder@cs.uni-kl.de Extended Abstract pared to a single, brawny server, they offer higher energy Energy efficiency is becoming more important in database saving potential in turn. design, i. e., the work delivered by a database server should Current hardware is not energy proportional, because a be accomplished by minimal energy consumption. So far, a single server consumes, even when idle, a substantial frac- substantial number of research papers examined and opti- tion of its peak power [1]. Because typical usage patterns mized the energy consumption of database servers or single lead to a server utilization far less than its maximum, en- components. In this way, our first efforts were exclusively fo- ergy efficiency of a server aside from peak performance is cused on the use of flash memory or SSDs in a DBMS context reduced [4]. In order to achieve energy proportionality using to identify their performance potential for typical DB opera- commodity hardware, we have chosen a clustered approach, tions. In particular, we developed tailor-made algorithms to where each node can be powered independently. By turn- support caching for flash-based databases [3], however with ing on/off whole nodes, the overall performance and energy limited success concerning the energy efficiency of the entire consumption can be fitted to the current workload. Unused database server. servers could be either shut down or made available to other A key observation made by Tsirogiannis et al. [5] con- processes. If present in a cloud, those servers could be leased cerning the energy efficiency of single servers, the best per- to other applications. forming configuration is also the most energy-efficient one, We have developed a research prototype of a distribu- because power use is not proportional to system utilization ted DBMS called WattDB on a scale-out architecture, con- and, for this reason, runtime needed for accomplishing a sisting of n wimpy computing nodes, interconnected by an computing task essentially determines energy consumption. 1GBit/s Ethernet switch. The cluster currently consists of Based on our caching experiments for flash-based databases, 10 identical nodes, composed of an Intel Atom D510 CPU, we came to the same conclusion [2]. Hence, the server sys- 2 GB DRAM and an SSD. The configuration is considered tem must be fully utilized to be most energy efficient. How- Amdahl-balanced, i. e., balanced between I/O and network ever, real-world workloads do not stress servers continuously. throughput on one hand and processing power on the other. Typically, their average utilization ranges between 20 and Compared to InfiniBand, the bandwidth of the intercon- 50% of peak performance [1]. Therefore, traditional single- necting network is limited but sufficient to supply the light- server DBMSs are chronically underutilized and operate be- weight nodes with data. More expensive, yet faster con- low their optimal energy-consumption-per-query ratio. As nections would have required more powerful processors and a result, there is a big optimization opportunity to decrease more sophisticated I/O subsystems. Such a design would energy consumption during off-peak times. have pushed the cost beyond limits, especially because we Because the energy use of single-server systems is far from would not have been able to use commodity hardware. Fur- being energy proportional, we came up with the hypothe- thermore, by choosing lightweight components, the overall sis that better energy efficiency may be achieved by a clus- energy footprint is low and the smallest configuration, i. e., ter of nodes whose size is dynamically adjusted to the cur- the one with the fewest number of nodes, exhibits low power rent workload demand. For this reason, we shifted our re- consumption. Moreover, experiments running on a small search focus from inflexible single-server DBMSs to distribu- cluster can easily be repeated on a cluster with more pow- ted clusters running on lightweight nodes. Although distri- erful nodes. buted systems impose some performance degradation com- A dedicated node is the master node, handling incoming queries and coordinating the cluster. Some of the nodes have each four hard disks attached and act as storage nodes, providing persistent data storage to the cluster. The remain- ing nodes (without hard disks drives) are called processing nodes. Due to the lack of directly accessible storage, they can only operate on data provided by other nodes (see Fig- ure 1). All nodes can evaluate (partial) query plans and execute DB operators, e. g., sorting, aggregation, etc., but only the 25th GI-Workshop on Foundations of Databases (Grundlagen von Daten- storage nodes can access the DB storage structures, i. e., banken), 28.05.2013 - 31.05.2012, Ilmenau, Germany. tables and indexes. Each storage node maintains a DB buffer Copyright is held by the author/owner(s). Master Node S S Processing S S Processing S S Processing Processing S S D Node D Node D Node D Node S Storage Node S Storage Node S Storage Node S S S Disk Disk Disk Disk Disk Disk D Disk Disk D Disk Disk D Disk Disk S Storage Node S Storage Node S S Disk Disk Disk Disk D Disk Disk D Disk Disk Figure 1: Overview of the WattDB cluster to keep recently referenced pages in main memory, whereas 1. REFERENCES a processing node does not cache intermediate results. As a [1] L. A. Barroso and U. Hölzle. The Case for consequence, each query needs to always fetch the qualified Energy-Proportional Computing. IEEE Computer, records from the corresponding storage nodes. 40(12):33–37, 2007. Hence, our cluster design results in a shared-nothing ar- [2] T. Härder, V. Hudlet, Y. Ou, and D. Schall. Energy chitecture where the nodes only differentiate to those which efficiency is not enough, energy proportionality is have or have not direct access to DB data on external stor- needed! In DASFAA Workshops, 1st Int. Workshop on age. Each of the nodes is additionally equipped with a FlashDB, LNCS 6637, pages 226–239, 2011. 128GB Solid-State Disk (Samsung 830 SSD). The SSDs do [3] Y. Ou, T. Härder, and D. Schall. Performance and not store the DB data, they provide swap space to support Power Evaluation of Flash-Aware Buffer Algorithms. In external sorting and to provide persistent storage for con- DEXA, LNCS 6261, pages 183–197, 2010. figuration files. We have chosen SSDs, because their access [4] D. Schall, V. Höfner, and M. Kern. Towards an latency is much lower compared to traditional hard disks; Enhanced Benchmark Advocating Energy-Efficient hence, they are better suited for temp storage. Systems. In TPCTC, LNCS 7144, pages 31–45, 2012. In WattDB, a dedicated component, running on the mas- ter node, controls the energy consumption, called Energy- [5] D. Tsirogiannis, S. Harizopoulos, and M. A. Shah. Controller. This component monitors the performance of Analyzing the Energy Efficiency of a Database Server. all nodes in the cluster. Depending on the current query In SIGMOD Conference, pages 231–242, 2010. workload and node utilization, the EnergyController acti- vates and suspends nodes to guarantee a sufficiently high node utilization depending on the workload demand. Sus- pended nodes do only consume a fraction of the idle power, but can be brought back online in a matter of a few sec- onds. It also modifies query plans to dynamically distribute the current workload on all running nodes thereby achieving balanced utilization of the active processing nodes. As data-intensive workloads, we submit specific TPC-H queries against a distributed shared-nothing DBMS, where time and energy use are captured by specific monitoring and measurement devices. We configure various static clusters of varying sizes and show their influence on energy efficiency and performance. Further, using an EnergyController and a load-aware scheduler, we verify the hypothesis that en- ergy proportionality for database management tasks can be well approximated by dynamic clusters of wimpy computing nodes.