=Paper=
{{Paper
|id=None
|storemode=property
|title=True Energy-efficient Data Processing is Only Gained by Energy-proportional DBMSs
|pdfUrl=https://ceur-ws.org/Vol-581/gvd2010_7_2.pdf
|volume=Vol-581
|dblpUrl=https://dblp.org/rec/conf/gvd/Hudlet10
}}
==True Energy-efficient Data Processing is Only Gained by Energy-proportional DBMSs==
True Energy-efficient Data Processing
is Only Gained by Energy-proportional DBMSs
Volker Hudlet
AG DBIS
TU Kaiserslautern, Germany
hudlet@cs.uni-kl.de
ABSTRACT consumption was acceptable even for only a tiny increase
As energy consumption and related costs are becoming a in performance, is not sufficient anymore for future genera-
critical component for operating a data center, system de- tions of computer systems in general and DB servers in par-
velopers as well as database researcher have to deal with this ticular. Therefore, attention must be shifted from a solely
fact and should come up with approaches that increase the performance-centric view to energy efficiency.
energy efficiency of a data center.
The buzzword Green IT is an umbrella term for the ongo-
Several proposal are already present in the literature which ing development of energy-efficient hardware and software as
introduce approaches to increase the energy efficiency in a well as the marketing of resulting products. Unfortunately,
given situation. Nevertheless, a server may still consume there are products claiming to be green, however, they just
more than 50% of its maximal power when running in idle pick up this buzzword to be more attractive on the market.
mode. Therefore, we believe that only energy-proportional
systems can deliver true energy efficiency, as the power con- The remaining parts of this paper are structured as follows:
sumption scales with the system load. This paper reviews The following Section 2 will briefly define energy efficiency
the current state of research concerning energy efficiency in and will discuss why energy proportionality is a natural pre-
DB servers and presents our vision of an energy-proportional requisite for a true energy-efficient system. Furthermore,
database system. related work is considered. Section 3 will explain in more
detail how energy proportionality can be achieved for (most
1. INTRODUCTION of the) system components, whereas Section 4 will disclose
Energy costs are a growing part of the total cost of owner- our vision of an energy-proportional database system which
ship of servers, and it is expected that they (as well as the we are striving for. Finally, we will conclude this paper and
costs for cooling) will outpace the expenses for the server give an outlook to future work.
hardware and software in the near future (calculated over a
period of three years) [4, 9]. 2. ENERGY EFFICIENCY REVISITED
In general, energy efficiency is defined as the quotient of the
In general, a server is constructed for an expected peak system’s work and the energy consumed while performing
load, i.e., maximal throughput, which is limited by the stor- this work:
age subsystem (in case of data-intensive applications) or by
the CPU (in the case of computation-intensive applications). W ork
EnergyEfficiency =
Normally, the peak load corresponds with the maximal en- EnergyConsumption
ergy consumption. In the majority of application situa- This generic model can be adapted to more concrete scenar-
tions, this maximal throughput is hardly needed, because ios such as, in our case, applications of database systems.
the server just utilizes a (small) share of its capacity; the The following measure can be used to indicate the energy
average server utilization often is around 10% – 30% [16]. efficiency of a database system.
The remaining capacity is unused, while the server is still
consuming (almost the full amount of) energy. #T ransactions
EnergyEfficiency (DBS) =
Joule
Given the public concern about energy waste, the exclu-
Note, depending on the transaction mix (varying numbers of
sive focus on performance, where over-proportional energy
long-running and short-running transactions), this measure
can be misleading. Meaningful results can only be achieved
by using well-defined benchmarks.
In the literature, several ideas have come up to improve en-
ergy efficiency. One of such proposals advocates to replace
the hard disks of the storage subsystem by flash disks or solid
Copyright is held by the author/owner(s). state disks (SSD). While consuming significantly less energy
GvD Workshop’10, 25.-28.05.2010, Bad Helmstedt, Germany. (about 1/10 of the energy a hard disk consumes), SSDs nev-
ertheless deliver substantially higher IOPS rates than hard
disks (at least when read performance is compared). There- Apart from the storage subsystem, dedicated proposals aim
fore, SSDs are a natural candidate for achieving better en- at energy-efficient usage of the CPU. To evaluate the bene-
ergy efficiency. Until the recent past, SSD technology was fits gained from energy-efficient approaches, the energy de-
still in its infancy and had to struggle with an unbalanced lay product (EDP) [6] has been proposed as a reasonable
read/write asymmetry: random reads were much faster com- measure. This factor is defined as energy · delay: for a con-
pared with those on hard disks, whereas random writes were stant EDP, the change in the energy consumed is therefore
much slower (approx. ten times of random read access)) and matched by an equal change in the response time. Lower
provided only limited write endurance, i.e., the underlying EDP values are, of course, desirable as they embody a larger
flash cells wore out and became unusable after a given num- percentage of energy saving. In this case, however, system
ber of rewrites. In the meantime, these disadvantages are response time is likely to be increased, which may not be
almost eliminated. The Intel X-25E claims to be capable wanted by the user.
of performing one Petabyte of random writes (on a 32GB
device) before wearing out1 . Based on dedicated IO exper- In contribution [14], Lang and Patel propose two techniques
iments on selected hard disks (HDDs) and SSDs, we come which are evaluated towards their resulting EDP. The first
to the conclusion that the asymmetry becomes negligible technique, called explicit query delay, delays queries and
for SSDs of the newest generation: We have confirmed the places them into a queue upon arrival. When the queue
performance of random reads at ∼ 13K IOPS, while the reaches a given threshold, all queries in the queue are ex-
random-write performance scores at respectable 10K IOPS. amined to determine whether or not they can be aggregated
Hence, we expect that SSDs will approach the sequential into a small number of groups, such that the queries of a
IO behavior of hard disks but, at the same time, provide group can be jointly evaluated. Hence, this approach tries
dramatically better random IO. to minimize redundant evaluation of queries thereby saving
energy. It has shown that, using a simplistic scenario, this
Härder et al. [8] analyzed the impact of the replacement kind of grouping could decrease the EDP by 26%.
of HDDs with SSDs in a database systems. They compare
the energy efficiency in XTC [10], a native XML DBMS, Besides this technique, it is possible to influence the CPU
by running a selected subset of the TPoX [15] benchmark. behavior and thereby its energy consumption by processor
The results gained in these experiments show a slight in- voltage/frequency control (PVC) techniques, e.g., by un-
crease of energy efficiency for CPU-bound DB applications derclocking the front-side bus or by downgrading the CPU
(0,176 TA/Joule vs. 0,166 TA/Joule), whereas more than voltage. Again, PVC techniques embody a static approach
a doubling was obtained for IO-intensive DB applications, which could leverage the energy efficiency only at a certain
i.e., for an IO-bound DB server (0,850 TA/Joule vs. 0,368 load level, but which could eventually also impinge upon the
TA/Joule). Hence, it is obvious that differing load situa- query execution time and imply higher energy consumption
tions may imply entirely different energy-efficiency levels. than the default setting. Thus, in general, it is highly desir-
But this is not the desirable behavior of a DB server. able to dynamically adjust the server’s energy consumption
such that the best possible energy efficiency is accomplished
One could argue that switching the server completely off at all load levels.
would be the most energy-efficient alternative, but again
this is just another energy-efficiency level (namely the point This is an objective where energy-proportional systems come
of origin) and the cost of resuming operation could not be into play. The notion of energy proportionality has been first
neglected, e.g., loading the DB buffer anew. coined by Barroso and Hölzle [3] and characterizes the be-
havior of a server whose energy consumption proportionally
The approach mentioned above is hindered by the fact that scales with its load. An adaptive PVC would be an ini-
the capacity costs (GB/$) for SSDs still exceed the ones for tial step towards this design goal. Nevertheless, the entire
HDDs by at least a factor of 10. Although analysts forecast system architecture should be reconsidered, because build-
a considerable price drop within the next two years [11], at ing energy-proportional systems requires a holistic approach.
the moment, SSDs might still be unattractive for a large Ranganathan [16] comes to the same conclusion that instead
data center. of having several small and local energy-aware optimiza-
tions, a holistic focus supposably results in an even better
To overcome this drawback, hybrid approaches, like those energy-efficient system.
described in [12] or [13], have been proposed. These ideas
combine the use of SSDs and hard disks and thereby allow Recently, Tsirogiannis et al. [20] claimed that, within a
to benefit from the advantages of both storage types while single node system (intended for use in scale-out architec-
still having a cost-effective storage subsystem. Right now, tures), the most energy-efficient configuration is typically the
these approaches just focus on the combination of several highest performing one. Obviously, their empirical “obser-
heterogeneous storage types for maximum performance. But vation” is also closely dependent on the absence of energy-
it is conceivable to come up with a hybrid storage subsystem proportional runtime behavior in current servers. Further-
which focuses on the energy-efficiency aspect as well. more, the authors hypothesize that better saving opportuni-
ties might be found when cross-node, energy-efficiency tech-
niques are to be applied.
1
Using 3.3K IOPS of random 4KB writes—the maximum
random-write speed specified by the manufacturer—, a max- In the following section, the key components of a server are
imum write endurance of >∼ 8 ∗ 107 sec is obtained. This examined towards their ability to reach energy-proportional
is close to three years, approximately the lifetime of a hard behavior.
disk.
Figure 1: Relative power consumption of a server at different activity levels derived by Google [18]
3. ENERGY PROPORTIONALITY OF A DB niques, the current trend towards many-core processors fa-
SERVER vors energy-efficient operation. It is possible that unused
Before we come up with a proposal how an energy-proportional cores enter a sleep mode where they just consume a fraction
system should be preferably composed, it makes sense to ex- of the power needed in idle mode. The Intel Core i7 pro-
amine existing (DB) server systems to find out how energy cessor combines both techniques by disabling unused cores
proportionality can be achieved. (especially in the case of single-threaded applications) and
by increasing the clock rate of the remaining one. Finally,
When considering a server as a whole, Spector [18] as well there are also low-energy processors (e.g. Intel Atom) avail-
as Tsirogiannis et al. [20] come to the conclusion that a able.
normal server consumes already more than 50% of its maxi-
mal power (and much more especially, when a huge memory DRAM memory Main memory is the primary concern
is present) when running in idle mode. Figure 1 illustrates when thinking about energy proportionality. As it perma-
how the power consumption looks like at different load situ- nently consumes a given amount of power (independent of
ations. It is remarkable that the power consumption (start- the load), this component is not energy-aware at all. One
ing already at 50%) quickly converges with a small increase current trend is to build large (in the range of Terabyte)
of utilization close to the peak consumption, i.e., the 100% main-memory databases2 . This will result in just the op-
level. Obviously, a server in its default settings does not posite of an energy-proportional system as RAM will be
exhibit an energy-proportional behavior at all. For these responsible for the overwhelming share of the energy con-
reasons, a closer look at the key components will be helpful. sumed by the server—at a constant rate.
Storage In contribution [20], experiments using hard disk Therefore, it is critical to evaluate how much internal mem-
RAIDs and SSD RAIDs show that, unlike hard disks, SSDs ory is needed to approximate energy proportionality without
provide an energy-proportional behavior. We also performed sacrificing drops in performance by utilizing an insufficient
some load test using a selected set of hard disks and SSDs of amount of memory.
different generations (cf. figure 2), but we draw another con-
clusion: SSDs just have a slightly better energy-proportional In a nutshell, the previous methodology using large-scale
behavior, yet at a much lower power level (1/10 of that of servers (scale-up) is still burdened by large energy consump-
hard disks). tion in idle mode.
In the recent past, several approaches have been proposed Another possibility for system engineering is scale-down /
for hard-disk-based storage subsystems, which spin down scale-out: Instead of using a single, large server, several
idle disks in order to save energy [5, 21, 22]. Depending on small-scale servers are deployed. In the literature, there have
the respective approach, data is relocated during run time in been proposals for such a network of small-scale servers like
order to increase the idle time of a disk that is already spun Amdahl blades [19], FAWN (Fast array of wimpy nodes) [1]
down. Otherwise, there is a time penalty to spin up the disk or TerraServer bricks [2]. As each server is independent, this
again. As an overall effect, energy-proportional behavior can is the appropriate granularity for scaling the whole system
be approximated [7]. In order to further decrease the power as well as the appropriate granularity to switch nodes on and
consumption, these approaches could be adapted to hybrid off. In the end, this will result in a true energy-proportional
or SSD-only storage subsystems. system (to the extent possible).
2
“SAP-Module gewinnen an Tempo” (Computer-Zeitung,
CPU Modern CPUs behave in an energy-proportional way June 22, 2009). Using main-memory data management,
to some degree. In addition to the control via PVC tech- SAP tries to speed up the response times of applications
by a factor of 100.
125% 125%
energy energy
consump‐ consump‐
tion tion
100% 100%
75% 75%
SSD1
HDD1
SSD2
HDD2
SSD3
HDD3
SSD4
50% id l
ideal 50%
ideal
25% 25%
0% 0%
0% 25% 50% 75% 100% load 0% 25% 50% 75% 100% load
Figure 2: Energy proportionality of selected hard disks and SSDs
In the next section, we will explain our vision of an energy- Another issue that needs further investigation is how much
proportional system in more detail. energy consumption is introduced by the network infrastruc-
ture and the data transmission between nodes transfer and
whether it proportionally scales with respect to the load as
4. OUR VISION well.
As it has become obvious in the preceding section that a
single (large-scale) server node can’t establish an energy- 5. CONCLUSION
proportional behavior, we will focus on the scale-out ap-
As we have shown, the current trend towards energy effi-
proach consisting of several small-scale nodes connected via
ciency and Green IT is relevant for the database research
network adapters.
community as well. Several ideas of limited scope have al-
ready been proposed; nevertheless, we believe that only a
We envision a distributed database system which runs on
holistic approach will be the road to success in the end.
several small-scale nodes. While FAWN tackles a distributed
key-value store, we will focus on a traditional relational
Present approaches try to be energy-efficient under high
database system. Although much research work on dis-
workloads or even peak load situation (e.g., explicit query
tributed database systems has delivered substantial scien-
delays). Our approach aims especially at increasing the en-
tific results and engineering techniques during more than
ergy efficiency at low load levels by introducing the concept
20 years, it is nevertheless fundamental to reevaluate this
of energy proportionality.
“body of knowledge and experience” with respect to modern
hardware and energy efficiency.
Furthermore, we want to provide some evidence whether
or not the claims of Tsirogiannis et al. [20] are true, i.e.,
As every node is constructed in a small-scale manner and,
whether our findings will support their hypothesis.
thus, consumes little energy, we have at least energy propor-
tionality at the granularity of nodes. Depending on the load
In the future, we will further explore how (distributed) data-
situation, nodes can be switched on and off, so this approach
base systems have to be designed to exploit the given system
will approximate the ideal energy-proportional system. We
architecture best. By introducing adaptivity, the database
believe that small-scale distributed systems are the key con-
system will dynamically interact with its underlying hard-
cept to achieve energy proportionality. By applying ad-hoc
ware to increase energy efficiency.
adaptivity mechanisms, energy consumption will scale with
the given load.
6. REFERENCES
At the moment, we are about to implement a first software [1] D. G. Andersen, J. Franklin, M. Kaminsky,
prototype in the context of the SIGMOD 2010 programming A. Phanishayee, L. Tan, and V. Vasudevan. FAWN: a
contest [17] whose goal is set to come up with a distributed fast array of wimpy nodes. In SOSP, pages 1–14, 2009.
database engine. After having finished the contest, we will [2] T. Barclay, J. Gray, and W. Chong. TerraServer Bricks
expand its functionality towards adaptivity and energy effi- – A High Availability Cluster Alternative, Microsoft
ciency. Research (MSR-TR-2004-107). Technical report, 2004.
[3] L. A. Barroso and U. Hölzle. The Case for
For the future, we consider an architecture which comprises Energy-Proportional Computing. Computer,
two types of specialized nodes: Data nodes for accessing 40(12):33–37, 2007.
the base relations and performing simple operations (e.g. [4] C. L. Belady. In the Data Center, Power and Cooling
selection and projection) and computation nodes for CPU- Costs More Than the IT Equipment it Supports.
intensive operations like joins. Of course, there are many Electronics Cooling. vol. 13, no. 1;
open and challenging questions while refining this approach, http://electronics-cooling.com/articles/2007/feb/a3/,
amongst others to find out how the overall energy efficiency 2007.
is affected by the data distribution or how to come up with [5] D. Colarelli and D. Grunwald. Massive arrays of idle
an energy-efficient query optimizer for distributed systems. disks for storage archives. In ACM/IEEE conference
on Supercomputing, pages 1–11. IEEE Computer [13] I. Koltsidas and S. Viglas. Flashing up the storage
Society Press, 2002. layer. PVLDB, 1(1):514–525, 2008.
[6] V. De and S. Borkar. Technology and design [14] W. Lang and J. M. Patel. Towards Eco-friendly
challenges for low power and high performance. In Database Management Systems. In CIDR, 2009.
International Symposium on Low power electronics [15] M. Nicola, I. Kogan, and B. Schiefer. An XML
and design, pages 163–168, 1999. transaction processing benchmark. In SIGMOD, pages
[7] J. Guerra, W. Belluomini, J. Glider, K. Gupta, and 937–948, 2007.
H. Pucha. Energy proportionality for storage: impact [16] P. Ranganathan. Recipe for efficiency: principles of
and feasibility. SIGOPS Operation Systems Review, power-aware computing. Communications of the
44(1):35–39, 2010. ACM, 53(4):60–67, April 2010.
[8] T. Härder, K. Schmidt, Y. Ou, and S. Bächle. [17] P. Senellart, C. Genzmer, S. Abiteboul, M. Balazinska,
Towards Flash Disk Use in Databases - Keeping S. Madden, and M. Stonebraker. SIGMOD 2010
Performance While Saving Energy? In BTW, volume Programming Contest – Distributed Query Engine.
P-144 of LNI, pages 167–186, 3 2009. http://dbweb.enst.fr/events/sigmod10contest/, 2010.
[9] S. Harizopoulos, M. A. Shah, J. Meza, and [18] A. Z. Spector. Distributed Computing at
P. Ranganathan. Energy Efficiency: The New Holy Multi-dimensional Scale. In International Middleware
Grail of Data Management Systems Research. In Conference, 2008. Keynote.
CIDR, 2009. [19] A. S. Szalay, G. C. Bell, H. H. Huang, A. Terzis, and
[10] M. P. Haustein and T. Härder. An Efficient A. White. Low-power amdahl-balanced blades for data
Infrastructure for Native Transactional XML intensive computing. SIGOPS Operation Systems
Processing. Data&Knowledge Engineering, Review, 44(1):71–75, 2010.
61(3):500–523, 6 2007. [20] D. Tsirogiannis, S. Harizopoulos, and M. Shah.
[11] J. Janukowicz, D. Reinsel, and J. Rydning. Worldwide Analyzing the Energy Efficiency of a Database Server.
Solid State Drive 2008-2012 Forecast and Analysis. In SIGMOD, 2010.
Technical report, IDC, Juni 2008. [21] C. Weddle, M. Oldham, J. Qian, A.-I. A. Wang,
[12] S.-H. Kim, D. Jung, J.-S. Kim, and S. Maeng. P. Reiher, and G. Kuenning. PARAID: A gear-shifting
HeteroDrive: Reshaping the Storage Access Pattern of power-aware RAID. ACM TOS, 3(3):13, 2007.
OLTP Workload Using SSD. In International [22] Q. Zhu, Z. Chen, L. Tan, Y. Zhou, K. Keeton, and
Workshop on Software Support for Portable Storage, J. Wilkes. Hibernator: helping disk arrays sleep
10 2009. through the winter. In SOSP, pages 177–190, 2005.