=Paper= {{Paper |id=Vol-1291/ewili14_13 |storemode=property |title=Minimizing Energy Under Performance Constraints on Embedded Platforms |pdfUrl=https://ceur-ws.org/Vol-1291/ewili14_13.pdf |volume=Vol-1291 |dblpUrl=https://dblp.org/rec/conf/ewili/ImesH14 }} ==Minimizing Energy Under Performance Constraints on Embedded Platforms == https://ceur-ws.org/Vol-1291/ewili14_13.pdf
      Minimizing Energy Under Performance Constraints on
                     Embedded Platforms
Resource Allocation Heuristics for Homogeneous and Single-ISA Heterogeneous
                                   Multi-Cores
                             Connor Imes                                          Henry Hoffmann
                          University of Chicago                                  University of Chicago
                    ckimes@cs.uchicago.edu                            hankhoffmann@cs.uchicago.edu

ABSTRACT                                                           1. INTRODUCTION
This paper explores the problem of energy optimization in             Embedded systems require predictable performance such
embedded platforms. Specifically, it studies resource allo-        as real-time or quality-of-service guarantees to meet their
cation strategies for meeting performance constraints with         design requirements. At the same time, these systems are
minimal energy consumption. We present a comparison of             often limited by available energy. Embedded systems de-
solutions for both homogeneous and single-ISA heteroge-            signers must therefore contend with the constrained opti-
neous multi-core embedded systems. We demonstrate that             mization problem of meeting performance guarantees while
different hardware platforms have fundamentally different          minimizing energy consumption.
performance/energy tradeoff spaces. As a result, minimiz-             To support systems designers, embedded processors are
ing energy on these platforms requires substantially differ-       becoming increasingly configurable. These processors ex-
ent resource allocation strategies. Our investigations reveal      pose a number of components which can be managed in
that one class of systems requires a race-to-idle heuristic to     software to change the system’s performance/power trade-
achieve optimal energy consumption, while another requires         offs. Embedded operating systems could potentially reduce
a never-idle heuristic to achieve the same. The differences        the burden of energy minimization by automatically schedul-
are dramatic: choosing the wrong strategy can increase en-         ing resource usage to meet the performance constraint and
ergy consumption by over 2× compared to optimal.                   minimize energy.
                                                                      Unfortunately, scheduling multiple resources, each with
                                                                   their own unique power and performance tradeoffs, is a dif-
Categories and Subject Descriptors                                 ficult problem. In practice, most designers turn to heuristic
C.4 [Performance of Systems]: Modeling Techniques;                 solutions that ensure the performance targets are met and
I.2.8 [Problem Solving, Control Methods, Search]:                  approximate the minimal energy solution. The race-to-idle
Heuristic Methods, Scheduling                                      heuristic, in particular, is often employed because it is easy
                                                                   to implement and has achieved acceptable results in prac-
                                                                   tice. This heuristic simply makes all resources available to
General Terms                                                      an application until it completes and then idles the system,
Experimentation, Measurement, Performance                          taking advantage of low-power states.
                                                                      As the diversity of embedded systems increases, it is in-
                                                                   creasingly unlikely that any single heuristic will perform well
Keywords                                                           across all systems. In this paper, we investigate heuristics for
Power-aware computing, energy-aware computing, resource            scheduling on configurable embedded architectures to meet
allocation, optimization, heterogeneous architectures              application performance constraints and minimize energy.
                                                                   We first discuss a formulation of the scheduling problem as
 This work was funded by the U.S. Government under the             a linear program which is general enough to capture schedul-
DARPA UHPC program and by the Dept. of Energy un-                  ing across a variety of configurable systems. We then discuss
der DOE DE-AC02-06CH11357. The views and conclusions               two heuristics and describe their relationship to the linear
contained herein are those of the authors and should not           program. Specifically, we compare race-to-idle to never-idle,
be interpreted as representing the official policies, either ex-   which tries to keep the system busy until the task deadline.
pressed or implied, of the U.S. Government.                        We capture empirical data from four benchmarks on two
                                                                   embedded systems and derive an oracle representing opti-
                                                                   mal energy consumption for each benchmark and system.
                                                                   We then model the energy consumption of the race-to-idle
                                                                   and never-idle heuristics and compare them to the oracles.
                                                                      Our two systems are: a Sony Vaio tablet, with an In-
                                                                   tel Haswell processor, and an ODROID development board,
EWiLi’14, November 2014, Lisbon, Portugal.                         with an ARM big.LITTLE processor. Our results indicate
Copyright retained by the authors.                                 that no single heuristic is effective on both platforms, and
.
the differences are striking. On the Vaio, the race-to-idle        3. ENERGY OPTIMIZATION
heuristic is near optimal, while the never-idle heuristic con-        Our goal is to generalize the problem of allocating system
sumes 28% more energy than optimal on average. In con-             resources to complete a task by a deadline while minimiz-
trast, on the ODROID, never-idle is within 15% of optimal,         ing energy. We assume a task consists of some amount of
while race-to-idle consumes 2.43× more energy. These re-           work to be accomplished. We assume that the system is
sults hold despite the fact that the ODROID’s idle power           configurable – a configuration represents an assignment of
consumption is only a small fraction of the Vaio’s. Although       resources to a task. In a heterogeneous system, for exam-
counter-intuitive, the race-to-idle approach is actually bet-      ple, a configuration could represent the assignment of a core
ter on the system with higher idle power. Our study indi-          type and clockspeed. As more resources can be assigned to a
cates that a general, portable resource allocator for embed-       task, the number of possible configurations increases. Each
ded systems must be flexible enough to match the resource          configuration produces a different performance and power
allocation strategy to the characteristics of the hardware.        consumption. We assume that tasks can be assigned time
                                                                   with different system configurations. The energy optimiza-
2. BACKGROUND                                                      tion problem, then, is to assign a time to each possible con-
   Energy optimization under performance constraints is a          figuration such that the total work accomplished is equal to
widely studied problem both in theory and in practice.             the task’s work and the total energy consumption is min-
   Embedded hardware platforms support energy manage-              imized. Some configurations may be assigned zero time,
ment by exposing configurable components. For example,             meaning the task will not make use of them.
dynamic voltage and frequency scaling (DVFS) gives soft-
ware control over processor speed [20, 22], which can then
                                                                   3.1   Problem Formulation
be tuned to ensure that performance targets are met while             We build on prior work that expressed this problem as a
energy is minimized. Theoretical results have produced op-         linear program [8]. We assume that a task starts at time 0,
timal uniprocessor scheduling algorithms for DVFS [1].             has a workload of W work units, and a deadline at time t.
   As energy reduction has become increasingly important,          The system has C configurations such that c ∈ {0, . . . , C −
processors have become increasingly configurable. For ex-          1}. Each configuration has a performance (work rate) rc and
ample, some processors augment DVFS with a sleep state,            a power consumption pc . By convention, we denote c = 0
allowing a system with no work to reduce power consump-            to be the system idle state with r0 = 0 and p0 = pidle . This
tion [12]. For even a single core system with a low-power idle     idle power consumption is a constant property of the system.
state and DVFS, optimal energy scheduling is intractable,          Also by convention, we denote c = C −1 as the configuration
but approximation algorithms have been developed that are          that allocates all resources to the task. For example, in a
within a bounded distance from optimal [2].                        two core system with four different clock speeds, there are
   Despite this difficulty, embedded systems now include not       nine configurations (2cores × 4speeds + idle), and c = 8 is
only DVFS and idle states, but also multiple processing cores      the assignment of all cores at the highest speed.
and even heterogeneous multi-cores, each with different ca-           We schedule time tc in each configuration where 0 ≤ tc ≤
pabilities [13, 14]. Even for a single application, scheduling     t, ∀c. Each configuration contributes tc · rc work and con-
for multiple cores, core types, DVFS settings, and idle states     sumes tc · pc energy. Energy minimization under a perfor-
is challenging, but worth pursuing. Empirical evidence has         mance constraint can then be formulated as the following
repeatedly shown that configuring across multiple resources        linear program:
provides greater energy efficiency than working with only a                             P
                                                                            minimize      c tc · p c                          (1)
single component[7, 9, 10, 17, 18].
   Case studies done in the early and middle part of the 2000s            subject to
                                                                                       P
found that the complexity of theoretically optimal sched-                                 c tc · r c   =W                         (2)
                                                                                          P
ulers provided little to no benefit in practice [3, 16, 19, 21].                             c tc      =t                         (3)
Instead, these studies indicated the well-known race-to-idle
                                                                                       t ≥ tc ≥ 0,     for c = 0, . . . , C − 1   (4)
heuristic, which allocates all resources to complete a task
and then idles until the next task is ready, is often close        Equation 1 is the objective function representing total en-
to optimal. More recent studies, however, suggest that this        ergy consumption. Equation 2 is the constraint that all work
trend is starting to reverse; they call into question the effi-    is completed, while Equations 3 and 4 ensure the deadline
cacy of the race-to-idle heuristic [5, 8, 15]. These contradic-    is respected.
tory results suggest that it is time to re-evaluate resource          This is an entirely general formulation of the problem.
allocation approaches and re-examine the potential energy          It is applicable to any configurable system, deadline, and
savings of more sophisticated algorithms while keeping in          workload. The issue is that as system complexity (i.e., the
mind practical implications.                                       number of configurable components) grows, the number of
   In this paper, we express the problem of allocating re-         configurations increases exponentially. Thus, this problem
sources in a configurable system as a linear optimization          is often difficult to solve in practice. The great irony is
which minimizes energy subject to a performance constraint.        that solving this problem would reduce energy consumption,
We then consider two common heuristic solutions to the             but embedded systems are often too resource constrained to
problem and compare their performance on two different em-         tackle such a difficult problem in the first place.
bedded systems. Our results indicate that the effectiveness
of the heuristics is entirely system dependent. Therefore,         3.2   Heuristic Scheduling Strategies
a generalized approach that works across a range of sys-             Given the difficulty of solving this problem in practice,
tems will need to be more sophisticated than either heuristic      we consider heuristic solutions. Interestingly, several well-
alone.                                                             known heuristics map directly onto the structure of this op-
timization problem. These heuristic solutions meet the con-                    Table 1: System power characteristics.
straints – they complete the work by the deadline – but
may consume more energy than the true optimal solution.                    System     Idle Power   Min Power     Max Power
                                                                            Vaio        2.50 W      3.04 W        12.20 W
Specifically, this paper considers two heuristics:
                                                                          ODROID        0.12 W      0.17 W        10.16 W
   • race-to-idle: Also known as race-to-complete or race-
     to-halt, this heuristic makes all resources available (i.e.,                   Table 2: System configurations.
     schedules all time in configuration c = C − 1) until the
     task completes and then idles (i.e., schedules remain-
     ing time in c = 0) until the next task arrives. Formally,                                Vaio
     the time spent in each configuration is:                              Configuration       Settings       Max Speedup
                                                                           clock speed               11               2.72
                                        W                                  cores                      2               1.81
                     tC−1         =                           (5)
                                      rC−1                                 hyperthreads               2               1.10
                     tidle        =   t − tC−1                (6)                           ODROID
                                                                           Configuration       Settings       Max Speedup
                       tc         =   0, ∀c 6= C − 1, idle    (7)          big cores                  4               6.10
                                                                           big core speeds            9               1.97
   • never-idle: This heuristic attempts to keep the sys-                  LITTLE cores               4               3.94
     tem busy (but perhaps not fully utilized) and complete                LITTLE core speeds         8               2.40
     the work just at the deadline. This strategy schedules
     time in two configurations: the lowest power state that
                                                                    is a Sony VAIO SVT11226CXB Tablet PC with a dual core
     is just faster than necessary, hi, and the most efficient
                                                                    Intel Haswell processor with hyperthreading that supports
     state that is slower than necessary, lo:
                                                                    eleven DVFS settings ranging from 600 MHz to 1.501 GHz.
                hi    =       arg min{pk |rk ≥ W/t}           (8)   The heterogeneous system is an ODROID-XU+E [6], an
                                  k∈C                               ARM big.LITTLE development platform with the Samsung
                lo    =       arg max{rk /pk |rk < W/t}       (9)   Exynos5 Octa SoC containing Cortex-A15 and Cortex-A7
                                  k∈C
                                                                    quad core CPUs. The Cortex-A15, the big cluster, supports
      The time spent in each state is then:                         nine DVFS settings ranging from 800 MHz to 1.6 GHz, while
                                                                    the Cortex-A7, the LITTLE cluster, supports eight DVFS
                                        W − rlo · t
                          thi     =                          (10)   settings ranging from 500 MHz to 1.2 GHz. Both devices
                                         rhi − rlo                  run Ubuntu Linux 14.04 LTS, with the Vaio using Linux
                                        rhi · t − W                 kernel 3.13.0 and the ODROID using Linux kernel 3.4.91.
                          tlo     =                          (11)
                                         rhi − rlo                     To measure power on the Vaio we use the Haswell pro-
                             tc   =     0, ∀c 6= hi, lo      (12)   cessor’s Model-Specific Register (MSR) which tracks energy
                                                                    consumption. The ODROID has four integrated sensors [11]
   These two strategies place different burdens on imple-           that we use to monitor the A15 cluster, the A7 cluster, the
menters. Race-to-idle is exceptionally easy to implement.           DRAM, and the GPU. Table 1 presents the power character-
It simply allocates all resources and then transitions to the       istics captured from both devices, including the idle power
idle state when the work completes as defined in Equations 5        as well as the minimum and maximum power usage during
and 6. Implementing race-to-idle requires no insight into the       execution. The cpufrequtils interface is used to control the
performance or power delivered by the system – it only re-          DVFS settings on each device.
quires recognizing when the task has finished. This heuristic          We use four benchmark applications from PARSEC [4] -
is also portable – the same strategy can be easily imple-           bodytrack, ferret, swaptions, and x264. These benchmarks
mented on any system. In contrast, the never-idle heuristic         represent a variety of applications with performance require-
requires more insight into the application’s performance and        ments that one can reasonably expect to use on an embedded
system power in different configurations. Specifically, it re-      system. Both bodytrack and x264 process video data and
quires selecting the hi and lo states and it may be harder          could be required to match the performance of an on-board
to port from system to system as the interfaces to different        camera. Ferret is a content-based search engine for non-text
configurations may change. In addition, some configurable           data types. Swaptions is a financial pricing application that
components may not be available on different systems.               is often run with performance and energy constraints to en-
   The next section evaluates these heuristics.                     sure that the prices it outputs are timely while minimizing
                                                                    the electricity cost of producing them.
4. EMPIRICAL EVALUATION
  In this section we describe two embedded platforms and            4.2     Methodology
our approach to characterizing their properties. We then              For these experiments, a configuration is a unique com-
analyze the results and discover that the two platforms have        bination of system components and their settings. Table 2
contrasting performance/energy tradeoff spaces that require         presents the configuration parameters for each system. In
different heuristics to achieve optimal energy efficiency.          total there are 44 configurations on the Vaio and 68 on the
                                                                    ODROID. On both platforms a clockspeed setting is applied
4.1    Evaluation Platforms                                         to all active cores concurrently, meaning there are no config-
  Two different embedded platforms are evaluated - one              urations that include different clockspeeds on different cores.
with homogeneous multi-cores and the other with single-             The reason there are not more configurations on the ODR-
ISA, heterogeneous multi-cores. The homogeneous system              OID is because the system does not support executing on
                           (a) bodytrack                                               (b) ferret




                            (c) swaptions                                               (d) x264

          Figure 1: Speedup and normalized energy efficiency for different benchmarks on the Vaio and ODROID.


the big and LITTLE clusters simultaneously.                      cuted in all configurations on each device, and we measure
   To collect performance and power metrics, we modify the       power and performance. Our goal is to understand the rela-
Heartbeats API [10] to capture energy readings in addition       tionship between energy efficiency and performance for each
to performance statistics. Each benchmark is then edited to      device.
produce heartbeats at intervals appropriate for their respec-       In Figure 1 we plot the performance vs. energy efficiency
tive tasks. The modified Heartbeats library captures energy      curves for the two devices on the same chart for each bench-
readings from the MSR on the Vaio and uses the ODROID’s          mark. To make the charts readable, any state that delivers
embedded power sensors to calculate energy consumption.          less performance for the same energy efficiency is discarded
The fact that one device records energy while the other offers   (so there are fewer points than configurations in the figures).
power measurements already demonstrates some complexity          The X-axis (Speedup) describes the performance improve-
in designing a general-purpose solution for meeting perfor-      ment for each state, normalized to the lowest performing
mance targets with power/energy constraints. Indeed, this        state along each curve. Using the x264 benchmark as an
additional complexity compelled us to abstract the compo-        example, the amount of work to be done is the number of
nent that reads (or calculates) energy from the rest of the      frames to process and the performance (work rate) is mea-
Heartbeats library in order to build a more generic system       sured in frames per second (FPS). The Y-axis (Energy Effi-
capable of capturing these metrics on diverse platforms.         ciency) is defined as performance/power and is normalized
   By varying the settings in Table 2, we execute the four       to the least energy-efficient state for each curve. In the x264
benchmarks in all possible configurations on each system.        benchmark, energy efficiency is categorized as the number
For each benchmark the number of worker threads is fixed         of frames processed per Joule of energy ( WF atts
                                                                                                                 PS
                                                                                                                     = fJoule
                                                                                                                         rames
                                                                                                                               ).
to match the maximum number of virtual cores (four in both       For both axes, larger values are better.
cases). This choice is predicated on the assumption that the        In each chart of Figure 1 the Vaio’s energy efficiency in-
software either has a predetermined number of threads to         creases over the baseline along with its performance. In
match the platform it is designed for, or the software is able   contrast, the ODROID’s energy efficiency drops as it moves
to determine the hardware configuration prior to performing      to higher performing states. The area without points ap-
its main task. In each execution, the Heartbeats library         proximately halfway through the ODROID curve on each
records the performance and power.                               plot is a result of the big.LITTLE architecture and the dis-
                                                                 connect in performance and energy efficiency between the
4.3   Performance/Energy Tradeoff Spaces                         ODROID’s smaller and larger cores. Clearly, these devices
  As mentioned previously, the four benchmarks are exe-          have completely different performance and energy efficiency
                               (a) Vaio                                               (b) ODROID

Figure 2: Never-idle and race-to-idle heuristics for the Vaio and ODROID with a performance target of 50% of their respective
maximums. The Y-axis is normalized to optimal.


tradeoffs, so each will need a different resource allocation      retical systems. All else being equal, the energy efficiency
strategy to minimize energy.                                      of the first system scales proportionally to its resource con-
  The next section demonstrates how much energy can be            sumption, and the energy efficiency of the second scales in-
saved by picking the right strategy.                              versely to its resource consumption. Both of these theoret-
                                                                  ical systems obviously require different strategies for mini-
4.4    Energy Optimization Using Heuristics                       mizing energy usage while meeting performance targets. For
   Using the prior results we derive an oracle that, given        maximum energy efficiency, the first system will want to use
a performance target, calculates the minimum amount of            its relatively efficient high-power states as long as possible,
energy required to complete each benchmark on each sys-           so it benefits from using race-to-idle. Conversely, the second
tem. Energy consumption is then modeled for the never-idle        system prefers to use its relatively efficient low-power states
and race-to-idle heuristics on both platforms for each of the     as long as possible, so it benefits from using never-idle.
benchmarks. Figure 2 presents the results for a performance          For both theoretical systems, as the performance target
target of 50% of each device’s maximum performance on             increases toward capacity and a race-to-idle heuristic is used,
each benchmark. Energy is normalized to the oracle’s mini-        more time is required in high-power execution states to com-
mum energy consumption for each benchmark and platform.           plete the task. Less time is then spent in the low-power idle
Lower values are better.                                          state. This is already known to be optimal for the first sys-
   Figure 2a indicates that the race-to-idle heuristic is near-   tem. The second system, however, now spends less time
optimal for the Vaio tablet. Figure 2b shows that the ODR-        in its relatively energy-inefficient low-power idle state, and
OID is more energy efficient using the never-idle heuristic       a higher energy efficiency is achieved than at lower perfor-
than race-to-idle. Therefore, the ODROID requires a dif-          mance targets where more time is spent idling. Likewise,
ferent approach to completing these tasks on time. These          if a never-idle heuristic is used on both systems, a larger
results follow from the tradeoffs spaces in Figure 1. On the      percentage of each system’s resources are required in order
Vaio, speeding up increases energy efficiency, so it is better    to meet the performance target. The gap between the uti-
to complete work faster using race-to-idle. On the ODR-           lized resources and total available resources closes, result-
OID, slowing down increases energy efficiency, so it is better    ing in fewer resources being unused. This is already known
to complete the work just at the deadline. This relation-         to be optimal for the second system. Now the first system
ship holds despite the fact the ODROID has much lower             uses its relatively energy-efficient high-power states and thus
idle power.                                                       achieves a higher energy efficiency than at lower performance
   The effects of choosing the right strategy are dramatic.       targets where less energy-efficient execution states are used.
On the Vaio, race-to-idle is only slightly worse than opti-       For both race-to-idle and never-idle heuristics, there is less
mal for all benchmarks., but the never-idle strategy con-         opportunity to save energy at higher performance targets.
sumes 28% more energy than optimal on average. On the             We then conclude that the heuristic strategy becomes less
ODROID, never-idle consumes 14% more energy than opti-            crucial as the performance target increases toward system
mal, but race-to-idle increases energy consumption by 2.43×       capacity.
on average. These results indicate that choosing the wrong           To verify this conclusion, we model the energy usage for a
strategy is not a trivial issue, but severely impacts energy      variety of performance targets across the range of attainable
consumptions.                                                     targets for the Vaio and ODROID, each of which we believe
                                                                  exhibit similar properties to one of the two theoretical sys-
4.5    Sensitivity to Performance Target                          tems. Using the x264 benchmark as an example, Figure 3
                                                                  presents the energy used by each heuristic relative to the or-
  In the previous section we set a performance target of          acle for performance targets ranging from 5% to 95% of each
50% of each system’s performance capacity. This section           system’s performance capacity (lower values are better). It
evaluates each heuristic’s sensitivity to this target.            is obvious again which heuristic is better for each system -
  For simplicity, we begin by considering two simple theo-
                                  (a) Vaio                                                   (b) ODROID

                     Figure 3: Normalized energy for various performance targets using the x264 benchmark.


the Vaio benefits from race-to-idle and the ODROID from                 [2]   S. Albers and A. Antoniadis. “Race to idle: new algorithms for
never-idle. We see for both systems that the lower perfor-                    speed scaling with a sleep state”. In: SODA. 2012.
                                                                        [3]   L. A. Barroso and U. Hölzle. “The Case for Energy-Proportional
mance targets suffer the most excess energy consumption                       Computing”. In: IEEE Computer 40 (2007).
over the baseline when an inefficient heuristic is used. As             [4]   C. Bienia et al. “The PARSEC Benchmark Suite: Characteri-
the performance target approaches capacity, the two heuris-                   zation and Architectural Implications”. In: PACT. 2008.
tics converge at the optimal energy usage for that target, as           [5]   A. Carroll and G. Heiser. “Mobile Multicores: Use Them or
                                                                              Waste Them”. In: Proceedings of the 2013 Workshop on Power
anticipated.                                                                  Aware Computing and Systems (HotPower’13). Farmington,
   Results for the other three benchmarks follow the same                     PA, USA, 2013, p. 12.
trends, but are omitted for space. This sensitivity analysis            [6]   HardKernel. http://www.hardkernel.com/main/products/prdt\
demonstrates that our conclusions hold across a range of                      _info.php?g\_code=G137463363079.
                                                                        [7]   U. Hoelzle and L. A. Barroso. The Datacenter as a Computer:
performance targets. Picking the right strategy is essential                  An Introduction to the Design of Warehouse-Scale Machines.
for any system except those that are always fully utilized.                   1st. Morgan and Claypool Publishers, 2009.
                                                                        [8]   H. Hoffmann. “Racing vs. Pacing to Idle: A Comparison of
                                                                              Heuristics for Energy-aware Resource Allocation”. In: HotPower.
5. CONCLUSIONS                                                                2013.
   This paper evaluates two embedded platforms that demon-              [9]   H. Hoffmann et al. “Self-aware computing in the Angstrom pro-
strate a diversity in performance and energy efficiency trade-                cessor”. In: DAC. 2012.
                                                                       [10]   H. Hoffmann et al. “A Generalized Software Framework for
off spaces. One platform improves in energy efficiency as it                  Accurate and Efficient Managment of Performance Goals”. In:
moves to higher performance states while the other experi-                    EMSOFT. 2013.
ences a reduction in energy efficiency as its performance in-          [11]   T. Instruments. http://www.ti.com/product/ina231.
creases. Applying a race-to-idle heuristic to the first achieves       [12]   S. Irani et al. “Algorithms for Power Savings”. In: ACM Trans.
                                                                              Algorithms 3.4 (Nov. 2007).
near-optimal energy consumption, but a never-idle heuristic
                                                                       [13]   B. Jeff. “Big.LITTLE system architecture from ARM: saving
achieves comparable results on the other. Different resource                  power through heterogeneous multiprocessing and task context
allocation strategies are clearly necessary for platforms ex-                 migration”. In: DAC. 2012, pp. 1143–1146.
hibiting these differing characteristics in order to attain op-        [14]   R. Kumar et al. “Processor Power Reduction Via Single-ISA
                                                                              Heterogeneous Multi-Core Architectures”. In: Computer Ar-
timal energy efficiency while meeting performance targets.                    chitecture Letters 2.1 (2003), pp. 2–2. issn: 1556-6056. doi:
These results indicate that a general-purpose solution to this                10.1109/L-CA.2003.6.
problem must be able to support inherently different perfor-           [15]   E. Le Sueur and G. Heiser. “Slow Down or Sleep, That is the
mance/energy tradeoff spaces.                                                 Question”. In: Proceedings of the 2011 USENIX Annual Tech-
                                                                              nical Conference. Portland, OR, USA, 2011.
   It is also reasonable to expect that, even on a single plat-        [16]   J. D. Lin et al. “Real-energy: A New Framework and a Case
form, applications with different processing requirements may                 Study to Evaluate Power-aware Real-time Scheduling Algorithms”.
have different performance/energy tradeoff spaces. There-                     In: ISLPED. 2010.
fore, a single heuristic may not be near-optimal for all ap-           [17]   M. Maggio et al. “Power Optimization in Embedded Systems
                                                                              via Feedback Control of Resource Allocation”. In: IEEE Trans.
plications on that platform. In such a scenario, a general-                   on Control Systems Technology 21.1 (2013).
purpose solution must also support different behavior for              [18]   D. Meisner et al. “Power management of online data-intensive
various applications. Future work will identify applications                  services”. In: ISCA (2011).
and systems that exhibit behavior different than those pre-            [19]   A. Miyoshi et al. “Critical Power Slope: Understanding the Run-
                                                                              time Effects of Frequency Scaling”. In: ICS. 2002.
sented here. We will then evaluate whether new behaviors               [20]   T. Pering et al. “The simulation and evaluation of dynamic
require new heuristics or whether the heuristics are robust                   voltage scaling algorithms”. In: ISLPED. 1998.
for a given platform.                                                  [21]   S. Saewong and R. Rajkumar. “Practical voltage-scaling for
                                                                              fixed-priority RT-systems”. In: RTAS. 2003.
                                                                       [22]   M. Weiser et al. “Scheduling for reduced CPU energy”. In: Mo-
References                                                                    bile Computing (1996).
 [1]   S. Albers. “Algorithms for Dynamic Speed Scaling”. In: STACS.
       2011, pp. 1–11.