<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Performance Evaluation of RUNT Algorithm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hiroyuki Chishiro</string-name>
          <email>chishiro@ny.ics.keio.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masayoshi Takasu</string-name>
          <email>takasu@ny.ics.keio.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rikuhei Ueda</string-name>
          <email>riku@ny.ics.keio.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nobuyuki Yamasaki</string-name>
          <email>yamasaki@ny.ics.keio.ac.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Information and Computer Science, Keio University</institution>
          ,
          <addr-line>Yokohama</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <abstract>
        <p>This paper evaluates the performance of Reduction to Uniprocessor Transformation (RUNT) with Voltage and Frequency Scaling, called Static RUNT (S-RUNT) and Dynamic RUNT (D-RUNT), respectively. Simulation results show that how to assign tasks to servers in RUNT influences energy consumption and the worst-fit heuristic is the best in many cases. In addition, the idle task assignment policy saves more energy consumption in D-RUNT and D-RUNT outperforms S-RUNT if the actual case execution time of each task is shorter than its worst case execution time.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>C.3 [SPECIAL-PURPOSE AND APPLICATION-BASED
SYSTEMS]: Real-time and embedded systems</p>
    </sec>
    <sec id="sec-2">
      <title>General Terms</title>
      <p>Algorithms</p>
    </sec>
    <sec id="sec-3">
      <title>INTRODUCTION</title>
      <p>
        Real-time systems have required multiprocessors and there are
mainly two categories of multiprocessor real-time scheduling:
partitioned scheduling and global scheduling. Partitioned scheduling
assigns tasks to processors offline and there are no migratory tasks
but it guarantees only 50% processor utilization in the worst case
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In contrast, global scheduling can achieve 100% utilization
by migrating tasks among processors online but increases run-time
overhead. We are interested in optimal multiprocessor real-time
scheduling algorithms, which can achieve 100% utilization with
any implicit-deadline periodic task sets. Several optimal
multiprocessor real-time scheduling algorithms have been proposed and
Reduction to UNiprocessor (RUN) [
        <xref ref-type="bibr" rid="ref14">13</xref>
        ] outperforms other optimal
algorithms with respect to the number of preemptions/migrations.
      </p>
      <p>Since the small number of preemptions/migrations improves the
practicality of the scheduling, we focus on RUN.</p>
      <p>
        RUN transforms the multiprocessor scheduling problem into an
equivalent set of uniprocessor scheduling problems by the DUAL
and PACK operations and the detail of them is described in Section
3. After transforming offline, RUN uses Earliest Deadline First
(EDF) [
        <xref ref-type="bibr" rid="ref12">11</xref>
        ] to transform the uniprocessor scheduling into the
multiprocessor scheduling online because EDF is optimal in
implicitdeadline periodic task sets on uniprocessors. Using these
operations, RUN achieves the optimality with low overhead.
      </p>
      <p>
        Voltage and Frequency Scaling (VFS) is one of the most
popular techniques to reduce energy consumption in computer systems.
Especially, Real-Time Voltage and Frequency Scaling (RT-VFS)
can reduce energy consumption by scaling the operating frequency
and the supply voltage while meeting real-time constraints.
RTVFS is based on the essential characteristic of real-time tasks. All
tasks can be executed slowly as long as their deadlines are met.
In most of the CMOS-based modern processors, the dynamic
energy consumption E is proportional to the operating frequency f
and the square of the supply voltage V (i.e., E ∝ f V 2) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and
the maximum operating frequency depends on the supply voltage.
Therefore, RT-VFS can effectively reduce energy consumption at
a cubic order of the operating frequency. RT-VFS has two
following techniques: Real-Time Static Voltage and Frequency Scaling
(RT-SVFS) and Real-Time Dynamic Voltage and Frequency
Scaling (RT-DVFS). RT-SVFS determines the voltage and frequency
offline and does not adjust them after the system starts. RT-DVFS
can reduce energy consumption by adjusting the voltage and
frequency online, which potentially saves more energy consumption.
Changing the voltage and frequency takes some time due to I/O
operations. If the overhead of RT-DVFS is small, it is possible
to ignore overhead or incorporate it into execution time of tasks.
However, RT-DVFS may incur significant overhead in some
systems and RT-SVFS is a good solution for these systems. There is a
trade-off between energy consumption and overhead in RT-VFS.
      </p>
      <p>
        In our previous work, Reduction to UNiprocessor
Transformation (RUNT) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] was proposed to achieve optimal multiprocessor
real-time scheduling algorithm based on RUN with Voltage and
Frequency Scaling. Unfortunately, the performance of RUNT is
not evaluated.
      </p>
      <p>This paper evaluates the performance of RUNT with
RT-SVFS/RTDVFS, called Static RUNT (S-RUNT) and Dynamic RUNT
(DRUNT), respectively. Simulation results show that the saved energy
consumption strongly depends on the way to assign tasks to servers
in RUNT and the worst-fit heuristic is the most energy-efficient
in many cases. In addition, the idle task assignment policy saves
more energy consumption in D-RUNT and D-RUNT outperforms
S-RUNT if tasks are completed early.
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>SYSTEM MODEL</title>
    </sec>
    <sec id="sec-5">
      <title>Processor Model</title>
      <p>The system has M processors Π = { P1, P2, ..., PM }. Each
processor Pj is characterized by the continuous normalized
frequency αj (0 ≤ αj ≤ 1). Here, we discuss the differences
between the system model and practical environments. (i) In
practical environments, the system has the discrete frequency values
F = { f1, ..., fL | fmin = f1 &lt; ... &lt; fL = fmax } and the
discrete voltage values V = { V1, ..., VL | V1 &lt; ... &lt; VL }. We
assume that Vk corresponds to fk and the voltage is also changed
at the same time as the corresponding frequency is changed. The
lowest frequency fi ∈ F such that αj ≤ fi/fmax will be selected
to achieve the lowest energy consumption while meeting real-time
constraints. (ii) The system model assumes that no overhead
occurs at run-time. In practical environments, the scaled frequency
interferes with the scheduling even if the frequency is not changed
dynamically. The worst case overhead is included in the worst case
execution time (WCET).
2.2</p>
    </sec>
    <sec id="sec-6">
      <title>Task Model</title>
      <p>The system has a task set T = { τ1, τ2, ..., τN }, which is a set
of N periodic tasks on M processors. Each task cannot be executed
in parallel among processors. Each task τi has its WCET Ci and
period Ti. The jth instance of task τi is called job τi,j . Task τi
executed on a processor Pj requires Ci/αj processor time at every
Ti interval. The relative deadline Di is equal to its period Ti (i.e.,
implicit-deadline). All tasks must complete the execution by their
deadlines. The utilization of each task is defined as Ui = Ci/Ti
and the system utilization is defined as U = M1 Pi Ui. We assume
that all tasks may be preempted and migrated among processors at
any time, and are independent (i.e., they do not share resources and
do not have any precedence).
3.</p>
    </sec>
    <sec id="sec-7">
      <title>THE RUNT ALGORITHM</title>
      <p>
        We introduce RUNT [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], which is an optimal multiprocessor
real-time scheduling algorithm based on RUN with VFS. RUNT
supports RT-SVFS/RT-DVFS techniques on uniform/independent
VFS multiprocessor systems, called Static Uniform RUNT
(SURUNT), Static Independent RUNT (SI-RUNT), Dynamic Uniform
RUNT (DU-RUNT), and Dynamic Independent RUNT (DI-RUNT),
respectively. Due to the space limitation, the detail of these
algorithms are explained in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. When the actual case execution time
(ACET) of each task is often shorter than its WCET [
        <xref ref-type="bibr" rid="ref7">6</xref>
        ], RUNT uses
Enhanced Cycle-Conserving EDF (ECC-EDF) [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ] to reclaim slack
for reducing energy consumption. RUNT achieves a small number
of preemptions/migrations compared to RUN because these
operations are performed when every scheduling event occurs. If a task
set does not satisfy the full system utilization, idle tasks are
inserted because RUN assumes the full system utilization. An idle
task assignment policy is an important factor to reduce energy
consumption. Therefore, the idle ratio-fit was proposed in our previous
work. We explain the overview of the RUN algorithm, the
ECCEDF algorithm, and the idle ratio-fit as follows.
3.1
      </p>
    </sec>
    <sec id="sec-8">
      <title>The RUN Algorithm</title>
      <p>
        RUN [
        <xref ref-type="bibr" rid="ref14">13</xref>
        ] is an optimal multiprocessor real-time scheduling
algorithm with a small number of preemptions/migrations. We
explain the detail of RUN in offline and online phases.
      </p>
      <p>Now we introduce the RUN’s specific model because RUN has
many original parameters and assumptions to explain itself. A
system is fully utilized if the system utilization U is one. Since RUN
assumes the full system utilization, idle tasks are inserted to fill in
the slack if U &lt; 1. The total utilization of idle tasks is defined
as Uidle = M − Pi Ui. Note that each idle task has just the
parameter of utilization and does not have other parameters including
WCET and period.</p>
      <p>RUN transforms the multiprocessor scheduling to the
uniprocessor scheduling by aggregating tasks into servers S. We treat servers
as tasks with a sequence of jobs but they are not actual tasks in the
system; each server is a proxy for a collection of client tasks. When
a server is running, the processor time is used by one of its clients.
Clients of a server are scheduled via an internal scheduling
mechanism. The utilization of each server Sk is Uksrv = Pτi∈Sk Ui,
where τi ∈ Sk means that task τi is first assigned to server Sk, and
Uksrv does not exceed one.
3.1.1</p>
      <sec id="sec-8-1">
        <title>Offline Phase</title>
        <p>In an offline phase, RUN reduces the multiprocessor scheduling
to the uniprocessor scheduling by the DUAL and PACK operations.
RUN uses EDF for uniprocessor scheduling because EDF is
optimal in implicit-deadline periodic task sets on uniprocessors.</p>
        <p>The DUAL operation transforms a task τi into the dual task τi∗,
whose execution time represents the idle time of τi (i.e., Ci∗ = Ti −
Ci). The relative deadline of dual task τi∗ is equal to that of task τi.
The dual task τi∗ is executed exactly when the original task τi is idle
and vice versa. A schedule for the original task set is obtained by
blocking τi whenever τi∗ executes in the dual schedule. In addition,
the sum of utilizations of task τi and dual task τi∗ is one and the
utilization of dual task τi∗ is Ui∗ = Ci∗/Ti. The DUAL operation
reduces the number of processors whenever N − M &lt; M .</p>
        <p>The PACK operation packs dual servers into packed servers whose
utilizations do not exceed one. When N − M ≥ M , the number
of servers can be reduced by aggregating them into fewer servers
by the PACK operation. The scheme how to pack servers to fewer
servers is heuristic and the PACK operation is similar to the
partitioning scheme. Note that if assigning tasks to processors is
successful, RUN generates the same schedule as Partitioned EDF
(PEDF) and does not perform the DUAL and PACK operations.
Otherwise the DUAL and PACK operations are performed to generate
the reduction tree offline, which is used to make server scheduling
decisions online. The detail of making scheduling decisions in the
reduction tree is shown in the next subsection.</p>
        <p>In order to explain the reduction tree, we define the following
terms with respect to servers as follows. A unit server is a server
whose utilization is one. A null server is a server whose utilization
is zero. A root server is a last packed server whose utilization is
one (unit server).</p>
        <p>Packing the dual servers of packed servers can reduce the
number of servers by at least (almost) half. We perform DUAL and
PACK operations repeatedly until all packed servers become unit
servers. Now we define a REDUCE operation to be their
composition.</p>
        <p>
          DEFINITION 1 (FROM DEFINITION IV.6. IN [
          <xref ref-type="bibr" rid="ref14">13</xref>
          ]). Given a
set of servers Γ and a packing π of Γ , a REDUCE operation on a
server S in Γ, denoted by ψ(S), is the composition of the DUAL
operation ϕ with the PACK operation σ for π (i.e., ψ(S) = ϕ(σ(S))).
In addition, we define reduction level/sequence to explain the
reduction tree as follows.
        </p>
        <p>
          DEFINITION 2 (FROM DEFINITION IV.7 IN [
          <xref ref-type="bibr" rid="ref14">13</xref>
          ]). Let i ≥ 1
be an integer, Γ be a set of servers, and S be a server in Γ. The
operator ψi is recursively defined by ψ0(S) = S and ψi(S) =
ψ ◦ ψi−1(S). {ψi}i is a reduction sequence, and the server system
ψi(Γ) is said to be at reduction level i.
        </p>
        <p>S14(1),{5N*,10N*,15N*}
Note that assigning tasks to servers is defined as reduction level 0
that does not perform the REDUCE operation.</p>
        <p>Figure 1 shows the reduction tree on three processors. τi(Ci,Ti)
expresses that task τi has WCET Ci and period Ti. Tasks τ1, τ2,
τ3, τ4, and τ5 are assigned to servers S1, S2, S3, S4, and S5 at
reduction level 0, respectively. The total utilization of idle tasks is
Uidle = M − Pi Ui = 3 − 5 × 0.4 = 1. In this example, idle
tasks are assigned to servers at reduction level 0 uniformly, i.e.,
the utilization of each server is added to Uidle/N = 1/5 = 0.2,
respectively.</p>
        <p>We represent a server as S(Uksrv),{ Dk }, where Uksrv is the
utik
lization of server Sk and Dk is the deadline set of server Sk. The
deadline set includes all (absolute) deadlines of tasks in the server.
Each server sets the earliest deadline in each deadline set when the
server is released. We assign deadline sets 5N ∗, 10N ∗, 15N ∗,
10N ∗, and 5N ∗ to servers at reduction level 0, respectively, where
N ∗ means natural numbers. Servers S6, S7, S8, S9, and S10 are
generated by the DUAL operation at reduction level 1 and their
utilizations are 0.4 because these servers are dual servers of servers
S1, S2, S3, S4, and S5 at reduction level 0, respectively. In this
example, servers S6 and S7 are packed, servers S8 and S9 are packed,
and server S10 is not packed by the PACK operation. Servers S11,
S12, and S13 are generated by the DUAL operation at reduction
level 2. Finally, server S14 is generated by the PACK operation at
reduction level 2 and its utilization is one, and hence the REDUCE
operation is finished and the reduction tree is completely generated.</p>
        <p>Note that the number of root servers may become more than one
because when all servers are unit servers at the highest reduction
level, then the REDUCE operation is finished. If one server is a
unit server, its dual server is a null server, which is packed into
another server when the next PACK operation is performed.
3.1.2</p>
      </sec>
      <sec id="sec-8-2">
        <title>Online Phase</title>
        <p>In an online phase, RUN schedules servers by the following rules
and we use Figure 1 for reference.</p>
        <p>
          RULE 3 (FROM RULE IV.2 IN [
          <xref ref-type="bibr" rid="ref14">13</xref>
          ]). If a packed server is
running (circled), execute the child node with the earliest deadline
among those children with work remaining; if a packed server is
not running (not circled), execute none of its children.
σ(ψ2(Γ))
        </p>
        <p>ψ2(Γ)
σ(ψ1(Γ))
σ
φ
σ
φ
σ
ψ0</p>
        <p>
          RULE 4 (FROM RULE IV.3 IN [
          <xref ref-type="bibr" rid="ref14">13</xref>
          ]). Execute (circle) the child
(packed server) of a dual server if and only if the dual server is not
running (not circled).
        </p>
        <p>In the reduction tree, a thick arrow represents a scheduled server
and a thin arrow represents a non-scheduled server by each
parent server. If a thick arrow from a server points a task, the server
schedules the task.</p>
        <p>In Figure 1, root server S14 is always running, regardless of these
rules, because a root server is always a unit server. Next, S14 makes
scheduling decisions in EDF order and server S12 is running at this
time. Since server S12 is running, S8 and S9 are not running by
Rule 3. Since servers S11 and S13 are not running, servers S7 and
S10 are running by Rule 4. Servers S6, S8, and S9 are not running,
and hence servers S1, S3, and S4 are running by Rule 4.</p>
        <p>Figure 2 shows an example of RUN scheduling on three
processors. Each server is executed on virtual processor V PR,v , where R
represents the reduction level and v represents the virtual processor
ID at each reduction level. The task set is shown in Figure 1 and
this example shows the scheduling decisions at time 4. This system
has three processors P1, P2, and P3, reduction level 0 has three
virtual processors V P0,1, V P0,2, and V P0,3, reduction level 1 has
two virtual processors V P1,1 and V P1,2, and reduction level 2 has
one virtual processor V P2,1.</p>
        <p>RUN uses the following task-to-processor assignment scheme;
(i) leave executing tasks on their current processors, (ii) assign idle
tasks to their last-used processor, when available, to avoid
unnecessary migrations, and (iii) assign remaining tasks to free processors
arbitrarily. By this scheme, each server assigns tasks to processors
P1, P2, or P3 in Figure 2. When each task completes its
execution on one processor, the processor becomes idle until the server
of each task exhausts its budget. For example, server S5 running on
V P0,1 completes task τ5 on processor P1 at time 3 and P1 becomes
idle (executes idle task) in time interval [3,4).
3.2</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>The ECC-EDF Algorithm</title>
      <p>
        ECC-EDF [
        <xref ref-type="bibr" rid="ref10">9</xref>
        ] is an RT-DVFS technique on uniprocessors and
ensured that any implicit-deadline periodic task set T with
utilization U ≤ 1 is successfully scheduled. In addition, ECC-EDF
outperforms CC-EDF theoretically and Look-Ahead EDF [
        <xref ref-type="bibr" rid="ref13">12</xref>
        ]
experimentally with respect to energy consumption. Therefore,
ECCEDF is used in RUNT to achieve optimal multiprocessor real-time
scheduling with RT-DVFS as well as EDF is used in RUN. To
improve CC-EDF, ECC-EDF takes the elapsed time of tasks into
consideration and finds the maximum utilization saved by the slack
on completion of the task by calculating the minimum utilization
needed to process the slack by its deadline using following
Equation 1.
,
where cci is the ACET of task τi and Ei is the elapsed time of task
τi.
3.3
      </p>
    </sec>
    <sec id="sec-10">
      <title>Idle Ratio-Fit</title>
      <p>
        The idle ratio-fit [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] assigns idle tasks to servers uniformly for
reducing energy consumption in an offline phase when the system
is not fully utilized. The idle ratio-fit is inspired by optimality on
energy-efficiency of the worst-fit. Aydin and Yang proved that a
task assignment that evenly divides the total utilization among all
the processors, if it exists, will minimize the total energy
consumption, and also showed that the worst-fit task assignment
heuristic outperforms other heuristics in energy-efficiency including the
first-, next-, and best-fit [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. We define the idle ratio of Sk denoted
by I dleRatio(Sk), which is the ratio of the utilization of idle tasks
assigned to server Sk to the utilization of Sk, as follows.
      </p>
      <p>I dleRatio(Sk) =</p>
      <p>
        Ukidle
U srv ,
k
where Uksrv is the utilization of each server Sk and Ukidle is the
utilization of idle tasks assigned to server Sk. The idle ratio-fit
lowers the idle ratio of each server on average to reduce energy
consumption. Due to the space limitation, the detail of the idle
ratio-fit is shown in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
    </sec>
    <sec id="sec-11">
      <title>SIMULATION STUDIES</title>
    </sec>
    <sec id="sec-12">
      <title>Simulation Setups</title>
      <p>In this section, we evaluate RUNT through simulation studies.
This system has 16 processors (M = 16) on static/dynamic and
uniform/independent VFS systems. We use three frequency sets as
follows: F1 = { 0.5, 0.75, 1.0 } , F2 = { 0.5, 0.75, 0.83, 1.0 } , F3 =
{ 0.36, 0.55, 0.64, 0.73, 0.82, 0.91, 1.0 }. When a processor goes
to idle, the processor sets its frequency to the minimum one. For
example, when a processor using F1 goes to idle, set the operating
frequency to 0.5. Due to space limitations, we only show the results
of independent VFS techniques.</p>
      <p>This simulation uses 1, 000 task sets in each system utilization.
The system utilization U is selected from [0.3, 0.35, 0.4, ..., 1.0].
Each Ui is selected within [0.01, 0.02, 0.03, ..., 1.0]. The period Ti
of each task τi is selected within [100, 200, 300, ..., 1600]. Tasks
in each task set are ordered by decreasing utilization. The ratio of
ACET to WCET is set to the range of [0.5, 1.0] or [0.75, 1.0], or
always 1.0, represented as DI-RUNT(50%), DI-RUNT(75%), and
DI-RUNT(100%), respectively. The simulation length is the
hyperperiod of each task set.
(1)
(2)</p>
      <p>The effectiveness of RUNT is in terms of energy ratio, which is
defined as follows.</p>
      <p>Energy Ratio =
1 Z T PPk∈Π fi3
T
0</p>
      <p>M
dt
4.2</p>
    </sec>
    <sec id="sec-13">
      <title>Simulation Results</title>
      <p>4.2.1</p>
      <sec id="sec-13-1">
        <title>Task Assignment Policy</title>
        <p>First, we evaluate the task assignment policy to examine which
heuristic is the most energy-efficient in RUNT. We use the idle
ratio-fit as the idle task assignment policy.</p>
        <p>
          Figure 3 shows the simulation results of task assignment
policy in SI-RUNT and DI-RUNT. In SI-RUNT, the worst-fit reduces
energy consumption the most when U ≤ 0.6 but other heuristics
outperform the worst-fit when U &gt; 0.6 (Figures 3(a), 3(b), and
3(c)). Since SI-RUNT selects the maximum frequency among all
frequencies of servers assigned to the processor, it is necessary to
decrease them as much as possible. A simple way to realize this is
to insert idle tasks to a server, and hence the utilization of the server
can be 100%. Servers with 100% utilization can use the processors
exclusively and if the server has slack generated by inserting idle
tasks, we can decrease the frequency of the processor. Even if the
original utilization of the server which uses the processor
exclusively and inserting idle tasks may not be effective, isolating the
server increases the potential of decreasing the frequency assigned
to other processors. This idea is similar to T-N Plane
Transformation (TNPT) [
          <xref ref-type="bibr" rid="ref8 ref9">7, 8</xref>
          ] that classifies tasks into two classes: heavy
tasks and light tasks, and a heavy task uses a processor exclusively.
In contrast, the first- and best-fit tend to increase the utilization of
each server up to 100%, and hence inserting idle tasks with low
utilization can make the utilization of each server be 100%. For
this reason, when the utilization of the task set becomes large and
few servers can decrease their frequency, the first- and best-fit can
outperform the worst-fit.
        </p>
        <p>In DI-RUNT, the worst-fit reduces energy consumption the most
(Figures 3(d), 3(e), and 3(f)). The worst-fit tends to uniform the
utilization of each server, which results in well-balanced load of
each processor that can decrease the frequency effectively. When
the level of frequency becomes more fine-grained, that is to say,
the number of selectable frequency values becomes large, the
actual frequency approaches theoretically optimal value. Therefore,
energy consumption in the frequency set F3 is the lowest in all
frequency sets.
4.2.2</p>
      </sec>
      <sec id="sec-13-2">
        <title>Idle Task Assignment Policy</title>
        <p>Next we measure the effectiveness of the idle task assignment
policy, called idle ratio-fit, compared to other idle task assignment
policies. We use the worst-fit as the task assignment policy because
the worst-fit outperforms other heuristics in many cases as shown
in Section 4.2.1.</p>
        <p>Figure 4 shows the simulation results of the idle task assignment
policy with the worst-fit in SI-RUNT and DI-RUNT. In SI-RUNT,
the first-, best-, and worst-fit reduce more energy consumption than
the idle ratio-fit (Figures 4(a), 4(b), and 4(c)). This is the similar
reason to the task assignment policy discussed in Section 4.2.1.
RTSVFS needs to decrease the frequency of all servers. In DI-RUNT,
on the other hand, the idle ratio-fit achieves the best results (Figures
4(d), 4(e), and 4(f)) because the idle ratio-fit is based on the idea of
the worst-fit and assigns idle tasks to servers uniformly. Especially,
the idle ratio-fit is superior to other idle task assignment policies in
a coarse-grained frequency set such as F1 or F2 and can save more
energy as shown in Figures 4(d) and 4(e).
0.2
0
1
0.2
0
1</p>
        <p>DI-RUNT is compared to SI-RUNT with respect to energy ratio.
Table 1 shows task and idle task assignment policies in each
algorithm. We choose these policies from the results of lowest energy
ratio in Sections 4.2.1 and 4.2.2.</p>
        <p>Figure 5 shows the simulation results of SI-RUNT and DI-RUNT.
The smaller the ratio of ACET to WCET is, the more DI-RUNT can
reduce energy consumption because small ACET/WCET means a
large amount of dynamic slack and the ratio of slack in the
utilization of each server is increased, which results in decreasing the
frequency. Note that even if the ACET of each task is always equal to
its WCET, DI-RUNT outperforms SI-RUNT except for U = 1.
SIRUNT determines the frequency of each processor before starting
the system and cannot change the frequency online. On the other
hand, DI-RUNT can make use of dynamic slack, which is produced
in the early completion of tasks, to decrease the frequency. In the
case of U = 1, the system has no idle time, and hence SI-RUNT
consumes the same energy as DI-RUNT(100%).</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>CONCLUSION</title>
      <p>This paper evaluated the performance of RUNT, which is an
optimal multiprocessor real-time scheduling algorithm based on
RUN with RT-VFS. Simulation results show that RUNT can reduce
energy consumption with the worst-fit task assignment heuristic,
compared to other task assignment heuristics, in many cases.
Interestingly, the idle ratio-fit does not reduce energy consumption
compared to traditional assignment heuristics in RT-SVFS but reduces
energy consumption the most in all idle task assignment policies in
RT-DVFS.</p>
      <p>
        In future work, we will compare RUNT to other
RT-SVFS/RTDVFS techniques such as TNPT [
        <xref ref-type="bibr" rid="ref8 ref9">7, 8</xref>
        ] with respect to the energy
consumption and the number of preemptions/migrations. We will
implement RUNT to evaluate energy consumption and overhead in
the RT-Est real-time operating system [
        <xref ref-type="bibr" rid="ref6">5</xref>
        ]. We will integrate the
static/dynamic power management techniques such as [
        <xref ref-type="bibr" rid="ref11">10</xref>
        ] with
RUNT to reduce energy consumption effectively.
      </p>
    </sec>
    <sec id="sec-15">
      <title>Acknowledgement</title>
      <p>This research was supported in part by Keio Gijuku Academic
Development Funds, Keio Kougakukai, and CREST, JST.
0.2
Idle First-Fit
Idle Best-Fit
Idle Worst-Fit
Idle Ratio-Fit
0.2
Idle First-Fit
Idle Best-Fit
Idle Worst-Fit
Idle Ratio-Fit
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1</p>
      <p>System Utilization
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1</p>
      <p>System Utilization
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1</p>
      <p>System Utilization
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1</p>
      <p>System Utilization
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1</p>
      <p>System Utilization
(a) Frequency Set F1
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1</p>
      <p>System Utilization
(b) Frequency Set F2
0
0.2
Idle First-Fit
Idle Best-Fit
Idle Worst-Fit
Idle Ratio-Fit</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Andersson</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Jonsson</surname>
          </string-name>
          .
          <article-title>The Utilization Bounds of Partitioned and Pfair Static-Priority Scheduling on Multiprocessors are 50%</article-title>
          .
          <source>In Proceedings of the 15th Euromicro Conference on Real-Time Systems</source>
          , pages
          <fpage>33</fpage>
          -
          <lpage>40</lpage>
          ,
          <year>July 2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>H.</given-names>
            <surname>Aydin</surname>
          </string-name>
          and
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <article-title>Energy-Aware Partitioning for Multiprocessor Real-Time Systems</article-title>
          .
          <source>In Proceedings of the 17th International Parallel and Distributed Processing Symposium</source>
          , pages
          <fpage>113</fpage>
          -
          <lpage>121</lpage>
          , Apr.
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Burd</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. W.</given-names>
            <surname>Brodersen</surname>
          </string-name>
          .
          <article-title>Energy Efficient CMOS Microprocessor Design</article-title>
          .
          <source>In Proceedings of the 28th Annual Hawaii International Conference on System Sciences</source>
          , pages
          <fpage>288</fpage>
          -
          <lpage>297</lpage>
          , Jan.
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chishiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Takasu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Ueda</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Yamasaki</surname>
          </string-name>
          .
          <article-title>Optimal Multiprocessor Real-Time Scheduling based on RUN with Voltage and Frequency Scaling</article-title>
          .
          <source>In Proceedings of the 18th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing</source>
          , pages
          <fpage>284</fpage>
          -
          <lpage>287</lpage>
          , Apr.
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <source>tio 0.8 a R 0</source>
          .6 y g r en 0.4 E
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chishiro</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Yamasaki.</surname>
          </string-name>
          RT-Est:
          <article-title>Real-Time Operating System for Semi-Fixed-Priority Scheduling Algorithms</article-title>
          .
          <source>In Proceedings of the 2011 International Symposium on Embedded and Pervasive Systems</source>
          , pages
          <fpage>358</fpage>
          -
          <lpage>365</lpage>
          , Oct.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ernst</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Ye</surname>
          </string-name>
          .
          <article-title>Embedded Program Timing Analysis Based on Path Clustering and Architecture Classification</article-title>
          .
          <source>In Proceedings of International Conference on Computer-Aided Design</source>
          , pages
          <fpage>598</fpage>
          -
          <lpage>604</lpage>
          , Nov.
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Funaoka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kato</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Yamasaki</surname>
          </string-name>
          .
          <article-title>Energy-Efficient Optimal Real-Time Scheduling on Multiprocessors</article-title>
          .
          <source>In Proceedings of the 11th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing</source>
          , pages
          <fpage>23</fpage>
          -
          <lpage>30</lpage>
          , May
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>K.</given-names>
            <surname>Funaoka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Takeda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kato</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Yamasaki</surname>
          </string-name>
          .
          <article-title>Dynamic Voltage and Frequency Scaling for Optimal Real-Time Scheduling on Multiprocessors</article-title>
          .
          <source>In Proceedings of the 3rd IEEE International Symposium on Industrial Embedded Systems</source>
          , pages
          <fpage>27</fpage>
          -
          <lpage>33</lpage>
          ,
          <year>June 2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.-S.</given-names>
            <surname>Lee</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.-H.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>Enhanced Cycle-Conserving Dynamic Voltage Scaling for Low-Power Real-Time Operating Systems</article-title>
          .
          <source>IEICE Transactions on Information and Systems</source>
          , 97-D(3):
          <fpage>480</fpage>
          -
          <lpage>487</lpage>
          , Mar.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Legout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Pautet</surname>
          </string-name>
          .
          <article-title>Scheduling algorithms to reduce the static energy consumption of real-time systems</article-title>
          .
          <source>Real-Time Systems</source>
          ,
          <volume>51</volume>
          (
          <issue>2</issue>
          ):
          <fpage>153</fpage>
          -
          <lpage>191</lpage>
          , Mar.
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C. L.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Layland</surname>
          </string-name>
          .
          <article-title>Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment</article-title>
          .
          <source>Journal of the ACM</source>
          ,
          <volume>20</volume>
          (
          <issue>1</issue>
          ):
          <fpage>46</fpage>
          -
          <lpage>61</lpage>
          , Jan.
          <year>1973</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Pillai</surname>
          </string-name>
          and
          <string-name>
            <given-names>K. G.</given-names>
            <surname>Shin</surname>
          </string-name>
          .
          <article-title>Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems</article-title>
          .
          <source>In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles</source>
          , pages
          <fpage>89</fpage>
          -
          <lpage>102</lpage>
          , Dec.
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P.</given-names>
            <surname>Regnier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Lima</surname>
          </string-name>
          , E. Massa, G. Levin, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandt</surname>
          </string-name>
          . RUN:
          <article-title>Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor</article-title>
          .
          <source>In Proceedings of the 32th IEEE Real-Time Systems Symposium</source>
          , pages
          <fpage>104</fpage>
          -
          <lpage>115</lpage>
          , Nov.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>