=Paper= {{Paper |id=Vol-1464/ewili15_3 |storemode=property |title=Performance Evaluation of RUNT Algorithm |pdfUrl=https://ceur-ws.org/Vol-1464/ewili15_3.pdf |volume=Vol-1464 |dblpUrl=https://dblp.org/rec/conf/ewili/ChishiroTUY15 }} ==Performance Evaluation of RUNT Algorithm== https://ceur-ws.org/Vol-1464/ewili15_3.pdf
                   Performance Evaluation of RUNT Algorithm

              Hiroyuki Chishiro, Masayoshi Takasu, Rikuhei Ueda, and Nobuyuki Yamasaki
                  Department of Information and Computer Science, Keio University, Yokohama, Japan
                                   {chishiro,takasu,riku,yamasaki}@ny.ics.keio.ac.jp




ABSTRACT                                                               Since the small number of preemptions/migrations improves the
This paper evaluates the performance of Reduction to Uniproces-        practicality of the scheduling, we focus on RUN.
sor Transformation (RUNT) with Voltage and Frequency Scaling,             RUN transforms the multiprocessor scheduling problem into an
called Static RUNT (S-RUNT) and Dynamic RUNT (D-RUNT),                 equivalent set of uniprocessor scheduling problems by the DUAL
respectively. Simulation results show that how to assign tasks to      and PACK operations and the detail of them is described in Section
servers in RUNT influences energy consumption and the worst-fit        3. After transforming offline, RUN uses Earliest Deadline First
heuristic is the best in many cases. In addition, the idle task as-    (EDF) [11] to transform the uniprocessor scheduling into the mul-
signment policy saves more energy consumption in D-RUNT and            tiprocessor scheduling online because EDF is optimal in implicit-
D-RUNT outperforms S-RUNT if the actual case execution time of         deadline periodic task sets on uniprocessors. Using these opera-
each task is shorter than its worst case execution time.               tions, RUN achieves the optimality with low overhead.
                                                                          Voltage and Frequency Scaling (VFS) is one of the most popu-
                                                                       lar techniques to reduce energy consumption in computer systems.
Categories and Subject Descriptors                                     Especially, Real-Time Voltage and Frequency Scaling (RT-VFS)
C.3 [SPECIAL-PURPOSE AND APPLICATION-BASED SYS-                        can reduce energy consumption by scaling the operating frequency
TEMS]: Real-time and embedded systems                                  and the supply voltage while meeting real-time constraints. RT-
                                                                       VFS is based on the essential characteristic of real-time tasks. All
                                                                       tasks can be executed slowly as long as their deadlines are met.
General Terms                                                          In most of the CMOS-based modern processors, the dynamic en-
Algorithms                                                             ergy consumption E is proportional to the operating frequency f
                                                                       and the square of the supply voltage V (i.e., E ∝ f V 2 ) [3], and
Keywords                                                               the maximum operating frequency depends on the supply voltage.
                                                                       Therefore, RT-VFS can effectively reduce energy consumption at
Optimal Multiprocessor Real-Time Scheduling, Multiprocessor Sys-       a cubic order of the operating frequency. RT-VFS has two follow-
tems, Real-Time Systems, RUN Algorithm                                 ing techniques: Real-Time Static Voltage and Frequency Scaling
                                                                       (RT-SVFS) and Real-Time Dynamic Voltage and Frequency Scal-
1.    INTRODUCTION                                                     ing (RT-DVFS). RT-SVFS determines the voltage and frequency
    Real-time systems have required multiprocessors and there are      offline and does not adjust them after the system starts. RT-DVFS
mainly two categories of multiprocessor real-time scheduling: par-     can reduce energy consumption by adjusting the voltage and fre-
titioned scheduling and global scheduling. Partitioned scheduling      quency online, which potentially saves more energy consumption.
assigns tasks to processors offline and there are no migratory tasks   Changing the voltage and frequency takes some time due to I/O
but it guarantees only 50% processor utilization in the worst case     operations. If the overhead of RT-DVFS is small, it is possible
[1]. In contrast, global scheduling can achieve 100% utilization       to ignore overhead or incorporate it into execution time of tasks.
by migrating tasks among processors online but increases run-time      However, RT-DVFS may incur significant overhead in some sys-
overhead. We are interested in optimal multiprocessor real-time        tems and RT-SVFS is a good solution for these systems. There is a
scheduling algorithms, which can achieve 100% utilization with         trade-off between energy consumption and overhead in RT-VFS.
any implicit-deadline periodic task sets. Several optimal multi-          In our previous work, Reduction to UNiprocessor Transforma-
processor real-time scheduling algorithms have been proposed and       tion (RUNT) [4] was proposed to achieve optimal multiprocessor
Reduction to UNiprocessor (RUN) [13] outperforms other optimal         real-time scheduling algorithm based on RUN with Voltage and
algorithms with respect to the number of preemptions/migrations.       Frequency Scaling. Unfortunately, the performance of RUNT is
                                                                       not evaluated.
                                                                          This paper evaluates the performance of RUNT with RT-SVFS/RT-
                                                                       DVFS, called Static RUNT (S-RUNT) and Dynamic RUNT (D-
                                                                       RUNT), respectively. Simulation results show that the saved energy
                                                                       consumption strongly depends on the way to assign tasks to servers
                                                                       in RUNT and the worst-fit heuristic is the most energy-efficient
                                                                       in many cases. In addition, the idle task assignment policy saves
                                                                       more energy consumption in D-RUNT and D-RUNT outperforms
EWiLi ’15, October 8th, 2015, Amsterdam, The Netherlands.              S-RUNT if tasks are completed early.
Copyright retained by the authors.
2.    SYSTEM MODEL                                                        the slack if U < P  1. The total utilization of idle tasks is defined
                                                                          as Uidle = M − i Ui . Note that each idle task has just the pa-
2.1 Processor Model                                                       rameter of utilization and does not have other parameters including
   The system has M processors Π = { P1 , P2 , ..., PM }. Each            WCET and period.
processor Pj is characterized by the continuous normalized fre-              RUN transforms the multiprocessor scheduling to the uniproces-
quency αj (0 ≤ αj ≤ 1). Here, we discuss the differences be-              sor scheduling by aggregating tasks into servers S. We treat servers
tween the system model and practical environments. (i) In prac-           as tasks with a sequence of jobs but they are not actual tasks in the
tical environments, the system has the discrete frequency values          system; each server is a proxy for a collection of client tasks. When
F = { f1 , ..., fL | fmin = f1 < ... < fL = fmax } and the dis-           a server is running, the processor time is used by one of its clients.
crete voltage values V = { V1 , ..., VL | V1 < ... < VL }. We as-         Clients of a server are scheduled via an internal scheduling
                                                                                                                                    P mech-
sume that Vk corresponds to fk and the voltage is also changed            anism. The utilization of each server Sk is Uksrv =          τi ∈Sk Ui ,
at the same time as the corresponding frequency is changed. The           where τi ∈ Sk means that task τi is first assigned to server Sk , and
lowest frequency fi ∈ F such that αj ≤ fi /fmax will be selected          Uksrv does not exceed one.
to achieve the lowest energy consumption while meeting real-time
constraints. (ii) The system model assumes that no overhead oc-           3.1.1 Offline Phase
curs at run-time. In practical environments, the scaled frequency            In an offline phase, RUN reduces the multiprocessor scheduling
interferes with the scheduling even if the frequency is not changed       to the uniprocessor scheduling by the DUAL and PACK operations.
dynamically. The worst case overhead is included in the worst case        RUN uses EDF for uniprocessor scheduling because EDF is opti-
execution time (WCET).                                                    mal in implicit-deadline periodic task sets on uniprocessors.
                                                                             The DUAL operation transforms a task τi into the dual task τi∗ ,
2.2 Task Model                                                            whose execution time represents the idle time of τi (i.e., Ci∗ = Ti −
   The system has a task set T = { τ1 , τ2 , ..., τN }, which is a set    Ci ). The relative deadline of dual task τi∗ is equal to that of task τi .
of N periodic tasks on M processors. Each task cannot be executed         The dual task τi∗ is executed exactly when the original task τi is idle
in parallel among processors. Each task τi has its WCET Ci and            and vice versa. A schedule for the original task set is obtained by
period Ti . The j th instance of task τi is called job τi,j . Task τi     blocking τi whenever τi∗ executes in the dual schedule. In addition,
executed on a processor Pj requires Ci /αj processor time at every        the sum of utilizations of task τi and dual task τi∗ is one and the
Ti interval. The relative deadline Di is equal to its period Ti (i.e.,    utilization of dual task τi∗ is Ui∗ = Ci∗ /Ti . The DUAL operation
implicit-deadline). All tasks must complete the execution by their        reduces the number of processors whenever N − M < M .
deadlines. The utilization of each task is defined
                                                1
                                                   P as Ui = Ci /Ti          The PACK operation packs dual servers into packed servers whose
and the system utilization is defined as U = M       i Ui . We assume
                                                                          utilizations do not exceed one. When N − M ≥ M , the number
that all tasks may be preempted and migrated among processors at          of servers can be reduced by aggregating them into fewer servers
any time, and are independent (i.e., they do not share resources and      by the PACK operation. The scheme how to pack servers to fewer
do not have any precedence).                                              servers is heuristic and the PACK operation is similar to the parti-
                                                                          tioning scheme. Note that if assigning tasks to processors is suc-
3.    THE RUNT ALGORITHM                                                  cessful, RUN generates the same schedule as Partitioned EDF (P-
   We introduce RUNT [4], which is an optimal multiprocessor              EDF) and does not perform the DUAL and PACK operations. Oth-
real-time scheduling algorithm based on RUN with VFS. RUNT                erwise the DUAL and PACK operations are performed to generate
supports RT-SVFS/RT-DVFS techniques on uniform/independent                the reduction tree offline, which is used to make server scheduling
VFS multiprocessor systems, called Static Uniform RUNT (SU-               decisions online. The detail of making scheduling decisions in the
                                                                          reduction tree is shown in the next subsection.
RUNT), Static Independent RUNT (SI-RUNT), Dynamic Uniform
                                                                             In order to explain the reduction tree, we define the following
RUNT (DU-RUNT), and Dynamic Independent RUNT (DI-RUNT),
respectively. Due to the space limitation, the detail of these algo-      terms with respect to servers as follows. A unit server is a server
rithms are explained in [4]. When the actual case execution time          whose utilization is one. A null server is a server whose utilization
(ACET) of each task is often shorter than its WCET [6], RUNT uses         is zero. A root server is a last packed server whose utilization is
Enhanced Cycle-Conserving EDF (ECC-EDF) [9] to reclaim slack              one (unit server).
                                                                             Packing the dual servers of packed servers can reduce the num-
for reducing energy consumption. RUNT achieves a small number
                                                                          ber of servers by at least (almost) half. We perform DUAL and
of preemptions/migrations compared to RUN because these opera-
tions are performed when every scheduling event occurs. If a task         PACK operations repeatedly until all packed servers become unit
set does not satisfy the full system utilization, idle tasks are in-      servers. Now we define a REDUCE operation to be their composi-
serted because RUN assumes the full system utilization. An idle           tion.
task assignment policy is an important factor to reduce energy con-          D EFINITION 1 (F ROM D EFINITION IV.6. IN [13]). Given a
sumption. Therefore, the idle ratio-fit was proposed in our previous      set of servers Γ and a packing π of Γ , a REDUCE operation on a
work. We explain the overview of the RUN algorithm, the ECC-              server S in Γ, denoted by ψ(S), is the composition of the DUAL op-
EDF algorithm, and the idle ratio-fit as follows.                         eration ϕ with the PACK operation σ for π (i.e., ψ(S) = ϕ(σ(S))).
3.1 The RUN Algorithm                                                     In addition, we define reduction level/sequence to explain the re-
   RUN [13] is an optimal multiprocessor real-time scheduling al-         duction tree as follows.
gorithm with a small number of preemptions/migrations. We ex-
plain the detail of RUN in offline and online phases.                         D EFINITION 2 (F ROM D EFINITION IV.7 IN [13]). Let i ≥ 1
   Now we introduce the RUN’s specific model because RUN has              be an integer, Γ be a set of servers, and S be a server in Γ. The
many original parameters and assumptions to explain itself. A sys-        operator ψ i is recursively defined by ψ 0 (S) = S and ψ i (S) =
tem is fully utilized if the system utilization U is one. Since RUN       ψ ◦ ψ i−1 (S). {ψ i }i is a reduction sequence, and the server system
assumes the full system utilization, idle tasks are inserted to fill in   ψ i (Γ) is said to be at reduction level i.
σ(ψ2(Γ))                                            S14(1),{5N*,10N*,15N*}                                                                   Reduction Level 2
          σ                                                                                                       VP2,1 S11 S13                 S12      S11 S13
   ψ2(Γ)              S11(0.2),{5N*,10N*}            S12(0.2),{10N*,15N*}                   S13(0.6),{5N*}              0     1     2    3     4        5    6        7    8   9    10
                                                                                                                                             Reduction Level 1
          φ
                                                                                                                  VP1,1 S9                      S10 S10       S6               S7
σ(ψ (Γ))
     1
                      σ({S6,S7})                     σ({S8,S9})                             σ({S10})
          σ                                                                                                      VP1,2 S10 S6             S7                 S8
   ψ (Γ) S
     1
                 6
                  (0.4),{5N*}
                                S   7
                                     (0.4),{10N*}
                                                    S   8
                                                         (0.4),{15N*}
                                                                        S9   (0.4),{10N*}
                                                                                            S   10
                                                                                                  (0.4),{5N*}
                                                                                                                        0     1     2    3     4        5    6        7    8   9    10
          φ                                                                                                                                  Reduction Level 0
σ(ψ0(Γ)) σ({S1}) σ({S2}) σ({S3}) σ({S4}) σ({S5})                                                                 VP0,1 S1 S5                    S4
          σ                                                                                                      VP0,2 S2                 S1            S1            S5
         ψ0     S1(0.6),{5N*} S2(0.6),{10N*} S3(0.6),{15N*} S4(0.6),{10N*} S5(0.6),{5N*}                         VP0,3 S3                                        S2            S1

                  τ 1
                     (2,5)
                                τ   2
                                     (4,10)
                                                        τ  3
                                                               (6,15)
                                                                             τ 4
                                                                                (4,10)
                                                                                             τ   5
                                                                                                  (2,5)                 0     1     2    3     4        5    6        7    8   9    10
                                                                                                                                             Processor
              Figure 1: Reduction tree on three processors                                                           P1 τ1     τ5                  τ4
                                                                                                                     P2 τ2                τ1            τ1            τ5
Note that assigning tasks to servers is defined as reduction level 0                                                 P3 τ3                                       τ2
that does not perform the REDUCE operation.
                                                                     (C ,T )
   Figure 1 shows the reduction tree on three processors. τi i i                                                        0     1     2    3     4        5    6        7    8   9    10
expresses that task τi has WCET Ci and period Ti . Tasks τ1 , τ2 ,
τ3 , τ4 , and τ5 are assigned to servers S1 , S2 , S3 , S4 , and S5 at                                          Figure 2: An example of RUN scheduling on three processors
reduction level P   0, respectively. The total utilization of idle tasks is
Uidle = M − i Ui = 3 − 5 × 0.4 = 1. In this example, idle
tasks are assigned to servers at reduction level 0 uniformly, i.e.,                                                RULE 4 (F ROM RULE IV.3 IN [13]). Execute (circle) the child
the utilization of each server is added to Uidle /N = 1/5 = 0.2,                                                (packed server) of a dual server if and only if the dual server is not
respectively.                                                                                                   running (not circled).
                                   (U srv ),{ Dk }
   We represent a server as Sk k                   , where Uksrv is the uti-
lization of server Sk and Dk is the deadline set of server Sk . The                                             In the reduction tree, a thick arrow represents a scheduled server
deadline set includes all (absolute) deadlines of tasks in the server.                                          and a thin arrow represents a non-scheduled server by each par-
Each server sets the earliest deadline in each deadline set when the                                            ent server. If a thick arrow from a server points a task, the server
server is released. We assign deadline sets 5N ∗ , 10N ∗ , 15N ∗ ,                                              schedules the task.
10N ∗ , and 5N ∗ to servers at reduction level 0, respectively, where                                              In Figure 1, root server S14 is always running, regardless of these
N ∗ means natural numbers. Servers S6 , S7 , S8 , S9 , and S10 are                                              rules, because a root server is always a unit server. Next, S14 makes
generated by the DUAL operation at reduction level 1 and their uti-                                             scheduling decisions in EDF order and server S12 is running at this
lizations are 0.4 because these servers are dual servers of servers                                             time. Since server S12 is running, S8 and S9 are not running by
S1 , S2 , S3 , S4 , and S5 at reduction level 0, respectively. In this ex-                                      Rule 3. Since servers S11 and S13 are not running, servers S7 and
ample, servers S6 and S7 are packed, servers S8 and S9 are packed,                                              S10 are running by Rule 4. Servers S6 , S8 , and S9 are not running,
and server S10 is not packed by the PACK operation. Servers S11 ,                                               and hence servers S1 , S3 , and S4 are running by Rule 4.
S12 , and S13 are generated by the DUAL operation at reduction                                                     Figure 2 shows an example of RUN scheduling on three proces-
level 2. Finally, server S14 is generated by the PACK operation at                                              sors. Each server is executed on virtual processor V PR,v , where R
reduction level 2 and its utilization is one, and hence the REDUCE                                              represents the reduction level and v represents the virtual processor
operation is finished and the reduction tree is completely generated.                                           ID at each reduction level. The task set is shown in Figure 1 and
   Note that the number of root servers may become more than one                                                this example shows the scheduling decisions at time 4. This system
because when all servers are unit servers at the highest reduction                                              has three processors P1 , P2 , and P3 , reduction level 0 has three
level, then the REDUCE operation is finished. If one server is a                                                virtual processors V P0,1 , V P0,2 , and V P0,3 , reduction level 1 has
unit server, its dual server is a null server, which is packed into                                             two virtual processors V P1,1 and V P1,2 , and reduction level 2 has
another server when the next PACK operation is performed.                                                       one virtual processor V P2,1 .
                                                                                                                   RUN uses the following task-to-processor assignment scheme;
3.1.2 Online Phase                                                                                              (i) leave executing tasks on their current processors, (ii) assign idle
  In an online phase, RUN schedules servers by the following rules                                              tasks to their last-used processor, when available, to avoid unneces-
and we use Figure 1 for reference.                                                                              sary migrations, and (iii) assign remaining tasks to free processors
                                                                                                                arbitrarily. By this scheme, each server assigns tasks to processors
   RULE 3 (F ROM RULE IV.2 IN [13]). If a packed server is run-                                                 P1 , P2 , or P3 in Figure 2. When each task completes its execu-
ning (circled), execute the child node with the earliest deadline                                               tion on one processor, the processor becomes idle until the server
among those children with work remaining; if a packed server is                                                 of each task exhausts its budget. For example, server S5 running on
not running (not circled), execute none of its children.                                                        V P0,1 completes task τ5 on processor P1 at time 3 and P1 becomes
                                                                                                                idle (executes idle task) in time interval [3,4).
3.2 The ECC-EDF Algorithm                                                   The effectiveness of RUNT is in terms of energy ratio, which is
   ECC-EDF [9] is an RT-DVFS technique on uniprocessors and               defined as follows.
ensured that any implicit-deadline periodic task set T with utiliza-                                     Z P            3
                                                                                                        1 T      Pk ∈Π fi
tion U ≤ 1 is successfully scheduled. In addition, ECC-EDF out-                          Energy Ratio =                   dt
performs CC-EDF theoretically and Look-Ahead EDF [12] exper-                                            T 0        M
imentally with respect to energy consumption. Therefore, ECC-
EDF is used in RUNT to achieve optimal multiprocessor real-time           4.2 Simulation Results
scheduling with RT-DVFS as well as EDF is used in RUN. To im-
prove CC-EDF, ECC-EDF takes the elapsed time of tasks into con-           4.2.1 Task Assignment Policy
sideration and finds the maximum utilization saved by the slack              First, we evaluate the task assignment policy to examine which
on completion of the task by calculating the minimum utilization          heuristic is the most energy-efficient in RUNT. We use the idle
needed to process the slack by its deadline using following Equa-         ratio-fit as the idle task assignment policy.
tion 1.                                                                      Figure 3 shows the simulation results of task assignment pol-
                               Ci − cci                                   icy in SI-RUNT and DI-RUNT. In SI-RUNT, the worst-fit reduces
                       Uis =            ,                          (1)    energy consumption the most when U ≤ 0.6 but other heuristics
                               Ti − E i
                                                                          outperform the worst-fit when U > 0.6 (Figures 3(a), 3(b), and
where cci is the ACET of task τi and Ei is the elapsed time of task       3(c)). Since SI-RUNT selects the maximum frequency among all
τi .                                                                      frequencies of servers assigned to the processor, it is necessary to
                                                                          decrease them as much as possible. A simple way to realize this is
3.3 Idle Ratio-Fit                                                        to insert idle tasks to a server, and hence the utilization of the server
   The idle ratio-fit [4] assigns idle tasks to servers uniformly for     can be 100%. Servers with 100% utilization can use the processors
reducing energy consumption in an offline phase when the system           exclusively and if the server has slack generated by inserting idle
is not fully utilized. The idle ratio-fit is inspired by optimality on    tasks, we can decrease the frequency of the processor. Even if the
energy-efficiency of the worst-fit. Aydin and Yang proved that a          original utilization of the server which uses the processor exclu-
task assignment that evenly divides the total utilization among all       sively and inserting idle tasks may not be effective, isolating the
the processors, if it exists, will minimize the total energy consump-     server increases the potential of decreasing the frequency assigned
tion, and also showed that the worst-fit task assignment heuris-          to other processors. This idea is similar to T-N Plane Transfor-
tic outperforms other heuristics in energy-efficiency including the       mation (TNPT) [7, 8] that classifies tasks into two classes: heavy
first-, next-, and best-fit [2]. We define the idle ratio of Sk denoted   tasks and light tasks, and a heavy task uses a processor exclusively.
by IdleRatio(Sk ), which is the ratio of the utilization of idle tasks    In contrast, the first- and best-fit tend to increase the utilization of
assigned to server Sk to the utilization of Sk , as follows.              each server up to 100%, and hence inserting idle tasks with low
                                                                          utilization can make the utilization of each server be 100%. For
                                       Ukidle                             this reason, when the utilization of the task set becomes large and
                  IdleRatio(Sk ) =            ,                    (2)    few servers can decrease their frequency, the first- and best-fit can
                                       Uksrv
                                                                          outperform the worst-fit.
where Uksrv is the utilization of each server Sk and Ukidle is the           In DI-RUNT, the worst-fit reduces energy consumption the most
utilization of idle tasks assigned to server Sk . The idle ratio-fit      (Figures 3(d), 3(e), and 3(f)). The worst-fit tends to uniform the
lowers the idle ratio of each server on average to reduce energy          utilization of each server, which results in well-balanced load of
consumption. Due to the space limitation, the detail of the idle          each processor that can decrease the frequency effectively. When
ratio-fit is shown in [4].                                                the level of frequency becomes more fine-grained, that is to say,
                                                                          the number of selectable frequency values becomes large, the ac-
                                                                          tual frequency approaches theoretically optimal value. Therefore,
4.    SIMULATION STUDIES                                                  energy consumption in the frequency set F3 is the lowest in all fre-
                                                                          quency sets.
4.1 Simulation Setups
   In this section, we evaluate RUNT through simulation studies.          4.2.2 Idle Task Assignment Policy
This system has 16 processors (M = 16) on static/dynamic and                 Next we measure the effectiveness of the idle task assignment
uniform/independent VFS systems. We use three frequency sets as           policy, called idle ratio-fit, compared to other idle task assignment
follows: F1 = { 0.5, 0.75, 1.0 } , F2 = { 0.5, 0.75, 0.83, 1.0 } , F3 =   policies. We use the worst-fit as the task assignment policy because
{ 0.36, 0.55, 0.64, 0.73, 0.82, 0.91, 1.0 }. When a processor goes        the worst-fit outperforms other heuristics in many cases as shown
to idle, the processor sets its frequency to the minimum one. For         in Section 4.2.1.
example, when a processor using F1 goes to idle, set the operating           Figure 4 shows the simulation results of the idle task assignment
frequency to 0.5. Due to space limitations, we only show the results      policy with the worst-fit in SI-RUNT and DI-RUNT. In SI-RUNT,
of independent VFS techniques.                                            the first-, best-, and worst-fit reduce more energy consumption than
   This simulation uses 1, 000 task sets in each system utilization.      the idle ratio-fit (Figures 4(a), 4(b), and 4(c)). This is the similar
The system utilization U is selected from [0.3, 0.35, 0.4, ..., 1.0].     reason to the task assignment policy discussed in Section 4.2.1. RT-
Each Ui is selected within [0.01, 0.02, 0.03, ..., 1.0]. The period Ti    SVFS needs to decrease the frequency of all servers. In DI-RUNT,
of each task τi is selected within [100, 200, 300, ..., 1600]. Tasks      on the other hand, the idle ratio-fit achieves the best results (Figures
in each task set are ordered by decreasing utilization. The ratio of      4(d), 4(e), and 4(f)) because the idle ratio-fit is based on the idea of
ACET to WCET is set to the range of [0.5, 1.0] or [0.75, 1.0], or         the worst-fit and assigns idle tasks to servers uniformly. Especially,
always 1.0, represented as DI-RUNT(50%), DI-RUNT(75%), and                the idle ratio-fit is superior to other idle task assignment policies in
DI-RUNT(100%), respectively. The simulation length is the hyper-          a coarse-grained frequency set such as F1 or F2 and can save more
period of each task set.                                                  energy as shown in Figures 4(d) and 4(e).
                      1               First-Fit                                              1               First-Fit                                         1                First-Fit
                                      Next-Fit                                                               Next-Fit                                                           Next-Fit
                                      Best-Fit                                                               Best-Fit                                                           Best-Fit
                    0.8               Worst-Fit                                            0.8               Worst-Fit                                        0.8               Worst-Fit
     Energy Ratio




                                                                            Energy Ratio




                                                                                                                                               Energy Ratio
                    0.6                                                                    0.6                                                                0.6

                    0.4                                                                    0.4                                                                0.4

                    0.2                                                                    0.2                                                                0.2

                      0                                                                      0                                                                 0
                          0.3   0.4   0.5   0.6    0.7   0.8   0.9   1                           0.3   0.4   0.5   0.6   0.7   0.8   0.9   1                        0.3   0.4   0.5   0.6   0.7   0.8   0.9   1
                                       System Utilization                                                     System Utilization                                                 System Utilization
                    (a) SI-RUNT in Frequency Set F1                                        (b) SI-RUNT in Frequency Set F2                                    (c) SI-RUNT in Frequency Set F3

                      1               First-Fit                                              1               First-Fit                                         1                First-Fit
                                      Next-Fit                                                               Next-Fit                                                           Next-Fit
                                      Best-Fit                                                               Best-Fit                                                           Best-Fit
                    0.8               Worst-Fit                                            0.8               Worst-Fit                                        0.8               Worst-Fit
     Energy Ratio




                                                                            Energy Ratio




                                                                                                                                               Energy Ratio
                    0.6                                                                    0.6                                                                0.6

                    0.4                                                                    0.4                                                                0.4

                    0.2                                                                    0.2                                                                0.2

                      0                                                                      0                                                                 0
                          0.3   0.4   0.5   0.6    0.7   0.8   0.9   1                           0.3   0.4   0.5   0.6   0.7   0.8   0.9   1                        0.3   0.4   0.5   0.6   0.7   0.8   0.9   1
                                       System Utilization                                                     System Utilization                                                 System Utilization
                    (d) DI-RUNT in Frequency Set F1                                        (e) DI-RUNT in Frequency Set F2                                    (f) DI-RUNT in Frequency Set F3

                                                                     Figure 3: Task Assignment Policy with Idle Ratio-Fit


                                                                                                                         energy consumption the most in all idle task assignment policies in
                     Table 1: Task and idle task assignment policies                                                     RT-DVFS.
        Algorithm                Frequency Set       Task Assignment     Idle Task Assignment
        SI-RUNT                               F1            Worst-Fit            Idle Worst-Fit
                                                                                                                            In future work, we will compare RUNT to other RT-SVFS/RT-
        SI-RUNT                               F2            Worst-Fit              Idle First-Fit                        DVFS techniques such as TNPT [7, 8] with respect to the energy
        SI-RUNT                               F3            Worst-Fit              Idle Best-Fit                         consumption and the number of preemptions/migrations. We will
        DI-RUNT                 F1 , F2 , and F3            Worst-Fit             Idle Ratio-Fit                         implement RUNT to evaluate energy consumption and overhead in
                                                                                                                         the RT-Est real-time operating system [5]. We will integrate the
                                                                                                                         static/dynamic power management techniques such as [10] with
4.2.3 Comparison of DI-RUNT with SI-RUNT                                                                                 RUNT to reduce energy consumption effectively.
   DI-RUNT is compared to SI-RUNT with respect to energy ratio.
Table 1 shows task and idle task assignment policies in each algo-                                                       Acknowledgement
rithm. We choose these policies from the results of lowest energy
ratio in Sections 4.2.1 and 4.2.2.                                                                                       This research was supported in part by Keio Gijuku Academic De-
   Figure 5 shows the simulation results of SI-RUNT and DI-RUNT.                                                         velopment Funds, Keio Kougakukai, and CREST, JST.
The smaller the ratio of ACET to WCET is, the more DI-RUNT can
reduce energy consumption because small ACET/WCET means a                                                                6. REFERENCES
large amount of dynamic slack and the ratio of slack in the utiliza-                                                      [1] B. Andersson and J. Jonsson. The Utilization Bounds of
tion of each server is increased, which results in decreasing the fre-                                                        Partitioned and Pfair Static-Priority Scheduling on
quency. Note that even if the ACET of each task is always equal to                                                            Multiprocessors are 50%. In Proceedings of the 15th
its WCET, DI-RUNT outperforms SI-RUNT except for U = 1. SI-                                                                   Euromicro Conference on Real-Time Systems, pages 33–40,
RUNT determines the frequency of each processor before starting                                                               July 2003.
the system and cannot change the frequency online. On the other                                                           [2] H. Aydin and Q. Yang. Energy-Aware Partitioning for
hand, DI-RUNT can make use of dynamic slack, which is produced                                                                Multiprocessor Real-Time Systems. In Proceedings of the
in the early completion of tasks, to decrease the frequency. In the                                                           17th International Parallel and Distributed Processing
case of U = 1, the system has no idle time, and hence SI-RUNT                                                                 Symposium, pages 113–121, Apr. 2003.
consumes the same energy as DI-RUNT(100%).                                                                                [3] T. D. Burd and R. W. Brodersen. Energy Efficient CMOS
                                                                                                                              Microprocessor Design. In Proceedings of the 28th Annual
5.                  CONCLUSION                                                                                                Hawaii International Conference on System Sciences, pages
   This paper evaluated the performance of RUNT, which is an                                                                  288–297, Jan. 1995.
optimal multiprocessor real-time scheduling algorithm based on                                                            [4] H. Chishiro, M. Takasu, R. Ueda, and N. Yamasaki. Optimal
RUN with RT-VFS. Simulation results show that RUNT can reduce                                                                 Multiprocessor Real-Time Scheduling based on RUN with
energy consumption with the worst-fit task assignment heuristic,                                                              Voltage and Frequency Scaling. In Proceedings of the 18th
compared to other task assignment heuristics, in many cases. Inter-                                                           IEEE International Symposium on
estingly, the idle ratio-fit does not reduce energy consumption com-                                                          Object/Component/Service-Oriented Real-Time Distributed
pared to traditional assignment heuristics in RT-SVFS but reduces                                                             Computing, pages 284–287, Apr. 2015.
                  1                Idle First-Fit                                       1                 Idle First-Fit                                    1                 Idle First-Fit
                                   Idle Best-Fit                                                          Idle Best-Fit                                                       Idle Best-Fit
                                   Idle Worst-Fit                                                         Idle Worst-Fit                                                      Idle Worst-Fit
                0.8                Idle Ratio-Fit                                      0.8                Idle Ratio-Fit                                   0.8                Idle Ratio-Fit
 Energy Ratio




                                                                       Energy Ratio




                                                                                                                                            Energy Ratio
                0.6                                                                    0.6                                                                 0.6

                0.4                                                                    0.4                                                                 0.4

                0.2                                                                    0.2                                                                 0.2

                  0                                                                     0                                                                   0
                      0.3    0.4   0.5   0.6   0.7   0.8   0.9   1                           0.3    0.4   0.5   0.6   0.7   0.8   0.9   1                        0.3    0.4   0.5   0.6   0.7   0.8   0.9   1
                                    System Utilization                                                     System Utilization                                                  System Utilization
                (a) SI-RUNT in Frequency Set F1                                       (b) SI-RUNT in Frequency Set F2                                      (c) SI-RUNT in Frequency Set F3

                  1                Idle First-Fit                                       1                 Idle First-Fit                                    1                 Idle First-Fit
                                   Idle Best-Fit                                                          Idle Best-Fit                                                       Idle Best-Fit
                                   Idle Worst-Fit                                                         Idle Worst-Fit                                                      Idle Worst-Fit
                0.8                Idle Ratio-Fit                                      0.8                Idle Ratio-Fit                                   0.8                Idle Ratio-Fit
 Energy Ratio




                                                                       Energy Ratio




                                                                                                                                            Energy Ratio
                0.6                                                                    0.6                                                                 0.6

                0.4                                                                    0.4                                                                 0.4

                0.2                                                                    0.2                                                                 0.2

                  0                                                                     0                                                                   0
                      0.3    0.4   0.5   0.6   0.7   0.8   0.9   1                           0.3    0.4   0.5   0.6   0.7   0.8   0.9   1                        0.3    0.4   0.5   0.6   0.7   0.8   0.9   1
                                    System Utilization                                                     System Utilization                                                  System Utilization
                (d) DI-RUNT in Frequency Set F1                                       (e) DI-RUNT in Frequency Set F2                                      (f) DI-RUNT in Frequency Set F3

                                                                 Figure 4: Idle Task Assignment Policy with Worst-Fit


                  1                SI-RUNT                                              1                 SI-RUNT                                           1                 SI-RUNT
                                   DI-RUNT(100%)                                                          DI-RUNT(100%)                                                       DI-RUNT(100%)
                                   DI-RUNT(75%)                                                           DI-RUNT(75%)                                                        DI-RUNT(75%)
                0.8                DI-RUNT(50%)                                        0.8                DI-RUNT(50%)                                     0.8                DI-RUNT(50%)
 Energy Ratio




                                                                       Energy Ratio




                                                                                                                                            Energy Ratio
                0.6                                                                    0.6                                                                 0.6

                0.4                                                                    0.4                                                                 0.4

                0.2                                                                    0.2                                                                 0.2

                  0                                                                     0                                                                   0
                      0.3    0.4   0.5 0.6 0.7 0.8         0.9   1                           0.3    0.4   0.5 0.6 0.7 0.8         0.9   1                        0.3    0.4   0.5 0.6 0.7 0.8         0.9   1
                                    System Utilization                                                     System Utilization                                                  System Utilization
                            (a) Frequency Set F1                                                   (b) Frequency Set F2                                                (c) Frequency Set F3

                                                                                      Figure 5: SI-RUNT and DI-RUNT


[5] H. Chishiro and N. Yamasaki. RT-Est: Real-Time Operating                                                           [9] M.-S. Lee and C.-H. Lee. Enhanced Cycle-Conserving
    System for Semi-Fixed-Priority Scheduling Algorithms. In                                                               Dynamic Voltage Scaling for Low-Power Real-Time
    Proceedings of the 2011 International Symposium on                                                                     Operating Systems. IEICE Transactions on Information and
    Embedded and Pervasive Systems, pages 358–365, Oct.                                                                    Systems, 97-D(3):480–487, Mar. 2014.
    2011.                                                                                                             [10] V. Legout, M. Jan, and L. Pautet. Scheduling algorithms to
[6] R. Ernst and W. Ye. Embedded Program Timing Analysis                                                                   reduce the static energy consumption of real-time systems.
    Based on Path Clustering and Architecture Classification. In                                                           Real-Time Systems, 51(2):153–191, Mar. 2015.
    Proceedings of International Conference on Computer-Aided                                                         [11] C. L. Liu and J. W. Layland. Scheduling Algorithms for
    Design, pages 598–604, Nov. 1997.                                                                                      Multiprogramming in a Hard Real-Time Environment.
[7] K. Funaoka, S. Kato, and N. Yamasaki. Energy-Efficient                                                                 Journal of the ACM, 20(1):46–61, Jan. 1973.
    Optimal Real-Time Scheduling on Multiprocessors. In                                                               [12] P. Pillai and K. G. Shin. Real-Time Dynamic Voltage Scaling
    Proceedings of the 11th IEEE International Symposium on                                                                for Low-Power Embedded Operating Systems. In
    Object/Component/Service-Oriented Real-Time Distributed                                                                Proceedings of the Eighteenth ACM Symposium on
    Computing, pages 23–30, May 2008.                                                                                      Operating Systems Principles, pages 89–102, Dec. 2001.
[8] K. Funaoka, A. Takeda, S. Kato, and N. Yamasaki. Dynamic                                                          [13] P. Regnier, G. Lima, E. Massa, G. Levin, and S. Brandt.
    Voltage and Frequency Scaling for Optimal Real-Time                                                                    RUN: Optimal Multiprocessor Real-Time Scheduling via
    Scheduling on Multiprocessors. In Proceedings of the 3rd                                                               Reduction to Uniprocessor. In Proceedings of the 32th IEEE
    IEEE International Symposium on Industrial Embedded                                                                    Real-Time Systems Symposium, pages 104–115, Nov. 2011.
    Systems, pages 27–33, June 2008.