Performance Evaluation of RUNT Algorithm Hiroyuki Chishiro, Masayoshi Takasu, Rikuhei Ueda, and Nobuyuki Yamasaki Department of Information and Computer Science, Keio University, Yokohama, Japan {chishiro,takasu,riku,yamasaki}@ny.ics.keio.ac.jp ABSTRACT Since the small number of preemptions/migrations improves the This paper evaluates the performance of Reduction to Uniproces- practicality of the scheduling, we focus on RUN. sor Transformation (RUNT) with Voltage and Frequency Scaling, RUN transforms the multiprocessor scheduling problem into an called Static RUNT (S-RUNT) and Dynamic RUNT (D-RUNT), equivalent set of uniprocessor scheduling problems by the DUAL respectively. Simulation results show that how to assign tasks to and PACK operations and the detail of them is described in Section servers in RUNT influences energy consumption and the worst-fit 3. After transforming offline, RUN uses Earliest Deadline First heuristic is the best in many cases. In addition, the idle task as- (EDF) [11] to transform the uniprocessor scheduling into the mul- signment policy saves more energy consumption in D-RUNT and tiprocessor scheduling online because EDF is optimal in implicit- D-RUNT outperforms S-RUNT if the actual case execution time of deadline periodic task sets on uniprocessors. Using these opera- each task is shorter than its worst case execution time. tions, RUN achieves the optimality with low overhead. Voltage and Frequency Scaling (VFS) is one of the most popu- lar techniques to reduce energy consumption in computer systems. Categories and Subject Descriptors Especially, Real-Time Voltage and Frequency Scaling (RT-VFS) C.3 [SPECIAL-PURPOSE AND APPLICATION-BASED SYS- can reduce energy consumption by scaling the operating frequency TEMS]: Real-time and embedded systems and the supply voltage while meeting real-time constraints. RT- VFS is based on the essential characteristic of real-time tasks. All tasks can be executed slowly as long as their deadlines are met. General Terms In most of the CMOS-based modern processors, the dynamic en- Algorithms ergy consumption E is proportional to the operating frequency f and the square of the supply voltage V (i.e., E ∝ f V 2 ) [3], and Keywords the maximum operating frequency depends on the supply voltage. Therefore, RT-VFS can effectively reduce energy consumption at Optimal Multiprocessor Real-Time Scheduling, Multiprocessor Sys- a cubic order of the operating frequency. RT-VFS has two follow- tems, Real-Time Systems, RUN Algorithm ing techniques: Real-Time Static Voltage and Frequency Scaling (RT-SVFS) and Real-Time Dynamic Voltage and Frequency Scal- 1. INTRODUCTION ing (RT-DVFS). RT-SVFS determines the voltage and frequency Real-time systems have required multiprocessors and there are offline and does not adjust them after the system starts. RT-DVFS mainly two categories of multiprocessor real-time scheduling: par- can reduce energy consumption by adjusting the voltage and fre- titioned scheduling and global scheduling. Partitioned scheduling quency online, which potentially saves more energy consumption. assigns tasks to processors offline and there are no migratory tasks Changing the voltage and frequency takes some time due to I/O but it guarantees only 50% processor utilization in the worst case operations. If the overhead of RT-DVFS is small, it is possible [1]. In contrast, global scheduling can achieve 100% utilization to ignore overhead or incorporate it into execution time of tasks. by migrating tasks among processors online but increases run-time However, RT-DVFS may incur significant overhead in some sys- overhead. We are interested in optimal multiprocessor real-time tems and RT-SVFS is a good solution for these systems. There is a scheduling algorithms, which can achieve 100% utilization with trade-off between energy consumption and overhead in RT-VFS. any implicit-deadline periodic task sets. Several optimal multi- In our previous work, Reduction to UNiprocessor Transforma- processor real-time scheduling algorithms have been proposed and tion (RUNT) [4] was proposed to achieve optimal multiprocessor Reduction to UNiprocessor (RUN) [13] outperforms other optimal real-time scheduling algorithm based on RUN with Voltage and algorithms with respect to the number of preemptions/migrations. Frequency Scaling. Unfortunately, the performance of RUNT is not evaluated. This paper evaluates the performance of RUNT with RT-SVFS/RT- DVFS, called Static RUNT (S-RUNT) and Dynamic RUNT (D- RUNT), respectively. Simulation results show that the saved energy consumption strongly depends on the way to assign tasks to servers in RUNT and the worst-fit heuristic is the most energy-efficient in many cases. In addition, the idle task assignment policy saves more energy consumption in D-RUNT and D-RUNT outperforms EWiLi ’15, October 8th, 2015, Amsterdam, The Netherlands. S-RUNT if tasks are completed early. Copyright retained by the authors. 2. SYSTEM MODEL the slack if U < P 1. The total utilization of idle tasks is defined as Uidle = M − i Ui . Note that each idle task has just the pa- 2.1 Processor Model rameter of utilization and does not have other parameters including The system has M processors Π = { P1 , P2 , ..., PM }. Each WCET and period. processor Pj is characterized by the continuous normalized fre- RUN transforms the multiprocessor scheduling to the uniproces- quency αj (0 ≤ αj ≤ 1). Here, we discuss the differences be- sor scheduling by aggregating tasks into servers S. We treat servers tween the system model and practical environments. (i) In prac- as tasks with a sequence of jobs but they are not actual tasks in the tical environments, the system has the discrete frequency values system; each server is a proxy for a collection of client tasks. When F = { f1 , ..., fL | fmin = f1 < ... < fL = fmax } and the dis- a server is running, the processor time is used by one of its clients. crete voltage values V = { V1 , ..., VL | V1 < ... < VL }. We as- Clients of a server are scheduled via an internal scheduling P mech- sume that Vk corresponds to fk and the voltage is also changed anism. The utilization of each server Sk is Uksrv = τi ∈Sk Ui , at the same time as the corresponding frequency is changed. The where τi ∈ Sk means that task τi is first assigned to server Sk , and lowest frequency fi ∈ F such that αj ≤ fi /fmax will be selected Uksrv does not exceed one. to achieve the lowest energy consumption while meeting real-time constraints. (ii) The system model assumes that no overhead oc- 3.1.1 Offline Phase curs at run-time. In practical environments, the scaled frequency In an offline phase, RUN reduces the multiprocessor scheduling interferes with the scheduling even if the frequency is not changed to the uniprocessor scheduling by the DUAL and PACK operations. dynamically. The worst case overhead is included in the worst case RUN uses EDF for uniprocessor scheduling because EDF is opti- execution time (WCET). mal in implicit-deadline periodic task sets on uniprocessors. The DUAL operation transforms a task τi into the dual task τi∗ , 2.2 Task Model whose execution time represents the idle time of τi (i.e., Ci∗ = Ti − The system has a task set T = { τ1 , τ2 , ..., τN }, which is a set Ci ). The relative deadline of dual task τi∗ is equal to that of task τi . of N periodic tasks on M processors. Each task cannot be executed The dual task τi∗ is executed exactly when the original task τi is idle in parallel among processors. Each task τi has its WCET Ci and and vice versa. A schedule for the original task set is obtained by period Ti . The j th instance of task τi is called job τi,j . Task τi blocking τi whenever τi∗ executes in the dual schedule. In addition, executed on a processor Pj requires Ci /αj processor time at every the sum of utilizations of task τi and dual task τi∗ is one and the Ti interval. The relative deadline Di is equal to its period Ti (i.e., utilization of dual task τi∗ is Ui∗ = Ci∗ /Ti . The DUAL operation implicit-deadline). All tasks must complete the execution by their reduces the number of processors whenever N − M < M . deadlines. The utilization of each task is defined 1 P as Ui = Ci /Ti The PACK operation packs dual servers into packed servers whose and the system utilization is defined as U = M i Ui . We assume utilizations do not exceed one. When N − M ≥ M , the number that all tasks may be preempted and migrated among processors at of servers can be reduced by aggregating them into fewer servers any time, and are independent (i.e., they do not share resources and by the PACK operation. The scheme how to pack servers to fewer do not have any precedence). servers is heuristic and the PACK operation is similar to the parti- tioning scheme. Note that if assigning tasks to processors is suc- 3. THE RUNT ALGORITHM cessful, RUN generates the same schedule as Partitioned EDF (P- We introduce RUNT [4], which is an optimal multiprocessor EDF) and does not perform the DUAL and PACK operations. Oth- real-time scheduling algorithm based on RUN with VFS. RUNT erwise the DUAL and PACK operations are performed to generate supports RT-SVFS/RT-DVFS techniques on uniform/independent the reduction tree offline, which is used to make server scheduling VFS multiprocessor systems, called Static Uniform RUNT (SU- decisions online. The detail of making scheduling decisions in the reduction tree is shown in the next subsection. RUNT), Static Independent RUNT (SI-RUNT), Dynamic Uniform In order to explain the reduction tree, we define the following RUNT (DU-RUNT), and Dynamic Independent RUNT (DI-RUNT), respectively. Due to the space limitation, the detail of these algo- terms with respect to servers as follows. A unit server is a server rithms are explained in [4]. When the actual case execution time whose utilization is one. A null server is a server whose utilization (ACET) of each task is often shorter than its WCET [6], RUNT uses is zero. A root server is a last packed server whose utilization is Enhanced Cycle-Conserving EDF (ECC-EDF) [9] to reclaim slack one (unit server). Packing the dual servers of packed servers can reduce the num- for reducing energy consumption. RUNT achieves a small number ber of servers by at least (almost) half. We perform DUAL and of preemptions/migrations compared to RUN because these opera- tions are performed when every scheduling event occurs. If a task PACK operations repeatedly until all packed servers become unit set does not satisfy the full system utilization, idle tasks are in- servers. Now we define a REDUCE operation to be their composi- serted because RUN assumes the full system utilization. An idle tion. task assignment policy is an important factor to reduce energy con- D EFINITION 1 (F ROM D EFINITION IV.6. IN [13]). Given a sumption. Therefore, the idle ratio-fit was proposed in our previous set of servers Γ and a packing π of Γ , a REDUCE operation on a work. We explain the overview of the RUN algorithm, the ECC- server S in Γ, denoted by ψ(S), is the composition of the DUAL op- EDF algorithm, and the idle ratio-fit as follows. eration ϕ with the PACK operation σ for π (i.e., ψ(S) = ϕ(σ(S))). 3.1 The RUN Algorithm In addition, we define reduction level/sequence to explain the re- RUN [13] is an optimal multiprocessor real-time scheduling al- duction tree as follows. gorithm with a small number of preemptions/migrations. We ex- plain the detail of RUN in offline and online phases. D EFINITION 2 (F ROM D EFINITION IV.7 IN [13]). Let i ≥ 1 Now we introduce the RUN’s specific model because RUN has be an integer, Γ be a set of servers, and S be a server in Γ. The many original parameters and assumptions to explain itself. A sys- operator ψ i is recursively defined by ψ 0 (S) = S and ψ i (S) = tem is fully utilized if the system utilization U is one. Since RUN ψ ◦ ψ i−1 (S). {ψ i }i is a reduction sequence, and the server system assumes the full system utilization, idle tasks are inserted to fill in ψ i (Γ) is said to be at reduction level i. σ(ψ2(Γ)) S14(1),{5N*,10N*,15N*} Reduction Level 2 σ VP2,1 S11 S13 S12 S11 S13 ψ2(Γ) S11(0.2),{5N*,10N*} S12(0.2),{10N*,15N*} S13(0.6),{5N*} 0 1 2 3 4 5 6 7 8 9 10 Reduction Level 1 φ VP1,1 S9 S10 S10 S6 S7 σ(ψ (Γ)) 1 σ({S6,S7}) σ({S8,S9}) σ({S10}) σ VP1,2 S10 S6 S7 S8 ψ (Γ) S 1 6 (0.4),{5N*} S 7 (0.4),{10N*} S 8 (0.4),{15N*} S9 (0.4),{10N*} S 10 (0.4),{5N*} 0 1 2 3 4 5 6 7 8 9 10 φ Reduction Level 0 σ(ψ0(Γ)) σ({S1}) σ({S2}) σ({S3}) σ({S4}) σ({S5}) VP0,1 S1 S5 S4 σ VP0,2 S2 S1 S1 S5 ψ0 S1(0.6),{5N*} S2(0.6),{10N*} S3(0.6),{15N*} S4(0.6),{10N*} S5(0.6),{5N*} VP0,3 S3 S2 S1 τ 1 (2,5) τ 2 (4,10) τ 3 (6,15) τ 4 (4,10) τ 5 (2,5) 0 1 2 3 4 5 6 7 8 9 10 Processor Figure 1: Reduction tree on three processors P1 τ1 τ5 τ4 P2 τ2 τ1 τ1 τ5 Note that assigning tasks to servers is defined as reduction level 0 P3 τ3 τ2 that does not perform the REDUCE operation. (C ,T ) Figure 1 shows the reduction tree on three processors. τi i i 0 1 2 3 4 5 6 7 8 9 10 expresses that task τi has WCET Ci and period Ti . Tasks τ1 , τ2 , τ3 , τ4 , and τ5 are assigned to servers S1 , S2 , S3 , S4 , and S5 at Figure 2: An example of RUN scheduling on three processors reduction level P 0, respectively. The total utilization of idle tasks is Uidle = M − i Ui = 3 − 5 × 0.4 = 1. In this example, idle tasks are assigned to servers at reduction level 0 uniformly, i.e., RULE 4 (F ROM RULE IV.3 IN [13]). Execute (circle) the child the utilization of each server is added to Uidle /N = 1/5 = 0.2, (packed server) of a dual server if and only if the dual server is not respectively. running (not circled). (U srv ),{ Dk } We represent a server as Sk k , where Uksrv is the uti- lization of server Sk and Dk is the deadline set of server Sk . The In the reduction tree, a thick arrow represents a scheduled server deadline set includes all (absolute) deadlines of tasks in the server. and a thin arrow represents a non-scheduled server by each par- Each server sets the earliest deadline in each deadline set when the ent server. If a thick arrow from a server points a task, the server server is released. We assign deadline sets 5N ∗ , 10N ∗ , 15N ∗ , schedules the task. 10N ∗ , and 5N ∗ to servers at reduction level 0, respectively, where In Figure 1, root server S14 is always running, regardless of these N ∗ means natural numbers. Servers S6 , S7 , S8 , S9 , and S10 are rules, because a root server is always a unit server. Next, S14 makes generated by the DUAL operation at reduction level 1 and their uti- scheduling decisions in EDF order and server S12 is running at this lizations are 0.4 because these servers are dual servers of servers time. Since server S12 is running, S8 and S9 are not running by S1 , S2 , S3 , S4 , and S5 at reduction level 0, respectively. In this ex- Rule 3. Since servers S11 and S13 are not running, servers S7 and ample, servers S6 and S7 are packed, servers S8 and S9 are packed, S10 are running by Rule 4. Servers S6 , S8 , and S9 are not running, and server S10 is not packed by the PACK operation. Servers S11 , and hence servers S1 , S3 , and S4 are running by Rule 4. S12 , and S13 are generated by the DUAL operation at reduction Figure 2 shows an example of RUN scheduling on three proces- level 2. Finally, server S14 is generated by the PACK operation at sors. Each server is executed on virtual processor V PR,v , where R reduction level 2 and its utilization is one, and hence the REDUCE represents the reduction level and v represents the virtual processor operation is finished and the reduction tree is completely generated. ID at each reduction level. The task set is shown in Figure 1 and Note that the number of root servers may become more than one this example shows the scheduling decisions at time 4. This system because when all servers are unit servers at the highest reduction has three processors P1 , P2 , and P3 , reduction level 0 has three level, then the REDUCE operation is finished. If one server is a virtual processors V P0,1 , V P0,2 , and V P0,3 , reduction level 1 has unit server, its dual server is a null server, which is packed into two virtual processors V P1,1 and V P1,2 , and reduction level 2 has another server when the next PACK operation is performed. one virtual processor V P2,1 . RUN uses the following task-to-processor assignment scheme; 3.1.2 Online Phase (i) leave executing tasks on their current processors, (ii) assign idle In an online phase, RUN schedules servers by the following rules tasks to their last-used processor, when available, to avoid unneces- and we use Figure 1 for reference. sary migrations, and (iii) assign remaining tasks to free processors arbitrarily. By this scheme, each server assigns tasks to processors RULE 3 (F ROM RULE IV.2 IN [13]). If a packed server is run- P1 , P2 , or P3 in Figure 2. When each task completes its execu- ning (circled), execute the child node with the earliest deadline tion on one processor, the processor becomes idle until the server among those children with work remaining; if a packed server is of each task exhausts its budget. For example, server S5 running on not running (not circled), execute none of its children. V P0,1 completes task τ5 on processor P1 at time 3 and P1 becomes idle (executes idle task) in time interval [3,4). 3.2 The ECC-EDF Algorithm The effectiveness of RUNT is in terms of energy ratio, which is ECC-EDF [9] is an RT-DVFS technique on uniprocessors and defined as follows. ensured that any implicit-deadline periodic task set T with utiliza- Z P 3 1 T Pk ∈Π fi tion U ≤ 1 is successfully scheduled. In addition, ECC-EDF out- Energy Ratio = dt performs CC-EDF theoretically and Look-Ahead EDF [12] exper- T 0 M imentally with respect to energy consumption. Therefore, ECC- EDF is used in RUNT to achieve optimal multiprocessor real-time 4.2 Simulation Results scheduling with RT-DVFS as well as EDF is used in RUN. To im- prove CC-EDF, ECC-EDF takes the elapsed time of tasks into con- 4.2.1 Task Assignment Policy sideration and finds the maximum utilization saved by the slack First, we evaluate the task assignment policy to examine which on completion of the task by calculating the minimum utilization heuristic is the most energy-efficient in RUNT. We use the idle needed to process the slack by its deadline using following Equa- ratio-fit as the idle task assignment policy. tion 1. Figure 3 shows the simulation results of task assignment pol- Ci − cci icy in SI-RUNT and DI-RUNT. In SI-RUNT, the worst-fit reduces Uis = , (1) energy consumption the most when U ≤ 0.6 but other heuristics Ti − E i outperform the worst-fit when U > 0.6 (Figures 3(a), 3(b), and where cci is the ACET of task τi and Ei is the elapsed time of task 3(c)). Since SI-RUNT selects the maximum frequency among all τi . frequencies of servers assigned to the processor, it is necessary to decrease them as much as possible. A simple way to realize this is 3.3 Idle Ratio-Fit to insert idle tasks to a server, and hence the utilization of the server The idle ratio-fit [4] assigns idle tasks to servers uniformly for can be 100%. Servers with 100% utilization can use the processors reducing energy consumption in an offline phase when the system exclusively and if the server has slack generated by inserting idle is not fully utilized. The idle ratio-fit is inspired by optimality on tasks, we can decrease the frequency of the processor. Even if the energy-efficiency of the worst-fit. Aydin and Yang proved that a original utilization of the server which uses the processor exclu- task assignment that evenly divides the total utilization among all sively and inserting idle tasks may not be effective, isolating the the processors, if it exists, will minimize the total energy consump- server increases the potential of decreasing the frequency assigned tion, and also showed that the worst-fit task assignment heuris- to other processors. This idea is similar to T-N Plane Transfor- tic outperforms other heuristics in energy-efficiency including the mation (TNPT) [7, 8] that classifies tasks into two classes: heavy first-, next-, and best-fit [2]. We define the idle ratio of Sk denoted tasks and light tasks, and a heavy task uses a processor exclusively. by IdleRatio(Sk ), which is the ratio of the utilization of idle tasks In contrast, the first- and best-fit tend to increase the utilization of assigned to server Sk to the utilization of Sk , as follows. each server up to 100%, and hence inserting idle tasks with low utilization can make the utilization of each server be 100%. For Ukidle this reason, when the utilization of the task set becomes large and IdleRatio(Sk ) = , (2) few servers can decrease their frequency, the first- and best-fit can Uksrv outperform the worst-fit. where Uksrv is the utilization of each server Sk and Ukidle is the In DI-RUNT, the worst-fit reduces energy consumption the most utilization of idle tasks assigned to server Sk . The idle ratio-fit (Figures 3(d), 3(e), and 3(f)). The worst-fit tends to uniform the lowers the idle ratio of each server on average to reduce energy utilization of each server, which results in well-balanced load of consumption. Due to the space limitation, the detail of the idle each processor that can decrease the frequency effectively. When ratio-fit is shown in [4]. the level of frequency becomes more fine-grained, that is to say, the number of selectable frequency values becomes large, the ac- tual frequency approaches theoretically optimal value. Therefore, 4. SIMULATION STUDIES energy consumption in the frequency set F3 is the lowest in all fre- quency sets. 4.1 Simulation Setups In this section, we evaluate RUNT through simulation studies. 4.2.2 Idle Task Assignment Policy This system has 16 processors (M = 16) on static/dynamic and Next we measure the effectiveness of the idle task assignment uniform/independent VFS systems. We use three frequency sets as policy, called idle ratio-fit, compared to other idle task assignment follows: F1 = { 0.5, 0.75, 1.0 } , F2 = { 0.5, 0.75, 0.83, 1.0 } , F3 = policies. We use the worst-fit as the task assignment policy because { 0.36, 0.55, 0.64, 0.73, 0.82, 0.91, 1.0 }. When a processor goes the worst-fit outperforms other heuristics in many cases as shown to idle, the processor sets its frequency to the minimum one. For in Section 4.2.1. example, when a processor using F1 goes to idle, set the operating Figure 4 shows the simulation results of the idle task assignment frequency to 0.5. Due to space limitations, we only show the results policy with the worst-fit in SI-RUNT and DI-RUNT. In SI-RUNT, of independent VFS techniques. the first-, best-, and worst-fit reduce more energy consumption than This simulation uses 1, 000 task sets in each system utilization. the idle ratio-fit (Figures 4(a), 4(b), and 4(c)). This is the similar The system utilization U is selected from [0.3, 0.35, 0.4, ..., 1.0]. reason to the task assignment policy discussed in Section 4.2.1. RT- Each Ui is selected within [0.01, 0.02, 0.03, ..., 1.0]. The period Ti SVFS needs to decrease the frequency of all servers. In DI-RUNT, of each task τi is selected within [100, 200, 300, ..., 1600]. Tasks on the other hand, the idle ratio-fit achieves the best results (Figures in each task set are ordered by decreasing utilization. The ratio of 4(d), 4(e), and 4(f)) because the idle ratio-fit is based on the idea of ACET to WCET is set to the range of [0.5, 1.0] or [0.75, 1.0], or the worst-fit and assigns idle tasks to servers uniformly. Especially, always 1.0, represented as DI-RUNT(50%), DI-RUNT(75%), and the idle ratio-fit is superior to other idle task assignment policies in DI-RUNT(100%), respectively. The simulation length is the hyper- a coarse-grained frequency set such as F1 or F2 and can save more period of each task set. energy as shown in Figures 4(d) and 4(e). 1 First-Fit 1 First-Fit 1 First-Fit Next-Fit Next-Fit Next-Fit Best-Fit Best-Fit Best-Fit 0.8 Worst-Fit 0.8 Worst-Fit 0.8 Worst-Fit Energy Ratio Energy Ratio Energy Ratio 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 System Utilization System Utilization System Utilization (a) SI-RUNT in Frequency Set F1 (b) SI-RUNT in Frequency Set F2 (c) SI-RUNT in Frequency Set F3 1 First-Fit 1 First-Fit 1 First-Fit Next-Fit Next-Fit Next-Fit Best-Fit Best-Fit Best-Fit 0.8 Worst-Fit 0.8 Worst-Fit 0.8 Worst-Fit Energy Ratio Energy Ratio Energy Ratio 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 System Utilization System Utilization System Utilization (d) DI-RUNT in Frequency Set F1 (e) DI-RUNT in Frequency Set F2 (f) DI-RUNT in Frequency Set F3 Figure 3: Task Assignment Policy with Idle Ratio-Fit energy consumption the most in all idle task assignment policies in Table 1: Task and idle task assignment policies RT-DVFS. Algorithm Frequency Set Task Assignment Idle Task Assignment SI-RUNT F1 Worst-Fit Idle Worst-Fit In future work, we will compare RUNT to other RT-SVFS/RT- SI-RUNT F2 Worst-Fit Idle First-Fit DVFS techniques such as TNPT [7, 8] with respect to the energy SI-RUNT F3 Worst-Fit Idle Best-Fit consumption and the number of preemptions/migrations. We will DI-RUNT F1 , F2 , and F3 Worst-Fit Idle Ratio-Fit implement RUNT to evaluate energy consumption and overhead in the RT-Est real-time operating system [5]. We will integrate the static/dynamic power management techniques such as [10] with 4.2.3 Comparison of DI-RUNT with SI-RUNT RUNT to reduce energy consumption effectively. DI-RUNT is compared to SI-RUNT with respect to energy ratio. Table 1 shows task and idle task assignment policies in each algo- Acknowledgement rithm. We choose these policies from the results of lowest energy ratio in Sections 4.2.1 and 4.2.2. This research was supported in part by Keio Gijuku Academic De- Figure 5 shows the simulation results of SI-RUNT and DI-RUNT. velopment Funds, Keio Kougakukai, and CREST, JST. The smaller the ratio of ACET to WCET is, the more DI-RUNT can reduce energy consumption because small ACET/WCET means a 6. REFERENCES large amount of dynamic slack and the ratio of slack in the utiliza- [1] B. Andersson and J. Jonsson. The Utilization Bounds of tion of each server is increased, which results in decreasing the fre- Partitioned and Pfair Static-Priority Scheduling on quency. Note that even if the ACET of each task is always equal to Multiprocessors are 50%. In Proceedings of the 15th its WCET, DI-RUNT outperforms SI-RUNT except for U = 1. SI- Euromicro Conference on Real-Time Systems, pages 33–40, RUNT determines the frequency of each processor before starting July 2003. the system and cannot change the frequency online. On the other [2] H. Aydin and Q. Yang. Energy-Aware Partitioning for hand, DI-RUNT can make use of dynamic slack, which is produced Multiprocessor Real-Time Systems. In Proceedings of the in the early completion of tasks, to decrease the frequency. In the 17th International Parallel and Distributed Processing case of U = 1, the system has no idle time, and hence SI-RUNT Symposium, pages 113–121, Apr. 2003. consumes the same energy as DI-RUNT(100%). [3] T. D. Burd and R. W. Brodersen. Energy Efficient CMOS Microprocessor Design. In Proceedings of the 28th Annual 5. CONCLUSION Hawaii International Conference on System Sciences, pages This paper evaluated the performance of RUNT, which is an 288–297, Jan. 1995. optimal multiprocessor real-time scheduling algorithm based on [4] H. Chishiro, M. Takasu, R. Ueda, and N. Yamasaki. Optimal RUN with RT-VFS. Simulation results show that RUNT can reduce Multiprocessor Real-Time Scheduling based on RUN with energy consumption with the worst-fit task assignment heuristic, Voltage and Frequency Scaling. In Proceedings of the 18th compared to other task assignment heuristics, in many cases. Inter- IEEE International Symposium on estingly, the idle ratio-fit does not reduce energy consumption com- Object/Component/Service-Oriented Real-Time Distributed pared to traditional assignment heuristics in RT-SVFS but reduces Computing, pages 284–287, Apr. 2015. 1 Idle First-Fit 1 Idle First-Fit 1 Idle First-Fit Idle Best-Fit Idle Best-Fit Idle Best-Fit Idle Worst-Fit Idle Worst-Fit Idle Worst-Fit 0.8 Idle Ratio-Fit 0.8 Idle Ratio-Fit 0.8 Idle Ratio-Fit Energy Ratio Energy Ratio Energy Ratio 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 System Utilization System Utilization System Utilization (a) SI-RUNT in Frequency Set F1 (b) SI-RUNT in Frequency Set F2 (c) SI-RUNT in Frequency Set F3 1 Idle First-Fit 1 Idle First-Fit 1 Idle First-Fit Idle Best-Fit Idle Best-Fit Idle Best-Fit Idle Worst-Fit Idle Worst-Fit Idle Worst-Fit 0.8 Idle Ratio-Fit 0.8 Idle Ratio-Fit 0.8 Idle Ratio-Fit Energy Ratio Energy Ratio Energy Ratio 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 System Utilization System Utilization System Utilization (d) DI-RUNT in Frequency Set F1 (e) DI-RUNT in Frequency Set F2 (f) DI-RUNT in Frequency Set F3 Figure 4: Idle Task Assignment Policy with Worst-Fit 1 SI-RUNT 1 SI-RUNT 1 SI-RUNT DI-RUNT(100%) DI-RUNT(100%) DI-RUNT(100%) DI-RUNT(75%) DI-RUNT(75%) DI-RUNT(75%) 0.8 DI-RUNT(50%) 0.8 DI-RUNT(50%) 0.8 DI-RUNT(50%) Energy Ratio Energy Ratio Energy Ratio 0.6 0.6 0.6 0.4 0.4 0.4 0.2 0.2 0.2 0 0 0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 System Utilization System Utilization System Utilization (a) Frequency Set F1 (b) Frequency Set F2 (c) Frequency Set F3 Figure 5: SI-RUNT and DI-RUNT [5] H. Chishiro and N. Yamasaki. RT-Est: Real-Time Operating [9] M.-S. Lee and C.-H. Lee. Enhanced Cycle-Conserving System for Semi-Fixed-Priority Scheduling Algorithms. In Dynamic Voltage Scaling for Low-Power Real-Time Proceedings of the 2011 International Symposium on Operating Systems. IEICE Transactions on Information and Embedded and Pervasive Systems, pages 358–365, Oct. Systems, 97-D(3):480–487, Mar. 2014. 2011. [10] V. Legout, M. Jan, and L. Pautet. Scheduling algorithms to [6] R. Ernst and W. Ye. Embedded Program Timing Analysis reduce the static energy consumption of real-time systems. Based on Path Clustering and Architecture Classification. In Real-Time Systems, 51(2):153–191, Mar. 2015. Proceedings of International Conference on Computer-Aided [11] C. L. Liu and J. W. Layland. Scheduling Algorithms for Design, pages 598–604, Nov. 1997. Multiprogramming in a Hard Real-Time Environment. [7] K. Funaoka, S. Kato, and N. Yamasaki. Energy-Efficient Journal of the ACM, 20(1):46–61, Jan. 1973. Optimal Real-Time Scheduling on Multiprocessors. In [12] P. Pillai and K. G. Shin. Real-Time Dynamic Voltage Scaling Proceedings of the 11th IEEE International Symposium on for Low-Power Embedded Operating Systems. In Object/Component/Service-Oriented Real-Time Distributed Proceedings of the Eighteenth ACM Symposium on Computing, pages 23–30, May 2008. Operating Systems Principles, pages 89–102, Dec. 2001. [8] K. Funaoka, A. Takeda, S. Kato, and N. Yamasaki. Dynamic [13] P. Regnier, G. Lima, E. Massa, G. Levin, and S. Brandt. Voltage and Frequency Scaling for Optimal Real-Time RUN: Optimal Multiprocessor Real-Time Scheduling via Scheduling on Multiprocessors. In Proceedings of the 3rd Reduction to Uniprocessor. In Proceedings of the 32th IEEE IEEE International Symposium on Industrial Embedded Real-Time Systems Symposium, pages 104–115, Nov. 2011. Systems, pages 27–33, June 2008.