-

Performance Evaluation of RUNT Algorithm

Hiroyuki Chishiro

chishiro@ny.ics.keio.ac.jp 0

Masayoshi Takasu

takasu@ny.ics.keio.ac.jp 0

Rikuhei Ueda

riku@ny.ics.keio.ac.jp 0

Nobuyuki Yamasaki

yamasaki@ny.ics.keio.ac.jp 0 0 Department of Information and Computer Science, Keio University , Yokohama , Japan

2015

This paper evaluates the performance of Reduction to Uniprocessor Transformation (RUNT) with Voltage and Frequency Scaling, called Static RUNT (S-RUNT) and Dynamic RUNT (D-RUNT), respectively. Simulation results show that how to assign tasks to servers in RUNT influences energy consumption and the worst-fit heuristic is the best in many cases. In addition, the idle task assignment policy saves more energy consumption in D-RUNT and D-RUNT outperforms S-RUNT if the actual case execution time of each task is shorter than its worst case execution time.

C.3 [SPECIAL-PURPOSE AND APPLICATION-BASED SYSTEMS]: Real-time and embedded systems

General Terms

Algorithms

INTRODUCTION

Real-time systems have required multiprocessors and there are mainly two categories of multiprocessor real-time scheduling: partitioned scheduling and global scheduling. Partitioned scheduling assigns tasks to processors offline and there are no migratory tasks but it guarantees only 50% processor utilization in the worst case [ 1 ]. In contrast, global scheduling can achieve 100% utilization by migrating tasks among processors online but increases run-time overhead. We are interested in optimal multiprocessor real-time scheduling algorithms, which can achieve 100% utilization with any implicit-deadline periodic task sets. Several optimal multiprocessor real-time scheduling algorithms have been proposed and Reduction to UNiprocessor (RUN) [ 13 ] outperforms other optimal algorithms with respect to the number of preemptions/migrations.

Since the small number of preemptions/migrations improves the practicality of the scheduling, we focus on RUN.

RUN transforms the multiprocessor scheduling problem into an equivalent set of uniprocessor scheduling problems by the DUAL and PACK operations and the detail of them is described in Section 3. After transforming offline, RUN uses Earliest Deadline First (EDF) [ 11 ] to transform the uniprocessor scheduling into the multiprocessor scheduling online because EDF is optimal in implicitdeadline periodic task sets on uniprocessors. Using these operations, RUN achieves the optimality with low overhead.

Voltage and Frequency Scaling (VFS) is one of the most popular techniques to reduce energy consumption in computer systems. Especially, Real-Time Voltage and Frequency Scaling (RT-VFS) can reduce energy consumption by scaling the operating frequency and the supply voltage while meeting real-time constraints. RTVFS is based on the essential characteristic of real-time tasks. All tasks can be executed slowly as long as their deadlines are met. In most of the CMOS-based modern processors, the dynamic energy consumption E is proportional to the operating frequency f and the square of the supply voltage V (i.e., E ∝ f V 2) [ 3 ], and the maximum operating frequency depends on the supply voltage. Therefore, RT-VFS can effectively reduce energy consumption at a cubic order of the operating frequency. RT-VFS has two following techniques: Real-Time Static Voltage and Frequency Scaling (RT-SVFS) and Real-Time Dynamic Voltage and Frequency Scaling (RT-DVFS). RT-SVFS determines the voltage and frequency offline and does not adjust them after the system starts. RT-DVFS can reduce energy consumption by adjusting the voltage and frequency online, which potentially saves more energy consumption. Changing the voltage and frequency takes some time due to I/O operations. If the overhead of RT-DVFS is small, it is possible to ignore overhead or incorporate it into execution time of tasks. However, RT-DVFS may incur significant overhead in some systems and RT-SVFS is a good solution for these systems. There is a trade-off between energy consumption and overhead in RT-VFS.

In our previous work, Reduction to UNiprocessor Transformation (RUNT) [ 4 ] was proposed to achieve optimal multiprocessor real-time scheduling algorithm based on RUN with Voltage and Frequency Scaling. Unfortunately, the performance of RUNT is not evaluated.

This paper evaluates the performance of RUNT with RT-SVFS/RTDVFS, called Static RUNT (S-RUNT) and Dynamic RUNT (DRUNT), respectively. Simulation results show that the saved energy consumption strongly depends on the way to assign tasks to servers in RUNT and the worst-fit heuristic is the most energy-efficient in many cases. In addition, the idle task assignment policy saves more energy consumption in D-RUNT and D-RUNT outperforms S-RUNT if tasks are completed early. 2.1

SYSTEM MODEL Processor Model

The system has M processors Π = { P1, P2, ..., PM }. Each processor Pj is characterized by the continuous normalized frequency αj (0 ≤ αj ≤ 1). Here, we discuss the differences between the system model and practical environments. (i) In practical environments, the system has the discrete frequency values F = { f1, ..., fL | fmin = f1 < ... < fL = fmax } and the discrete voltage values V = { V1, ..., VL | V1 < ... < VL }. We assume that Vk corresponds to fk and the voltage is also changed at the same time as the corresponding frequency is changed. The lowest frequency fi ∈ F such that αj ≤ fi/fmax will be selected to achieve the lowest energy consumption while meeting real-time constraints. (ii) The system model assumes that no overhead occurs at run-time. In practical environments, the scaled frequency interferes with the scheduling even if the frequency is not changed dynamically. The worst case overhead is included in the worst case execution time (WCET). 2.2

Task Model

The system has a task set T = { τ1, τ2, ..., τN }, which is a set of N periodic tasks on M processors. Each task cannot be executed in parallel among processors. Each task τi has its WCET Ci and period Ti. The jth instance of task τi is called job τi,j . Task τi executed on a processor Pj requires Ci/αj processor time at every Ti interval. The relative deadline Di is equal to its period Ti (i.e., implicit-deadline). All tasks must complete the execution by their deadlines. The utilization of each task is defined as Ui = Ci/Ti and the system utilization is defined as U = M1 Pi Ui. We assume that all tasks may be preempted and migrated among processors at any time, and are independent (i.e., they do not share resources and do not have any precedence). 3.

THE RUNT ALGORITHM

We introduce RUNT [ 4 ], which is an optimal multiprocessor real-time scheduling algorithm based on RUN with VFS. RUNT supports RT-SVFS/RT-DVFS techniques on uniform/independent VFS multiprocessor systems, called Static Uniform RUNT (SURUNT), Static Independent RUNT (SI-RUNT), Dynamic Uniform RUNT (DU-RUNT), and Dynamic Independent RUNT (DI-RUNT), respectively. Due to the space limitation, the detail of these algorithms are explained in [ 4 ]. When the actual case execution time (ACET) of each task is often shorter than its WCET [ 6 ], RUNT uses Enhanced Cycle-Conserving EDF (ECC-EDF) [ 9 ] to reclaim slack for reducing energy consumption. RUNT achieves a small number of preemptions/migrations compared to RUN because these operations are performed when every scheduling event occurs. If a task set does not satisfy the full system utilization, idle tasks are inserted because RUN assumes the full system utilization. An idle task assignment policy is an important factor to reduce energy consumption. Therefore, the idle ratio-fit was proposed in our previous work. We explain the overview of the RUN algorithm, the ECCEDF algorithm, and the idle ratio-fit as follows. 3.1

The RUN Algorithm

RUN [ 13 ] is an optimal multiprocessor real-time scheduling algorithm with a small number of preemptions/migrations. We explain the detail of RUN in offline and online phases.

Now we introduce the RUN’s specific model because RUN has many original parameters and assumptions to explain itself. A system is fully utilized if the system utilization U is one. Since RUN assumes the full system utilization, idle tasks are inserted to fill in the slack if U < 1. The total utilization of idle tasks is defined as Uidle = M − Pi Ui. Note that each idle task has just the parameter of utilization and does not have other parameters including WCET and period.

RUN transforms the multiprocessor scheduling to the uniprocessor scheduling by aggregating tasks into servers S. We treat servers as tasks with a sequence of jobs but they are not actual tasks in the system; each server is a proxy for a collection of client tasks. When a server is running, the processor time is used by one of its clients. Clients of a server are scheduled via an internal scheduling mechanism. The utilization of each server Sk is Uksrv = Pτi∈Sk Ui, where τi ∈ Sk means that task τi is first assigned to server Sk, and Uksrv does not exceed one. 3.1.1

Offline Phase

In an offline phase, RUN reduces the multiprocessor scheduling to the uniprocessor scheduling by the DUAL and PACK operations. RUN uses EDF for uniprocessor scheduling because EDF is optimal in implicit-deadline periodic task sets on uniprocessors.

The DUAL operation transforms a task τi into the dual task τi∗, whose execution time represents the idle time of τi (i.e., Ci∗ = Ti − Ci). The relative deadline of dual task τi∗ is equal to that of task τi. The dual task τi∗ is executed exactly when the original task τi is idle and vice versa. A schedule for the original task set is obtained by blocking τi whenever τi∗ executes in the dual schedule. In addition, the sum of utilizations of task τi and dual task τi∗ is one and the utilization of dual task τi∗ is Ui∗ = Ci∗/Ti. The DUAL operation reduces the number of processors whenever N − M < M .

The PACK operation packs dual servers into packed servers whose utilizations do not exceed one. When N − M ≥ M , the number of servers can be reduced by aggregating them into fewer servers by the PACK operation. The scheme how to pack servers to fewer servers is heuristic and the PACK operation is similar to the partitioning scheme. Note that if assigning tasks to processors is successful, RUN generates the same schedule as Partitioned EDF (PEDF) and does not perform the DUAL and PACK operations. Otherwise the DUAL and PACK operations are performed to generate the reduction tree offline, which is used to make server scheduling decisions online. The detail of making scheduling decisions in the reduction tree is shown in the next subsection.

In order to explain the reduction tree, we define the following terms with respect to servers as follows. A unit server is a server whose utilization is one. A null server is a server whose utilization is zero. A root server is a last packed server whose utilization is one (unit server).

Packing the dual servers of packed servers can reduce the number of servers by at least (almost) half. We perform DUAL and PACK operations repeatedly until all packed servers become unit servers. Now we define a REDUCE operation to be their composition.

DEFINITION 1 (FROM DEFINITION IV.6. IN [ 13 ]). Given a set of servers Γ and a packing π of Γ , a REDUCE operation on a server S in Γ, denoted by ψ(S), is the composition of the DUAL operation ϕ with the PACK operation σ for π (i.e., ψ(S) = ϕ(σ(S))). In addition, we define reduction level/sequence to explain the reduction tree as follows.

DEFINITION 2 (FROM DEFINITION IV.7 IN [ 13 ]). Let i ≥ 1 be an integer, Γ be a set of servers, and S be a server in Γ. The operator ψi is recursively defined by ψ0(S) = S and ψi(S) = ψ ◦ ψi−1(S). {ψi}i is a reduction sequence, and the server system ψi(Γ) is said to be at reduction level i.

S14(1),{5N*,10N*,15N*} Note that assigning tasks to servers is defined as reduction level 0 that does not perform the REDUCE operation.

Figure 1 shows the reduction tree on three processors. τi(Ci,Ti) expresses that task τi has WCET Ci and period Ti. Tasks τ1, τ2, τ3, τ4, and τ5 are assigned to servers S1, S2, S3, S4, and S5 at reduction level 0, respectively. The total utilization of idle tasks is Uidle = M − Pi Ui = 3 − 5 × 0.4 = 1. In this example, idle tasks are assigned to servers at reduction level 0 uniformly, i.e., the utilization of each server is added to Uidle/N = 1/5 = 0.2, respectively.

We represent a server as S(Uksrv),{ Dk }, where Uksrv is the utik lization of server Sk and Dk is the deadline set of server Sk. The deadline set includes all (absolute) deadlines of tasks in the server. Each server sets the earliest deadline in each deadline set when the server is released. We assign deadline sets 5N ∗, 10N ∗, 15N ∗, 10N ∗, and 5N ∗ to servers at reduction level 0, respectively, where N ∗ means natural numbers. Servers S6, S7, S8, S9, and S10 are generated by the DUAL operation at reduction level 1 and their utilizations are 0.4 because these servers are dual servers of servers S1, S2, S3, S4, and S5 at reduction level 0, respectively. In this example, servers S6 and S7 are packed, servers S8 and S9 are packed, and server S10 is not packed by the PACK operation. Servers S11, S12, and S13 are generated by the DUAL operation at reduction level 2. Finally, server S14 is generated by the PACK operation at reduction level 2 and its utilization is one, and hence the REDUCE operation is finished and the reduction tree is completely generated.

Note that the number of root servers may become more than one because when all servers are unit servers at the highest reduction level, then the REDUCE operation is finished. If one server is a unit server, its dual server is a null server, which is packed into another server when the next PACK operation is performed. 3.1.2

Online Phase

In an online phase, RUN schedules servers by the following rules and we use Figure 1 for reference.

RULE 3 (FROM RULE IV.2 IN [ 13 ]). If a packed server is running (circled), execute the child node with the earliest deadline among those children with work remaining; if a packed server is not running (not circled), execute none of its children. σ(ψ2(Γ))

ψ2(Γ) σ(ψ1(Γ)) σ φ σ φ σ ψ0

RULE 4 (FROM RULE IV.3 IN [ 13 ]). Execute (circle) the child (packed server) of a dual server if and only if the dual server is not running (not circled).

In the reduction tree, a thick arrow represents a scheduled server and a thin arrow represents a non-scheduled server by each parent server. If a thick arrow from a server points a task, the server schedules the task.

In Figure 1, root server S14 is always running, regardless of these rules, because a root server is always a unit server. Next, S14 makes scheduling decisions in EDF order and server S12 is running at this time. Since server S12 is running, S8 and S9 are not running by Rule 3. Since servers S11 and S13 are not running, servers S7 and S10 are running by Rule 4. Servers S6, S8, and S9 are not running, and hence servers S1, S3, and S4 are running by Rule 4.

Figure 2 shows an example of RUN scheduling on three processors. Each server is executed on virtual processor V PR,v , where R represents the reduction level and v represents the virtual processor ID at each reduction level. The task set is shown in Figure 1 and this example shows the scheduling decisions at time 4. This system has three processors P1, P2, and P3, reduction level 0 has three virtual processors V P0,1, V P0,2, and V P0,3, reduction level 1 has two virtual processors V P1,1 and V P1,2, and reduction level 2 has one virtual processor V P2,1.

RUN uses the following task-to-processor assignment scheme; (i) leave executing tasks on their current processors, (ii) assign idle tasks to their last-used processor, when available, to avoid unnecessary migrations, and (iii) assign remaining tasks to free processors arbitrarily. By this scheme, each server assigns tasks to processors P1, P2, or P3 in Figure 2. When each task completes its execution on one processor, the processor becomes idle until the server of each task exhausts its budget. For example, server S5 running on V P0,1 completes task τ5 on processor P1 at time 3 and P1 becomes idle (executes idle task) in time interval [3,4). 3.2

The ECC-EDF Algorithm

ECC-EDF [ 9 ] is an RT-DVFS technique on uniprocessors and ensured that any implicit-deadline periodic task set T with utilization U ≤ 1 is successfully scheduled. In addition, ECC-EDF outperforms CC-EDF theoretically and Look-Ahead EDF [ 12 ] experimentally with respect to energy consumption. Therefore, ECCEDF is used in RUNT to achieve optimal multiprocessor real-time scheduling with RT-DVFS as well as EDF is used in RUN. To improve CC-EDF, ECC-EDF takes the elapsed time of tasks into consideration and finds the maximum utilization saved by the slack on completion of the task by calculating the minimum utilization needed to process the slack by its deadline using following Equation 1. , where cci is the ACET of task τi and Ei is the elapsed time of task τi. 3.3

Idle Ratio-Fit

The idle ratio-fit [ 4 ] assigns idle tasks to servers uniformly for reducing energy consumption in an offline phase when the system is not fully utilized. The idle ratio-fit is inspired by optimality on energy-efficiency of the worst-fit. Aydin and Yang proved that a task assignment that evenly divides the total utilization among all the processors, if it exists, will minimize the total energy consumption, and also showed that the worst-fit task assignment heuristic outperforms other heuristics in energy-efficiency including the first-, next-, and best-fit [ 2 ]. We define the idle ratio of Sk denoted by I dleRatio(Sk), which is the ratio of the utilization of idle tasks assigned to server Sk to the utilization of Sk, as follows.

I dleRatio(Sk) =

Ukidle U srv , k where Uksrv is the utilization of each server Sk and Ukidle is the utilization of idle tasks assigned to server Sk. The idle ratio-fit lowers the idle ratio of each server on average to reduce energy consumption. Due to the space limitation, the detail of the idle ratio-fit is shown in [ 4 ].

SIMULATION STUDIES Simulation Setups

In this section, we evaluate RUNT through simulation studies. This system has 16 processors (M = 16) on static/dynamic and uniform/independent VFS systems. We use three frequency sets as follows: F1 = { 0.5, 0.75, 1.0 } , F2 = { 0.5, 0.75, 0.83, 1.0 } , F3 = { 0.36, 0.55, 0.64, 0.73, 0.82, 0.91, 1.0 }. When a processor goes to idle, the processor sets its frequency to the minimum one. For example, when a processor using F1 goes to idle, set the operating frequency to 0.5. Due to space limitations, we only show the results of independent VFS techniques.

This simulation uses 1, 000 task sets in each system utilization. The system utilization U is selected from [0.3, 0.35, 0.4, ..., 1.0]. Each Ui is selected within [0.01, 0.02, 0.03, ..., 1.0]. The period Ti of each task τi is selected within [100, 200, 300, ..., 1600]. Tasks in each task set are ordered by decreasing utilization. The ratio of ACET to WCET is set to the range of [0.5, 1.0] or [0.75, 1.0], or always 1.0, represented as DI-RUNT(50%), DI-RUNT(75%), and DI-RUNT(100%), respectively. The simulation length is the hyperperiod of each task set. (1) (2)

The effectiveness of RUNT is in terms of energy ratio, which is defined as follows.

Energy Ratio = 1 Z T PPk∈Π fi3 T 0

M dt 4.2

Simulation Results

4.2.1

Task Assignment Policy

First, we evaluate the task assignment policy to examine which heuristic is the most energy-efficient in RUNT. We use the idle ratio-fit as the idle task assignment policy.

Figure 3 shows the simulation results of task assignment policy in SI-RUNT and DI-RUNT. In SI-RUNT, the worst-fit reduces energy consumption the most when U ≤ 0.6 but other heuristics outperform the worst-fit when U > 0.6 (Figures 3(a), 3(b), and 3(c)). Since SI-RUNT selects the maximum frequency among all frequencies of servers assigned to the processor, it is necessary to decrease them as much as possible. A simple way to realize this is to insert idle tasks to a server, and hence the utilization of the server can be 100%. Servers with 100% utilization can use the processors exclusively and if the server has slack generated by inserting idle tasks, we can decrease the frequency of the processor. Even if the original utilization of the server which uses the processor exclusively and inserting idle tasks may not be effective, isolating the server increases the potential of decreasing the frequency assigned to other processors. This idea is similar to T-N Plane Transformation (TNPT) [ 7, 8 ] that classifies tasks into two classes: heavy tasks and light tasks, and a heavy task uses a processor exclusively. In contrast, the first- and best-fit tend to increase the utilization of each server up to 100%, and hence inserting idle tasks with low utilization can make the utilization of each server be 100%. For this reason, when the utilization of the task set becomes large and few servers can decrease their frequency, the first- and best-fit can outperform the worst-fit.

In DI-RUNT, the worst-fit reduces energy consumption the most (Figures 3(d), 3(e), and 3(f)). The worst-fit tends to uniform the utilization of each server, which results in well-balanced load of each processor that can decrease the frequency effectively. When the level of frequency becomes more fine-grained, that is to say, the number of selectable frequency values becomes large, the actual frequency approaches theoretically optimal value. Therefore, energy consumption in the frequency set F3 is the lowest in all frequency sets. 4.2.2

Idle Task Assignment Policy

Next we measure the effectiveness of the idle task assignment policy, called idle ratio-fit, compared to other idle task assignment policies. We use the worst-fit as the task assignment policy because the worst-fit outperforms other heuristics in many cases as shown in Section 4.2.1.

Figure 4 shows the simulation results of the idle task assignment policy with the worst-fit in SI-RUNT and DI-RUNT. In SI-RUNT, the first-, best-, and worst-fit reduce more energy consumption than the idle ratio-fit (Figures 4(a), 4(b), and 4(c)). This is the similar reason to the task assignment policy discussed in Section 4.2.1. RTSVFS needs to decrease the frequency of all servers. In DI-RUNT, on the other hand, the idle ratio-fit achieves the best results (Figures 4(d), 4(e), and 4(f)) because the idle ratio-fit is based on the idea of the worst-fit and assigns idle tasks to servers uniformly. Especially, the idle ratio-fit is superior to other idle task assignment policies in a coarse-grained frequency set such as F1 or F2 and can save more energy as shown in Figures 4(d) and 4(e). 0.2 0 1 0.2 0 1

DI-RUNT is compared to SI-RUNT with respect to energy ratio. Table 1 shows task and idle task assignment policies in each algorithm. We choose these policies from the results of lowest energy ratio in Sections 4.2.1 and 4.2.2.

Figure 5 shows the simulation results of SI-RUNT and DI-RUNT. The smaller the ratio of ACET to WCET is, the more DI-RUNT can reduce energy consumption because small ACET/WCET means a large amount of dynamic slack and the ratio of slack in the utilization of each server is increased, which results in decreasing the frequency. Note that even if the ACET of each task is always equal to its WCET, DI-RUNT outperforms SI-RUNT except for U = 1. SIRUNT determines the frequency of each processor before starting the system and cannot change the frequency online. On the other hand, DI-RUNT can make use of dynamic slack, which is produced in the early completion of tasks, to decrease the frequency. In the case of U = 1, the system has no idle time, and hence SI-RUNT consumes the same energy as DI-RUNT(100%).

CONCLUSION

This paper evaluated the performance of RUNT, which is an optimal multiprocessor real-time scheduling algorithm based on RUN with RT-VFS. Simulation results show that RUNT can reduce energy consumption with the worst-fit task assignment heuristic, compared to other task assignment heuristics, in many cases. Interestingly, the idle ratio-fit does not reduce energy consumption compared to traditional assignment heuristics in RT-SVFS but reduces energy consumption the most in all idle task assignment policies in RT-DVFS.

In future work, we will compare RUNT to other RT-SVFS/RTDVFS techniques such as TNPT [ 7, 8 ] with respect to the energy consumption and the number of preemptions/migrations. We will implement RUNT to evaluate energy consumption and overhead in the RT-Est real-time operating system [ 5 ]. We will integrate the static/dynamic power management techniques such as [ 10 ] with RUNT to reduce energy consumption effectively.

Acknowledgement

This research was supported in part by Keio Gijuku Academic Development Funds, Keio Kougakukai, and CREST, JST. 0.2 Idle First-Fit Idle Best-Fit Idle Worst-Fit Idle Ratio-Fit 0.2 Idle First-Fit Idle Best-Fit Idle Worst-Fit Idle Ratio-Fit 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

System Utilization 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

System Utilization (a) Frequency Set F1 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

System Utilization (b) Frequency Set F2 0 0.2 Idle First-Fit Idle Best-Fit Idle Worst-Fit Idle Ratio-Fit

[1]

Andersson and

Jonsson . The Utilization Bounds of Partitioned and Pfair Static-Priority Scheduling on Multiprocessors are 50% . In Proceedings of the 15th Euromicro Conference on Real-Time Systems , pages 33 - 40 , July 2003 .

[2]

Aydin and

Yang . Energy-Aware Partitioning for Multiprocessor Real-Time Systems . In Proceedings of the 17th International Parallel and Distributed Processing Symposium , pages 113 - 121 , Apr. 2003 .

[3]

T. D.

Burd and

R. W.

Brodersen . Energy Efficient CMOS Microprocessor Design . In Proceedings of the 28th Annual Hawaii International Conference on System Sciences , pages 288 - 297 , Jan. 1995 .

[4]

Chishiro ,

Takasu ,

Ueda , and

Yamasaki . Optimal Multiprocessor Real-Time Scheduling based on RUN with Voltage and Frequency Scaling . In Proceedings of the 18th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing , pages 284 - 287 , Apr. 2015 .

tio 0.8 a R 0 .6 y g r en 0.4 E

[5]

Chishiro and

Yamasaki. RT-Est: Real-Time Operating System for Semi-Fixed-Priority Scheduling Algorithms . In Proceedings of the 2011 International Symposium on Embedded and Pervasive Systems , pages 358 - 365 , Oct. 2011 .

[6]

Ernst and

Ye . Embedded Program Timing Analysis Based on Path Clustering and Architecture Classification . In Proceedings of International Conference on Computer-Aided Design , pages 598 - 604 , Nov. 1997 .

[7]

Funaoka ,

Kato , and

Yamasaki . Energy-Efficient Optimal Real-Time Scheduling on Multiprocessors . In Proceedings of the 11th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing , pages 23 - 30 , May 2008 .

[8]

Funaoka ,

Takeda ,

Kato , and

Yamasaki . Dynamic Voltage and Frequency Scaling for Optimal Real-Time Scheduling on Multiprocessors . In Proceedings of the 3rd IEEE International Symposium on Industrial Embedded Systems , pages 27 - 33 , June 2008 .

[9]

M.-S.

Lee and

C.-H.

Lee . Enhanced Cycle-Conserving Dynamic Voltage Scaling for Low-Power Real-Time Operating Systems . IEICE Transactions on Information and Systems , 97-D(3): 480 - 487 , Mar. 2014 .

[10]

Legout ,

Jan , and

Pautet . Scheduling algorithms to reduce the static energy consumption of real-time systems . Real-Time Systems , 51 ( 2 ): 153 - 191 , Mar. 2015 .

[11]

C. L.

Liu and

J. W.

Layland . Scheduling Algorithms for Multiprogramming in a Hard Real-Time Environment . Journal of the ACM , 20 ( 1 ): 46 - 61 , Jan. 1973 .

[12]

Pillai and

K. G.

Shin . Real-Time Dynamic Voltage Scaling for Low-Power Embedded Operating Systems . In Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles , pages 89 - 102 , Dec. 2001 .

[13]

Regnier ,

Lima , E. Massa, G. Levin, and

Brandt . RUN: Optimal Multiprocessor Real-Time Scheduling via Reduction to Uniprocessor . In Proceedings of the 32th IEEE Real-Time Systems Symposium , pages 104 - 115 , Nov. 2011 .