Cache-Aware Real-Time Scheduling Simulator: Implementation and Return of Experience Hai Nam Tran, Frank Singhoff, Stéphane Rubini, Jalil Boukhobza Univ. Bretagne Occidentale, UMR 6285, Lab-STICC, F-29200 Brest, France {hai-nam.tran,singhoff,rubini,boukhobza}@univ-brest.fr ABSTRACT by the preemption. In [15], the authors showed that the Evaluating cache related preemption delay (CRPD) in pre- cost of context switching raises from the range of 4.2µs to emptive scheduling context for Real-Time Embedded Sys- 8.7 µs to the range of 38.6µs to 203.2µs when the size of the tem (RTES) stays an open issue despite of its practical im- data sets of programs is larger than the cache. Thus, tak- portance. Indeed, various parameters should be taken into ing CRPD into account may be important when performing account such as cache utilization of tasks, memory layout scheduling analysis of a RTES. of tasks, processor utilization and priority assignment algo- Problem statement: Scheduling analysis of RTES with rithm. In state-of-the-art work, dependencies amongst those cache memory in fixed priority preemptive scheduling con- parameters are not investigated with precision because of the text is complex because there are many parameters affecting lack of scheduling analysis tool taking them into account. the outcome and dependencies amongst them. For example, In this article, we propose a tool to investigate and eval- CRPD analysis, which is the process of evaluating the im- uate scheduling analysis of RTES with cache memory and pact of CRPD on a RTES, cannot be only based on single- various scheduling parameters. Both modeling guidelines task analysis as it depends on correlations amongst tasks. and implementation is detailed. Implementation is made in For a set of tasks, CRPD analysis has to take into account Cheddar, an open-source scheduling analyzer, which is freely parameters such as cache utilizations of tasks, memory lay- available; Experiments are conducted in order to illustrate out of tasks, processor utilization but also priority assign- performance and applicability of our tool. Furthermore, we ment algorithms and WCET tasks. discuss about implementation issues, problems raised and To address all parameters, many tools are involved, in lessons learned from those experiments. two domains: WCET/Cache analysis and scheduling analy- sis. Each of those domains is only dedicated to a sub-part of those parameters. Unfortunately, there are no tools ad- 1. INTRODUCTION dressing the whole problem in state-of-the-art work. Cache memory is a crucial hardware component used to Contributions : We propose a tool to investigate and eval- reduce the performance gap between processors and main uate scheduling analysis of RTES with cache memory and memories. In the context of real-time embedded system various scheduling parameters. The work is implemented in (RTES), the popularization of processors with large size and Cheddar, an open-source scheduling analyzer, which is freely multi level caches motivates the proposition of verification available to researchers and practitioners. We propose an methods handling this hardware component [14], [5], [2]. approach to use the analysis models and results of WCET/- Scheduling simulation is a classical verification method cache analysis tools into a scheduling analysis tool. The performed at design step of RTES. It provides means to in- programming model we used is compliant with the existing vestigate that all timing constraints of a RTES are satisfied. one in [14], [5], [2] and [17]. Experiments are performed to To perform scheduling simulation, several assumptions are illustrate performance and applicability of our tool. usually made in order to simplify system modeling. One of The rest of the article is organized as follows. Section 2 them is that preemption costs are negligible. Preemption presents background of our work. In Section 3, we give an cost is the additional time to process interrupts, manipulate overview of our approach. In Section 4, we present devel- task queues and actually perform context switches. opment process and detailed information about the imple- Integrating cache memory in RTES generally enhances the mentation of our work. In Section 5, an evaluation of our whole performance in term of execution time, but unfortu- proposed scheduling simulator is given. Section 6 discusses nately it can lead to an increase in preemption cost and related works and Section 7 concludes the article. execution time variability [15]. When a task is preempted, memory blocks belonging to the task could be removed from 2. BACKGROUND the cache. Once the task resumes, previously removed mem- In this section, we introduce the system model and we ory blocks have to be reloaded into the cache. Thus, a new explain how preemption cost and CRPD are computed. We preemption cost named Cache Related Preemption Delay assume an uniprocessor RTES with direct-mapped instruc- (CRPD) is introduced. By definition, CRPD is the addi- tion cache. As far as we know, instruction cache is popular in tional time to refill the cache with memory blocks evicted practical implementation of RTES. The assumption about EWiLi’15, October 8th, 2015, Amsterdam, The Netherlands. direct-mapped is used to simplify the data flow analysis, Copyright retained by the authors. it could be easily relaxed. There are n independent tasks, 3. Scheduling ECB of each task, which is called cache access profile in the RTES 1. Modeling Simulation sequel. A detailed description of these first and second steps could be found in [24], which is a preliminary work toward cache integration. System Model Cache Access (Cheddar-ADL) 2. Data Flow Third, system model with computed cache access profiles Analysis Profile (UCB, ECB) are loaded into the scheduling simulator. Scheduling simula- : Input : Output tion is done and provides various outcomes such as feasibility of the system, worst case response time of tasks, number of preemption, CRPD per task and total CRPD. Figure 1: Approach In our work, the cache utilization of a program at step 1 τ1 , τ2 , ..., τn scheduled by a preemptive scheduler. CPRD is is modeled at both low level, which is the control flow graph bounded by g ·BRT , where g is an upper bound on the num- of program produced by a WCET analysis tool, and at high ber of cache block reloaded due to preemption, and BRT is level, which is a pre-computed set of UCB and ECB. We an upper-bound on the time necessary to reload a memory expect the user to re-use results produced by a WCET anal- block in the cache (block reload time). To analyze the effect ysis tool within Cheddar. We can extract all information of preemption on a preempted task, Lee et al.[14] introduced related to CRPD from the scheduling simulator as written the concept of useful cache block (UCB ): above to fully investigate the impact of CRPD. Detailed im- plementation of each step is presented in the next section. Definition 1. A memory block m is called a useful cache block (UCB) at program point P , if m may be cached at P and m may be reused at program point P 0 after P that may 4. IMPLEMENTATION be reached from P without eviction of m on this path. In this section, we present the implementation of our ap- proach. We introduce our framework, discuss about the de- The number of UCB at program point P gives an upper velopment process and point out several implementation is- bound on the number of additional reloads due to a pre- sues. The source code of the presented work is available emption at P . The maximum possible preemption cost for under GNU GPL licence at http://beru.univ-brest.fr/ a task is determined by the program point with the highest svn/CHEDDAR/trunk/src/. number of UCB. In [22], the authors exploit the fact that for The work took place in the context of the Cheddar project the i-th preemption, only the i-th highest number of UCB [21]. Cheddar is an open source real-time scheduling analy- has to be considered. However, as shown in [1] and [3], a sis tool. Classical feasibility tests, scheduling algorithms and significant reduction typically only occurs at a high number scheduling simulator for real-time systems are implemented of preemptions. Thus, we only consider the program point in Cheddar. System architecture is defined with Cheddar with highest number of UCB. Architecture Description Language (Cheddar-ADL). Ched- The impact of preempting task is given by the number dar class files are automatically generated by the tool Platy- of cache blocks that the task may evict during its execu- pus [20] through a model-driven process. The Cheddar meta- tion. Busquet et al.[5] introduce the concept of evicting model defines hardware components such as: processor, core cache block (ECB ): and shared resource; and software components such as: task and task group [11]. Definition 2. A memory block of the preempting task is The development process consisted of three steps. First, called an evicting cache block (ECB), if it is accessed during Cheddar-ADL is extended to model RTES with cache mem- the execution of the preempting task. ory and cache access profile. Second, we implemented UCB The notation UCBi and ECBi are used to present the set computation by data flow analysis. Third, a cache-aware of UCBs and ECBs of a task τi . Assume that the sets of UCB scheduling simulator is implemented by extending the schedul- and ECB of each task are preliminary computed, UCB’i is ing simulator of Cheddar. In [24], we presented how Cheddar- the set of UCBs currently in the cache of the preempted ADL is extended to take into account cache memory and task. γi,j is the preemption cost (i.e. the CRPD) when a how data flow analysis in [14] is implemented. In this arti- task τj directly preempt task τi . In case of a preemption cle, we provide a summary of our models and focus more on between two tasks τi and τj , γi,j is computed by: the implementation of the cache-aware scheduling simulator and its application. γi,j = BRT · | UCB’i ∩ ECBj | (1) 4.1 Cache Model and Cache Access Profile However, in case of nested preemptions, a task τj can pre- To support cache aware scheduling analysis, the Cheddar empt more than one task. Thus, computation of CRPD meta-model has been extended with the entities below: must take all preempted tasks into account. 1 ENTITY Generic Cache 3. OUR APPROACH 2 cache size : Natural; 3 line size : Natural; Our approach consists of three steps as shown in Fig. 1. 4 associativity : Natural; First, we modeled a RTES with components required to ap- 5 miss time : Natural; ply analysis methods stated in Section 2. Those components 6 cache category : Cache Type; include hardware and software parts. Hardware components 7 ENTITY Generic Task are processor, core and cache memory. Software components 8 task type : Natural; 9 capacity : Natural; are tasks and control flow graphs of tasks. 10 deadline : Natural; Second, from this model, we applied data flow analysis 11 priority : Priority Range; presented in [14] in order to compute the set of UCB and 12 cache access profile name : String; 13 cfg name : String; τi in the cache. It is computed from a system model at 14 ENTITY Cache Access Profile scheduling simulation time. The function Remove() at line 15 UCBs : Cache Blocks Table; 8 is used to remove an element from a set. 16 ECBs : Cache Blocks Table; 17 ENTITY CFG 1 event SCHED START 18 nodes : CFG Nodes Table; 2 for each task τi loop 19 edges : CFG Edges Table; 3 τi .cUCB ← τi .U CB 20 ENTITY Basic Block SUBTYPE OF (CFG Node); 4 end loop 21 instruction offset : INTEGER; 5 event PREEMPTION 22 instruction capacity : INTEGER; 6 τj ← preempting task 7 for each task τi preempted loop Cache memory is modeled by classical attributes such as 8 τi .cUCB ← Remove(τi .cUCB, τj .ECB) cache size, line size, associativity and block reloading time 9 end loop or miss time. From this model of cache and control flow 10 event RUNNING TASK 11 τi ← executing task graph of a task, we can compute the memory layout, the 12 CRPD ← (τi .U CB − τi .cUCB) ∗ Miss Time memory-to-cache mapping and the set of UCB and ECB of 13 τi .cUCB ← τi .U CB a task. A task is modeled by an entity with classical attributes At the start of the scheduling simulation, a SCHED START such as capacity, deadline, priority and of f sets. The task type event is raised. The WCET of a task is assumed to in- specifies the type of a task, such as periodic, aperiodic, clude the block reloading time when a task is executed non- sporadic or poisson [11]. preemptively. So, on event SCHED START, the set of UCBs Two attributes, UCBs and ECBs are added to handle of a task is assumed to be filled. caches. They are the set of UCBs and ECBs of a task, re- When a preemption occurs, a PREEMPTION event is spectively. The model is designed according to two assump- raised and the simulator computes the evicted UCBs of pre- tions: first, a task is assumed to have a fixed set of UCB and empted tasks using the ECBs of preempting task. The ECB. UCB is computed by the program point with highest scheduler keeps track of the number of UCBs for each task. number of UCB ; second, any partial execution of a task When a task executes, a RUNNING TASK event is raised. uses all its ECBs and UCBs ; thus, CRPD achieved from The scheduler firstly checks if all the UCBs of this task are scheduling simulation is always upper-bounded. loaded into the cache. If so, the task continues its execu- In order to compute the set of UCB and ECB of a task, we tion. If not, the task reloads the evicted UCBs. The CRPD need its control flow graph (which is modeled by the CFG is added to the remaining capacity of the task itself. In our EXPRESS entity). Each node in the CFG stores informa- implementation, CRPD is not added to the capacity of pre- tion about the capacity and the location in main memory empted tasks at the preemption point but at the instant at of its assembly instruction. At the moment, our analysis which those tasks resume execution. method takes into account direct mapped instruction cache because instruction access pattern is simpler to be computed 4.4 Implementation Issues from the CFG of a program. Several issues were raised when designing and implement- ing the simulator. Most of them were raised because we need 4.2 Computing Cache Access Profile to mix timing specifications of different orders of magnitude. UCB analysis is implemented following the work in [14]. Others are related to tools interoperability. The input of the algorithm is an annotated CFG of a pro- In practice, cache block reload time is significantly smaller gram. From the input CFG, our tool analyzes and computes than period or capacity of a task. In Cheddar, we do not the set of UCB and ECB. prescribe 1 unit of time is equivalent to 1 ms or 1 µs, which are the unit of task period and block reload time. Mixing 4.3 The Cache-Aware Scheduling Simulator timing specifications of different orders of magnitude makes The scheduling simulation in Cheddar works as follows. complex the computation of the feasibility interval. We re- First, a system architecture model, including hardware/- call that a feasibility interval is an interval for which testing software components, is loaded. Then, the scheduling is of task feasibility is needed [10]. The scheduling simula- computed by three successive steps: computing priority, in- tion interval needed to verify the schedulability of a task set serting ready task into queues and electing task. The elected could be significantly large if a µs is chosen as a time unit. task will receive the processor for the next unit of time. A solution in practice is to design a system with harmonic The scheduling simulator records different events raised task set in order to reduce the feasibility interval; however, during the simulation, such as task releases, task comple- it is clearly not always possible. In addition, instead of us- tions and shared resources lockings or unlockings. The re- ing 1 µs, we use the cache block reload time as a base value sult of the scheduling analysis is the set of events produced for 1 unit of time as in our experiment. Furthermore, a at simulation time. long scheduling simulation interval also raises issues regard- We extended the scheduling simulator of Cheddar as fol- ing performance and scalability. Even with harmonic task lows. First, we extended the set of events Cheddar can set, the tool must be able to perform scheduling simulation produce. For example, an event PREEMPTION, which is in a large interval to overcome the different between cache raised when a preemption occurs, is added. Second, event block reload time and task period, which may be CPU and RUNNING TASK, which is raised when a task executes, is memory expensive. As Cheddar stores scheduling simula- extended with the assumption about CRPD that any partial tion results into XML files, it can also be I/O intensive. To execution of a task uses all its ECBs and UCBs. reduce memory and I/O overhead, we selected a subset of The pseudo code of the event handler is written below. events the simulator has to handle and store. The notation τi .cUCB represents the set of UCBs of task A second issue we are facing is about tool interoperability. The input data of the CRPD analysis in our tool is designed 300 100 90 to be compatible with data provided by a WCET analysis 250 80 Number of Preemption Total CRPD (ms) 70 tool. We also support import data in XML format. At 200 60 the moment, we do not enforce tool interoperability and we 150 50 40 expect to investigate WCET tools in order to overcome this 100 30 issue. 50 20 10 0 0 5. EXPERIMENT AND DISCUSSION 50 55 60 65 70 75 80 Processor Utilization (%) 85 90 95 In this section, we show that our tool can handle param- RM (#) EDF (#) PA*(#) eters compliant with the existing works in [14], [5], [2] in RM (CRPD) EDF (CRPD) PA*(CRPD) the first experiment. In addition, we discuss about the de- pendency between CRPD and scheduling parameters. Fur- Figure 2: Varying PU, RF=0.3 thermore, we point out that our tool can run CRPD opti- mization techniques by taking an example of memory layout to 80%. After this point, there is a downward trend in the optimization by simulated annealing following the work of preemption cost and in the number of preemptions of EDF [17] in the second experiment. We also provide performance while there is an upward trend in those data of RM and and scalability tests of the tool in the third experiment. PA*. Observed from the scheduler, when PU is larger than Task period and cache utilization generation of our ex- 80%, many task sets are not schedulable. periments is based on the existing work in [1]. Task sets In conclusion, first, when PU increases, the total num- are generated with the following configuration. Task peri- ber of preemption and CRPD also increase. However, the ods are uniformly generated from 5 ms to 500 ms, as found change is not linear. Second, a priority assignment algo- in most automotive and aerospace hard real-time applica- rithm with less number of preemption tends to give lower tions [1]. Generated task sets are harmonic in order to have total CRPD. EDF and PA* generate less preemption and a low feasibility interval and scheduling simulation period. CRPD than RM. In fact, to enforce the fixed priority order, Task deadlines are implicit, i.e. ∀i : Di = Ti . Processor uti- the number of preemptions that typically occur under RM lization values (PU) are generated using the UUniFast algo- is higher than under EDF [6]. From this experiment, we see rithm [4]. Task execution times are set based on the proces- that CRPD depends on the chosen priority assignment or sor utilizations and the generated periods: ∀i : Ci = Ui · Ti , scheduler. where Ui is the processor utilization of task i. Task offsets In addition, this experiment shows that both scheduling are uniformly distributed from 1 to 30 ms. analysis and CRPD analysis should be performed jointly. Cache memory and cache access profile of tasks are gener- PA*, a priority assignment taking CRPD into account has a ated as follows. The cache is direct mapped. The number of significant lower total CRPD. The decrease in total CRPD cache blocks is equal to 256. The block-reload time is 8 µs. of PA* with RM and EDF is roughly 30ms on a scheduling The cache usage of each task is determined by the number interval of 1000ms. However, comparing to RM and DM, of ECBs. They are generated using UUniFast algorithm for feasibility constraints of tasks are not satisfied with PA*, a total cache utilization (CU) of 5. UUniFast may produce only total CRPD is reduced. In [25], we proposed a priority values larger than 1 which means a task fills the whole cache. assignment heuristic to take into account both feasibility ECBs of each tasks are consecutively arranged from a cache constraints and CRPD. set. For each task, the UCBs are generated according to a uniform distribution ranging from 0 to the number of ECB 5.2 CRPD with Memory Layout Optimization multiplied by a reuse factor (RF). If the set of ECBs gen- by Simulated Annealing erated exceeds the number of cache sets, the set of ECBs is The objective of this experiment is to show that users can limited to the number of cache sets. For the generation of use CRPD optimization approaches with our tool. We apply the UCBs, the original set of ECBs is used. memory layout optimization by simulated annealing (SA) based on the work of [17] with our generated task sets. In our 5.1 CRPD with Priority Assignment and Pro- experiment, the objective of SA is to lower the total CRPD cessor Utilization (PU) after a scheduling simulation over a scheduling interval of In this experiment, we present CRPD analysis with dif- 1000 ms. ferent priority assignments or scheduling algorithms. In ad- For each iteration of SA, we perform a swap in memory dition, we discuss about the impact of changing priority as- layout between two random tasks. Changes are made to signment/scheduling algorithm and increasing PU to CRPD. the layout of tasks in memory, and then mapped to their PU is varied from 0.5 to 0.95 in steps of 0.05. RF is fixed cache layout for evaluation. The total CRPD is computed at 0.3. For each value of PU, we performed scheduling sim- by scheduling simulation. The optimum layout is the layout ulations with 100 task set and computed the average num- which has the lowest total CRPD. Initial temperature of SA ber of preemptions and average total CRPD in a scheduling is 1.0, and after every iteration, it is reduced by multiplying interval of 1000 ms. Experiments are conducted with two it by a cooling rate of 0.5 until it reaches the target tem- priority assignment algorithms: Rate Monotonic (RM) and perature of 0.2. Number of iteration for each temperature one named PA*, which assigns the highest priority level to is 10. the task with the largest set of UCB. In addition, we also The result of this experiment is Fig. 3. From the graph, take into account Earliest Deadline First (EDF) scheduler. we can see the impact of memory layout optimization to to- The result of this experiment is Fig. 2. As the graph tal CRPD. We can reduce roughly 20-30 ms of total CRPD. illustrates, the number of preemptions and the preemption To sum up, this experiment shows that our tool allows users cost increases steadily from the processor utilization of 50% to perform a specific optimization of CRPD. 70 tools mainly designed for evaluating and comparing real- 60 time scheduling algorithm for multiprocessor architectures. Total CRPD (ms) SymTA/S[13] and RealTime-at-Work 1 are model-based 50 40 scheduling analysis tools targeting automotive industry. The 30 hardware components supported in those tools are specific 20 to their domains (ECU, CAN and AFDX Networks). To the best of our knowledge, the support for cache mem- 10 ory does not exists in the tools above. 0 50 55 60 65 70 75 80 85 90 95 SimSo[8] is a scheduling simulation tool. It supports cache Processor Utilization (%) sharing on multi-processor systems. It takes into account RM RM-SA impact of the caches through statistical models and also the direct overheads such as context switches and scheduling Figure 3: Total CRPD with and without memory decisions. The memory behavior of a program is modeled layout optimization based on Stack Distance Profiles (SDP) - the distribution 16 of the stack distances for all the memory accesses of a task, 14 where a stack distance is by definition the number of unique cache lines accessed between two consecutive accesses to a Computation Time (s) 12 10 same line [18]. The difference between SDP and our model 8 is that SDP is achieved by on-line monitored counters such 6 as valgrind [19] while UCB and ECB are achieved by an off- 4 line WCET analysis tools as below. At the moment, there 2 is no archived comparison between the two. UCB analysis 0 with scheduling simulation could be more pessimistic but 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100 Simulation Interval (x10000) safer because it takes into account the program point with Average Computation Time Max Computation Time largest number of UCB. Several WCET analysis tools allow designers to perform Figure 4: Computation time of the simulator cache analysis. SymTA/P[23], HEPTANE[9], Chronos[16] and aiT2 are examples of them. UCB computation of a 5.3 Performance/Scalability Analysis program is supported by aiT. The analysis of those tools are The objective of this experiment is to test the performance based on program path analysis of the control flow graph of and the scalability of our tool when scheduling simulation the program. It is compliant with the requirement of our interval increases. In general, there are four factors affecting proposed tool. the performance of a scheduling simulator: (1) the number In conclusion, WCET tools focus on the evaluation of pro- of tasks, (2) the scheduling simulation interval, (3) the cache gram’s control flow graph to compute the WCET and also a size and (4) the number of events. The first three factors few tools can compute cache access profile. The analysis re- depend on a chosen RTES. The number of events depends on sult could be used as an input for a scheduling analysis tool. characteristics of the RTES; for example, a higher processor In the domain of real-time scheduling analysis, the support utilization means a higher number of preemption events. In for cache and CRPD when performing scheduling analysis this experiment, we choose to test a system model of 10 tasks is not very well specified. As far as we know, only SimSo and 256 cache blocks. Processor utilization is set to 70 %. clearly supports scheduling simulation with cache analysis Scheduling simulation is ranging from 100,000 to 1,000,000 based on SDP. Then, we proposed a tool available to the units of time where 1 unit = 8 µs. community which can either compute cache access profile Fig. 4 displays results of our experiment on a PC with of a task from its control flow graph or re-use informa- Intel Core i5-3360 CPU, running Ubuntu 12.04. For each tion obtained from a WCET/cache analysis tool to perform simulation interval, 100 task sets are generated. We perform scheduling analysis. Our model is compliant with existing scheduling simulation and compute the max and average work in [14], [5], [2] and [17]. In addition, because Cheddar computation time. provides a large set of scheduling analysis methods, we can As we can see, while maximum computation time increases fully investigate the dependency between CRPD and other slightly when simulation interval increases, average compu- scheduling parameters in order to either adjust or optimize tation time only fluctuates around 6 seconds. This shows a RTES design to meet its timing constraints. that the tool is scalable when simulation interval is high. 7. CONCLUSIONS 6. RELATED WORKS In the article, we presented an approach to implement a In this section, we present several real-time scheduling cache-aware scheduling simulator. The work was proceeded analysis and WCET analysis tools. in the context of the Cheddar real-time scheduling ana- MAST[12] is a modeling and analysis suite for real-time lyzer, which is open-source, freely available to researchers applications. The hardware component abstraction of MAST and practitioners that want to investigate scheduling anal- model is generic and it includes processing resources and ysis of RTES with cache memory. Our solution consists of shared resources. The shared resource component is not three parts: modeling cache memory and cache access pro- supposed to model a cache memory unit. However, MAST file, implementing several cache analysis methods and per- considers the overhead parameters of the components that 1 may be used to model CRPD. RealTime-at-Work, http://www.realtimeatwork.com/ 2 STORM[26] and YARTISS [7] are scheduling simulation AbsInt Inc., http://www.absint.com/ forming scheduling simulation. We extended Cheddar to be Modeling and analysis suite for real time applications. able to deal with cache memory and illustrated the depen- In Real-Time Systems, 13th Euromicro Conference on, dency between cache and other scheduling parameters. 2001., pages 125–134. IEEE, 2001. There are open problems we are aiming to address in [13] R. Henia, A. Hamann, M. Jersak, R. Racu, K. Richter, the future. The first one concerns the refinement of other and R. Ernst. System level performance analysis–the CRPD analysis methods. Several improvements have been symta/s approach. IEE Proceedings-Computers and proposed in [2] to reduce the upper-bound of the CRPD. Digital Techniques, 152(2):148–166, 2005. In addition, we plan to compare our approach with the ap- [14] C.-G. Lee, H. Hahn, Y.-M. Seo, S. L. Min, R. Ha, proach based on Stack Distance Profile in [8]. Second, we are S. Hong, C. Y. Park, M. Lee, and C. S. Kim. Analysis going to study the problem of feasibility interval of RTES of cache-related preemption delay in fixed-priority with cache memory. preemptive scheduling. IEEE Transactions on Computers, 47(6):700–713, 1998. 8. REFERENCES [15] C. Li, C. Ding, and K. Shen. Quantifying the cost of [1] S. Altmeyer, R. I. Davis, and C. Maiza. Improved context switch. In Proceedings of the 2007 workshop cache related pre-emption delay aware response time on Experimental computer science. ACM, 2007. analysis for fixed priority pre-emptive systems. [16] X. Li, Y. Liang, T. Mitra, and A. Roychoudhury. Real-Time Systems, 48(5):499–526, 2012. Chronos: A timing analyzer for embedded software. [2] S. Altmeyer and C. Maiza Burguière. Cache-related Science of Computer Programming, 69(1):56–67, 2007. preemption delay via useful cache blocks: Survey and [17] W. Lunniss, S. Altmeyer, and R. I. Davis. Optimising redefinition. Journal of Systems Architecture, task layout to increase schedulability via reduced 57(7):707–719, 2011. cache related pre-emption delays. In Proceedings of the [3] M. Bertogna, O. Xhani, M. Marinoni, F. Esposito, 20th International Conference on Real-Time and and G. Buttazzo. Optimal selection of preemption Network Systems, pages 161–170. ACM, 2012. points to minimize preemption overhead. In Real-Time [18] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Systems (ECRTS), 2011 23rd Euromicro Conference Traiger. Evaluation techniques for storage hierarchies. on, pages 217–227. IEEE, 2011. IBM Systems journal, 9(2):78–117, 1970. [4] E. Bini and G. C. Buttazzo. Measuring the [19] N. Nethercote and J. Seward. Valgrind: a framework performance of schedulability tests. Real-Time for heavyweight dynamic binary instrumentation. In Systems, 30(1-2):129–154, 2005. ACM Sigplan notices, volume 42, pages 89–100, 2007. [5] J. V. Busquets-Mataix, J. J. Serrano, R. Ors, P. Gil, [20] A. Plantec and F. Singhoff. Refactoring of an ada 95 and A. Wellings. Adding instruction cache effect to library with a meta case tool. In ACM SIGAda Ada schedulability analysis of preemptive real-time Letters, volume 26, pages 61–70. ACM, 2006. systems. In Proceedings of the 2nd IEEE Real-Time [21] F. Singhoff, J. Legrand, L. Nana, and L. Marcé. Technology and Applications Symposium (RTAS), Cheddar: a flexible real time scheduling framework. pages 204–212, 1996. ACM SIGAda Ada Letters, 24(4):1–8, 2004. [6] G. C. Buttazzo. Rate monotonic vs. edf: judgment [22] J. Staschulat, S. Schliecker, and R. Ernst. Scheduling day. Real-Time Systems, 29(1):5–26, 2005. analysis of real-time systems with precise modeling of [7] Y. Chandarli, F. Fauberteau, D. Masson, S. Midonnet, cache related preemption delay. In Proceedings of the M. Qamhieh, et al. Yartiss: A tool to visualize, test, 17th Euromicro Conference on Real-Time Systems compare and evaluate real-time scheduling algorithms. (ECRTS), pages 41–48, 2005. In Proceedings of the 3rd International Workshop on [23] J. Staschulat, S. Schliecker, and R. Ernst. Scheduling Analysis Tools and Methodologies for Embedded and analysis of real-time systems with precise modeling of Real-time Systems, pages 21–26, 2012. cache related preemption delay. In Euromicro [8] M. Chéramy, A.-M. Déplanche, P.-E. Hladik, et al. Conference on Real-Time Systems (ECRTS), Palma Simulation of real-time multiprocessor scheduling with de Mallorca, Spain, 2005. overheads. In International Conference on Simulation [24] H. N. Tran, F. Singhoff, S. Rubini, and J. Boukhobza. and Modeling Methodologies, Technologies and Instruction cache in hard real-time systems: modeling Applications (SIMULTECH), 2013. and integration in scheduling analysis tools with [9] A. Colin and I. Puaut. Worst-case timing analysis of AADL. In Proceedings of the 12th IEEE International the rtems real-time operating system. Rapport NâŮe, Conference on Embedded and Ubiquitous Computing, PI1277, IRISA, France, 1999. Milan, Italy, 2014. [10] L. Cucu and J. Goossens. Feasibility intervals for [25] H. N. Tran, F. Singhoff, S. Rubini, and J. Boukhobza. fixed-priority real-time scheduling on uniform Addressing cache related preemption delay in fixed multiprocessors. In IEEE Conference on Emerging priority assignment. In Proceedings of the 20th IEEE Technologies and Factory Automation (ETFA), pages International Conference on Emerging Technologies 397–404, 2006. and Factory Automation (ETFA), Luxembourg, 2015. [11] C. Fotsing, F. Singhoff, A. Plantec, V. Gaudel, [26] R. Urunuela, A. Deplanche, and Y. Trinquet. Storm, a S. Rubini, S. Li, H. N. Tran, L. Lemarchand, simulation tool for real-time multiprocessor scheduling P. Dissaux, and J. Legrand. Cheddar architecture evaluation. In IEEE Conference on Emerging description language. 2014. Technologies and Factory Automation (ETFA), pages [12] M. González Harbour, J. Gutiérrez Garcı́a, 1–8, 2010. J. Palencia Gutiérrez, and J. Drake Moyano. Mast: