=Paper=
{{Paper
|id=Vol-1464/ewili15_10
|storemode=property
|title=Cache-Aware Real-Time Scheduling Simulator: Implementation and Return of Experience
|pdfUrl=https://ceur-ws.org/Vol-1464/ewili15_10.pdf
|volume=Vol-1464
|dblpUrl=https://dblp.org/rec/conf/ewili/TranSRB15
}}
==Cache-Aware Real-Time Scheduling Simulator: Implementation and Return of Experience==
<pdf width="1500px">https://ceur-ws.org/Vol-1464/ewili15_10.pdf</pdf>
<pre>
              Cache-Aware Real-Time Scheduling Simulator:
                Implementation and Return of Experience

                       Hai Nam Tran, Frank Singhoff, Stéphane Rubini, Jalil Boukhobza
                        Univ. Bretagne Occidentale, UMR 6285, Lab-STICC, F-29200 Brest, France
                              {hai-nam.tran,singhoff,rubini,boukhobza}@univ-brest.fr


ABSTRACT                                                         by the preemption. In [15], the authors showed that the
Evaluating cache related preemption delay (CRPD) in pre-         cost of context switching raises from the range of 4.2µs to
emptive scheduling context for Real-Time Embedded Sys-           8.7 µs to the range of 38.6µs to 203.2µs when the size of the
tem (RTES) stays an open issue despite of its practical im-      data sets of programs is larger than the cache. Thus, tak-
portance. Indeed, various parameters should be taken into        ing CRPD into account may be important when performing
account such as cache utilization of tasks, memory layout        scheduling analysis of a RTES.
of tasks, processor utilization and priority assignment algo-       Problem statement: Scheduling analysis of RTES with
rithm. In state-of-the-art work, dependencies amongst those      cache memory in fixed priority preemptive scheduling con-
parameters are not investigated with precision because of the    text is complex because there are many parameters affecting
lack of scheduling analysis tool taking them into account.       the outcome and dependencies amongst them. For example,
In this article, we propose a tool to investigate and eval-      CRPD analysis, which is the process of evaluating the im-
uate scheduling analysis of RTES with cache memory and           pact of CRPD on a RTES, cannot be only based on single-
various scheduling parameters. Both modeling guidelines          task analysis as it depends on correlations amongst tasks.
and implementation is detailed. Implementation is made in        For a set of tasks, CRPD analysis has to take into account
Cheddar, an open-source scheduling analyzer, which is freely     parameters such as cache utilizations of tasks, memory lay-
available; Experiments are conducted in order to illustrate      out of tasks, processor utilization but also priority assign-
performance and applicability of our tool. Furthermore, we       ment algorithms and WCET tasks.
discuss about implementation issues, problems raised and            To address all parameters, many tools are involved, in
lessons learned from those experiments.                          two domains: WCET/Cache analysis and scheduling analy-
                                                                 sis. Each of those domains is only dedicated to a sub-part
                                                                 of those parameters. Unfortunately, there are no tools ad-
1.    INTRODUCTION                                               dressing the whole problem in state-of-the-art work.
   Cache memory is a crucial hardware component used to             Contributions : We propose a tool to investigate and eval-
reduce the performance gap between processors and main           uate scheduling analysis of RTES with cache memory and
memories. In the context of real-time embedded system            various scheduling parameters. The work is implemented in
(RTES), the popularization of processors with large size and     Cheddar, an open-source scheduling analyzer, which is freely
multi level caches motivates the proposition of verification     available to researchers and practitioners. We propose an
methods handling this hardware component [14], [5], [2].         approach to use the analysis models and results of WCET/-
   Scheduling simulation is a classical verification method      cache analysis tools into a scheduling analysis tool. The
performed at design step of RTES. It provides means to in-       programming model we used is compliant with the existing
vestigate that all timing constraints of a RTES are satisfied.   one in [14], [5], [2] and [17]. Experiments are performed to
To perform scheduling simulation, several assumptions are        illustrate performance and applicability of our tool.
usually made in order to simplify system modeling. One of           The rest of the article is organized as follows. Section 2
them is that preemption costs are negligible. Preemption         presents background of our work. In Section 3, we give an
cost is the additional time to process interrupts, manipulate    overview of our approach. In Section 4, we present devel-
task queues and actually perform context switches.               opment process and detailed information about the imple-
   Integrating cache memory in RTES generally enhances the       mentation of our work. In Section 5, an evaluation of our
whole performance in term of execution time, but unfortu-        proposed scheduling simulator is given. Section 6 discusses
nately it can lead to an increase in preemption cost and         related works and Section 7 concludes the article.
execution time variability [15]. When a task is preempted,
memory blocks belonging to the task could be removed from        2.   BACKGROUND
the cache. Once the task resumes, previously removed mem-
                                                                    In this section, we introduce the system model and we
ory blocks have to be reloaded into the cache. Thus, a new
                                                                 explain how preemption cost and CRPD are computed. We
preemption cost named Cache Related Preemption Delay
                                                                 assume an uniprocessor RTES with direct-mapped instruc-
(CRPD) is introduced. By definition, CRPD is the addi-
                                                                 tion cache. As far as we know, instruction cache is popular in
tional time to refill the cache with memory blocks evicted
                                                                 practical implementation of RTES. The assumption about
EWiLi’15, October 8th, 2015, Amsterdam, The Netherlands.         direct-mapped is used to simplify the data flow analysis,
Copyright retained by the authors.                               it could be easily relaxed. There are n independent tasks,
                                                     3. Scheduling
                                                                           ECB of each task, which is called cache access profile in the
     RTES      1. Modeling
                                                      Simulation           sequel. A detailed description of these first and second steps
                                                                           could be found in [24], which is a preliminary work toward
                                                                           cache integration.
              System Model                            Cache Access
              (Cheddar-ADL)
                                2. Data Flow                                  Third, system model with computed cache access profiles
                                  Analysis          Profile (UCB, ECB)
                                                                           are loaded into the scheduling simulator. Scheduling simula-
                      : Input            : Output                          tion is done and provides various outcomes such as feasibility
                                                                           of the system, worst case response time of tasks, number of
                                                                           preemption, CRPD per task and total CRPD.
                    Figure 1: Approach
                                                                              In our work, the cache utilization of a program at step 1
τ1 , τ2 , ..., τn scheduled by a preemptive scheduler. CPRD is             is modeled at both low level, which is the control flow graph
bounded by g ·BRT , where g is an upper bound on the num-                  of program produced by a WCET analysis tool, and at high
ber of cache block reloaded due to preemption, and BRT is                  level, which is a pre-computed set of UCB and ECB. We
an upper-bound on the time necessary to reload a memory                    expect the user to re-use results produced by a WCET anal-
block in the cache (block reload time). To analyze the effect              ysis tool within Cheddar. We can extract all information
of preemption on a preempted task, Lee et al.[14] introduced               related to CRPD from the scheduling simulator as written
the concept of useful cache block (UCB ):                                  above to fully investigate the impact of CRPD. Detailed im-
                                                                           plementation of each step is presented in the next section.
  Definition 1. A memory block m is called a useful cache
block (UCB) at program point P , if m may be cached at P
and m may be reused at program point P 0 after P that may                  4.     IMPLEMENTATION
be reached from P without eviction of m on this path.                         In this section, we present the implementation of our ap-
                                                                           proach. We introduce our framework, discuss about the de-
The number of UCB at program point P gives an upper                        velopment process and point out several implementation is-
bound on the number of additional reloads due to a pre-                    sues. The source code of the presented work is available
emption at P . The maximum possible preemption cost for                    under GNU GPL licence at http://beru.univ-brest.fr/
a task is determined by the program point with the highest                 svn/CHEDDAR/trunk/src/.
number of UCB. In [22], the authors exploit the fact that for                 The work took place in the context of the Cheddar project
the i-th preemption, only the i-th highest number of UCB                   [21]. Cheddar is an open source real-time scheduling analy-
has to be considered. However, as shown in [1] and [3], a                  sis tool. Classical feasibility tests, scheduling algorithms and
significant reduction typically only occurs at a high number               scheduling simulator for real-time systems are implemented
of preemptions. Thus, we only consider the program point                   in Cheddar. System architecture is defined with Cheddar
with highest number of UCB.                                                Architecture Description Language (Cheddar-ADL). Ched-
   The impact of preempting task is given by the number                    dar class files are automatically generated by the tool Platy-
of cache blocks that the task may evict during its execu-                  pus [20] through a model-driven process. The Cheddar meta-
tion. Busquet et al.[5] introduce the concept of evicting                  model defines hardware components such as: processor, core
cache block (ECB ):                                                        and shared resource; and software components such as: task
                                                                           and task group [11].
  Definition 2. A memory block of the preempting task is
                                                                              The development process consisted of three steps. First,
called an evicting cache block (ECB), if it is accessed during
                                                                           Cheddar-ADL is extended to model RTES with cache mem-
the execution of the preempting task.
                                                                           ory and cache access profile. Second, we implemented UCB
   The notation UCBi and ECBi are used to present the set                  computation by data flow analysis. Third, a cache-aware
of UCBs and ECBs of a task τi . Assume that the sets of UCB                scheduling simulator is implemented by extending the schedul-
and ECB of each task are preliminary computed, UCB’i is                    ing simulator of Cheddar. In [24], we presented how Cheddar-
the set of UCBs currently in the cache of the preempted                    ADL is extended to take into account cache memory and
task. γi,j is the preemption cost (i.e. the CRPD) when a                   how data flow analysis in [14] is implemented. In this arti-
task τj directly preempt task τi . In case of a preemption                 cle, we provide a summary of our models and focus more on
between two tasks τi and τj , γi,j is computed by:                         the implementation of the cache-aware scheduling simulator
                                                                           and its application.
              γi,j = BRT · | UCB’i ∩ ECBj |                          (1)
                                                                           4.1     Cache Model and Cache Access Profile
However, in case of nested preemptions, a task τj can pre-
                                                                            To support cache aware scheduling analysis, the Cheddar
empt more than one task. Thus, computation of CRPD
                                                                           meta-model has been extended with the entities below:
must take all preempted tasks into account.
                                                                           1     ENTITY Generic Cache
3.    OUR APPROACH                                                         2       cache size : Natural;
                                                                           3       line size : Natural;
  Our approach consists of three steps as shown in Fig. 1.                 4       associativity : Natural;
First, we modeled a RTES with components required to ap-                   5       miss time : Natural;
ply analysis methods stated in Section 2. Those components                 6       cache category : Cache Type;
include hardware and software parts. Hardware components                   7     ENTITY Generic Task
are processor, core and cache memory. Software components                  8       task type : Natural;
                                                                           9       capacity : Natural;
are tasks and control flow graphs of tasks.                                10      deadline : Natural;
  Second, from this model, we applied data flow analysis                   11      priority : Priority Range;
presented in [14] in order to compute the set of UCB and                   12      cache access profile name : String;
13      cfg name : String;                                          τi in the cache. It is computed from a system model at
14    ENTITY Cache Access Profile                                   scheduling simulation time. The function Remove() at line
15      UCBs : Cache Blocks Table;                                  8 is used to remove an element from a set.
16      ECBs : Cache Blocks Table;
17    ENTITY CFG                                                    1     event SCHED START
18      nodes : CFG Nodes Table;                                    2         for each task τi loop
19      edges : CFG Edges Table;                                    3              τi .cUCB ← τi .U CB
20    ENTITY Basic Block SUBTYPE OF (CFG Node);                     4         end loop
21      instruction offset : INTEGER;                               5     event PREEMPTION
22      instruction capacity : INTEGER;                             6         τj ← preempting task
                                                                    7         for each task τi preempted loop
   Cache memory is modeled by classical attributes such as          8              τi .cUCB ← Remove(τi .cUCB, τj .ECB)
cache size, line size, associativity and block reloading time       9         end loop
or miss time. From this model of cache and control flow             10    event RUNNING TASK
                                                                    11        τi ← executing task
graph of a task, we can compute the memory layout, the
                                                                    12        CRPD ← (τi .U CB − τi .cUCB) ∗ Miss Time
memory-to-cache mapping and the set of UCB and ECB of               13        τi .cUCB ← τi .U CB
a task.
   A task is modeled by an entity with classical attributes            At the start of the scheduling simulation, a SCHED START
such as capacity, deadline, priority and of f sets. The task type   event is raised. The WCET of a task is assumed to in-
specifies the type of a task, such as periodic, aperiodic,          clude the block reloading time when a task is executed non-
sporadic or poisson [11].                                           preemptively. So, on event SCHED START, the set of UCBs
   Two attributes, UCBs and ECBs are added to handle                of a task is assumed to be filled.
caches. They are the set of UCBs and ECBs of a task, re-               When a preemption occurs, a PREEMPTION event is
spectively. The model is designed according to two assump-          raised and the simulator computes the evicted UCBs of pre-
tions: first, a task is assumed to have a fixed set of UCB and      empted tasks using the ECBs of preempting task. The
ECB. UCB is computed by the program point with highest              scheduler keeps track of the number of UCBs for each task.
number of UCB ; second, any partial execution of a task                When a task executes, a RUNNING TASK event is raised.
uses all its ECBs and UCBs ; thus, CRPD achieved from               The scheduler firstly checks if all the UCBs of this task are
scheduling simulation is always upper-bounded.                      loaded into the cache. If so, the task continues its execu-
   In order to compute the set of UCB and ECB of a task, we         tion. If not, the task reloads the evicted UCBs. The CRPD
need its control flow graph (which is modeled by the CFG            is added to the remaining capacity of the task itself. In our
EXPRESS entity). Each node in the CFG stores informa-               implementation, CRPD is not added to the capacity of pre-
tion about the capacity and the location in main memory             empted tasks at the preemption point but at the instant at
of its assembly instruction. At the moment, our analysis            which those tasks resume execution.
method takes into account direct mapped instruction cache
because instruction access pattern is simpler to be computed        4.4     Implementation Issues
from the CFG of a program.                                             Several issues were raised when designing and implement-
                                                                    ing the simulator. Most of them were raised because we need
4.2    Computing Cache Access Profile                               to mix timing specifications of different orders of magnitude.
  UCB analysis is implemented following the work in [14].           Others are related to tools interoperability.
The input of the algorithm is an annotated CFG of a pro-               In practice, cache block reload time is significantly smaller
gram. From the input CFG, our tool analyzes and computes            than period or capacity of a task. In Cheddar, we do not
the set of UCB and ECB.                                             prescribe 1 unit of time is equivalent to 1 ms or 1 µs, which
                                                                    are the unit of task period and block reload time. Mixing
4.3    The Cache-Aware Scheduling Simulator                         timing specifications of different orders of magnitude makes
   The scheduling simulation in Cheddar works as follows.           complex the computation of the feasibility interval. We re-
First, a system architecture model, including hardware/-            call that a feasibility interval is an interval for which testing
software components, is loaded. Then, the scheduling is             of task feasibility is needed [10]. The scheduling simula-
computed by three successive steps: computing priority, in-         tion interval needed to verify the schedulability of a task set
serting ready task into queues and electing task. The elected       could be significantly large if a µs is chosen as a time unit.
task will receive the processor for the next unit of time.          A solution in practice is to design a system with harmonic
   The scheduling simulator records different events raised         task set in order to reduce the feasibility interval; however,
during the simulation, such as task releases, task comple-          it is clearly not always possible. In addition, instead of us-
tions and shared resources lockings or unlockings. The re-          ing 1 µs, we use the cache block reload time as a base value
sult of the scheduling analysis is the set of events produced       for 1 unit of time as in our experiment. Furthermore, a
at simulation time.                                                 long scheduling simulation interval also raises issues regard-
   We extended the scheduling simulator of Cheddar as fol-          ing performance and scalability. Even with harmonic task
lows. First, we extended the set of events Cheddar can              set, the tool must be able to perform scheduling simulation
produce. For example, an event PREEMPTION, which is                 in a large interval to overcome the different between cache
raised when a preemption occurs, is added. Second, event            block reload time and task period, which may be CPU and
RUNNING TASK, which is raised when a task executes, is              memory expensive. As Cheddar stores scheduling simula-
extended with the assumption about CRPD that any partial            tion results into XML files, it can also be I/O intensive. To
execution of a task uses all its ECBs and UCBs.                     reduce memory and I/O overhead, we selected a subset of
   The pseudo code of the event handler is written below.           events the simulator has to handle and store.
The notation τi .cUCB represents the set of UCBs of task               A second issue we are facing is about tool interoperability.
The input data of the CRPD analysis in our tool is designed                                 300                                                                         100
                                                                                                                                                                        90
to be compatible with data provided by a WCET analysis                                      250                                                                         80


                                                                     Number of Preemption


                                                                                                                                                                              Total CRPD (ms)
                                                                                                                                                                        70
tool. We also support import data in XML format. At                                         200
                                                                                                                                                                        60
the moment, we do not enforce tool interoperability and we                                  150                                                                         50
                                                                                                                                                                        40
expect to investigate WCET tools in order to overcome this                                  100                                                                         30
issue.                                                                                       50                                                                         20
                                                                                                                                                                        10
                                                                                              0                                                                         0

5.    EXPERIMENT AND DISCUSSION                                                                    50    55    60      65     70       75       80
                                                                                                                        Processor Utilization (%)
                                                                                                                                                       85     90   95


   In this section, we show that our tool can handle param-                                                   RM (#)            EDF (#)              PA*(#)
eters compliant with the existing works in [14], [5], [2] in                                                  RM (CRPD)         EDF (CRPD)           PA*(CRPD)

the first experiment. In addition, we discuss about the de-
pendency between CRPD and scheduling parameters. Fur-                                                   Figure 2: Varying PU, RF=0.3
thermore, we point out that our tool can run CRPD opti-
mization techniques by taking an example of memory layout         to 80%. After this point, there is a downward trend in the
optimization by simulated annealing following the work of         preemption cost and in the number of preemptions of EDF
[17] in the second experiment. We also provide performance        while there is an upward trend in those data of RM and
and scalability tests of the tool in the third experiment.        PA*. Observed from the scheduler, when PU is larger than
   Task period and cache utilization generation of our ex-        80%, many task sets are not schedulable.
periments is based on the existing work in [1]. Task sets            In conclusion, first, when PU increases, the total num-
are generated with the following configuration. Task peri-        ber of preemption and CRPD also increase. However, the
ods are uniformly generated from 5 ms to 500 ms, as found         change is not linear. Second, a priority assignment algo-
in most automotive and aerospace hard real-time applica-          rithm with less number of preemption tends to give lower
tions [1]. Generated task sets are harmonic in order to have      total CRPD. EDF and PA* generate less preemption and
a low feasibility interval and scheduling simulation period.      CRPD than RM. In fact, to enforce the fixed priority order,
Task deadlines are implicit, i.e. ∀i : Di = Ti . Processor uti-   the number of preemptions that typically occur under RM
lization values (PU) are generated using the UUniFast algo-       is higher than under EDF [6]. From this experiment, we see
rithm [4]. Task execution times are set based on the proces-      that CRPD depends on the chosen priority assignment or
sor utilizations and the generated periods: ∀i : Ci = Ui · Ti ,   scheduler.
where Ui is the processor utilization of task i. Task offsets        In addition, this experiment shows that both scheduling
are uniformly distributed from 1 to 30 ms.                        analysis and CRPD analysis should be performed jointly.
   Cache memory and cache access profile of tasks are gener-      PA*, a priority assignment taking CRPD into account has a
ated as follows. The cache is direct mapped. The number of        significant lower total CRPD. The decrease in total CRPD
cache blocks is equal to 256. The block-reload time is 8 µs.      of PA* with RM and EDF is roughly 30ms on a scheduling
The cache usage of each task is determined by the number          interval of 1000ms. However, comparing to RM and DM,
of ECBs. They are generated using UUniFast algorithm for          feasibility constraints of tasks are not satisfied with PA*,
a total cache utilization (CU) of 5. UUniFast may produce         only total CRPD is reduced. In [25], we proposed a priority
values larger than 1 which means a task fills the whole cache.    assignment heuristic to take into account both feasibility
ECBs of each tasks are consecutively arranged from a cache        constraints and CRPD.
set. For each task, the UCBs are generated according to a
uniform distribution ranging from 0 to the number of ECB          5.2                             CRPD with Memory Layout Optimization
multiplied by a reuse factor (RF). If the set of ECBs gen-                                        by Simulated Annealing
erated exceeds the number of cache sets, the set of ECBs is          The objective of this experiment is to show that users can
limited to the number of cache sets. For the generation of        use CRPD optimization approaches with our tool. We apply
the UCBs, the original set of ECBs is used.                       memory layout optimization by simulated annealing (SA)
                                                                  based on the work of [17] with our generated task sets. In our
5.1    CRPD with Priority Assignment and Pro-                     experiment, the objective of SA is to lower the total CRPD
       cessor Utilization (PU)                                    after a scheduling simulation over a scheduling interval of
   In this experiment, we present CRPD analysis with dif-         1000 ms.
ferent priority assignments or scheduling algorithms. In ad-         For each iteration of SA, we perform a swap in memory
dition, we discuss about the impact of changing priority as-      layout between two random tasks. Changes are made to
signment/scheduling algorithm and increasing PU to CRPD.          the layout of tasks in memory, and then mapped to their
   PU is varied from 0.5 to 0.95 in steps of 0.05. RF is fixed    cache layout for evaluation. The total CRPD is computed
at 0.3. For each value of PU, we performed scheduling sim-        by scheduling simulation. The optimum layout is the layout
ulations with 100 task set and computed the average num-          which has the lowest total CRPD. Initial temperature of SA
ber of preemptions and average total CRPD in a scheduling         is 1.0, and after every iteration, it is reduced by multiplying
interval of 1000 ms. Experiments are conducted with two           it by a cooling rate of 0.5 until it reaches the target tem-
priority assignment algorithms: Rate Monotonic (RM) and           perature of 0.2. Number of iteration for each temperature
one named PA*, which assigns the highest priority level to        is 10.
the task with the largest set of UCB. In addition, we also           The result of this experiment is Fig. 3. From the graph,
take into account Earliest Deadline First (EDF) scheduler.        we can see the impact of memory layout optimization to to-
   The result of this experiment is Fig. 2. As the graph          tal CRPD. We can reduce roughly 20-30 ms of total CRPD.
illustrates, the number of preemptions and the preemption         To sum up, this experiment shows that our tool allows users
cost increases steadily from the processor utilization of 50%     to perform a specific optimization of CRPD.
                             70                                                                    tools mainly designed for evaluating and comparing real-
                             60                                                                    time scheduling algorithm for multiprocessor architectures.
        Total CRPD (ms)                                                                               SymTA/S[13] and RealTime-at-Work 1 are model-based
                             50

                             40                                                                    scheduling analysis tools targeting automotive industry. The
                             30                                                                    hardware components supported in those tools are specific
                             20
                                                                                                   to their domains (ECU, CAN and AFDX Networks).
                                                                                                      To the best of our knowledge, the support for cache mem-
                             10
                                                                                                   ory does not exists in the tools above.
                              0
                                   50    55    60    65    70       75      80   85   90      95      SimSo[8] is a scheduling simulation tool. It supports cache
                                                     Processor Utilization (%)                     sharing on multi-processor systems. It takes into account
                                                       RM         RM-SA                            impact of the caches through statistical models and also the
                                                                                                   direct overheads such as context switches and scheduling
Figure 3: Total CRPD with and without memory                                                       decisions. The memory behavior of a program is modeled
layout optimization                                                                                based on Stack Distance Profiles (SDP) - the distribution
                             16                                                                    of the stack distances for all the memory accesses of a task,
                             14                                                                    where a stack distance is by definition the number of unique
                                                                                                   cache lines accessed between two consecutive accesses to a
      Computation Time (s)


                             12
                             10                                                                    same line [18]. The difference between SDP and our model
                             8                                                                     is that SDP is achieved by on-line monitored counters such
                             6                                                                     as valgrind [19] while UCB and ECB are achieved by an off-
                             4                                                                     line WCET analysis tools as below. At the moment, there
                             2                                                                     is no archived comparison between the two. UCB analysis
                             0                                                                     with scheduling simulation could be more pessimistic but
                                  10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
                                                   Simulation Interval (x10000)                    safer because it takes into account the program point with
                                     Average Computation Time          Max Computation Time
                                                                                                   largest number of UCB.
                                                                                                      Several WCET analysis tools allow designers to perform
     Figure 4: Computation time of the simulator                                                   cache analysis. SymTA/P[23], HEPTANE[9], Chronos[16]
                                                                                                   and aiT2 are examples of them. UCB computation of a
5.3            Performance/Scalability Analysis                                                    program is supported by aiT. The analysis of those tools are
   The objective of this experiment is to test the performance                                     based on program path analysis of the control flow graph of
and the scalability of our tool when scheduling simulation                                         the program. It is compliant with the requirement of our
interval increases. In general, there are four factors affecting                                   proposed tool.
the performance of a scheduling simulator: (1) the number                                             In conclusion, WCET tools focus on the evaluation of pro-
of tasks, (2) the scheduling simulation interval, (3) the cache                                    gram’s control flow graph to compute the WCET and also a
size and (4) the number of events. The first three factors                                         few tools can compute cache access profile. The analysis re-
depend on a chosen RTES. The number of events depends on                                           sult could be used as an input for a scheduling analysis tool.
characteristics of the RTES; for example, a higher processor                                       In the domain of real-time scheduling analysis, the support
utilization means a higher number of preemption events. In                                         for cache and CRPD when performing scheduling analysis
this experiment, we choose to test a system model of 10 tasks                                      is not very well specified. As far as we know, only SimSo
and 256 cache blocks. Processor utilization is set to 70 %.                                        clearly supports scheduling simulation with cache analysis
Scheduling simulation is ranging from 100,000 to 1,000,000                                         based on SDP. Then, we proposed a tool available to the
units of time where 1 unit = 8 µs.                                                                 community which can either compute cache access profile
   Fig. 4 displays results of our experiment on a PC with                                          of a task from its control flow graph or re-use informa-
Intel Core i5-3360 CPU, running Ubuntu 12.04. For each                                             tion obtained from a WCET/cache analysis tool to perform
simulation interval, 100 task sets are generated. We perform                                       scheduling analysis. Our model is compliant with existing
scheduling simulation and compute the max and average                                              work in [14], [5], [2] and [17]. In addition, because Cheddar
computation time.                                                                                  provides a large set of scheduling analysis methods, we can
   As we can see, while maximum computation time increases                                         fully investigate the dependency between CRPD and other
slightly when simulation interval increases, average compu-                                        scheduling parameters in order to either adjust or optimize
tation time only fluctuates around 6 seconds. This shows                                           a RTES design to meet its timing constraints.
that the tool is scalable when simulation interval is high.
                                                                                                   7.     CONCLUSIONS
6.    RELATED WORKS                                                                                   In the article, we presented an approach to implement a
  In this section, we present several real-time scheduling                                         cache-aware scheduling simulator. The work was proceeded
analysis and WCET analysis tools.                                                                  in the context of the Cheddar real-time scheduling ana-
  MAST[12] is a modeling and analysis suite for real-time                                          lyzer, which is open-source, freely available to researchers
applications. The hardware component abstraction of MAST                                           and practitioners that want to investigate scheduling anal-
model is generic and it includes processing resources and                                          ysis of RTES with cache memory. Our solution consists of
shared resources. The shared resource component is not                                             three parts: modeling cache memory and cache access pro-
supposed to model a cache memory unit. However, MAST                                               file, implementing several cache analysis methods and per-
considers the overhead parameters of the components that
                                                                                                   1
may be used to model CRPD.                                                                             RealTime-at-Work, http://www.realtimeatwork.com/
                                                                                                   2
  STORM[26] and YARTISS [7] are scheduling simulation                                                  AbsInt Inc., http://www.absint.com/
forming scheduling simulation. We extended Cheddar to be             Modeling and analysis suite for real time applications.
able to deal with cache memory and illustrated the depen-            In Real-Time Systems, 13th Euromicro Conference on,
dency between cache and other scheduling parameters.                 2001., pages 125–134. IEEE, 2001.
  There are open problems we are aiming to address in           [13] R. Henia, A. Hamann, M. Jersak, R. Racu, K. Richter,
the future. The first one concerns the refinement of other           and R. Ernst. System level performance analysis–the
CRPD analysis methods. Several improvements have been                symta/s approach. IEE Proceedings-Computers and
proposed in [2] to reduce the upper-bound of the CRPD.               Digital Techniques, 152(2):148–166, 2005.
In addition, we plan to compare our approach with the ap-       [14] C.-G. Lee, H. Hahn, Y.-M. Seo, S. L. Min, R. Ha,
proach based on Stack Distance Profile in [8]. Second, we are        S. Hong, C. Y. Park, M. Lee, and C. S. Kim. Analysis
going to study the problem of feasibility interval of RTES           of cache-related preemption delay in fixed-priority
with cache memory.                                                   preemptive scheduling. IEEE Transactions on
                                                                     Computers, 47(6):700–713, 1998.
8.   REFERENCES                                                 [15] C. Li, C. Ding, and K. Shen. Quantifying the cost of
 [1] S. Altmeyer, R. I. Davis, and C. Maiza. Improved                context switch. In Proceedings of the 2007 workshop
     cache related pre-emption delay aware response time             on Experimental computer science. ACM, 2007.
     analysis for fixed priority pre-emptive systems.           [16] X. Li, Y. Liang, T. Mitra, and A. Roychoudhury.
     Real-Time Systems, 48(5):499–526, 2012.                         Chronos: A timing analyzer for embedded software.
 [2] S. Altmeyer and C. Maiza Burguière. Cache-related              Science of Computer Programming, 69(1):56–67, 2007.
     preemption delay via useful cache blocks: Survey and       [17] W. Lunniss, S. Altmeyer, and R. I. Davis. Optimising
     redefinition. Journal of Systems Architecture,                  task layout to increase schedulability via reduced
     57(7):707–719, 2011.                                            cache related pre-emption delays. In Proceedings of the
 [3] M. Bertogna, O. Xhani, M. Marinoni, F. Esposito,                20th International Conference on Real-Time and
     and G. Buttazzo. Optimal selection of preemption                Network Systems, pages 161–170. ACM, 2012.
     points to minimize preemption overhead. In Real-Time       [18] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L.
     Systems (ECRTS), 2011 23rd Euromicro Conference                 Traiger. Evaluation techniques for storage hierarchies.
     on, pages 217–227. IEEE, 2011.                                  IBM Systems journal, 9(2):78–117, 1970.
 [4] E. Bini and G. C. Buttazzo. Measuring the                  [19] N. Nethercote and J. Seward. Valgrind: a framework
     performance of schedulability tests. Real-Time                  for heavyweight dynamic binary instrumentation. In
     Systems, 30(1-2):129–154, 2005.                                 ACM Sigplan notices, volume 42, pages 89–100, 2007.
 [5] J. V. Busquets-Mataix, J. J. Serrano, R. Ors, P. Gil,      [20] A. Plantec and F. Singhoff. Refactoring of an ada 95
     and A. Wellings. Adding instruction cache effect to             library with a meta case tool. In ACM SIGAda Ada
     schedulability analysis of preemptive real-time                 Letters, volume 26, pages 61–70. ACM, 2006.
     systems. In Proceedings of the 2nd IEEE Real-Time          [21] F. Singhoff, J. Legrand, L. Nana, and L. Marcé.
     Technology and Applications Symposium (RTAS),                   Cheddar: a flexible real time scheduling framework.
     pages 204–212, 1996.                                            ACM SIGAda Ada Letters, 24(4):1–8, 2004.
 [6] G. C. Buttazzo. Rate monotonic vs. edf: judgment           [22] J. Staschulat, S. Schliecker, and R. Ernst. Scheduling
     day. Real-Time Systems, 29(1):5–26, 2005.                       analysis of real-time systems with precise modeling of
 [7] Y. Chandarli, F. Fauberteau, D. Masson, S. Midonnet,            cache related preemption delay. In Proceedings of the
     M. Qamhieh, et al. Yartiss: A tool to visualize, test,          17th Euromicro Conference on Real-Time Systems
     compare and evaluate real-time scheduling algorithms.           (ECRTS), pages 41–48, 2005.
     In Proceedings of the 3rd International Workshop on        [23] J. Staschulat, S. Schliecker, and R. Ernst. Scheduling
     Analysis Tools and Methodologies for Embedded and               analysis of real-time systems with precise modeling of
     Real-time Systems, pages 21–26, 2012.                           cache related preemption delay. In Euromicro
 [8] M. Chéramy, A.-M. Déplanche, P.-E. Hladik, et al.             Conference on Real-Time Systems (ECRTS), Palma
     Simulation of real-time multiprocessor scheduling with          de Mallorca, Spain, 2005.
     overheads. In International Conference on Simulation       [24] H. N. Tran, F. Singhoff, S. Rubini, and J. Boukhobza.
     and Modeling Methodologies, Technologies and                    Instruction cache in hard real-time systems: modeling
     Applications (SIMULTECH), 2013.                                 and integration in scheduling analysis tools with
 [9] A. Colin and I. Puaut. Worst-case timing analysis of            AADL. In Proceedings of the 12th IEEE International
     the rtems real-time operating system. Rapport NâŮe,           Conference on Embedded and Ubiquitous Computing,
     PI1277, IRISA, France, 1999.                                    Milan, Italy, 2014.
[10] L. Cucu and J. Goossens. Feasibility intervals for         [25] H. N. Tran, F. Singhoff, S. Rubini, and J. Boukhobza.
     fixed-priority real-time scheduling on uniform                  Addressing cache related preemption delay in fixed
     multiprocessors. In IEEE Conference on Emerging                 priority assignment. In Proceedings of the 20th IEEE
     Technologies and Factory Automation (ETFA), pages               International Conference on Emerging Technologies
     397–404, 2006.                                                  and Factory Automation (ETFA), Luxembourg, 2015.
[11] C. Fotsing, F. Singhoff, A. Plantec, V. Gaudel,            [26] R. Urunuela, A. Deplanche, and Y. Trinquet. Storm, a
     S. Rubini, S. Li, H. N. Tran, L. Lemarchand,                    simulation tool for real-time multiprocessor scheduling
     P. Dissaux, and J. Legrand. Cheddar architecture                evaluation. In IEEE Conference on Emerging
     description language. 2014.                                     Technologies and Factory Automation (ETFA), pages
[12] M. González Harbour, J. Gutiérrez Garcı́a,                    1–8, 2010.
     J. Palencia Gutiérrez, and J. Drake Moyano. Mast:

</pre>