<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Caching in a Mixed-Criticality 5G Radio Base Station</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Emad Jacob Maroun</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luca Pezzarossa</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Schoeberl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technical University of Denmark, Department of Applied Mathematics and Computer Science</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Telecommunication is a critical driver of economic and social development. 5G technologies are state-of-the-art in telecommunication, setting strong and open-ended requirements for implementing systems. Current systems for implementing baseband technologies in 5G depend on hardware separation to ensure high- and low-criticality tasks do not interfere in such a way as to violate guarantees. To increase performance and lower costs, this paper sets the research direction into future mixed-criticality systems that can handle both the high- and low-criticality tasks of the baseband unit. We analyze the 5G requirements and the common systems that currently implement them. We propose using T-CREST as the research platform with a specific architecture targeting mixed-criticality workloads. We present two cache proposals to reduce the interference of low-criticality tasks on high-criticality tasks but ensure high cache utilization and eficiency. The first cache proposal uses timeouts to automatically free cache lines reserved for high-criticality tasks. The second proposal uses contention tracking to limit how much low-criticality tasks may influence high-criticality tasks. Lastly, we propose a third cache architecture to unify the method and stack caches unique to T-CREST into a single level-2 cache.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;5g</kwd>
        <kwd>t-crest</kwd>
        <kwd>real-time systems</kwd>
        <kwd>low latency</kwd>
        <kwd>caches</kwd>
        <kwd>radio baseband</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Socio-technical evolution is dependent on mobile
communications as a critical driver to allow for economic and social
development [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As such, the evolution of communication
technologies is essential in enabling societal development.
5G is state-of-the-art in mobile communication technologies,
promising unprecedented speeds, ultra-low latency, and
massive connectivity capabilities. With its lofty promises,
implementing 5G communication networks is a significant
industrial challenge. Continued investment in 5G
technologies is needed to reach beyond the minimal promises of the
technology. Improvements in technical implementations
will ensure better service characteristics for customers and
users at lower costs.
      </p>
      <p>One critical aspect of telecommunications technology is
the radio base station (RBS), which provides wireless
transmission to and from mobile devices. The 5G functionality is
implemented in these RBSs. Continued improvement of the
RBS is critical to staying at the forefront of the industry. As
such, research on how to best implement RBS for optimizing
performance and cost ensures long-term competitiveness
in the industry.</p>
      <p>
        The requirements of 5G introduce a hierarchy of
prioritized tasks that the RBS has to complete. The RBS, therefore,
becomes a mixed-criticality system [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], where minimum
guarantees are upheld to ensure critical tasks are completed
correctly and in a timely fashion. On the other hand,
noncritical tasks need to be performed as fast as possible;
however, they only need to provide good quality of service (QoS)
on average, so they may be de-prioritized to ensure that
critical tasks meet their deadlines. To ensure non-critical tasks
do not interfere with the critical ones, hardware systems
are divided into several layers with difering
responsibilities correlating to the open systems interconnection (OSI)
model [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This hardware division makes it easier to
control interference but decreases resource utilization, which
3rd workshop on Resource AWareness of Systems and Society (RAW 2024),
July 2–5, 2024, Maribor, Slovenia
* Corresponding author.
$ ejama@dtu.dk (E. J. Maroun); lpez@dtu.dk (L. Pezzarossa);
masca@dtu.dk (M. Schoeberl)
      </p>
      <p>
        0000-0002-3675-3376 (E. J. Maroun); 0000-0002-0863-2526
(L. Pezzarossa); 0000-0003-2366-382X (M. Schoeberl)
© 2024 Copyright © 2024 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).
hurts performance and price. Therefore, we are interested
in investigating future system designs incorporating
mixedcriticality system research to merge the currently divided
systems into a single platform that can handle the varying
criticality of tasks. While the current heavy use of shared
scratchpads and the phased execution model [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] give high
predictability to systems managing the OSI layer 1, it is
wasteful and dificult to unify with the use of shared caches
in the systems managing the OSI layer 2. Therefore,
innovative techniques are needed to facilitate the unification of the
layer 1 and layer 2 systems into a unified hardware system.
      </p>
      <p>
        This paper addresses the challenge of sharing a level 2 (L2)
cache between diferent tasks and executing on diferent
cores while still delivering low-latency execution of critical
tasks. We propose to use the T-CREST platform [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] to
explore diferent solutions of the challenges around memory
management for mixed-criticality systems by presenting
three distinct caching architectures for future exploration.
All solutions are centered around regulating access to
different cache lines for high- and low-criticality jobs. More
specifically, we propose two shared caches that use
timeouts and contention tracking to limit the interference of
low-criticality tasks on high-criticality ones, as well as an
L2 cache that unifies the split caches unique to T-CREST
since they exhibit unique access characteristics that can be
sped up predictably.
      </p>
      <p>The contributions of this paper are: (1) A description of
common 5G RBS technologies and implementations, (2) a
discussion of the challenges future systems face in the
pursuit of lower cost, higher eficiency, and improved
performance, and (3) three proposals for caching architectures that
we intend to explore to address the challenges described.</p>
      <p>The rest of this paper is structured into four sections. The
following section will provide some background on how
current systems implement 5G and their challenges. Section
3 introduces the T-CREST platform and how it can be used as
a basis for research into a mixed-criticality system. Section
4 discusses the three cache architecture proposals. Section
5 presents related work and Section 6 concludes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. 5G Radio Baseband</title>
      <p>During the initial phases of the 5G specification, three usage
scenarios were identified as being critical for the future of
I$
DSP 0</p>
      <p>D$
I$
DSP 2</p>
      <p>D$</p>
      <p>I$
DSP 1</p>
      <p>D$
I$
DSP 3</p>
      <p>D$</p>
      <p>Acc 0
Acc 2</p>
      <p>Acc 1
Acc 3</p>
      <p>Acc 4
Acc 6
Shared Scratchpad
Shared Scratchpad
Shared Scratchpad</p>
      <p>Acc 5
Acc 7</p>
      <p>
        Scheduler
Cluster-Shared Scratchpad 1
Cluster-Shared Scratchpad 2
mobile communications [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]:
      </p>
      <p>Enhanced Mobile Broadband (eMBB): Focuses on
providing significantly higher data rates and capacity compared
to previous telecommunication generations, enabling
applications such as high-definition video streaming, virtual
reality, and augmented reality. This scenario covers the
day-to-day activities of private users and data-heavy but
less critical industrial applications.</p>
      <p>Ultra-Reliable and Low Latency Communications
(URLLC): Emphasizes ultra-reliable and low-latency
communication, critical for applications that demand real-time
responsiveness and mission-critical reliability, including
autonomous vehicles, remote surgery, and industrial
automation.</p>
      <p>Massive Machine Type Communications (mMTC):
Targets the connectivity of a massive number of devices
using minimal energy, enabling the Internet of Things (IoT)
to scale to unprecedented levels, facilitating applications
such as smart cities, industrial IoT, and environmental
monitoring.</p>
      <p>
        These scenarios resulted in a requirement specification
that includes the following criteria [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]:
• Peak Data Rate: 20 Gbit/s download, 10 Gbit/s
upload. This is only in ideal conditions.
• Transmission Latency: 4 ms for eMBB, 1 ms for
URLLC. This is the latency added by the 5G
network to the overall communication latency between
endpoints.
• Device Mobility: up to 500 km/h for rural eMBB, less
for more dense areas.
• Density: up to 1.000.000 devices per square kilometer
in the mMTC scenario.
      </p>
      <p>Note how each requirement applies in specific scenarios and
is not necessary in others. For example, the peak data rate
is unnecessary for scenarios covered by URLLC or mMTC.
Meanwhile, the extreme latency requirement of 1 ms only
applies to URLLC.</p>
      <p>An RBS must manage these diverse requirements and,
therefore, becomes a mixed-criticality system. For example,
tasks within the URLLC scenario must be prioritized over
eMMB tasks to uphold the URLLC latency requirements. Not
only do we have a range of priorities, but these priorities
may also change as usage changes. Adapting to ongoing
changes in network usage is, therefore, a critical aspect of
implementing 5G.</p>
      <p>Of-Chip Main Memory DRAM</p>
      <sec id="sec-2-1">
        <title>2.1. System Architecture</title>
        <p>
          Typical RBS systems are divided into three hardware units:
1. The Remote Radio Unit (RRU). It is immediately
connected to the antennas and handles the initial input
stream from the antennas. The antenna streams are
initially processed in this unit and grouped into user
streams (e.g., 8 antenna streams are compressed to
one group) to be sent to the next unit.
2. The Baseband Unit (BBU). It takes the input streams
from the RRU and further processes them. The RRU
and BBU units together constitute the physical layer
of the OSI model (layer 1), handling the physical
aspects of transmitting and receiving wireless 5G
signals [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
3. The Layer 2 unit handles the data link layer of the
OSI model (layer 2). This includes Medium Access
Control (MAC) and Radio Link Control (RLC) tasks.
        </p>
        <p>The varying characteristics of the workloads of the
diferent units result in diferent hardware designs. While both
the BBU and layer 2 must handle high- and low-criticality
tasks, they do so in diferent ways. This research aims to
explore a merged system to handle the BBU and layer 2 tasks
in one hardware system. The new system is to be centered
around the design of a BBU but explore technologies that
allow layer 2 tasks to run eficiently.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Baseband Unit</title>
        <p>The BBU system handles physical layer tasks centered
around signal processing of incoming and outgoing
transmissions. Its design ensures maximum predictability at the
expense of resource utilization eficiency. Figure 1 provides
an overview of the system. It is not meant to be
representative of any specific system but to give an idea of the
components often present and their interactions.</p>
        <sec id="sec-2-2-1">
          <title>2.2.1. Hardware</title>
          <p>
            We focus on systems centered around a clustered and
heterogeneous design. Each cluster contains a set of processors or
accelerators (for illustration, we show four in Figure 1). First,
the general computing capability is provided by digital
signal processor (DSP) cores with high predictability [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ]. Each
DSP has a private instruction and data cache and shares a
single scratchpad memory with the other processors in the
cluster.
          </p>
          <p>The other clusters contain acceleration cores for specific
and common workloads. The accelerators in each cluster
also share a scratchpad. The exact architecture of the
accelerators is out of the scope of this paper.</p>
          <p>The clusters may also share scratchpads, two are shown
as an example. These split scratchpads handle diferent
data with specific access characteristics. For example, some
configuration data might be mostly read and changed rarely,
while user-specific data may be updated continuously.</p>
          <p>Lastly, a hardware scheduler can be present to orchestrate
task execution on the relevant cores and movement of data.
We have omitted to describe any other application-specific
devices or connections to peripherals.</p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Data Processing</title>
        <p>Data processing starts once every millisecond. While the
RRU is processing the antenna streams, the BBU starts with
a set of configuration tasks that prepare for the delivery
of data from the RRU. These configuration tasks must run
on the DSP cores to, e.g., configure the accelerators before
they start executing. This could result in configuration data
initially going to one of the cluster-shared scratchpads, from
where it is moved to the cluster scratchpads as needed. This
data starts in the shared scratchpad of the core running
the job and is of-loaded to the cluster-shared scratchpad
when the configuration job is done. In parallel with the
configuration tasks, the data from the RRU is being loaded
into the cluster-shared scratchpads. When that is ready,
proper processing tasks can begin executing on DSPs or
accelerators as needed.</p>
        <p>We consider only strict data access characteristics of the
tasks. All shared data is read-only. User-specific data is
segmented into the relevant tasks and updated only by the
task currently being worked on. At no point are two tasks
working on the same user data. These strict data access
characteristics mean that synchronization and coherence
are not issues we will consider.</p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Phased Execution</title>
          <p>The use of scratchpads in the BBU reduces the variability
in execution times. However, this requires methodical
orchestration to ensure each job has the needed data. As such,
every job is divided into three phases:
1. Read: Any data a task requires is moved onto its
cluster’s scratchpad from the cluster-shared caches.
2. Execute: The task’s job is executed to completion
without needing to access memory other than the
cluster’s scratchpad.
3. Write: All the data previously fetched for the job,
which has been updated, is written back to the main
memory.</p>
          <p>
            This is a classic implementation of the phased execution
[
            <xref ref-type="bibr" rid="ref4 ref9">4, 9</xref>
            ], also called the simple-task model [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. The task
scheduler ensures that a task’s Execute is only scheduled on a
processor when its corresponding Read has terminated on
the same cluster. Data movement is performed using DMAs,
allowing processors to execute other jobs’ Execute phase
in parallel with data movements.
          </p>
          <p>A cluster’s scratchpad is partitioned so that each running
job has exclusive access to its memory portion. If two tasks
use the same data, the Read of each will load that data
into their respective partitions. This means data might be
duplicated in the cluster scratchpads. However, such shared
data is rarely written to, and synchronization is explicitly
handled at the application level and, therefore, is not an
issue.</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Layer 2 Design</title>
        <p>The common computing architectures for layer 2 are
more traditional, with, e.g., superscalar cores and standard
caching. The workload on the system requires less stringent
predictability than the BBU, allowing for a more traditional
design. The tasks also require higher performance,
provided by the more complex design at the cost of
predictability. To ensure high-criticality tasks meet their deadlines,
the hardware resources can be partitioned by clusters and
intentionally over-provisioned.</p>
        <p>Layer 2, therefore, can have much wastage where
highcriticality tasks are concerned. This unit’s more complex
design makes it challenging to ensure tasks meet their
deadlines. The only possibility of ensuring the deadlines are
met is to provide the tasks with such an overabundance of
resources that even when low-criticality tasks interfere, the
high-criticality tasks will not be adversely afected.
Therefore, the ineficient use of resources in layer 2 is a supporting
reason for merging the layer 2 subsystem with the BBU
subsystem.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Challenges</title>
        <p>We aim to research new methods for implementing 5G RBS
technologies to achieve better performance at lower cost.
Therefore, the current challenges of increased costs and
lower performance must be alleviated in any future system.</p>
        <p>Challenge 1: The primary challenge for the
abovementioned RBS systems is a divided hardware architecture.
The physical division ensures that high-critical tasks can
maintain their needed deadlines, which increases costs and
reduces overall performance. First, the separation
necessitates manufacturing two physical systems, which is costly.
Second, the separation means the two systems cannot share
resources, reducing the eficient use of available resources.</p>
        <p>Challenge 2: On the BBU system specifically, there is
also a challenge with eficient use of resources. While
using scratchpads ensures execution-time predictability for
all tasks, it also forces data duplication. If two tasks use
the same data, that data is moved into both tasks’
scratchpads partition. This is both a waste of scratchpad memory
and memory bandwidth. This is especially prevalent with
configuration data, which is often shared between many
tasks and does not change often. The data loaded into the
scratchpads is also loaded on a pessimistic basis. Some tasks
may only need some of the data, meaning some data might
be unnecessarily loaded into the scratchpads.</p>
        <p>Challenge 3: Memory bandwidth is wasted when
dependent tasks use the same data. The Write phase in the BBU
system always runs after the Execute phase. A subsequent
job using the same data must reload it in its Read phase.
This is sub-optimal in cases where the subsequent task can
run on the same cluster as the first task. In such a case,
omitting the Write phase of the first task and the Read
phase of the second task would be better.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. The T-CREST Platform</title>
      <p>We propose to use the T-CREST platform as a basis for
research into future platforms for 5G RBS. This section
describes the platform’s current capabilities and how they
relate to the challenges present in divided RBS systems.</p>
      <sec id="sec-3-1">
        <title>3.1. T-CREST and Patmos</title>
        <p>
          The Patmos processor [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] is designed to serve real-time
systems. Several Patmos cores are combined with a
networkon-chip, a memory arbitration tree, and a memory controller
to the time-predictable multi-core platform T-CREST [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. As
such, T-CREST provides techniques that make task
execution time more predictable and reduce the worst-case
execution time (WCET). Around the Patmos cores, it builds a
platform with time-predictable components to reduce WCET
analysis complexity and increase accuracy. T-CREST uses
networks-on-chips [
          <xref ref-type="bibr" rid="ref12 ref13 ref14">12, 13, 14</xref>
          ] that ensure data is moved
between processing cores with a known maximum latency. For
accessing shared main memory, T-CREST uses the dedicated
arbitration tree-based network-on-chip [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Regardless of
how many cores are accessing the memory, each access will
be serviced within a bounded latency.
        </p>
        <p>
          Patmos uses an in-order pipeline to ensure every
instruction has a known and constant execution time. To exploit
instruction-level parallelism predictably, Patmos is also a
very long instruction-word (VLIW) architecture with a
dualissue pipeline. VLIW architectures are a predictable way
of increasing performance without increasing complexity
[
          <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
          ]. Patmos executes instructions in bundles of up to
two instructions. The compiler must designate instructions
as part of a bundle by setting a specific bit in the first
instruction. All Patmos instructions are predicated: Based on
one of eight predicate registers, each instruction is either
enabled or disabled. If the predicate register’s value is true,
the instruction is enabled, meaning it executes normally. If
the value is false, the instruction is disabled and does not
afect registers or memory. It efectively becomes a noop.
However, the execution time of disabled instructions is the
same as when enabled. Predicated instructions allow the
compiler to minimize execution time variability or even
eliminate it entirely [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Predictable Caching</title>
        <p>
          While caching is usually associated with unpredictability
and dificulties for static analysis, T-CREST deploys two
predictable and easily analyzable caches. The first is a method
cache [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] that replaces a traditional instruction cache in
Patmos [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. the method cache caches whole or parts of
functions (sub-functions) such that instruction fetching never
misses except at specific points. The compiler manages this
cache by splitting the code into blocks that fit in the method
cache and inserting cache-fill instructions where needed.
For the Patmos ISA function call and return instructions
ensure that the callee or the caller are in the method cache.
To support sub-function caching Patmos has cache filling
variants of branch instructions. Using a method cache limits
the number of places cache misses can occur to the specific
cache-filling instructions. The method cache is simpler to
model for an analyzer to provide tight WCET bounds [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ].
        </p>
        <p>
          The second unique cache of the T-CREST is the stack
cache [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]. It caches function-local data, often accessed
predictably, and can be loaded at function entry and exit points.
Accessing this data is also done without experiencing cache
misses. The compiler also manages the stack cache, setting
it up and tearing it down at function entry and exits and
using stack-targeting load and store instruction variants.
An analyzer can assume any stack-targeting instruction will
hit in the stack cache. Therefore, the cache size must only
be modeled to account for the stack setup and tear-down
time [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Data accesses that are not function-local may still
go through the conventional data cache or circumvent all
caching to target the main memory directly.
        </p>
        <p>
          These two cache architectures are supported by the Platin
WCET-analyzer [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. Platin models instruction execution
and tracks which blocks of code are likely to be in the
method cache at a given point. It accounts for this at
controllfow point to know whether a method-cache miss is likely
and how many bytes would have to be loaded. For the stack
cache, it models the program stack’s size at any point and
tracks stack-cache-control instructions added by the
compiler. At points where the stack must grow, Platin knows
whether the cache has free space or needs to spill some of
the program stack to main memory.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Missing Capabilities</title>
        <p>The T-CREST platform is missing some features and
capabilities compared to the BBU system. We will enumerate
these missing capabilities and highlight how we might
either simulate them using existing capabilities or discuss how
to implement them into the platform as part of the research
project.</p>
        <sec id="sec-3-3-1">
          <title>3.3.1. Acceleration and Clustering</title>
          <p>The specific processing requirements of an RBS means
dedicated accelerators can be used for maximum eficiency. The
T-CREST platform does not include anything resembling
these accelerators. Likewise, the T-CREST platform does
not use any clustering, whose benefit is mainly driven by a
multi-layered intermediate memory, which we will discuss
in the next section.</p>
          <p>As this research mainly focuses on the eficient use of
resources, notably memory, we will not investigate or
implement any hardware acceleration. Instead, we will use
the Patmos cores as substitutes for specific accelerators. We
will implement clustering into the T-CREST platform so
that each cluster can be designated to be allowed to execute
specific tasks. This will allow us to treat one cluster as a
substitute for a BBU DSP cluster and others for diferent
types of acceleration clusters.</p>
        </sec>
        <sec id="sec-3-3-2">
          <title>3.3.2. Hierarchical Memory</title>
          <p>The Patmos cores of T-CREST are each paired with private
caches, as described earlier. However, no further hierarchy
of intermediate memory exists. In contrast, the BBU system
contains three levels of intermediate storage: First, each
DSP (or accelerator) has its caches. Second, each cluster has
a shared scratchpad. lastly, cluster-shared scratchpads are
present for a last level of storing various types of data.</p>
          <p>A multi-layered memory hierarchy is necessary for the
experiments to be representative, especially given the unique
data access characteristics. Therefore, we will build a
second layer of intermediate memory, which is shared between
the Patmos cores of each cluster. We will omit a last
memory layer, as any methods of managing the second layers
we develop can be transferred to the rest of the layers of a
real-world system.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>3.3.3. Hardware-Assisted Scheduling</title>
          <p>The BBU systems often use hardware to accelerate
scheduling. T-CREST does not implement any hardware that can
assist with scheduling. While using a hardware scheduler in
the BBU system ensures that the extreme amount of tasks
gets scheduled in a reasonable time, the smaller scale of
this project’s prototypes can likely handled by
softwaremanaged scheduling.</p>
          <p>
            Therefore, the initial proposed system will not have any
scheduling hardware, but dedicated Patmos cores will
replace it to handle the scheduling. Software-defined
scheduling can be a flexible way to test our scheduling
strategies as the system matures. Moving to a hardware
scheduler should be easily doable at later stages of research,
where the scheduling has been studied and techniques
chosen. Patmos already supports adding custom devices and
accelerators[
            <xref ref-type="bibr" rid="ref25">25</xref>
            ]. A hardware scheduler is a device that
interacts with the rest of the clusters, memories, and processors
and issues commands in the same manner a Patmos core
would.
          </p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Proposed System Architecture</title>
        <p>Figure 2 shows a diagram of our proposed system. It
comprises three clusters, each with a set of Patmos cores with
private split caches (Method, Stack, and Data) and a shared
cluster cache. The cores use the T-CREST memory tree to
access the shared cache, providing us with predictable and
low-latency access. The clusters use the T-CREST memory
tree to connect to the memory controller, which manages
access to the of-chip, main memory. A shared bus (in gray
above the clusters) facilitates cross-cluster and cross-core
communication. This allows a Patmos core or a hardware
device scheduler to issue scheduling commands to the whole
system.</p>
        <p>This system architecture will allow research on eficiently
managing the cluster caches. The diferent clusters can
simulate the DSP or accelerator clusters on the BBU system,
while the cluster-shared scratchpads of that system do not
introduce new challenges. Therefore, limiting ourselves
to the two levels of cache (private and cluster caches) will
allow for fruitful experimentation during the research.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Cache Proposals</title>
      <p>To start addressing the challenge of merging layer 1 and
layer 2 systems, we focus on the challenge of using a shared
cache in each cluster. As described earlier, the BBU
architecture sacrifices the eficient use of resources to ensure low
variability in execution times. We aim to maximize resource
usage in the proposed system while maintaining low
variability. We propose exploring three caching solutions that
address the challenges of predictable caching: (1) a
criticality timeout cache, (2) a contention tracking cache, and (3) a
unified method/stack cache.</p>
      <sec id="sec-4-1">
        <title>4.1. Criticality Timeout Cache</title>
        <p>In cases where strict predictability is unnecessary but
flexibility and utilization eficiency are essential, we propose
a cache using a partitioning approach based on cache line
timeouts. For that cache, we need an n-way set
associative cache configuration. We can configure the cache at the
granularity of cache ways. Each cache way can be assigned
either a criticality or a task/core ID (we will use criticality
moving forward).</p>
        <p>In this proposal, each cache way can be assigned either
to high or low criticality. Cache lines can be used by
highor low-criticality tasks. However, naturally high-criticality
tasks are preferred. A low-criticality task cannot evict a
high-criticality cache line. Therefore, to avoid starvation of
low-criticality tasks, at least one way must not be assigned
for the high-criticality tasks.</p>
        <p>When an access of the high criticality arrives, a cache line
in one high-criticality way is tagged as being occupied by
that criticality, and an associated timeout begins. As long
as the timeout is not reached, accesses of low-criticality
tasks cannot evict the cache line. If there is no access to
the line before the timeout is reached, the line criticality
is downgraded, allowing low-criticality jobs to evict the
line. The cache can either be configured right before each
job starts executing, or the criticalities can be configured
ahead of time to match the tasks that will run on the cluster.
With timeouts, there is no need to explicitly release any
data, as the timeout mechanism will do so automatically.
Configuring the cache is done by setting the criticality of a
cache way. When a way is configured with a criticality, all
its cache lines will prefer accesses from that criticality, as
described above.</p>
        <p>A significant drawback of this approach is its
unpredictability. Because timeouts might cause a cache line to be
evicted even when it might be used in the future, it can be
dificult for a WCET analysis tool to track which cache lines
have reached the deadline and which have not. The efect
of the timeouts on WCET bounds can be challenging to
estimate and would require dedicated analysis. However, it can
also be omitted, as this cache architecture is better suited
for measurement-based WCET estimation. With detailed
testing and measurements, getting a suficiently safe WCET
bound should be feasible.</p>
        <p>This cache architecture is designed for high utilization
and low scheduling complexity. Because it reserves each
cache line, only the necessary subset of a cache way is
reserved at a given time. Cache lines that either timed out or
were not used by the job are free to be used by low-criticality
tasks, increasing the utilization of the cache. In this proposal,
we also do not pre-load data into the cache. This means
only data that is used will be loaded. Therefore, we avoid
both bandwidth wastage and cache space wastage when
loading data that is not used. When a job stops executing,
its associated cache lines will eventually time out and release
their contents automatically. The scheduler, therefore, does
not need to manage the phased execution of jobs, reducing
the pressure on the scheduler.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Contention Tracking Cache</title>
        <p>In this proposal, a combination of contention tracking in
the cache and contention-aware task scheduling will allow
for maximal cache utilization through dynamic partitioning,
with high predictability through cache contention tracking
and mitigation.</p>
        <p>In a multicore system without shared caches, the
execution time of a job is afected by the cache behavior without
that behavior being afected by other jobs. Through cache
analysis, we can bound the execution time attributable to
the cache. This is done by estimating the number of cache
misses that will occur. When the cache is shared, this
analysis is no longer possible, as the interference of other jobs
will cause additional cache misses in a manner that cannot
be estimated. In this proposal, we want to let the task
scheduler limit the contention that a job is allowed to experience
such that it is guaranteed to meet its deadline.</p>
        <p>We give two example types of contention: (1) A job 1
experiences a contention event if a cache line 1 it
populated with data 1 is evicted by an access by another job 2.
This is because 1 will experience a cache miss on the next
access to 1 that it would not have experienced if 2 had
not interfered. (2) 1 also experiences a contention event if a
cache miss occurs when accessing 1 results in the eviction
of a cache line that 1 also populated in the same cache set
(with data 2). This event is a contention with any other
job with at least one populated cache line in the same set.
Without the other jobs, 1 would have populated an empty
cache line instead of evicting one of its other populated lines.
The evicted line will cause a cache miss in the future when
1 needs to access 2 again.</p>
        <p>We only consider contention between diferent jobs.
Selfcontention also happens in private caches and is, therefore,
already managed in the cache analysis for the private cache.</p>
        <p>We limit the maximum allowed contention as defined
above to ensure that a job meets its deadline without
interference from other jobs. The scheduler will configure the
cache with a maximum allowed contention. The cache
controller will track contention by checking and counting the
above contention events for each job. When a job reaches
its contention limit, any cache access that would cause a
contention event will be blocked or mitigated. For example,
say 1 is high criticality, and 2 is not. As long as 1 has not
reached its contention limit, the cache treats accesses from
both jobs equally. When the limit is reached, contention
events are mitigated between 1 and 2. In the case of the
ifrst event type, accesses from 2 that would cause an
eviction of 1’s cache lines would be rejected by the cache. The
access must then be rerouted directly to the main memory,
which the system must have support for. In the second event
type, if the default replacement policy would have 1 evict
its own cache line in the set, it would instead evict a cache
line from 2.</p>
        <p>Setting the contention limit is the responsibility of the
job scheduler. Through traditional static WCET analysis
with the assumption of private caches, jobs get their WCET
bound. Any excess time between the bound and the task
deadline is therefore open to contention. Before the
scheduler starts a job, it sets the contention limit, ensuring the
WCET of the job, with contention, still meets the deadline.
The contention limit can be static, and it can be calculated
as part of schedulability analysis. It can also be dynamic,
so the scheduler changes it for the runtime condition. If
the task was started early, the contention is increased to
match the slack time available. If the task was started late,
the contention is reduced or set to zero to ensure that the
deadline is still met.</p>
        <p>This proposal’s major strength is that it disconnects the
analysis of tasks with difering criticalities. Because of
the contention limit, high-criticality tasks will never be
adversely afected by low-criticality tasks. Therefore, we
just need to ensure that all high-criticality tasks meet their
deadlines with other methods.1 It also does not statically
partition or lock the cache. At worst, when a contention limit is
reached, the cache will be dynamically partitioned
automatically simply by prioritizing the jobs that have reached the
limit. This maximizes cache utilization. It also allows for
maximizing the performance of low-criticality tasks as long
as it does not adversely afect any high-criticality tasks.</p>
        <p>This proposal does increase the complexity of the cache
controller, which needs to track contention events and
mitigate them for jobs that have reached their contention limit.
Each cache line needs to be associated with a job (or core),
each job needs a contention counter, and logic needs to
ensure the correct mitigation at contention limits. The
proposal also increases scheduler complexity. This complexity
can be initially lowered by simply having statically
determined contention limits. However, further work should
explore dynamically determined limits, which would increase
the workload on the scheduler.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Unified Method/Stack Cache</title>
        <p>The Patmos processor on T-CREST uses the special method
and stack caches. While these caches have been researched
for their impact on predictability, and the Platin analyzer
has analysis implementations for them, additional work is
needed to integrate them into a shared L2 cache. Therefore,
we propose investigating a shared L2 cache that integrates
the features of both the method cache and the stack cache.
It is meant to complement either a traditional L2 data cache
or scratchpad, with extended research avenues for a fully
integrated L2 cache that supports the method-, stack-, and
1For example, we could use partitioning between high-criticality tasks
only.
All Data
Shared
Mixed-Criticality
Analyzable
Needs Scheduling
Guaranteed
✓
✓
✓
✗
✗*
✗
✓
✓
✓
✓*
✓*
✓*</p>
        <p>Unified
Method/</p>
        <p>Stack
✗
✗
✗
✓
✗
✓
data caches. This proposal can also complement either of
the previous proposals.</p>
        <p>The method and stack caches have particular access
patterns to their data. The method cache accesses a block of
code at a time, pre-loading a complete block at once. It also
uses a first-in, first-out (FIFO) replacement policy to account
for functions earlier in the call stack being less likely to be
called again soon. On the other hand, the stack cache is
not backed by main memory unless some data is spilled
when the cache is full. This allows the L2 cache to store
the spilled stack data first without sending it to the main
memory. Access to this stored data would have the same
characteristics as access to the stack cache. Additionally,
when space is tight in the L2 cache, the replacement policy
is the same as the stack cache: spill the data furthest up the
stack.</p>
        <p>An open question is how to partition the cache between
the method and stack data. Since both have a replacement
policy that depends on reaching the space limit, a policy
is needed for deciding how much of the cache should be
meant for the methods, and how much should be used for
the stack. We should also investigate if this division can be
dynamically configured such that if the stack is not expected
to use much space, then most of the L2 cache should be saved
for the methods and vice versa. A diferent approach could
be to say that the stack gets priority up to a point. When
the stack needs to store more data, methods are evicted to
make room up to a point (e.g., half the L2 cache size). Any
space not used by the stack cache can store methods. This
can also be done in reverse, where the method data gets
priority.</p>
        <p>An open question that would need answering following
the above initial research, would be how to implement a
unified method/stack cache that is also shared between cores.
Since each core has a distinct stack, and is also likely to use
diferent functions, we need to explore ways for a single
cache to efectively manage multiple stacks and call trees.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Discussion</title>
        <p>The three caching proposals—Criticality Timeout Cache,
Contention Tracking Cache, and Unified Method/Stack
Cache—each address the challenge of predictable caching
in diferent ways. Table 1 compared the various features
of our proposals. The first big diference is between the
Unified Method/Stack Cache and the two other caches. The
Priority Timeout and Contention tracking caches both
support all program data, whereas the Unified Method/Stack
Cache only supports instruction data (methods) and stack
data. Even more specifically, the traditional stack does not
support all stack data, only that which does not need an
address, as the stack cache is not backed by main memory.
Any data whose address is taken in the program cannot be
put in the stack cache, going instead to the shadow stack,
which is backed by main memory. Another big diference
between the Unified Method/Stack Cache and the others is
that the proposal does not share the cache between multiple
cores, which also means it does not alleviate any challenges
for mixed-criticality systems.</p>
        <p>Analyzability is diferent between all the cache proposals.
The Priority Timeout cache does not support analyzability
very well, as it is dificult for analyzers to track when cache
lines have timed out. The contention cache is analyzable,
but only in the sense that it simplifies mixed-criticality
analysis by disallowing interference between tasks of diferent
criticalities. For tasks with the same criticality, the cache
does not provide any assistance but does not complicate
the analysis. The Unified Method/Stack Cache is the most
analyzable. Analyzers can reuse the analysis done for the
separate method and stack caches and likely reuse it for
the unified one with diferent configurations and minor
customization.</p>
        <p>The proposals also difer in how much support is needed
from the job scheduler at runtime. The Priority Timeout
Cache can be implemented without scheduler support if the
way-based partitioning is configured ahead of time. If the
partitioning is done dynamically, it would be the scheduler’s
responsibility. The Contention Tracking Cache needs
support from the scheduler to ensure the amount of allowed
contention is within the correct limit. The scheduler needs
to account for when a high-criticality job is started so that
an appropriate contention limit is chosen. A static approach
can also be used where the contention limit is chosen ahead
of time. However, that does not provide much benefit
compared to traditional partitioning. The Unified Method/stack
Cache needs no scheduling support at all. The only thing
that might be configurable would be how much of the cache
is prioritized for methods or the stack. However, this could
better be done by the program itself, e.g., through compiler
management of the cache.</p>
        <p>Lastly, each cache has diferent guarantees on its behavior.
The Priority Timeout Cache provides priority guarantees for
only a specific time. If that is not managed such that it does
not run out, programs cannot be guaranteed that a specific
amount of the cache is reserved for them. While giving
no guarantees on partitioning, the Contention Tracking
cache guarantees how much contention could afect a job.
However, this is only contention from lower criticality job
contention, and so does not make any guarantees about
the contention from similar-criticality tasks. However, the
Unified Method/Stack Cache is predictable and guarantees
similar behavior to the split caches.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Related Work</title>
      <p>
        Shared caches are a significant challenge for predictability
due to their inherent nature of allowing multiple cores to
access the same cache [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. This can lead to contention
and unpredictable performance. However, several solutions
have been proposed to address this issue, including cache
partitioning and locking [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ].
      </p>
      <p>
        Partitioning is a technique that divides the shared cache
into several partitions, each dedicated to a specific core [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ].
This approach can significantly improve predictability by
reducing contention [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ]. Way-based partitioning involves
dividing the cache ways among diferent cores. Each core is
assigned a specific number of ways in the cache, ensuring
exclusive access to those ways. This method can efectively
isolate the cache activities of diferent cores, improving
predictability. On the other hand, index-based partitioning
involves dividing the cache sets among diferent cores. Each
core is assigned specific sets in the cache, ensuring exclusive
access. This method is more flexible than way-based
partitioning because the number of sets is usually large, allowing
for finer-grained partitioning. However, a given set maps
to specific address ranges. Therefore, this method requires
more detailed memory management. Page coloring is often
used to partition the cache [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ]. The address space is divided
into colors associated with the cache sets. Assigning colors
to tasks/cores provides the partitioning, assuming an
assignment that provides the correct memory for each task/core
is found. The cache hardware can also support index-based
partitioning for various benefits [
        <xref ref-type="bibr" rid="ref31 ref32">31, 32</xref>
        ]. However, some
form of software management will always be needed.
      </p>
      <p>
        Cache locking is another technique used to improve
predictability in shared caches [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. With locking, specific
cache lines can be locked to prevent them from being evicted,
ensuring they are always available for the necessary cores.
This can significantly reduce cache misses and improve
predictability. Locking can be costly. Lock management
involves tracking the locked cache lines, increasing hardware
complexity. Adding locking to a cache can reduce its
capacity or speed depending on how fine-grained the locking
is. Locking also reduces cache utilization, as any unused
locked content cannot be evicted to free up the cache lines
for needed data.
      </p>
      <p>
        T-CREST has enabled much research within various
aspects of real-time systems [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Because all of T-CREST’s
components are predictable, it is possible to implement constant
execution-time code based on the single-path paradigm
[
        <xref ref-type="bibr" rid="ref18 ref34">34, 18</xref>
        ]. Single-path has an inherently high overhead,
necessitating optimizations to reduce executed code [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ], make
best use of Patmos’ dual-issue pipeline [
        <xref ref-type="bibr" rid="ref17 ref36">36, 17</xref>
        ], and use
custom register allocation techniques [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ]. The combination of
T-CREST and single-path code has been shown to be
competitive with of-the-shelf ARM processors for a real-time
application [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ]. Research is also ongoing to port the Lingua
Franca coordination language to T-CREST to enable the
creation of complete real-time systems within one framework
[
        <xref ref-type="bibr" rid="ref39 ref40">39, 40</xref>
        ].
      </p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>The increasing importance of 5G technologies necessitates
continuous research and development into the hardware
systems implementing the technology. The diverse
requirement specifications of this new technology necessitate a
system with varying degrees of strictness and performance.
Existing systems were designed with the minimal 5G
guarantees in mind, ensuring the hard requirements, e.g., low
latency, were met before softer requirements like
throughput. This focus resulted in a divided physical system to
achieve the goals.</p>
      <p>To increase future systems’ performance while
maintaining the older system’s guarantees, this paper sets the
research direction into a mixed-criticality 5G RBS with merged
BBU and layer 2 systems. The system should be able to
execute high-criticality tasks, like those required by the URLLC
5G scenario, and low-criticality, QoS tasks, like those for the
eMMB, in one SoC. By analyzing the 5G requirement
speciifcations and the common system architecture, we propose
using the T-CREST platform as the research platform for
future mixed-criticality systems. We propose a specific system
architecture that best leverages the existing system
architecture’s strength and increases its performance through
shared caches. We propose three specific research directions
within shared L2 caches for clustered systems. The various
proposals have distinct strengths and weaknesses that will
be further explored in future work.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgment</title>
      <p>This work is partially supported by the CERCIRAS
(Connecting Education and Research Communities for an Innovative
Resource Aware Society) COST Action no. CA19135 funded
by COST (European Cooperation in Science and
Technology).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>International</given-names>
            <surname>Telecommunication Union - Radiocommunication Sector</surname>
          </string-name>
          ,
          <article-title>IMT Vision - Framework and overall objectives of the future development of IMT for 2020 and beyond</article-title>
          ,
          <source>Technical Report M.2083-0</source>
          , International Telecommunication Union,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. I. Davis</surname>
          </string-name>
          ,
          <article-title>Mixed criticality systems-a review:(february</article-title>
          <year>2022</year>
          ) (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3] ISO/IEC 7498-1:1994(E),
          <source>Information technology - Open Systems Interconnection - Basic Reference Model: The Basic Model, Technical Report 7498-1</source>
          :1994, International Organization for Standardization,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pellizzoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Betti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bak</surname>
          </string-name>
          , G. Yao,
          <string-name>
            <given-names>J.</given-names>
            <surname>Criswell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Caccamo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kegley</surname>
          </string-name>
          ,
          <article-title>A predictable execution model for cots-based embedded systems</article-title>
          ,
          <source>in: 2011 17th IEEE Real-Time and Embedded Technology and Applications Symposium</source>
          , IEEE,
          <year>2011</year>
          , pp.
          <fpage>269</fpage>
          -
          <lpage>279</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Abbaspour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Akesson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Audsley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Capasso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Garside</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Goossens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Goossens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hansen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Heckmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hepp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kasapaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Knoop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Prokesch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Pufitsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Puschner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rocha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sparsø</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tocchi</surname>
          </string-name>
          ,
          <string-name>
            <surname>T-CREST</surname>
          </string-name>
          :
          <article-title>Time-predictable multi-core architecture for embedded systems</article-title>
          ,
          <source>Journal of Systems Architecture</source>
          <volume>61</volume>
          (
          <year>2015</year>
          )
          <fpage>449</fpage>
          -
          <lpage>471</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.sysarc.
          <year>2015</year>
          .
          <volume>04</volume>
          . 002.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>International</given-names>
            <surname>Telecommunication Union - Radiocommunication Sector</surname>
          </string-name>
          ,
          <article-title>Minimum requirements related to technical performance for IMT-2020 radio interface(s)</article-title>
          ,
          <source>Technical Report M.2410-0</source>
          , International Telecommunication Union,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gong</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.-Z. Xu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rao</surname>
          </string-name>
          ,
          <article-title>ebase: A baseband unit cluster testbed to improve energyeficiency for cloud radio access network</article-title>
          ,
          <source>in: 2013 IEEE International Conference on Communications (ICC)</source>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>4222</fpage>
          -
          <lpage>4227</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Tell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>A programmable dsp core for baseband processing</article-title>
          ,
          <source>in: The 3rd International IEEENEWCAS Conference</source>
          ,
          <year>2005</year>
          ., IEEE,
          <year>2005</year>
          , pp.
          <fpage>403</fpage>
          -
          <lpage>406</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Arora</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Maia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. A.</given-names>
            <surname>Rashid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Nelissen</surname>
          </string-name>
          , E. Tovar,
          <article-title>Schedulability analysis for 3-phase tasks with partitioned fixed-priority scheduling</article-title>
          ,
          <source>Journal of Systems Architecture</source>
          <volume>131</volume>
          (
          <year>2022</year>
          )
          <fpage>102706</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H.</given-names>
            <surname>Kopetz</surname>
          </string-name>
          ,
          <string-name>
            <surname>Real-Time</surname>
            <given-names>Systems</given-names>
          </string-name>
          , Kluwer Academic, Boston, MA, USA,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Pufitsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hepp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Prokesch</surname>
          </string-name>
          ,
          <article-title>Patmos: A time-predictable microprocessor</article-title>
          ,
          <source>Real-Time Systems 54(2)</source>
          (
          <year>2018</year>
          )
          <fpage>389</fpage>
          -
          <lpage>423</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11241-018-9300-4.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brandner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sparsø</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kasapaki</surname>
          </string-name>
          ,
          <article-title>A statically scheduled time-division-multiplexed network-on-chip for real-time systems</article-title>
          ,
          <source>in: Proceedings of the 6th International Symposium on Networkson-Chip (NOCS)</source>
          , IEEE, Lyngby, Denmark,
          <year>2012</year>
          , pp.
          <fpage>152</fpage>
          -
          <lpage>160</lpage>
          . doi:
          <volume>10</volume>
          .1109/NOCS.
          <year>2012</year>
          .
          <volume>25</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E.</given-names>
            <surname>Kasapaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. B.</given-names>
            <surname>Sørensen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. T.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Goossens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sparsø</surname>
          </string-name>
          ,
          <article-title>Argo: A real-time networkon-chip architecture with an eficient GALS implementation</article-title>
          ,
          <source>IEEE Transactions on Very Large Scale Integration (VLSI) Systems</source>
          <volume>24</volume>
          (
          <year>2016</year>
          )
          <fpage>479</fpage>
          -
          <lpage>492</lpage>
          . doi:
          <volume>10</volume>
          . 1109/TVLSI.
          <year>2015</year>
          .
          <volume>2405614</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <article-title>Exploration of network interface architectures for a real-time network-on-chip</article-title>
          ,
          <source>in: Proceedings of the 2024 IEEE 27th International Symposium on Real-Time Distributed Computing (ISORC)</source>
          , IEEE,
          <string-name>
            <surname>United</surname>
            <given-names>States</given-names>
          </string-name>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1109/ ISORC61049.
          <year>2024</year>
          .
          <volume>10551364</volume>
          ,
          <string-name>
            <surname>2024</surname>
            <given-names>IEEE</given-names>
          </string-name>
          <source>27th International Symposium on Real-Time Distributed Computing</source>
          , ISORC ; Conference date:
          <fpage>22</fpage>
          -
          <lpage>05</lpage>
          -2024 Through 25-
          <fpage>05</fpage>
          -
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. V.</given-names>
            <surname>Chong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Pufitsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sparsø</surname>
          </string-name>
          ,
          <article-title>A time-predictable memory network-on-chip</article-title>
          ,
          <source>in: Proceedings of the 14th International Workshop on WorstCase Execution Time Analysis (WCET</source>
          <year>2014</year>
          ), Madrid, Spain,
          <year>2014</year>
          , pp.
          <fpage>53</fpage>
          -
          <lpage>62</lpage>
          . doi:
          <volume>10</volume>
          .4230/OASIcs.WCET.
          <year>2014</year>
          .
          <volume>53</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <surname>W. Zhang,</surname>
          </string-name>
          <article-title>A time-predictable VLIW processor and its compiler support, Real-Time Syst</article-title>
          .
          <volume>38</volume>
          (
          <year>2008</year>
          )
          <fpage>67</fpage>
          -
          <lpage>84</lpage>
          . doi:http://dx.doi.org/10.1007/ s11241-007-9030-5.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Maroun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Puschner</surname>
          </string-name>
          ,
          <article-title>Predictable and optimized single-path code for predicated processors</article-title>
          ,
          <source>Journal of Systems Architecture</source>
          (
          <year>2024</year>
          )
          <fpage>103214</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Maroun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Puschner</surname>
          </string-name>
          ,
          <article-title>Compilerdirected constant execution time on flat memory systems</article-title>
          ,
          <source>in: 2023 IEEE 26th International Symposium on Real-Time Distributed Computing (ISORC)</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>64</fpage>
          -
          <lpage>75</lpage>
          . doi:
          <volume>10</volume>
          .1109/ISORC58943.
          <year>2023</year>
          .
          <volume>00019</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <article-title>A time predictable instruction cache for a Java processor</article-title>
          ,
          <source>in: On the Move to Meaningful Internet Systems 2004: Workshop on Java Technologies for Real-Time and Embedded Systems (JTRES</source>
          <year>2004</year>
          ), volume
          <volume>3292</volume>
          <source>of LNCS</source>
          , Springer, Agia Napa, Cyprus,
          <year>2004</year>
          , pp.
          <fpage>371</fpage>
          -
          <lpage>382</lpage>
          . doi:
          <volume>10</volume>
          .1007/b102133.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>P.</given-names>
            <surname>Degasperi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hepp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Pufitsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <article-title>A method cache for Patmos</article-title>
          ,
          <source>in: Proceedings of the 17th IEEE Symposium on Object/Component/Serviceoriented Real-time Distributed Computing (ISORC</source>
          <year>2014</year>
          ), IEEE, Reno, Nevada, USA,
          <year>2014</year>
          , pp.
          <fpage>100</fpage>
          -
          <lpage>108</lpage>
          . doi:
          <volume>10</volume>
          .1109/ISORC.
          <year>2014</year>
          .
          <volume>47</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>B.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hepp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <article-title>Scope-based method cache analysis</article-title>
          ,
          <source>in: Proceedings of the 14th International Workshop on Worst-Case Execution Time Analysis (WCET</source>
          <year>2014</year>
          ), Madrid, Spain,
          <year>2014</year>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>82</lpage>
          . doi:
          <volume>10</volume>
          .4230/OASIcs.WCET.
          <year>2014</year>
          .
          <volume>73</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>S.</given-names>
            <surname>Abbaspour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brandner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <article-title>A timepredictable stack cache</article-title>
          ,
          <source>in: Proceedings of the 9th Workshop on Software Technologies for Embedded and Ubiquitous Systems</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>A.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brandner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <article-title>Static analysis of worst-case stack cache behavior</article-title>
          ,
          <source>in: Proceedings of the 21st International Conference on Real-Time Networks and Systems (RTNS</source>
          <year>2013</year>
          ), ACM, New York, NY, USA,
          <year>2013</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>64</lpage>
          . doi:
          <volume>10</volume>
          .1145/2516821.2516828.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Maroun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Dengler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dietrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Hepp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Herzog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Huber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Knoop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wiltsche-Prokesch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Puschner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rafeck</surname>
          </string-name>
          , et al.,
          <article-title>The platin multi-target worst-case analysis tool</article-title>
          , in: 22nd
          <source>International Workshop on Worst-Case Execution Time Analysis (WCET</source>
          <year>2024</year>
          ),
          <source>Schloss Dagstuhl-Leibniz-Zentrum für Informatik</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pircher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baranyai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lehr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <article-title>Accelerator interface for patmos</article-title>
          ,
          <source>in: 2021 IEEE Nordic Circuits and Systems Conference (NORCAS): NORCHIP and International Symposium of System-on-Chip (SoC)</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>B. C.</given-names>
            <surname>Ward</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Herman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. J.</given-names>
            <surname>Kenna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. H.</given-names>
            <surname>Anderson</surname>
          </string-name>
          ,
          <article-title>Making shared caches more predictable on multicore platforms</article-title>
          ,
          <source>in: 2013 25th Euromicro Conference on Real-Time Systems</source>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>167</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>G.</given-names>
            <surname>Gracioli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Alhammad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mancuso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Fröhlich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pellizzoni</surname>
          </string-name>
          ,
          <article-title>A survey on cache management mechanisms for real-time embedded systems</article-title>
          ,
          <source>ACM Computing Surveys (CSUR) 48</source>
          (
          <year>2015</year>
          )
          <fpage>1</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <article-title>A survey of techniques for cache partitioning in multicore processors</article-title>
          ,
          <source>ACM Computing Surveys (CSUR) 50</source>
          (
          <year>2017</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>X.</given-names>
            <surname>Vera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Lisper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xue</surname>
          </string-name>
          ,
          <article-title>Data caches in multitasking hard real-time systems</article-title>
          ,
          <source>in: RTSS 2003. 24th IEEE Real-Time Systems Symposium</source>
          ,
          <year>2003</year>
          , IEEE,
          <year>2003</year>
          , pp.
          <fpage>154</fpage>
          -
          <lpage>165</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>T.</given-names>
            <surname>Lugo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lozano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Carretero</surname>
          </string-name>
          ,
          <article-title>A survey of techniques for reducing interference in real-time applications on multicore platforms</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          )
          <fpage>21853</fpage>
          -
          <lpage>21882</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chousein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. N.</given-names>
            <surname>Mahapatra</surname>
          </string-name>
          ,
          <article-title>Fully associative cache partitioning with don't care bits for real-time applications</article-title>
          ,
          <source>ACM SIGBED Review</source>
          <volume>2</volume>
          (
          <year>2005</year>
          )
          <fpage>35</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Time-sensitivity-aware shared cache architecture for multi-core embedded systems</article-title>
          ,
          <source>The Journal of Supercomputing</source>
          <volume>75</volume>
          (
          <year>2019</year>
          )
          <fpage>6746</fpage>
          -
          <lpage>6776</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <article-title>A survey of techniques for cache locking</article-title>
          ,
          <source>ACM Transactions on Design Automation of Electronic Systems (TODAES) 21</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>P.</given-names>
            <surname>Puschner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Burns</surname>
          </string-name>
          ,
          <article-title>Writing temporally predictable code</article-title>
          ,
          <source>in: Proceedings of the The Seventh IEEE International Workshop on Object-Oriented Real-Time Dependable Systems (WORDS</source>
          <year>2002</year>
          ), IEEE Computer Society, Washington, DC, USA,
          <year>2002</year>
          , pp.
          <fpage>85</fpage>
          -
          <lpage>94</lpage>
          . doi:
          <volume>10</volume>
          . 1109/WORDS.
          <year>2002</year>
          .
          <volume>1000040</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Maroun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          , P. Puschner,
          <article-title>ConstantLoop Dominators for Single-Path Code Optimization</article-title>
          , in: P. Wägemann (Ed.),
          <source>21th International Workshop on Worst-Case Execution Time Analysis (WCET</source>
          <year>2023</year>
          ), volume
          <volume>114</volume>
          of Open Access Series in Informatics (OASIcs),
          <source>Schloss Dagstuhl - Leibniz-Zentrum für Informatik</source>
          , Dagstuhl, Germany,
          <year>2023</year>
          , pp.
          <volume>7</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          :
          <fpage>13</fpage>
          . URL: https://drops.dagstuhl.de/opus/volltexte/2023/ 18436. doi:
          <volume>10</volume>
          .4230/OASIcs.WCET.
          <year>2023</year>
          .
          <volume>7</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Maroun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Puschner</surname>
          </string-name>
          ,
          <article-title>Compiling for time-predictability with dual-issue single-path code</article-title>
          ,
          <source>Journal of Systems Architecture</source>
          <volume>118</volume>
          (
          <year>2021</year>
          )
          <fpage>1</fpage>
          -
          <lpage>11</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <given-names>E.</given-names>
            <surname>Maroun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Puschner</surname>
          </string-name>
          ,
          <article-title>Two-step register allocation for implementing single-path code</article-title>
          ,
          <source>in: Proceedings of the 2024 IEEE 27th International Symposium on Real-Time Distributed Computing (ISORC)</source>
          , IEEE,
          <string-name>
            <surname>United</surname>
            <given-names>States</given-names>
          </string-name>
          ,
          <year>2024</year>
          . doi:
          <volume>10</volume>
          .1109/ ISORC61049.
          <year>2024</year>
          .
          <volume>10551362</volume>
          ,
          <string-name>
            <surname>2024</surname>
            <given-names>IEEE</given-names>
          </string-name>
          <source>27th International Symposium on Real-Time Distributed Computing</source>
          , ISORC ; Conference date:
          <fpage>22</fpage>
          -
          <lpage>05</lpage>
          -2024 Through 25-
          <fpage>05</fpage>
          -
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <given-names>M.</given-names>
            <surname>Platzer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Puschner</surname>
          </string-name>
          ,
          <article-title>A real-time application with fully predictable task timing</article-title>
          ,
          <source>in: 2020 IEEE 23rd International Symposium on Real-Time Distributed Computing (ISORC)</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>43</fpage>
          -
          <lpage>46</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>E.</given-names>
            <surname>Khodadad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pezzarossa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <article-title>Towards lingua franca on the patmos processor</article-title>
          ,
          <source>in: Proceedings of the 2024 IEEE 27th International Symposium on Real-Time Distributed Computing (ISORC)</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schoeberl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Khodadad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. J.</given-names>
            <surname>Maroun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pezzarossa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Lee</surname>
          </string-name>
          , Invited Paper:
          <article-title>Worst-Case Execution Time Analysis of Lingua Franca Applications</article-title>
          , in: T. Carle (Ed.),
          <source>22nd International Workshop on Worst-Case Execution Time Analysis (WCET</source>
          <year>2024</year>
          ), volume
          <volume>121</volume>
          of Open Access Series in Informatics (OASIcs),
          <source>Schloss Dagstuhl - Leibniz-Zentrum für Informatik</source>
          , Dagstuhl, Germany,
          <year>2024</year>
          , pp.
          <volume>4</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>4</lpage>
          :
          <fpage>13</fpage>
          . doi:
          <volume>10</volume>
          .4230/OASIcs.WCET.
          <year>2024</year>
          .
          <volume>4</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>