<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Leveraging the MLIR Infrastructure for the Computing Continuum</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jiahong Bi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guilherme Korol</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jeronimo Castrillon</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technische Universität Dresden (TUD)</institution>
          ,
          <addr-line>Helmholtzstraße 18, 01069, Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With an ever-increasing number of connected devices (e.g., IoT), cloud computing faces eficiency challenges due to complex infrastructure, high communication costs, and privacy. Fog and edge computing enable computing closer to data sources, ofering alternatives to the limitations of relying exclusively on the cloud. When combined with high-performance cloud platforms, fog, and edge devices form a computing continuum. However, the continuum challenges designers who need to compile and deploy on distributed and heterogeneous devices and optimize for a diverse set of nonfunctional requirements. To ease the usage and ensure the full potential of the continuum, a Design and Programming Environment (DPE) that is interoperable, reusable, portable, and cross-layer is needed. In this context, the Multi-Level Intermediate Representation (MLIR) becomes vital since it provides an extensible and reusable compiler infrastructure. The project development of a continuum-oriented DPE leveraging the MLIR infrastructure is discussed in this paper as a work in progress.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Computing continuum</kwd>
        <kwd>Domain Specific Language</kwd>
        <kwd>Compiler Optimizations</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Cloud computing has emerged as a critical technology in the industry over the past years
due to its flexibility in managing information and resources across the Internet. It has also
relieved users from the burden of configuring their working environments, allowing them to
reduce infrastructural costs. However, in recent years, the rise of Artificial Intelligence (AI)
related technologies and the Internet-of-Things (IoT) has made relying solely on cloud-based
computing increasingly challenging. This is due to the significant energy consumption and
communication costs associated with real-time interactions between the cloud and devices.
New computing paradigms, such as fog computing and edge computing, have been introduced
as extensions. These approaches aim to address the limitations of cloud-based computing
by distributing computational tasks closer to the data source. Cloud, edge, and fog form a
computing continuum [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], posing new challenges, such as partitioning an application between
nodes, compiling applications to these distributed and heterogeneous devices, and seamlessly
and eficiently migrating workloads across the continuum.
      </p>
      <p>
        The MYRTUS [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] project aims to address these challenges. More specifically, MYRTUS aims to
provide the technology to enable cyber-physical systems to evolve towards a living dimension,
contributing to integrating edge, fog, and cloud computing platforms into a seamless execution
environment and providing languages and tools to orchestrate collaborative, distributed, and
decentralized components. One key component of the MYRTUS project is a so-called DPE, which
deals with multiple aspects of high-level application modeling, model-based design and synthesis,
and high-level compilation for adaptable execution on heterogeneous resources. This paper
describes initial research and plans for the high-level compilation framework of the DPE known
as Node-Level Optimization and Deployment (NLOP). Notably, the proposed NLOP addresses
the compilation in the following aspects:
• Interoperate with model-based frameworks for automatic code generation and deployment
to ensure interoperability;
• Integrate with productivity-oriented programming frameworks and Python-based Domain
      </p>
      <p>Specific Languages (DSLs) to enhance developer eficiency and ease of use;
• Support diferent architectures with a focus on accelerators for eficient processing, such
as Coarse-Grained Reconfigurable Architectures (CGRAs) and Field-Programmable Gate
Arrays (FPGAs);
• Provide automatic insertion of adaptivity knobs for runtime adaptation.</p>
      <p>The MLIR project will be leveraged to develop the features above in a unified compilation
lfow. MLIR ofers us a framework in which we can extend our needs for the continuum, reusing
state-of-the-art compilation tools like Low-Level Virtual Machine (LLVM)’s backends and
optimizers, supporting hardware heterogeneity, and integrating external tools that will facilitate
the construction of the DPE’s NLOP.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        This section presents a brief background on the concepts and tools that will serve to develop the
NLOP. This includes, a brief introduction to MLIR, initial MLIR-based infrastructure developed
in a previous EU project, and fundamentals of the adaptable models of computation.
2.1. MLIR
MLIR is a promising framework for constructing reusable and extensible compiler infrastructure.
It aims to tackle software fragmentation, enhance compilation for diverse hardware systems,
considerably lower the expenses associated with developing domain-specific compilers, and
facilitate the integration of existing compilers. It extends the monolithic LLVM IR into multi-level
abstractions, each of which serves its own purpose and has specific functionalities, such as
arith for arithmetic operations and linalg for linear algebra. The infrastructure of MLIR
makes it possible to seamlessly transition from abstract, high-level representations to concrete
executable code. This makes MLIR an enabler for the eficient implementation of DSLs and other
programming constructs [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>In MLIR, we can roughly understand that everything is an operation. These operations exist
in diferent dialects, each serving its own abstraction level. MLIR allows users to define custom
dialects, along with custom types, interfaces, etc. One of the most important infrastructure
within MLIR is Pass, with which we can lower or convert one dialect to another by giving several
rewrite patterns.</p>
      <sec id="sec-2-1">
        <title>2.2. System Development Kit of EVEREST</title>
        <p>External frontends
onnx torch
tosa</p>
        <p>quant
EVEREST frontends</p>
        <p>cfdlang
base2</p>
        <p>ekl
Condrust/ohua</p>
        <p>Entry dialects
jabbah
esn
teil
cyclic</p>
        <p>ub
MLIR dialects
tensor affine
linalg buffer
dfg evp olympus
Coordination, integration, backend</p>
        <p>bit
Arithmetic
support
External
backends
gpu
hw
fsm
A compelling example of how the
MLIR multi-level abstractions can
be leveraged is demonstrated in
the Software Development Kit
(SDK) of the EVEREST project.</p>
        <p>
          EVEREST is a H2020 EU project
that aims to simplify the
development of complex big data
applications for FPGA-based data
centers [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. The EVEREST SDK is
a framework designed to optimize
dfg.operator @add
   inputs(%in0: i32, %in1: i32)
   outputs(%sum: i32)
{
   %0 = arith.addi %in0, %in1 : i32
   dfg.output %0 : i32
selected kernels in the application workflow [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Built upon MLIR, the SDK supports diferent
input languages into a unified system and hardware generation, connecting to diferent downstream
High-Level Synthesis (HLS) tools. Several dialects, optimizations, and abstraction lowerings are
implemented for this workflow.
        </p>
        <p>
          The main dialects and their relations are shown in Figure 1. Machine Learning (ML)
applications from tvm can be translated into the jabbah dialect. The SDK also includes
dialects for kernel language frontend (ekl), the coordination dialect dfg-mlir, and a DSL
cfdlang. ekl and cfdlang can be converted to an MLIR implementation of the intermediate
tensor language teil [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ] and a dialect for Einstein notation esn. These abstractions are
employed to execute a series of transformations. The EVEREST MLIR stack demonstrates the
Multi-Level abstraction methodology to deploy applications within a cluster with FPGAs [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. In
MYRTUS, we will build on top of these abstractions, extend them, and enable deployment on the
computing continuum.
        </p>
        <sec id="sec-2-1-1">
          <title>2.2.1. The dfg-mlir Dialect</title>
          <p>The dfg-mlir of the EVEREST SDK will be extended to cater for requirements in MYRTUS. In the
dfg-mlir dialect, a user can define an Homogeneous Synchronous Data-Flow (HSDF) node using
a custom operation dfg.operator (see Figure 2a). Users can define input and output ports and
perform any operations with them. The definition of ports is followed by an MLIR region with
only one block. Users can use any MLIR operation inside this region such as arith.addi from
the arith dialect. The returned result is an MLIR Value that can be used in other operations as
operands. An Output operation indicates which values should be output. The input/output
ports are connected to channels, which are implicitly pulled/pushed from/to at the beginning
and end of the region.</p>
          <p>For broader modelling of a Data-Flow Graph (DFG), dfg-mlir also provides a Process
operation. This operation has a similar syntax to an Operator but is capable of describing a
Kahn Process Network (KPN) node, which means that users can pull/push from/to the channels
multiple times. There is a Pass inside dfg-mlir, which can convert every Operator to the
equivalent Process operation, as shown in Figure 2b.</p>
          <p>
            dfg-mlir supports diferent hardware platforms, enabling parallel execution of DFGs. The
CPU backend, for instnace, lowers to OpenMP. For hardware generation, dfg-mlir can be
lowered to Olympus [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] which, with the help of the Bambu [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] HLS tool, can deploy the graph
onto CPU-FPGA heterogeneous system. An extended FPGA backend was introduced in [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ],
which allows for a more generic execution on FPGAs using the CIRCT project as backend.
2.3. Adaptable Models of Computation
dfg-mlir demonstrates the usage of the data flow Model of Computation (MoC), which depicts
systems as graphs of computational entities and communication channels. MoCs introduce
an alternative to traditional programming methods for fully leveraging highly heterogeneous
platforms such as the ones in the MYRTUS continuum. However, mapping DFGs is a widely
studied yet not solved challenge [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]. Traditionally, mappings can be determined at design time
using Design and Space Exploration (DSE) or at runtime based on the current workload of the
hardware, managed by a Runtime Manager (RM). Currently, dfg-mlir relies on the system’s
RM. To leverage both mapping methods, the Hybrid Application Mapping (HAM) approach
is introduced to find near-optimal mappings at design time and adapt to workload changes
at runtime [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ]. Taking this further, Khasanov et al. recently enhanced HAM by leveraging
a genetic algorithm to find spatial-temporal mappings for the MoC [
            <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
            ]. This approach
considers expected workload changes and generates more eficient mappings.
          </p>
          <p>
            For DSE, tools like Mocasin can be utilized. Mocasin [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ] is an open-source research
environment to explore mapping algorithms and novel data structures representing the mapping
space. Mocasin features an abstract modular architecture encompassing commonly used DFG
MoCs and the related tool flows, enabling the composition of these flows. There is an integrated
high-level simulator that can generate a tracing file, with which users can check the execution of
each node in the DFG. Mocasin can run DSE to find the Pareto points in the design space based
on the DFG and platform. Objectives can be selected from execution time, energy consumption,
and resource utilization. Within Mocasin, users can freely design their platforms using the
provided infrastructure, such as the definition of clusters, Processing Elements (PEs), and
Network on Chip (NoC). Mocasin also allows users to define their own MoC input for instance,
a custom format in YAML.
          </p>
          <p>Dataflow lacks reactive behavior to inputs from the environment which is key in the context of
Cyber-Physical System (CPS). Recently, LinguaFranca [17] emerged as a coordination language
for CPSs, extending dataflow with time semantics using the discrete event model with explicit
semantics of time [18]. LinguaFranca adopts the reactor model [19] and supports various
runtimes capable of concurrent and distributed execution. The reactor model also supports
topological changes to the underlying dataflow graph for adaptable execution. In MYRTUS, we will
borrow ideas from LinguaFranca to enable reactive and adaptive execution in the computing
continuum.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Work-in-Progress</title>
      <p>This section gives an overview of the ongoing works and plans for the MYRTUS DPE’s NLOP .
First, a general overview of the NLOP , including its main components, inputs, and outputs, is
given. Next, we detail two work fronts currently taking place for extending the dfg-mlir dialect
for the NLOP.</p>
      <sec id="sec-3-1">
        <title>3.1. General Overview</title>
        <p>While the PyTorch models can be translated into torch-mlir dialect, all inputs (C/C++ or
PyTorch) are converted into the MLIR domain. Additionally, NLOP will support techniques such
as in [21] in MLIR to further optimize the computation of PyTorch models. These techniques are
applied at the PyTorch level to save execution time and energy.</p>
        <p>Subsequently, passes will be implemented to translate MLIR programs from the previous step
into custom dialects (see Figure 3), including dfg-mlir. This process includes automatic DFG
recognition and generation, converting them to the custom dialects while maintaining the same
semantics. Once the program in custom dialects is generated, several analyses and optimizations
will be performed to obtain the quasi-optimal DFG based on a cost function or similar technology.
To that end, Mocasin will be used to run simulations and DSE to identify the best mapping
and partitioning for deployment on heterogeneous nodes. Finally, the NLOP generates low-level
Intermediate Representation (IR) or code for the supported FPGA platforms with MDC [22] and
CGRAs with STRELA [23].
3.2. dfg-mlir with Mocasin
We assume that all the nodes in the generated dfg-mlir program will be an Opeartor, which
means from/to each port, we only pull/push one data in each iteration. However, there is a
limitation with the Operator operation shown in Figure 2a: it lacks the ability to take values
from the previous iteration. With the syntax of Single Static Assignment (SSA), it is illegal to
directly use the result value as operand. This limits our ability to translate a wider range of
applications into an Opeartor in dfg-mlir (e.g., a Multiply Accumulate (MAC) operation).
dfg.operator @mac To address this issue, we introduced the iteration
       inputs(%in0: i32, %in1: i32) arguments syntax to Operator. As shown in Figure 4,
       outputs(%out: i32) an iter_args list can be added after defining the
       i n i tiiatleirz_ear g{s(%sum: i32) input and output ports. If this list is present, an
       %0 = arith.constant 0 : i32 initialize region must be appended to provide the
       dfg.yield %0 : i32 initial values for each iteration argument. In the
   }  {%0 = arith.muli %in0, %in1 : i32 body region, these iteration arguments can be used
   %1 = arith.addi %0, %sum : i32 like any other values in any operation. To pass the
}      ddffgg..oyuitepludt  %1%1  ::  i3i232 ruesseudlttoofupcudarrteenttheitmeraattiothneteondthoef nOepxeta, ratoYri.eld is</p>
        <p>To integrate with Mocasin we implemented a YAML
Figure 4: Iteration arguments support reader as well as a new CGRA platform. Within
LLVM, we developed a transformation pass that outputs the internal dataflow of an Operator
to a YAML file. This file contains the information on each node, their ports and the channels
connecting them. If there is iteration argument, an initial token will be added in the channel,
representing a backedge in the loop graphically.</p>
        <p>As mentioned in Section 2, we will expand the semantics of the underlying MoC to account
for runtime adaptivity and reactive behavior.
3.3. dfg-mlir atop CIRCT
Currently, CIRCT relies on Polygeist to read in C/C++ programs into MLIR. Each function
is then turned into CIRCT’s entry dialect, called handshake. Passes are available to lower
handshake into low-level dialects within CIRCT, ultimately generating System Verilog code.
However, handshake can only describe HSDFs, as it assumes that users will only pull/push
one data from/to the ports by default. More expressive computational graphs coming from
high-level DSLs, e.g., using Synchronous Data-Flow (SDF) or KPN semantics, cannot use CIRCT
as backend at the moment.</p>
        <p>As discussed in Section 2, dfg-mlir supports more expressive MoCs (ProcessOp for KPNs).
Within dfg-mlir, we have also implemented passes that can directly generate low-level dialects
in CIRCT before generating the System Verilog code, such as fsm for Finite State Machine (FSM)
generation and sv for SystemVerilog syntax. dfg-mlir also uses elastic circuit for each port,
the same as handshake, meaning each port will be converted into three signals: valid, data, and
ready. For the multiple pulls/pushes behavior in a KPN, we will generate a FSM to correctly
handle the elastic circuit signals.</p>
        <p>Another consideration is that handshake only inserts a buffer operation, which has a capacity
of two elements between two ports for synchronizing diferent modules. In contrast, we implement
a more flexible channel inspired by Chisel [24]. Currently, the channel’s capacity is manually
controlled. By utilizing the work of Josipović et al. [25], we aim to improve the sizing of channels
automatically. With all these features, our approach allows users to have a more powerful way
to describe a wider range of DFGs. In the project, this will be extended to support reactive and
adaptive execution.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>In this paper, we introduced current eforts to implement the NLOP phase of the MYRTUS DPE.
Naturally, some challenges remain to be tackled throughout the NLOP development , including:
• System integration: This involves connecting application components, utilizing various
abstractions, navigating among transformations and trade-ofs in heterogeneous and
distributed computing environments, with custom MLIR dialects code generation being the
ifnal step.
• CGRA mapping: The integration of dfg-mlir and Mocasin for CGRA mapping exploration
will not be limited to supporting specific architectures such as STRELA in Figure 3. The
approach will support diferent CGRAs by accepting architecture properties.
• CIRCT extension: CIRCT ofers an alternative approach to HLS but has some limitations
that must be addressed. For instance, the handshake dialect can adopt the dfg-mlir
semantics. Another critical extension is the support for pipelining, which is typically
available in HLS tools as pragmas. To avoid vendor-locking, how to automatically apply
diferent optimization pragmas will also be explored.
• Adaptive MoC execution: Applying HAM at the MLIR level and taking LinguaFranca’s
time semantics could also be explored.</p>
      <p>By addressing these points, the DPE’s NLOP can fully leverage the resources of the continuum.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was funded by the EU Horizon Europe Programme under grant agreement No
101135183 (MYRTUS). Views and opinions expressed are however those of the author(s) only
and do not necessarily reflect those of the European Union. Neither the European Union nor
the granting authority can be held responsible for them.
Mocasin—rapid prototyping of rapid prototyping tools: A framework for exploring new
approaches in mapping software to heterogeneous multi-cores, in: Proceedings of the 2021
Drone Systems Engineering and Rapid Simulation and Performance Evaluation: Methods
and Tools Proceedings, 2021, pp. 66–73.
[17] C. Menard, M. Lohstroh, S. Bateni, M. Chorlian, A. Deng, P. Donovan, C. Fournier, S. Lin,
F. Suchert, T. Tanneberger, H. Kim, J. Castrillon, E. A. Lee, High-performance deterministic
concurrency using lingua franca, ACM Transactions on Architecture and Code Optimization
(TACO) 20 (2023) 1–29. URL: https://doi.org/10.1145/3617687. doi:10.1145/3617687.
[18] M. Lohstroh, C. Menard, A. Schulz-Rosengarten, M. Weber, J. Castrillon, E. A. Lee, A
language for deterministic coordination across multiple timelines, in: 2020 Forum for
Specification and Design Languages (FDL), 2020, pp. 1–8. URL: https://ieeexplore.ieee.
org/document/9232939. doi:10.1109/FDL50818.2020.9232939.
[19] M. Lohstroh, Í. Í. Romero, A. Goens, P. Derler, J. Castrillon, E. A. Lee, A.
SangiovanniVincentelli, Reactors: A deterministic model for composable reactive systems, in:
R. Chamberlain, M. Edin Grimheden, W. Taha (Eds.), Cyber Physical Systems.
ModelBased Design – Proceedings of the 9th Workshop on Design, Modeling and Evaluation
of Cyber Physical Systems (CyPhy 2019) and the Workshop on Embedded and
CyberPhysical Systems Education (WESE 2019), Springer International Publishing, Cham,
2020, pp. 59–85. URL: https://link.springer.com/chapter/10.1007/978-3-030-41131-2_4.
doi:10.1007/978-3-030-41131-2_4.
[20] W. S. Moses, L. Chelini, R. Zhao, O. Zinenko, Polygeist: Raising c to polyhedral mlir, in:
2021 30th International Conference on Parallel Architectures and Compilation Techniques
(PACT), IEEE, 2021, pp. 45–59.
[21] G. Korol, M. G. Jordan, M. B. Rutzig, J. Castrillon, A. C. S. Beck, Pruning and early-exit
co-optimization for cnn acceleration on fpgas, in: 2023 Design, Automation and Test in
Europe Conference and Exhibition (DATE), 2023, pp. 1–6. doi:10.23919/DATE56975.2023.
10137244.
[22] F. Manca, F. Ratto, F. Palumbo, Onnx-to-hardware design flow for adaptive neural-network
inference on fpgas, arXiv preprint arXiv:2406.09078 (2024).
[23] D. Vazquez, J. Miranda, A. Rodriguez, A. Otero, P. D. Schiavone, D. Atienza, Strela:
Streaming elastic cgra accelerator for embedded systems, arXiv preprint arXiv:2404.12503
(2024).
[24] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek,
K. Asanović, Chisel: constructing hardware in a scala embedded language, in: Proceedings
of the 49th Annual Design Automation Conference, 2012, pp. 1216–1225.
[25] L. Josipović, S. Sheikhha, A. Guerrieri, P. Ienne, J. Cortadella, Bufer placement and sizing
for high-performance dataflow circuits, ACM Transactions on Reconfigurable Technology
and Systems (TRETS) 15 (2021) 1–32.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kimovski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Mathá</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hammer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Mehran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Hellwagner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Prodan</surname>
          </string-name>
          , Cloud, fog, or edge: Where to compute?,
          <source>IEEE Internet Computing</source>
          <volume>25</volume>
          (
          <year>2021</year>
          )
          <fpage>30</fpage>
          -
          <lpage>36</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Palumbo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. K.</given-names>
            <surname>Zedda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fanni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bagnato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Castello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castrillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Ponte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Driessen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fadda</surname>
          </string-name>
          , et al.,
          <article-title>Myrtus: Multi-layer 360 dynamic orchestration and interoperable design environment for compute-continuum systems</article-title>
          ,
          <source>in: Proceedings of the 21st ACM International Conference on Computing Frontiers Workshops and Special Sessions</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>101</fpage>
          -
          <lpage>106</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lattner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Amini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>U.</given-names>
            <surname>Bondhugula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Davis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pienaar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Riddle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Shpeisman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Vasilache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Zinenko</surname>
          </string-name>
          , Mlir:
          <article-title>Scaling compiler infrastructure for domain specific computation</article-title>
          ,
          <source>in: 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>2</fpage>
          -
          <lpage>14</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pilato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Banik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Beránek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brocheton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castrillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cevasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Curzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ferrandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. F.</given-names>
            <surname>Friebel</surname>
          </string-name>
          , et al.,
          <article-title>A system development kit for big data applications on fpga-based clusters: The everest approach</article-title>
          , in: 2024 Design,
          <article-title>Automation and Test in Europe Conference and Exhibition (DATE)</article-title>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pilato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bohm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Brocheton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castrillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cevasco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Cima</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Diamantopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ferrandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martinovic</surname>
          </string-name>
          , G. Palermo,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paolino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Parodi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Pittaluga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Raho</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Regazzoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Slaninova</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Hagleitner, EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms</article-title>
          ,
          <source>in: Proceedings of the 2021 Design, Automation and Test in Europe Conference (DATE)</source>
          ,
          <source>DATE'21</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1320</fpage>
          -
          <lpage>1325</lpage>
          . URL: https://ieeexplore.ieee.org/document/9473940. doi:
          <volume>10</volume>
          .23919/DATE51398.
          <year>2021</year>
          .
          <volume>9473940</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Rink</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Castrillon,</surname>
          </string-name>
          <article-title>TeIL: a type-safe imperative Tensor Intermediate Language</article-title>
          ,
          <source>in: Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages</source>
          , and
          <article-title>Compilers for Array Programming (ARRAY)</article-title>
          ,
          <source>ARRAY</source>
          <year>2019</year>
          , ACM, New York, NY, USA,
          <year>2019</year>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>68</lpage>
          . URL: http://doi.acm.
          <source>org/10</source>
          .1145/3315454.3329959. doi:
          <volume>10</volume>
          .1145/ 3315454.3329959.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Susungi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Rink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castrillon</surname>
          </string-name>
          , C. Tadonki,
          <article-title>Meta-programming for cross-domain tensor optimizations</article-title>
          ,
          <source>in: Proceedings of 17th ACM SIGPLAN International Conference on Generative Programming: Concepts and Experiences (GPCE'18)</source>
          ,
          <source>GPCE</source>
          <year>2018</year>
          , ACM, New York, NY, USA,
          <year>2018</year>
          , pp.
          <fpage>79</fpage>
          -
          <lpage>92</lpage>
          . URL: http://doi.acm.
          <source>org/10</source>
          .1145/ 3278122.3278131. doi:
          <volume>10</volume>
          .1145/3278122.3278131.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Soldavini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K. F. A.</given-names>
            <surname>Friebel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tibaldi</surname>
          </string-name>
          , G. Hempel,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castrillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pilato</surname>
          </string-name>
          ,
          <article-title>Automatic creation of high-bandwidth memory architectures from domain-specific languages: The case of computational fluid dynamics</article-title>
          ,
          <source>ACM Transactions on Reconfigurable Technology and Systems (TRETS) 16</source>
          (
          <year>2023</year>
          ). URL: https://doi.org/10.1145/3563553. doi:
          <volume>10</volume>
          .1145/3563553.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Soldavini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pilato</surname>
          </string-name>
          ,
          <article-title>Platform-aware fpga system architecture generation based on mlir</article-title>
          ,
          <source>arXiv preprint arXiv:2309.12917</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Ferrandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. G.</given-names>
            <surname>Castellana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Curzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Fezzardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fiorito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lattuada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Minutoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pilato</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Tumeo</surname>
          </string-name>
          ,
          <article-title>Bambu: an open-source research framework for the high-level synthesis of complex applications</article-title>
          ,
          <source>in: 2021 58th ACM/IEEE Design Automation Conference (DAC)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1327</fpage>
          -
          <lpage>1330</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bi</surname>
          </string-name>
          ,
          <article-title>A lowering for high-level data flows to reconfigurable hardware (</article-title>
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>J.</given-names>
            <surname>Castrillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Desnos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Menard</surname>
          </string-name>
          ,
          <source>Dataflow Models of Computation for Programming Heterogeneous Multicores</source>
          , Springer Nature Singapore, Singapore,
          <year>2023</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>40</lpage>
          . URL: https://doi.org/10.1007/
          <fpage>978</fpage>
          -981-15-6401-7_
          <fpage>45</fpage>
          -
          <lpage>2</lpage>
          . doi:
          <volume>10</volume>
          .1007/
          <fpage>978</fpage>
          -981-15-6401-7_
          <fpage>45</fpage>
          -
          <lpage>2</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>R.</given-names>
            <surname>Khasanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castrillon</surname>
          </string-name>
          ,
          <article-title>Energy-eficient runtime resource management for adaptable multi-application mapping</article-title>
          ,
          <source>in: Proceedings of the 2020 Design, Automation and Test in Europe Conference (DATE)</source>
          ,
          <source>DATE '20</source>
          , IEEE,
          <year>2020</year>
          , pp.
          <fpage>909</fpage>
          -
          <lpage>914</lpage>
          . URL: https://ieeexplore. ieee.org/document/9116381. doi:
          <volume>10</volume>
          .23919/DATE48585.
          <year>2020</year>
          .
          <volume>9116381</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Khasanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dietrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castrillon</surname>
          </string-name>
          ,
          <article-title>Flexible spatio-temporal energy-eficient runtime management</article-title>
          ,
          <source>in: 29th Asia and South Pacific Design Automation Conference (ASPDAC'24)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>777</fpage>
          -
          <lpage>784</lpage>
          . URL: https://ieeexplore.ieee.org/document/10473885. doi:
          <volume>10</volume>
          .1109/ASP-DAC58780.
          <year>2024</year>
          .
          <volume>10473885</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Smejkal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khasanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castrillon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Härtig</surname>
          </string-name>
          , E-Mapper:
          <article-title>Energy-eficient resource allocation for traditional operating systems on heterogeneous processors, 2024</article-title>
          . URL: https: //arxiv.org/abs/2406.18980. arXiv:
          <volume>2406</volume>
          .
          <fpage>18980</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Menard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Goens</surname>
          </string-name>
          , G. Hempel,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khasanov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Robledo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Teweleitt</surname>
          </string-name>
          , J. Castrillon,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>