<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Sardegna, Italy
$ juan.encinas@upm.es (J. Encinas)</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>ML-Based Modeling and Virtualization of Reconfigurable Multi-Accelerator Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Juan Encinas</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Centro de Electrónica Industrial, Universidad Politécnica de Madrid</institution>
          ,
          <addr-line>Calle de José Gutiérrez Abascal 2, Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>The work of this thesis focuses on providing reconfigurable multi-accelerator systems with the ability to self-adapt at run-time to the conditions and requirements of an IoT environment in a way that is transparent to the user. To this end, we have been working on an ofline characterisation of the power consumption and performance of this type of systems through the development of a monitoring infrastructure and the production of predictive models based on machine learning techniques, obtaining very promising results. Currently the development is focused on converting this characterisation into an online modeling that allows, together with an already developed management infrastructure, to evaluate and validate this approach in a realistic test environment, and in the future we will work on the development of virtualization techniques for reconfigurable multi-accelerator systems that allow sharing the hardware among multiple tenants and applications, managing resources in an optimal and transparent way for the user and guaranteeing the performance, privacy and security of the system.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Multi-Accelerator Systems</kwd>
        <kwd>Reconfigurable Computing</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>System Modeling</kwd>
        <kwd>Virtualization ceuTechniques</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation and Objectives of the Thesis</title>
      <p>
        The goal of this thesis is to develop design methodologies, support tools and decision making
algorithms to provide reconfigurable multi-accelerator systems with the ability to adapt
autonomously and at run-time to varying application conditions, environment and input data, in
a transparent way to the user. Thanks to this self-adaptation capability, FPGA-based systems
can be used as accelerators capable of handling computational requests from sensors or devices
at the edge of the Internet of Things (IoT). This mechanism, known as computing ofloading,
allows processing to be brought closer to the points where the data is produced, ofering higher
processing performance, greater privacy, lower latency and lower power consumption than
oflfoading to a remote cloud [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. Therefore, a scenario arises in which FPGA systems ofer their
processing power to the network, which is referred to as Acceleration-as-a-Service (AaaS) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>In order for FPGA-based systems to operate in this type of scheme, two problems have been
identified: the decision making to optimally manage the available reconfigurable resources,
as well as the virtualization of the FPGA’s logical resources, thus isolating the application
developer from the low-level details of the reconfigurable device used.</p>
      <sec id="sec-1-1">
        <title>1.1. Real-Time Modeling and Management of Reconfigurable</title>
      </sec>
      <sec id="sec-1-2">
        <title>Multi-Accelerator Systems</title>
        <p>
          The inclusion of hardware accelerators in processing systems allows to improve their
performance, both in terms of execution time and energy eficiency [
          <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
          ]. In case multiple accelerators
coexist to run one or multiple applications, we will speak of multi-accelerator systems.
        </p>
        <p>In case all possible tasks to be accelerated in hardware are known in the design stages, they
can be simulated and modeled analytically as part of the Design Space Exploration (DSE) process.
However, in the computational ofload scenarios detailed above, the hardware accelerators
that will be required throughout the lifetime of the system cannot be known, nor the instants
of time at which they will be demanded by the various network edge elements, as these will
vary their behavior based on the data received or environmental conditions. For this reason,
reconfigurable multi-accelerator systems working in this type of scenario must be able to adapt
dynamically, ensuring that the system is always working at its optimum point, both from the
perspective of energy consumption and throughput achieved.</p>
        <p>In order to optimally manage the reconfigurable resources available, it will be necessary to
model each of the system’s accelerators beforehand. However, the modeling of the accelerators
using analytical techniques is extremely complex, since each accelerator in execution will
interfere with the others, due to the fact that there are shared elements, such as memories,
controllers or communication buses within the chip. Therefore, each combination of accelerators
must be characterized.</p>
        <p>This is why this thesis proposes the development of models of multi-accelerator systems
using machine learning algorithms. These models should be updated with the data produced by
the execution of new combinations of accelerators or even new accelerator functionalities. On
the other hand, a monitoring infrastructure will be designed to allow run-time performance
and power consumption measurements in this type of systems, using such data to train the
aforementioned models.</p>
        <p>Based on the extracted models, machine learning based decision making algorithms will also
be proposed, which will be able to select the optimal working point of the system at any given
time for a given configuration. The metrics obtained will be fed back to the models, to provide
them with the ability to be incrementally updated, so that the reconfigurable multi-accelerator
systems can dynamically adapt to the changing conditions of the environment and possible
unforeseen configurations.</p>
      </sec>
      <sec id="sec-1-3">
        <title>1.2. Support for Reconfigurable Multi-Accelerator System Virtualization</title>
        <p>
          FPGAs have gained a lot of importance in recent years in the world of cloud and edge computing
due to their flexibility, high performance and low power consumption [
          <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
          ]. Moreover, unlike
other computing platforms such as CPUs and GPUs that feature a fixed architecture, FPGAs
can adapt their architecture to the requirements of any algorithm due to their flexible hardware.
FPGA hardware can be reconfigured to obtain both spatial and temporal parallelism on a large
scale. As a result, FPGAs are high performance and energy eficient computing platforms, which
is key for edge or cloud computing scenarios where available resources are limited.
        </p>
        <p>Although FPGAs ofer great benefits over CPUs and GPUs, these benefits require certain
compromises in both design and usability. The application design flow for FPGAs requires the
use of Hardware Description Languages (HDLs) and knowledge of the specific hardware at a
low level, which restricts its use for most software application developers. Although High-Level
Synthesis (HLS) tools are now available that allow FPGA applications to be developed using
code with C-like syntax, it is still necessary to know certain hardware details to develop an
optimized accelerator. In addition, the design process is specific to the hardware for which it is
designed (obtaining a diferent binary depending on the FPGA model to be used) and the tools
that vendors provide for design do not allow the use of system resources to be shared among
multiple applications or users, which is essential for cloud and edge computing.</p>
        <p>In this thesis we propose to design a virtualization infrastructure for FPGAs that allows both
sharing resources among multiple applications and users, and designing applications without
requiring specific knowledge of the FPGA hardware.</p>
        <p>Specifically, we intend to explore and evaluate diferent existing virtualization techniques,
both those used for software virtualization (the vast majority) and the approaches already
proposed for hardware virtualization, and extend them in order to obtain an infrastructure with
four key blocks:
1. Create an abstraction layer over the hardware to hide hardware-specific details from
application designers and generate simple interfaces to access resources.
2. Manage the allocation of resources both spatially and temporally to make optimal use of
resources.
3. Manage system resources in a way that is transparent to the user.
4. Ensure isolation between users and applications in terms of both performance and privacy,
to guarantee data security and system resilience.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Thesis Progress</title>
      <p>This section briefly describes the work accomplished to date, as well as the tasks planned for
the remainder of the thesis.</p>
      <sec id="sec-2-1">
        <title>2.1. Accomplished Works</title>
        <p>At the present time, we have mainly focused on the real-time modeling and management of
reconfigurable multi-accelerator systems.</p>
        <p>We have design a monitoring infrastructure for acquiring power/performance traces in
reconfigurable multi-accelerator systems. Those traces have been used for training ofline
ML-based models to predict power consumption and performance of such systems, obtaining
promising results. And recently, we have developed a management infrastructure that will be
used to integrate and validate the rest of this part of the thesis.</p>
        <sec id="sec-2-1-1">
          <title>2.1.1. Monitoring Infrastructure</title>
          <p>A non-intrusive monitoring infrastructure has been design to acquire synchronized power
consumption and performance traces in reconfigurable multi-accelerator systems (see Figure 1). To
do so, hardware components (block diagram of the infrastructure in HDL), software components
(driver for managing the infrastructure’s hardware from a Linux-based OS and libraries for
controlling the infrastructure on Linux-/baremetal-systems), as well as a script-based visualization
tool for visualizing the generated traces have been designed.</p>
          <p>RAM</p>
          <p>DMA
Controller</p>
          <p>Events</p>
          <p>BRAM
Power Trace</p>
          <p>BRAM</p>
          <p>Software
Application</p>
          <p>CPU</p>
          <p>Software</p>
          <p>Control
AXI Interface</p>
          <p>AXI Interface
Trigger Commands
Mask Monitoring</p>
          <p>FSM IP</p>
          <p>Hardware
Accelerators</p>
          <p>Probes</p>
          <p>Trigger
Trigger
Module</p>
          <p>Performance Trace Acquisition Module</p>
          <p>Count
Timestamp</p>
          <p>Counter</p>
          <p>Timestamp</p>
          <p>BRAM
Events</p>
          <p>Detector
Power Trace
Acquisition</p>
          <p>Manager</p>
          <p>SPI
enable
Events
enable</p>
          <p>Power Value</p>
          <p>Power Trace Acquisition Module</p>
          <p>ADC
+
Measurement Board</p>
          <p>Input+
Input</p>
          <p>Figure 2 show an example of the traces generate with the infrastructure, where an application
with 2 hardware accelerators that exhibit diferent functionality is depicted (signals 0/1 and
2/3 are the start/ready pairs for accelerators A and B, respectively, and power consumption is
shown at the top).</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. ML-Based Modeling</title>
          <p>
            In order to properly model the power consumption and performance of reconfigurable
multiaccelerator systems we have obtained, with the monitoring infrastructure, power and
performance traces of multiple combinations of hardware-accelerated kernels from the MachSuite [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]
benchmark suite, a well-known benchmark suite for HLS-oriented accelerator evaluation. The
obtained traces have been used to train ML-based models at predicting power consumption
and performance. Those models have been subsequently evaluated, with very good results.
As a example Figure 3 and Figure 4 show the graphical evaluation of a particular model when
predicting power consumption and performance, respectively.
          </p>
          <p>
            In both figures, the predicted value of each observation is plotted against its actual value
(the dotted diagonal line represents ideal points where the predicted value equals the measured
value). It can be observed that in both cases most of the observations fall really close to the
dotted line indicating that the model has a good prediction performance. For a more in-depth
analysis, refer to our paper on the subject [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ].
          </p>
        </sec>
        <sec id="sec-2-1-3">
          <title>2.1.3. Management Infrastructure</title>
          <p>A management infrastructure has been design, capable of attending to all the incoming
acceleration requests of a particular workload, deciding when to execute them in the FPGA fabric
following a specific scheduling policy (see diagram in Figure 5).</p>
          <p>For the hardware acceleration we have extended the ARTICo3 framework [8], an academic
framework for high-performance reconfigurable multi-accelerator system implementation. We
have also integrated the monitoring infrastructure described above, enabling the monitoring of
every part of the process.</p>
          <p>Arrival Stage</p>
          <p>Workload
0
1
2
3
4
5</p>
          <p>n
Waiting Queue</p>
          <p>4
Scheduling Stage
Execution Stage</p>
          <p>Scheduling Policy
Scheduling Queue</p>
          <p>S0</p>
          <p>3
Slot #0</p>
          <p>Slot #1</p>
          <p>1
S1</p>
          <p>FPGA - PL
Slot #2</p>
          <p>2
S2</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Future Work</title>
        <p>We are currently working on doing an online model training and update rather than an ofline
characterization as a first step to achieve full run-time self-adaptation. And we are also thinking
about the idea of integrating complex scheduling policies within the management infrastructure,
such as reinforcement learning decision making and other alternative approaches based on the
online data-driven models, to perform an intelligent decision making on the task scheduling
and resource management of the system.</p>
        <p>This would conclude the first pillar of the thesis and we would then focus on the FPGA
virtualization part described in Section 1.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Acknowledgements</title>
      <p>This thesis is receiving funding from the European Union’s Horizon 2020 research and innovation
programme under grant agreement No 872570.
[8] A. Rodríguez, J. Valverde, J. Portilla, A. Otero, T. Riesgo, E. de la Torre, FPGA-Based
HighPerformance Embedded Systems for Adaptive Edge Computing in Cyber-Physical Systems:
The ARTICo3 Framework, Sensors 18 (2018). doi:10.3390/s18061877.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , G. Luo, G. Sun,
          <string-name>
            <given-names>N.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Liu,</surname>
          </string-name>
          <article-title>The case for fpga-based edge computing</article-title>
          ,
          <source>IEEE Transactions on Mobile Computing</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cerina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Notargiacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. G.</given-names>
            <surname>Paccanit</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Santambrogio</surname>
          </string-name>
          ,
          <article-title>A fog-computing architecture for preventive healthcare and assisted living in smart ambients</article-title>
          ,
          <source>in: 2017 IEEE 3rd International Forum on Research and Technologies for Society and Industry (RTSI)</source>
          , IEEE,
          <year>2017</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bobda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Mbongue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Chow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ewais</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tarafdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Vega</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Eguro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Handagala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Leeser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Herbordt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Shahzad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hofste</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ringlein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Szefer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sanaullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tessier</surname>
          </string-name>
          ,
          <article-title>The future of fpga acceleration in datacenters and the cloud</article-title>
          ,
          <source>ACM Trans. Reconfigurable Technol. Syst</source>
          .
          <volume>15</volume>
          (
          <year>2022</year>
          ). URL: https://doi.org/10.1145/3506713. doi:
          <volume>10</volume>
          .1145/3506713.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Qasaimeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Denolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vissers</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zambreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. H.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>Comparing energy eficiency of cpu, gpu and fpga implementations for vision kernels</article-title>
          ,
          <source>in: 2019 IEEE International Conference on Embedded Software and Systems (ICESS)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICESS.
          <year>2019</year>
          .
          <volume>8782524</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Asano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Maruyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          ,
          <article-title>Performance comparison of fpga, gpu and cpu in image processing</article-title>
          ,
          <source>in: 2009 International Conference on Field Programmable Logic and Applications</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>126</fpage>
          -
          <lpage>131</lpage>
          . doi:
          <volume>10</volume>
          .1109/FPL.
          <year>2009</year>
          .
          <volume>5272532</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Reagen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Adolf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Wei</surname>
          </string-name>
          , D. Brooks,
          <article-title>MachSuite: Benchmarks for accelerator design and customized architectures</article-title>
          ,
          <source>in: 2014 IEEE International Symposium on Workload Characterization (IISWC)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>110</fpage>
          -
          <lpage>119</lpage>
          . doi:
          <volume>10</volume>
          .1109/IISWC.
          <year>2014</year>
          .
          <volume>6983050</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Encinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Otero</surname>
          </string-name>
          , E. De La Torre,
          <article-title>Run-time monitoring and ml-based modeling in reconfigurable multi-accelerator systems</article-title>
          ,
          <source>in: 2021 XXXVI Conference on Design of Circuits and Integrated Systems (DCIS)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          . doi:
          <volume>10</volume>
          .1109/DCIS53048.
          <year>2021</year>
          .
          <volume>9666187</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>