<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Resource-Aware Application Execution Exploiting the BarbequeRTRM</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giuseppe Massari</string-name>
          <email>giuseppe.massari@polimi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Libutti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>William Fornaciari</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Reghenzani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gianmario Pozzi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Milano DEIB: Dipartimento di Elettronica</institution>
          ,
          <addr-line>Informazione e Bioingegneria</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <abstract>
        <p>Energy e ciency and thermal management have become major concerns in both embedded and HPC systems. The progress of silicon technology and the subsequent growth of the dark silicon phenomena are negatively a ecting the reliability of computing systems. As a result, in the next future we expect run-time variability to increase in terms of both performance and computing resources availability. To address these issues, systems and applications must be able to adapt to such scenarios. This work provides a brief overview of the Barbeque Run-Time Resource Manager (BarbequeRTRM ) and the application execution model that it exploits, in order to deal with run-time performance and available resources variability.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The need of resource-aware and adaptive applications is driven by several issues
and requirements that are typical of modern computing systems. For instance,
embedded mobile devices must deal with the limited energy budget provided
by the battery, while HPC centers must a ord huge costs due to the power
consumption and the cooling of the infrastructure. Furthermore, the dark silicon
phenomenon a ecting modern processors is becoming prominent[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], since it is
increasing the amount of silicon area that must be turned o , to guarantee the
power envelope of the processor. For all these reasons, a continuous and full usage
of the whole set of system computing resources is often impossible to achieve.
      </p>
      <p>On the application side, we can gain e ciency by implementing suitable
adaptive behaviors like enabling/disabling the execution of a task, or scaling the
accuracy of the output depending on the availability of computing resources.
A run-time resource management framework can implement such approach by
constraining the resource allocation according to system level requirements or
runtime conditions, and providing to the applications suitable interfaces to check
and negotiate the resource assignment.</p>
      <sec id="sec-1-1">
        <title>Applications</title>
        <p>C
C++
OpenCL</p>
        <p>Recipes
RPC Channel
FIFOs/Binder</p>
        <p>DBus
Application Proxy
Application
Manager
Power
Manager
CPUfreq
Synchronization</p>
        <p>Protocol
Synchronization</p>
        <p>Policy
Synchronization
Manager
AEM API
Plain API
AS-RTM API</p>
      </sec>
      <sec id="sec-1-2">
        <title>Run-time application library</title>
        <p>Scheduler Policy
Resource
Accounter
Scheduler Manager
Resource Manager
Platform Proxy</p>
      </sec>
      <sec id="sec-1-3">
        <title>Resource Manager daemon</title>
        <p>Platform Drivers
Control Groups</p>
      </sec>
      <sec id="sec-1-4">
        <title>Linux kernel-space</title>
        <p>The BarbequeRTRM is a modular and portable run-time resource manager
targeting both embedded and High-Performance Computing (HPC) systems. From
the hardware resources perspective, the framework can manage homogeneous
and heterogeneous multi-core processors, as well as heterogeneous systems
including devices characterized by completely di erent ISA (e.g., CPU and GPU).</p>
        <p>The modularity of the BarbequeRTRM comes from a software architecture
in which we can distinguish between core components and plugin modules.
Typically, the latter are platform-speci c extensions and selectable resource
management policies.</p>
        <p>
          The portability instead, is guaranteed by the exploitation of some underlying
Linux operating system frameworks, like cpufreq and cgroups, that allows the
BarbequeRTRM to enforce the resource allocation decisions [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
2.1
The resource manager exposes its services to the applications through a run-time
library (RTLib). The library accomplishes a two-fold objective: 1) to provide a
communication channel between the resource manager and the applications; 2)
to expose an execution model to support the implementation of the
resourceaware adaptive execution of the applications[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
        </p>
        <p>In Figure 2 we show the Abstract Execution Model (AEM), that the run-time
manageable applications must implemented accordingly. This execution model
is put in place by de ning and implementing a suitable C++ class, derived from
the BbqueEXC class provided by the RTLib.</p>
        <p>At run-time, the BbqueEXC member functions are called by a control thread,
which is responsible of synchronizing the application execution with the
decisional process of the resource manager. The rationale behind each member
function implementation is the following:</p>
        <p>onSetup(): setting up the application (initialize variables and structures,
starting threads, . . . ). onConfigure(): check the amount of assigned resources
and con gure the application accordingly. onRun(): single cycle of computation
(e.g., computing a single frame during a video encoding). onMonitor():
performance and QoS monitoring. onRelease(): cleanup and termination code.</p>
        <p>Therefore, once the application ends the initialization step (onSetup), the
control thread waits for the resource allocation decision coming from the
BarbequeRTRM. As soon as it has been received, the onConfigure function is called.
In this function, the application can then check the amount of assigned resources,
and con gure itself accordingly, before starting (or continuing) the execution, as
sketched here below.
RTLIB ExitCode t BlackscholesEXC : : onConfigure ( i n t 8 t awm id ) f
// Get t h e number o f CPU c o r e s a s s i g n e d</p>
        <p>GetAssignedResources ( RTLIB ResourceType : : PROC NR, n r c p u ) ;
g</p>
        <p>// C o n f i g u r e . . .</p>
        <p>The functions onRun and onMonitor are then sequentially called and
executed in a loop, until the entire computation is over.</p>
        <p>The RTLib estimates the current performance of the application, in terms of
cycles-per-second (CPS), such that the application could check the gap between
the required performance level and the one currently achieved. After that, the
application can notify the resource manager about this gap.</p>
        <p>Considering also that the performance goal can vary depending on input data
and external events, a e ective approach is to exploit the SetCPSGoal function to
specify the performance goal and the noti cation rate, as shown in the following
example of onMonitor implementation:
RTLIB ExitCode t BlackscholesEXC : : onMonitor ( ) f
// S p e c i f i c e v e n t c o n d i t i o n t r i g g e r i n g t h e
// change o f performance r e q u i r e m e n t s
i f ( . . . )</p>
        <p>SetCPSGoal ( 2 . 5 , 1 0 ) ;
// . . .
g</p>
        <p>In the example, the application sets a performance goal of 2.5 CPS, and a
noti cation rate of 10 cycles. The library keeps track of the application
performance, computing the average CPS value over a (con gurable) number of
last execution cycles. Whenever the performance gap overcomes a given
(congurable) threshold, such a gap value is sent to the resource manager. As a
consequence, the amount of assigned resources can be adjusted accordingly. The
noti cation rate is then exploited to bound the application recon guration rate,
and hence the related overhead. In other words, the application asks the resource
manager to send back a recon guration request after not less than 10 execution
cycles or more.
3</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Experimental Scenario</title>
      <p>
        In this section we show results of the resource-aware adaptive execution of
blackscholes from the PARSEC benchmark suite [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] on a embedded
development board that features an ARM Cortex A9 dual-core CPU. The benchmark
has been properly modi ed to t the Abstract Execution Model. The frequency
of the CPU has been set to its maximum value, which is 920 MHz. The full CPU
usage, which is shown in Figure 3a, causes the chip temperature to raise over
100 C, thus triggering the thermal throttling response of the operating system.
LTohaedBarbequeRTTeRmMp:ePraotwuerer data traceFprleoqtuency
      </p>
      <p>LTohaedBarbequeRTTeRmMp:ePraotwuerer data traceFprleoqtuency
90
80
40
30</p>
      <p>A continuous frequency scaling is operated in order to cool down the CPU, with
performance variability as a further consequence.</p>
      <p>In Figure 3b, the application sets a performance goal of CPS=1. The resource
manager takes into account such information shrinking the amount of CPU time
assigned. The implicit result is a lower but more stable performance level, along
with a reduced thermal stress.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>H.</given-names>
            <surname>Esmaeilzadeh</surname>
          </string-name>
          , E. Blem,
          <string-name>
            <given-names>R.</given-names>
            <surname>St. Amant</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sankaralingam</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Burger</surname>
          </string-name>
          , \
          <article-title>Dark silicon and the end of multicore scaling,"</article-title>
          <source>in Proceedings of the 38th Annual International Symposium on Computer Architecture</source>
          , ser.
          <source>ISCA '11</source>
          . New York, NY, USA: ACM,
          <year>2011</year>
          , pp.
          <volume>365</volume>
          {
          <fpage>376</fpage>
          . [Online]. Available: http://doi.acm.
          <source>org/10</source>
          .1145/2000064.2000108
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Bellasi</surname>
          </string-name>
          , G. Massari, and W. Fornaciari, \
          <article-title>E ective Runtime Resource Management Using Linux Control Groups with the BarbequeRTRM Framework,"</article-title>
          <source>ACM Transactions on Embedded Computing Systems (TECS)</source>
          , vol.
          <volume>14</volume>
          , no.
          <issue>2</issue>
          , p.
          <fpage>39</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>G.</given-names>
            <surname>Massari</surname>
          </string-name>
          , E. Paone,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bellasi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Palermo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Zaccaria</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Fornaciari</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Silvano</surname>
          </string-name>
          , \
          <article-title>Combining application adaptivity and system-wide resource management on multi-core platforms," in Embedded Computer Systems</article-title>
          : Architectures, Modeling, and
          <source>Simulation (SAMOS XIV)</source>
          ,
          <source>2014 International Conference on. IEEE</source>
          ,
          <year>2014</year>
          , pp.
          <volume>26</volume>
          {
          <fpage>33</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bienia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. P.</given-names>
            <surname>Singh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          , \
          <article-title>The PARSEC benchmark suite: characterization and architectural implications," in Proceedings of the 17th international conference on Parallel architectures and compilation techniques, ser</article-title>
          .
          <source>PACT '08</source>
          . New York, NY, USA: ACM,
          <year>2008</year>
          , pp.
          <volume>72</volume>
          {
          <fpage>81</fpage>
          . [Online]. Available: http://doi.acm.
          <source>org/10</source>
          .1145/1454115.1454128
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>