<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modern methods of energy consumption optimization in FPGA-based heterogeneous HPC systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Oleksandr V. Hryshchuk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sergiy P. Zagorodnyuk</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Taras Shevchenko National University of Kyiv</institution>
          ,
          <addr-line>64/13 Volodymyrska Str., Kyiv, 01601</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <fpage>167</fpage>
      <lpage>176</lpage>
      <abstract>
        <p>High-Performance Computing (HPC) systems play a pivotal role in addressing complex computational challenges across various domains, but their escalating energy consumption has raised concerns regarding sustainability and operational costs. This paper presents a comprehensive investigation into the parametrization and modeling of energy consumption in heterogeneous HPC systems, aiming to provide valuable insights for optimizing energy eficiency while preserving performance. We begin by characterizing the heterogeneity within modern HPC environments, which encompass diverse hardware components, such as CPUs, GPUs, FPGAs, and accelerators. Our research delves into modeling techniques, leveraging heuristics methods and statistical approaches to construct accurate predictive models for energy consumption. Furthermore, we explore the integration of dynamic power management strategies, such as DVFS (Dynamic Voltage and Frequency Scaling) and task scheduling, to optimize energy usage without compromising performance. This paper provides a vital foundation for sustainable HPC practices, enabling researchers and practitioners to make informed decisions for achieving enhanced energy eficiency without sacrificing computational performance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;high-performance computing (HPC)</kwd>
        <kwd>FPGA</kwd>
        <kwd>power modeling</kwd>
        <kwd>power analysis</kwd>
        <kwd>heterogeneous computing</kwd>
        <kwd>power saving</kwd>
        <kwd>task scheduling</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Today’s large-scale computing systems, such as data centers and high-performance computing
clusters (HPCs), are severely limited by power and cooling costs for extremely large-scale (or
exascale) problems. The steady increase in electricity consumption is a growing concern for
several reasons, such as cost, reliability, scalability, and environmental impact. Nowadays data
centers use 200 TWh per year and contribute near 0.3% of whole carbon emissions in the world,
when entire complex of ICT (Information and computing technology) devices produce up to 2%
of it [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Best case scenario model predicts that in 2030 ICT will share 8% of whole electricity
consumption in the world [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], while worst case scenario anticipate 51% of global electricity usage.
This potential increase in power consumption and, sequentially, cost of computing operations
leads researcher and engineers to investigate and develop new techniques and approaches to
optimize power management in HPC systems and in ICD domain in general.
      </p>
      <p>
        Present-day there are set of methods and approaches to resolve this energy optimization issue,
mainly only for homogeneous CPU-based HPC systems. General taxonomy of this techniques,
suggested in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and depicted on figure 1 and can be divided into two main groups SPM (static
power management) and DPM (dynamic power management). SPM methods, divided in two
separate groups (for hardware and software level management) usually defined during design
time and cannot be changed in runtime. Hardware SPM techniques can be detailed and split
into three separate groups [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]:
      </p>
      <sec id="sec-1-1">
        <title>1. Circuit level</title>
        <p>2. Logic level
3. Architecture level</p>
        <p>
          DPM methods widely used in HPC [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] systems can be divided into two main groups – DCD
(Dynamic component Deactivation), based on predictive and heuristic approaches, and DPS
(Dynamic Power Scaling), like resource throttling and DVFS (Dynamic Voltage Frequency
Scaling). This techniques can be a foundation for more complicated optimization methods, in
example, task scheduling based on DVFS [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] or DCD heuristics applications [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>
          Methods described before can be used on diferent hardware platforms, both homogeneous
(well-studied nowadays) and heterogeneous (with GPU, TPU, FPGA and CGRA), which became
popular in HPC according to a survey on Deep Learning hardware accelerators for heterogeneous
HPC Platforms [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. At the same time number of scientific papers on energy-aware optimization
for HPC systems with FPGA controllers are extremely low (1-3 per year), compared to all
researches about “FPGA heterogeneous computing” (see figure 2 with data obtained from
app.dimensions.ai) which indicates a limited number of solutions in this domain, so this work
will be focused on heterogeneous applications of energy-aware optimizations in HPC systems.
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Energy optimization theory</title>
      <sec id="sec-2-1">
        <title>2.1. Optimization problem definition for task scheduling</title>
        <p>
          In introduction section was mentioned that optimization techniques can be divided into
hardware and software types, first of them are case-specific for diferent variations of hardware
like CPU, memory chips, NIC, etc., while software-defined approaches can be generalized
and provide a solution for disparate equipment with same characteristics/types, in example,
homogeneous or heterogeneous GPU and TPU-based HPC clusters [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Such software solutions
are often leads to energy-eficient task-scheduling methods, optimization problem for which
can be defined in a way that described next.
        </p>
        <p>
          For a finite set of jobs(task)  and a finite set of resources , (, ) is a function, that
returns time of execution of job  ∈  on resource  ∈  [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].Then scheduling can be
described as task of finding a set of start times {1, 2, . . . , ||} for jobs, allocated to resources
{1, 2, . . . , ||} in conditions where:
∀ : ∄ :  ≤  + time (, ) ∧  ≤  + time (, ) ∧  = , ∀ :  ∈ 
(1)
        </p>
        <p>
          Additional optimization conditions (see equation 2) can be applied to provided scheduling,
where optimization criteria can be finding maximum or minimum, depending on formulation
of a function which involves simple metrics such as execution time, consumed energy, etc. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
min / max (︀ OptimizationCriteria (︀{ 1, 2, . . . , ||
        </p>
        <p>
          This model is extremely simplified and does not suitable for real applications due to several
reasons – it assumes that one resource can take only one task at the time, number of available
resources always equal or higher than number of jobs to complete and does not include impact
of communication between tasks on nodes or computing elements. To resolve these problems
and adapt model to real world upgraded model was suggested [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] – for two tasks  and 
from set of jobs pairs ,  is set of devices, which can be assigned for job  ∈  , time of
communication between jobs obtained from function (, , , ), then solution is a set
of assignments  and start times {1, 2, . . . , ||} for each job, like it described in equations
(3)
(4)
(5)
(6)
∀ ∈  :  ∈ 
∀ : ∄ :  ≤  + time (, ) ∧  ≤  + time (, ) ∧  ∩  = ∅
∀{, } ∈  :  + time (, ) + comm (, , , ) ≤
        </p>
        <sec id="sec-2-1-1">
          <title>With optimization condition:</title>
          <p>min / max (︀ OptimizationCriteria (︀{ 1, 2, . . . , ||︀} , 1, . . . , ||, )︀</p>
          <p>
            This method involves enumeration of all jobs for all available resources, which leads to
idea that solution can not be found in polynomial time, and it was proved that problem of
energy-eficient active time [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] scheduling is NP-Complete [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], so to be able use this model
there can be a two possible ways – use predefined constraints and precalculated configurations
or use heuristic methods, in example genetic algorithms [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ], to find solution during runtime.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Optimization criteria</title>
        <p>
          General optimization problem was described in previous section, and to be used in real HPC
systems in requires properly defined optimization criteria. Existing solutions in this domain
based on energy consumption metric (EC), or can take under consideration other properties,
in example, execution time, etc. [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. Power consumption can be described via energy itself (in
joules or watts), or can be represented with more complicated models like instruction per joule
or power per watt [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. This approach used in Green500 rating as FLOPS per Watt metric [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>
          More sophisticated can use combination of following metrics such as EC (energy
consumption), ExecT (execution time), utilization, average weighted time, wait time, power, Pareto front,
AST, AFT, clock frequency, work(job) per energy, reliability, electricity cost, temperature, EDP,
EDF, Number of cores, Probability of execution, branch transition rate, cache eficiency, issue
width [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In example new algorithm was proposed for reformed scheduling method with energy
consumption constraint (RSMECC), based on AST, AFT and energy consumption metrics [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
This algorithm can make it possible to more eficiently solve a wide range of computing tasks,
including in the field of neural networks, complex 3D modeling and artificial intelligence.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Cluster architecture</title>
      <p>
        Nowadays HPC clusters widespread around the world in diferent forms and variations, but
generally main part of them are based on homogeneous massive parallel processor architecture
(MPP), which inherited from older NUMA (non-uniform memory access) architecture [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This
approach looks similar to shared-memory technology, but in this case each processor in cluster
is connected to it’s own part of memory and create entity of single independent node, which
connected with other nodes via network interface card and common network (see figure 3).
Absence of shared memory between nodes (not including common NAS) simplifies design and
reduces ineficient components therefor improving scalability and stability of HPC system [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
At the same time due to lack of shared memory, a processor core in one group must employ a
diferent method to exchange data and coordinate with cores of other processor groups [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
This issue become more visible for heterogeneous systems, based on CPUs form diferent series
or types, or even for GRID computing systems [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Another popular approach for building HPC systems is usage of symmetric multi-processors
(SMP). It embodies a category of parallel architectures that harness the power of multiple
processor cores to enhance performance by leveraging parallel processing, all the while upholding
a unified memory structure that spans the entirety of the parallel computing system [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        An SMP defines a self-contained and self-sustaining computer system equipped with all the
subsystems and components essential for fulfilling the demands and facil-itating the execution
of various applications. It can operate independently to support user applications designed as
shared-memory multi-threaded programs, serve as one among several equivalent subsystems
in a scalable MPP systems or commodity clus-ter, and work as a throughput computer for the
simultaneous execution of independent concurrent tasks [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. General architecture of SMP
system depicted on figure 4.
      </p>
      <sec id="sec-3-1">
        <title>3.1. Heterogeneous cluster architecture comparison</title>
        <p>
          Heterogeneous computing in HPC refers to the utilization of diverse hardware accelerators, like
general purpose graphic processing unit (GPGPU), field programmable gate array (FPGA),
coarsegrained reconfigurable array (CGRA) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and specialized coprocessors, alongside traditional
CPU. This approach harnesses the strengths of diferent computing components to optimize
performance and energy eficiency, making it particularly well-suited for workloads that can
benefit from parallel processing. Most common heterogeneous clusters involve usage of coupled
CPU and GPGPU as single node, therefore nowadays exists energy eficient solutions for this
kind of HPC system, which was analyzed in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>But FPGA in same time in HPC is a new type of accelerators and less studied as it was shown
in Introduction section of this paper. But nowadays there are existing works on this topic, in
example the technique of cooperative CPU, GPU and FPGA task execution, based on EngineCL
framework was suggested in [16]. Also, new approach, called Cooperative Heterogeneous
Acceleration with Reconfigurable Multi-devices (CHARM) was proposed for multi hybrid
accelerated cluster with GPU and FPGA coupling, which was implemented in “Albireo-nodes” in
Cygnus cluster, based on CPU Intel Xeon Gold, GPU NVIDIA Tesla V100 x4 and FPGA Nallatech
520N with Intel Stratix10 [17]. Architecture of this nodes shown of figure 5.</p>
        <p>Characteristic comparison for Cygnus supercomputer node and heterogeneous system from
EngineCL test setup shown on table 1. At the same time, for EngineCL was shown that
performance improvement from heterogeneity was obtained for all benchmark tasks ("Matrix
multiplication", "Mersenne Twister", "Watermarking", "Sobel Filter", "Nearest Neighbor", "AES
Decrypt"), but energy consumption improvement was detected only for "Sobel Filter" [16],
which leaves a research gap for searching energy-optimization methods for this kind of system.</p>
        <p>Consequently, this two works have a lack of energy consumption optimization for described
systems, and despite existing methods of power management and optimization described in
survey of FPGA optimization methods for data center energy eficiency [ 18]. Finding “general”
solution for FPGA-kind of system is complicated due to the necessity of reconfiguring of
hardware for each specific task (job), but nevertheless, energy optimization constraints with
proper criteria, described in “Energy optimization theory” section of this paper can be applied
to multi-hybird hardware FPGA systems to optimize power consumption.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>This paper shows modern theories and approaches for power consumption planning and
optimizations for heterogeneous HPC systems, including optimization model for MPP system,
described in third section of this paper. As this problem in NP-complete, heuristics approaches
for finding solutions was mentioned. Results from mentioned solutions can be implemented on
hardware or software level via DPM technologies. At the same time mentioned solutions is well
suited to only CPU-GPU coupled systems, but not for CPU-GPU-FPGA coupled systems. For
last one there is existing power management techniques, like easy-to use in FPGA DCD, but
the is a lack of schedulers and general approaches for implementing solution from theoretical
optimal model. Therefore, future work involves further search ways of amplification methods,
including heuristic solutions of power consumption planning in FPGA-coupled HPC systems.
ConFigurable Computing and FPGAs (ReConFig), 2018, pp. 1–4. doi:10.1109/RECONFIG.
2018.8641720.
[16] M. Dávila, R. Nozal, R. Gran Tejero, M. Villarroya, D. Suárez Gracia, J. Bosque,
Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL, The Journal of
Supercomputing 75 (2019). doi:10.1007/s11227-019-02768-y.
[17] T. Boku, N. Fujita, R. Kobayashi, O. Tatebe, Cygnus - World First Multihybrid Accelerated
Cluster with GPU and FPGA Coupling, in: Workshop Proceedings of the 51st International
Conference on Parallel Processing, ICPP Workshops ’22, Association for Computing
Machinery, 2023, pp. 1–8. doi:10.1145/3547276.3548629.
[18] M. Tibaldi, C. Pilato, A Survey of FPGA Optimization Methods for Data Center Energy
Eficiency, IEEE Transactions on Sustainable Computing (2023) 343–362. doi: 10.1109/
TSUSC.2023.3273852.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <article-title>How to stop data centres from gobbling up the world's electricity</article-title>
          ,
          <source>Nature</source>
          <volume>561</volume>
          (
          <year>2018</year>
          )
          <fpage>163</fpage>
          -
          <lpage>166</lpage>
          . doi:
          <volume>10</volume>
          .1038/d41586-018-06610-y.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. S. G.</given-names>
            <surname>Andrae</surname>
          </string-name>
          , T. Edler,
          <source>On Global Electricity Usage of Communication Technology: Trends to</source>
          <year>2030</year>
          ,
          <volume>Challenges 6</volume>
          (
          <year>2015</year>
          )
          <fpage>117</fpage>
          -
          <lpage>157</lpage>
          . doi:
          <volume>10</volume>
          .3390/challe6010117.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Haj-Yahya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mendelson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y. B.</given-names>
            <surname>Asher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chattopadhyay</surname>
          </string-name>
          ,
          <article-title>Energy Eficient High Performance Processors: Recent Approaches for Designing Green High Performance Computing</article-title>
          , Springer,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Kocot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Czarnul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Proficz</surname>
          </string-name>
          ,
          <article-title>Energy-Aware Scheduling for High-Performance Computing Systems:</article-title>
          A Survey,
          <source>Energies</source>
          <volume>16</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3390/en16020890.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Saha</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Purohit, NP-completeness of the Active Time Scheduling Problem</article-title>
          ,
          <year>2021</year>
          . URL: http://arxiv.org/abs/2112.03255.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>C.</given-names>
            <surname>Silvano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ielmini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ferrandi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fiorin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Curzel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Benini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Conti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garofalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zambelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Calore</surname>
          </string-name>
          , et. al.,
          <source>A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms</source>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2306.15552.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>V.</given-names>
            <surname>Raca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Umboh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Mehofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Scholz</surname>
          </string-name>
          ,
          <article-title>Runtime and energy constrained work scheduling for heterogeneous systems</article-title>
          ,
          <source>Journal of Supercomputing</source>
          <volume>78</volume>
          (
          <year>2022</year>
          )
          <fpage>17150</fpage>
          -
          <lpage>17177</lpage>
          . doi:
          <volume>10</volume>
          . 1007/s11227-022-04556-7.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. N.</given-names>
            <surname>Gabow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Khuller</surname>
          </string-name>
          ,
          <article-title>A Model for Minimizing Active Processor Time</article-title>
          , in: L.
          <string-name>
            <surname>Epstein</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          Ferragina (Eds.),
          <source>Algorithms - ESA 2012, Lecture Notes in Computer Science</source>
          , Springer,
          <year>2012</year>
          , pp.
          <fpage>289</fpage>
          -
          <lpage>300</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -33090-2_
          <fpage>26</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cocaña-Fernández</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ranilla</surname>
          </string-name>
          , L. Sánchez,
          <article-title>Energy-eficient allocation of computing node slots in HPC clusters through parameter learning and hybrid genetic fuzzy system modeling</article-title>
          ,
          <source>Journal of Supercomputing</source>
          <volume>71</volume>
          (
          <year>2015</year>
          )
          <fpage>1163</fpage>
          -
          <lpage>1174</lpage>
          . doi:
          <volume>10</volume>
          .1007/s11227-014-1320-9.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Safari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khorsand</surname>
          </string-name>
          ,
          <article-title>Energy-aware scheduling algorithm for time-constrained workflow tasks in DVFS-enabled cloud environment</article-title>
          ,
          <source>Simulation Modelling Practice and Theory</source>
          <volume>87</volume>
          (
          <year>2018</year>
          )
          <fpage>311</fpage>
          -
          <lpage>326</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.simpat.
          <year>2018</year>
          .
          <volume>07</volume>
          .006.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Scogland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Azose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Rohr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rivoire</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Bates</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hackenberg</surname>
          </string-name>
          ,
          <article-title>Node variability in large-scale power measurements: perspectives from the Green500, Top500 and EEHPCWG, in: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis</article-title>
          ,
          <source>SC '15</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2015</year>
          . doi:
          <volume>10</volume>
          .1145/2807591.2807653.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <article-title>A reformed task scheduling algorithm for heterogeneous distributed systems with energy consumption constraints</article-title>
          ,
          <source>Neural Computing and Applications</source>
          <volume>32</volume>
          (
          <year>2020</year>
          ).
          <source>doi:10.1007/s00521-019-04415-2.</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Sterling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brodowicz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Anderson, High Performance Computing: Modern Systems</article-title>
          and Practices, Morgan Kaufmann,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramos</surname>
          </string-name>
          , T. Hoefler,
          <article-title>Modeling communication in cache-coherent SMP systems: a casestudy with Xeon Phi, in: Proceedings of the 22nd international symposium on Highperformance parallel and distributed computing</article-title>
          ,
          <source>HPDC '13</source>
          ,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery,
          <year>2018</year>
          , pp.
          <fpage>97</fpage>
          -
          <lpage>108</lpage>
          . doi:
          <volume>10</volume>
          .1145/2462902.2462916.
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Käsgen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weinhardt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Hochberger</surname>
          </string-name>
          ,
          <article-title>A Coarse-Grained Reconfigurable Array for High-Performance Computing Applications</article-title>
          , in: 2018 International Conference on Re-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>