<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Modern methods of energy consumption optimization in FPGA-based heterogeneous HPC systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Oleksandr</forename><forename type="middle">V</forename><surname>Hryshchuk</surname></persName>
							<email>oleksandr_hryshchuk@knu.ua</email>
						</author>
						<author>
							<persName><forename type="first">Sergiy</forename><forename type="middle">P</forename><surname>Zagorodnyuk</surname></persName>
							<email>szagorodniuk@gmail.com</email>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="institution">Taras Shevchenko National University of Kyiv</orgName>
								<address>
									<addrLine>64/13 Volodymyrska Str</addrLine>
									<postCode>01601</postCode>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<address>
									<settlement>Kryvyi Rih</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Modern methods of energy consumption optimization in FPGA-based heterogeneous HPC systems</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">1AEEBCAA38012CEC5ABF75CD39580DBF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:25+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>high-performance computing (HPC)</term>
					<term>FPGA</term>
					<term>power modeling</term>
					<term>power analysis</term>
					<term>heterogeneous computing</term>
					<term>power saving</term>
					<term>task scheduling</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>High-Performance Computing (HPC) systems play a pivotal role in addressing complex computational challenges across various domains, but their escalating energy consumption has raised concerns regarding sustainability and operational costs. This paper presents a comprehensive investigation into the parametrization and modeling of energy consumption in heterogeneous HPC systems, aiming to provide valuable insights for optimizing energy efficiency while preserving performance. We begin by characterizing the heterogeneity within modern HPC environments, which encompass diverse hardware components, such as CPUs, GPUs, FPGAs, and accelerators. Our research delves into modeling techniques, leveraging heuristics methods and statistical approaches to construct accurate predictive models for energy consumption. Furthermore, we explore the integration of dynamic power management strategies, such as DVFS (Dynamic Voltage and Frequency Scaling) and task scheduling, to optimize energy usage without compromising performance. This paper provides a vital foundation for sustainable HPC practices, enabling researchers and practitioners to make informed decisions for achieving enhanced energy efficiency without sacrificing computational performance.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Today's large-scale computing systems, such as data centers and high-performance computing clusters (HPCs), are severely limited by power and cooling costs for extremely large-scale (or exascale) problems. The steady increase in electricity consumption is a growing concern for several reasons, such as cost, reliability, scalability, and environmental impact. Nowadays data centers use 200 TWh per year and contribute near 0.3% of whole carbon emissions in the world, when entire complex of ICT (Information and computing technology) devices produce up to 2% of it <ref type="bibr" target="#b0">[1]</ref>. Best case scenario model predicts that in 2030 ICT will share 8% of whole electricity consumption in the world <ref type="bibr" target="#b1">[2]</ref>, while worst case scenario anticipate 51% of global electricity usage. This potential increase in power consumption and, sequentially, cost of computing operations leads researcher and engineers to investigate and develop new techniques and approaches to optimize power management in HPC systems and in ICD domain in general. Present-day there are set of methods and approaches to resolve this energy optimization issue, mainly only for homogeneous CPU-based HPC systems. General taxonomy of this techniques, suggested in <ref type="bibr" target="#b2">[3]</ref> and depicted on figure <ref type="figure" target="#fig_0">1</ref> and can be divided into two main groups SPM (static power management) and DPM (dynamic power management). SPM methods, divided in two separate groups (for hardware and software level management) usually defined during design time and cannot be changed in runtime. Hardware SPM techniques can be detailed and split into three separate groups <ref type="bibr" target="#b2">[3]</ref>:</p><p>1. Circuit level 2. Logic level 3. Architecture level DPM methods widely used in HPC <ref type="bibr" target="#b3">[4]</ref> systems can be divided into two main groups -DCD (Dynamic component Deactivation), based on predictive and heuristic approaches, and DPS (Dynamic Power Scaling), like resource throttling and DVFS (Dynamic Voltage Frequency Scaling). This techniques can be a foundation for more complicated optimization methods, in example, task scheduling based on DVFS <ref type="bibr" target="#b4">[5]</ref> or DCD heuristics applications <ref type="bibr" target="#b3">[4]</ref>.</p><p>Methods described before can be used on different hardware platforms, both homogeneous (well-studied nowadays) and heterogeneous (with GPU, TPU, FPGA and CGRA), which became popular in HPC according to a survey on Deep Learning hardware accelerators for heterogeneous HPC Platforms <ref type="bibr" target="#b5">[6]</ref>. At the same time number of scientific papers on energy-aware optimization for HPC systems with FPGA controllers are extremely low (1-3 per year), compared to all researches about "FPGA heterogeneous computing" (see figure <ref type="figure" target="#fig_1">2</ref> with data obtained from app.dimensions.ai) which indicates a limited number of solutions in this domain, so this work will be focused on heterogeneous applications of energy-aware optimizations in HPC systems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Energy optimization theory</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Optimization problem definition for task scheduling</head><p>In introduction section was mentioned that optimization techniques can be divided into hardware and software types, first of them are case-specific for different variations of hardware like CPU, memory chips, NIC, etc., while software-defined approaches can be generalized and provide a solution for disparate equipment with same characteristics/types, in example, homogeneous or heterogeneous GPU and TPU-based HPC clusters <ref type="bibr" target="#b6">[7]</ref>. Such software solutions are often leads to energy-efficient task-scheduling methods, optimization problem for which can be defined in a way that described next.</p><p>For a finite set of jobs(task) 𝐽 and a finite set of resources 𝑅, 𝑡𝑖𝑚𝑒(𝑗, 𝑟) is a function, that returns time of execution of job 𝑗 ∈ 𝐽 on resource 𝑟 ∈ 𝑅 <ref type="bibr" target="#b3">[4]</ref>.Then scheduling can be described as task of finding a set of start times {𝑠 1 , 𝑠 2 , . . . , 𝑠 |𝐽| } for jobs, allocated to resources {𝑎 1 , 𝑎 2 , . . . , 𝑎 |𝐽| } in conditions where:</p><formula xml:id="formula_0">∀𝑠 𝑥 : ∄𝑠 𝑦 : 𝑠 𝑥 ≤ 𝑠 𝑦 + time (𝑦, 𝐴 𝑦 ) ∧ 𝑠 𝑦 ≤ 𝑠 𝑥 + time (𝑥, 𝐴 𝑥 ) ∧ 𝑎 𝑥 = 𝑎 𝑦 , ∀𝑎 𝑥 : 𝑥 ∈ 𝑅 (1)</formula><p>Additional optimization conditions (see equation 2) can be applied to provided scheduling, where optimization criteria can be finding maximum or minimum, depending on formulation of a function which involves simple metrics such as execution time, consumed energy, etc. <ref type="bibr" target="#b3">[4]</ref>.  </p><p>This model is extremely simplified and does not suitable for real applications due to several reasons -it assumes that one resource can take only one task at the time, number of available resources always equal or higher than number of jobs to complete and does not include impact of communication between tasks on nodes or computing elements. To resolve these problems and adapt model to real world upgraded model was suggested <ref type="bibr" target="#b3">[4]</ref> -for two tasks 𝑥 and 𝑦 from set of jobs pairs 𝐷, 𝑃 𝑗 is set of devices, which can be assigned for job 𝑗 ∈ 𝐽, time of communication between jobs obtained from function 𝑐𝑜𝑚𝑚(𝑥, 𝑦, 𝑎 𝑥 , 𝑎 𝑦 ), then solution is a set of assignments 𝐴 𝑗 and start times {𝑠 1 , 𝑠 2 , . . . , 𝑠 |𝐽| } for each job, like it described in equations 3-6:</p><formula xml:id="formula_2">∀𝑥 ∈ 𝐴 𝑗 : 𝑥 ∈ 𝑃 𝑗 (<label>3</label></formula><formula xml:id="formula_3">)</formula><formula xml:id="formula_4">∀𝑠 𝑥 : ∄𝑠 𝑦 : 𝑠 𝑥 ≤ 𝑠 𝑦 + time (𝑦, 𝐴 𝑦 ) ∧ 𝑠 𝑦 ≤ 𝑠 𝑥 + time (𝑥, 𝐴 𝑥 ) ∧ 𝐴 𝑥 ∩ 𝐴 𝑦 = ∅ (4) ∀{𝑥, 𝑦} ∈ 𝐷 : 𝑠 𝑥 + time (𝑥, 𝐴 𝑥 ) + comm (𝑥, 𝑦, 𝐴 𝑥 , 𝐴 𝑦 ) ≤ 𝑠 𝑦<label>(5)</label></formula><p>With optimization condition:</p><formula xml:id="formula_5">min / max (︀ OptimizationCriteria (︀{︀ 𝑠 1 , 𝑠 2 , . . . , 𝑠 |𝐽| }︀ , 𝐴 1 , . . . , 𝐴 |𝐽| , 𝐷 )︀)︀<label>(6)</label></formula><p>This method involves enumeration of all jobs for all available resources, which leads to idea that solution can not be found in polynomial time, and it was proved that problem of energy-efficient active time <ref type="bibr" target="#b7">[8]</ref> scheduling is NP-Complete <ref type="bibr" target="#b4">[5]</ref>, so to be able use this model there can be a two possible ways -use predefined constraints and precalculated configurations or use heuristic methods, in example genetic algorithms <ref type="bibr" target="#b8">[9]</ref>, to find solution during runtime.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Optimization criteria</head><p>General optimization problem was described in previous section, and to be used in real HPC systems in requires properly defined optimization criteria. Existing solutions in this domain based on energy consumption metric (EC), or can take under consideration other properties, in example, execution time, etc. <ref type="bibr" target="#b3">[4]</ref>. Power consumption can be described via energy itself (in joules or watts), or can be represented with more complicated models like instruction per joule or power per watt <ref type="bibr" target="#b9">[10]</ref>. This approach used in Green500 rating as FLOPS per Watt metric <ref type="bibr" target="#b10">[11]</ref>.</p><p>More sophisticated can use combination of following metrics such as EC (energy consumption), ExecT (execution time), utilization, average weighted time, wait time, power, Pareto front, AST, AFT, clock frequency, work(job) per energy, reliability, electricity cost, temperature, EDP, EDF, Number of cores, Probability of execution, branch transition rate, cache efficiency, issue width <ref type="bibr" target="#b3">[4]</ref>. In example new algorithm was proposed for reformed scheduling method with energy consumption constraint (RSMECC), based on AST, AFT and energy consumption metrics <ref type="bibr" target="#b11">[12]</ref>. This algorithm can make it possible to more efficiently solve a wide range of computing tasks, including in the field of neural networks, complex 3D modeling and artificial intelligence.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Cluster architecture</head><p>Nowadays HPC clusters widespread around the world in different forms and variations, but generally main part of them are based on homogeneous massive parallel processor architecture (MPP), which inherited from older NUMA (non-uniform memory access) architecture <ref type="bibr" target="#b12">[13]</ref>. This approach looks similar to shared-memory technology, but in this case each processor in cluster is connected to it's own part of memory and create entity of single independent node, which connected with other nodes via network interface card and common network (see figure <ref type="figure" target="#fig_2">3</ref>). Absence of shared memory between nodes (not including common NAS) simplifies design and reduces inefficient components therefor improving scalability and stability of HPC system <ref type="bibr" target="#b12">[13]</ref>. At the same time due to lack of shared memory, a processor core in one group must employ a different method to exchange data and coordinate with cores of other processor groups <ref type="bibr" target="#b13">[14]</ref>. This issue become more visible for heterogeneous systems, based on CPUs form different series or types, or even for GRID computing systems <ref type="bibr" target="#b14">[15]</ref>. Another popular approach for building HPC systems is usage of symmetric multi-processors (SMP). It embodies a category of parallel architectures that harness the power of multiple processor cores to enhance performance by leveraging parallel processing, all the while upholding a unified memory structure that spans the entirety of the parallel computing system <ref type="bibr" target="#b12">[13]</ref>.</p><p>An SMP defines a self-contained and self-sustaining computer system equipped with all the subsystems and components essential for fulfilling the demands and facil-itating the execution of various applications. It can operate independently to support user applications designed as shared-memory multi-threaded programs, serve as one among several equivalent subsystems in a scalable MPP systems or commodity clus-ter, and work as a throughput computer for the simultaneous execution of independent concurrent tasks <ref type="bibr" target="#b13">[14]</ref>. General architecture of SMP system depicted on figure <ref type="figure" target="#fig_3">4</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Heterogeneous cluster architecture comparison</head><p>Heterogeneous computing in HPC refers to the utilization of diverse hardware accelerators, like general purpose graphic processing unit (GPGPU), field programmable gate array (FPGA), coarsegrained reconfigurable array (CGRA) <ref type="bibr" target="#b14">[15]</ref> and specialized coprocessors, alongside traditional CPU. This approach harnesses the strengths of different computing components to optimize performance and energy efficiency, making it particularly well-suited for workloads that can benefit from parallel processing. Most common heterogeneous clusters involve usage of coupled CPU and GPGPU as single node, therefore nowadays exists energy efficient solutions for this kind of HPC system, which was analyzed in <ref type="bibr" target="#b3">[4]</ref>.</p><p>But FPGA in same time in HPC is a new type of accelerators and less studied as it was shown in Introduction section of this paper. But nowadays there are existing works on this topic, in example the technique of cooperative CPU, GPU and FPGA task execution, based on EngineCL framework was suggested in <ref type="bibr" target="#b15">[16]</ref>. Also, new approach, called Cooperative Heterogeneous Acceleration with Reconfigurable Multi-devices (CHARM) was proposed for multi hybrid accelerated cluster with GPU and FPGA coupling, which was implemented in "Albireo-nodes" in Cygnus cluster, based on CPU Intel Xeon Gold, GPU NVIDIA Tesla V100 x4 and FPGA Nallatech 520N with Intel Stratix10 <ref type="bibr" target="#b16">[17]</ref>. Architecture of this nodes shown of figure <ref type="figure" target="#fig_4">5</ref>.</p><p>Characteristic comparison for Cygnus supercomputer node and heterogeneous system from EngineCL test setup shown on table <ref type="table" target="#tab_1">1</ref>. At the same time, for EngineCL was shown that performance improvement from heterogeneity was obtained for all benchmark tasks ("Matrix multiplication", "Mersenne Twister", "Watermarking", "Sobel Filter", "Nearest Neighbor", "AES Decrypt"), but energy consumption improvement was detected only for "Sobel Filter" <ref type="bibr" target="#b15">[16]</ref>, which leaves a research gap for searching energy-optimization methods for this kind of system. Consequently, this two works have a lack of energy consumption optimization for described systems, and despite existing methods of power management and optimization described in survey of FPGA optimization methods for data center energy efficiency <ref type="bibr" target="#b17">[18]</ref>. Finding "general" solution for FPGA-kind of system is complicated due to the necessity of reconfiguring of hardware for each specific task (job), but nevertheless, energy optimization constraints with proper criteria, described in "Energy optimization theory" section of this paper can be applied to multi-hybird hardware FPGA systems to optimize power consumption.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>This paper shows modern theories and approaches for power consumption planning and optimizations for heterogeneous HPC systems, including optimization model for MPP system, described in third section of this paper. As this problem in NP-complete, heuristics approaches for finding solutions was mentioned. Results from mentioned solutions can be implemented on hardware or software level via DPM technologies. At the same time mentioned solutions is well suited to only CPU-GPU coupled systems, but not for CPU-GPU-FPGA coupled systems. For last one there is existing power management techniques, like easy-to use in FPGA DCD, but the is a lack of schedulers and general approaches for implementing solution from theoretical optimal model. Therefore, future work involves further search ways of amplification methods, including heuristic solutions of power consumption planning in FPGA-coupled HPC systems.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: General classification of power management methods in computer systems.</figDesc><graphic coords="2,89.29,84.19,416.68,154.81" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Count of scientific publications per year on topic "FPGA heterogeneous computing" and "Energy-aware FPGA heterogeneous computing" from 2014 to 2023.</figDesc><graphic coords="3,130.96,367.42,333.36,275.70" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: MPP HPC cluster architecture.</figDesc><graphic coords="5,89.29,282.61,416.68,250.33" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Internal architecture of SMP HPC system.</figDesc><graphic coords="6,89.29,136.78,416.68,425.53" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Internal architecture of Albireo-node from Cygnus cluster.</figDesc><graphic coords="8,89.29,84.19,416.69,392.08" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head></head><label></label><figDesc>, 𝑠 2 , . . . , 𝑠 |𝐽| }︀ , {︀ 𝑎 1 , 𝑎 2 , . . . , 𝑎 |𝐽| }︀)︀)︀</figDesc><table><row><cell>min / max</cell><cell>(︀</cell><cell>OptimizationCriteria</cell><cell>(︀{︀</cell><cell>𝑠 1</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1</head><label>1</label><figDesc>Comparison of Cygnus and EngineCL setup node specifications.</figDesc><table><row><cell>Characterisic</cell><cell>Cygnus</cell><cell>EngineCL test setup</cell></row><row><cell>CPU</cell><cell>Intel Xeon Gold x2</cell><cell>Intel Core i7-G700k</cell></row><row><cell>GPU</cell><cell cols="2">Nvidia Tesla V100x4 (32 Gb x4) Nvidia GeForce GTX Titan X (12 Gb)</cell></row><row><cell>FPGA</cell><cell>Intel Stratix 10x2</cell><cell>Altera DE5NET Stratix V</cell></row><row><cell>RAM</cell><cell>192GB</cell><cell>64GB</cell></row><row><cell cols="2">Number of nodes 32 GPU+FPGA, 46 CPU-only</cell><cell>1</cell></row><row><cell cols="2">Energy-efficiency N/A</cell><cell>1 of 6 benchmark tasks</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">How to stop data centres from gobbling up the world&apos;s electricity</title>
		<author>
			<persName><forename type="first">N</forename><surname>Jones</surname></persName>
		</author>
		<idno type="DOI">10.1038/d41586-018-06610-y</idno>
	</analytic>
	<monogr>
		<title level="j">Nature</title>
		<imprint>
			<biblScope unit="volume">561</biblScope>
			<biblScope unit="page" from="163" to="166" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">S G</forename><surname>Andrae</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Edler</surname></persName>
		</author>
		<idno type="DOI">10.3390/challe6010117</idno>
	</analytic>
	<monogr>
		<title level="m">On Global Electricity Usage of Communication Technology: Trends to 2030</title>
				<imprint>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="117" to="157" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<title level="m" type="main">Energy Efficient High Performance Processors: Recent Approaches for Designing Green High Performance Computing</title>
		<author>
			<persName><forename type="first">J</forename><surname>Haj-Yahya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Mendelson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">B</forename><surname>Asher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Chattopadhyay</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Energy-Aware Scheduling for High-Performance Computing Systems: A Survey</title>
		<author>
			<persName><forename type="first">B</forename><surname>Kocot</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Czarnul</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Proficz</surname></persName>
		</author>
		<idno type="DOI">10.3390/en16020890</idno>
	</analytic>
	<monogr>
		<title level="j">Energies</title>
		<imprint>
			<biblScope unit="volume">16</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">NP-completeness of the Active Time Scheduling Problem</title>
		<author>
			<persName><forename type="first">S</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Purohit</surname></persName>
		</author>
		<ptr target="http://arxiv.org/abs/2112.03255" />
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms</title>
		<author>
			<persName><forename type="first">C</forename><surname>Silvano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ielmini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ferrandi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fiorin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Curzel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Benini</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Conti</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Garofalo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zambelli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Calore</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2306.15552</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Runtime and energy constrained work scheduling for heterogeneous systems</title>
		<author>
			<persName><forename type="first">V</forename><surname>Raca</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Umboh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Mehofer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Scholz</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11227-022-04556-7</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Supercomputing</title>
		<imprint>
			<biblScope unit="volume">78</biblScope>
			<biblScope unit="page" from="17150" to="17177" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A Model for Minimizing Active Processor Time</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">N</forename><surname>Gabow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Khuller</surname></persName>
		</author>
		<idno type="DOI">10.1007/978-3-642-33090-2_26</idno>
	</analytic>
	<monogr>
		<title level="m">Algorithms -ESA 2012</title>
		<title level="s">Lecture Notes in Computer Science</title>
		<editor>
			<persName><forename type="first">L</forename><surname>Epstein</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Ferragina</surname></persName>
		</editor>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2012">2012</date>
			<biblScope unit="page" from="289" to="300" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Energy-efficient allocation of computing node slots in HPC clusters through parameter learning and hybrid genetic fuzzy system modeling</title>
		<author>
			<persName><forename type="first">A</forename><surname>Cocaña-Fernández</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Ranilla</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Sánchez</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11227-014-1320-9</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Supercomputing</title>
		<imprint>
			<biblScope unit="volume">71</biblScope>
			<biblScope unit="page" from="1163" to="1174" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Energy-aware scheduling algorithm for time-constrained workflow tasks in DVFS-enabled cloud environment</title>
		<author>
			<persName><forename type="first">M</forename><surname>Safari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Khorsand</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.simpat.2018.07.006</idno>
	</analytic>
	<monogr>
		<title level="j">Simulation Modelling Practice and Theory</title>
		<imprint>
			<biblScope unit="volume">87</biblScope>
			<biblScope unit="page" from="311" to="326" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Node variability in large-scale power measurements: perspectives from the Green500, Top500 and EEHPCWG</title>
		<author>
			<persName><forename type="first">T</forename><surname>Scogland</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Azose</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Rohr</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rivoire</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Bates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Hackenberg</surname></persName>
		</author>
		<idno type="DOI">10.1145/2807591.2807653</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC &apos;15</title>
				<meeting>the International Conference for High Performance Computing, Networking, Storage and Analysis, SC &apos;15<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">A reformed task scheduling algorithm for heterogeneous distributed systems with energy consumption constraints</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>He</surname></persName>
		</author>
		<idno type="DOI">10.1007/s00521-019-04415-2</idno>
	</analytic>
	<monogr>
		<title level="j">Neural Computing and Applications</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><surname>Sterling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Brodowicz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Anderson</surname></persName>
		</author>
		<title level="m">High Performance Computing: Modern Systems and Practices</title>
				<imprint>
			<publisher>Morgan Kaufmann</publisher>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Modeling communication in cache-coherent SMP systems: a casestudy with Xeon Phi</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ramos</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Hoefler</surname></persName>
		</author>
		<idno type="DOI">10.1145/2462902.2462916</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 22nd international symposium on Highperformance parallel and distributed computing, HPDC &apos;13</title>
				<meeting>the 22nd international symposium on Highperformance parallel and distributed computing, HPDC &apos;13</meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="97" to="108" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">A Coarse-Grained Reconfigurable Array for High-Performance Computing Applications</title>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">S</forename><surname>Käsgen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Weinhardt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hochberger</surname></persName>
		</author>
		<idno type="DOI">10.1109/RECONFIG.2018.8641720</idno>
	</analytic>
	<monogr>
		<title level="m">2018 International Conference on Re-ConFigurable Computing and FPGAs (ReConFig)</title>
				<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Cooperative CPU, GPU, and FPGA heterogeneous execution EngineCL</title>
		<author>
			<persName><forename type="first">M</forename><surname>Dávila</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Nozal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Gran Tejero</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Villarroya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Suárez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gracia</surname></persName>
		</author>
		<author>
			<persName><surname>Bosque</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11227-019-02768-y</idno>
	</analytic>
	<monogr>
		<title level="j">The Journal of Supercomputing</title>
		<imprint>
			<biblScope unit="volume">75</biblScope>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Cygnus -World First Multihybrid Accelerated Cluster with GPU and FPGA Coupling</title>
		<author>
			<persName><forename type="first">T</forename><surname>Boku</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Fujita</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Kobayashi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Tatebe</surname></persName>
		</author>
		<idno type="DOI">10.1145/3547276.3548629</idno>
	</analytic>
	<monogr>
		<title level="m">Workshop Proceedings of the 51st International Conference on Parallel Processing, ICPP Workshops &apos;22</title>
				<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="1" to="8" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A Survey of FPGA Optimization Methods for Data Center Energy Efficiency</title>
		<author>
			<persName><forename type="first">M</forename><surname>Tibaldi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Pilato</surname></persName>
		</author>
		<idno type="DOI">10.1109/TSUSC.2023.3273852</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Sustainable Computing</title>
		<imprint>
			<biblScope unit="page" from="343" to="362" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
