<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Optimal Implementation of Power Saving Techniques in CGR Systems</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Tiziana</forename><surname>Fanni</surname></persName>
							<email>tiziana.fanni@diee.unica.it</email>
							<affiliation key="aff0">
								<orgName type="department">DIEE</orgName>
								<orgName type="institution">Università degli Studi di Cagliari</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Optimal Implementation of Power Saving Techniques in CGR Systems</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">951D3F545185FD9CE6DE70A4D1F5DA70</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T21:58+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Reconfigurable Systems</term>
					<term>Power Saving</term>
					<term>Power Gating</term>
					<term>Clock Gating</term>
					<term>Power Modeling</term>
					<term>Datapath Merging</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Coarse-Grained Reconfigurable (CGR) architectures combine high performance with flexibility, allowing the execution of a large set of applications over the same substrate. However, they are also required to be energy efficient. This work focuses on a methodology to identify which parts of a CGR architecture may benefit from the application of power saving techniques, guiding the designers towards an optimal implementation of clock gated and power gated designs.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Motivation and Background</head><p>Nowadays small portable devices are required to efficiently execute multiple fancy functions. Reconfigurable systems <ref type="bibr" target="#b0">[1]</ref> are a suitable solution for these colliding requirements. In particular, Coarse-Grained Reconfigurable (CGR) systems offer higher performance with a certain degree of flexibility. In CGR systems all the resources belonging to the possible configurations are instantiated in the substrate, and they are multiplexed in time. Thus CGR offer fast reconfiguration, paying the cost of the power consumed by the resources present in the substrate but not involved in active configuration.</p><p>Power consumption in digital devices is computed as in Equation <ref type="formula">1</ref>, where: 1) P lkg is the static power dissipation caused by leakage currents, consumed even while no circuit activity is present; 2) P int is the dynamic power consumption mainly due to the cell switching activity; (3) P net is again a dynamic term, related to the interconnection. With technologies below 90 nm, designers are required to minimize both static (P lkg ) and dynamic (P int + P net ) terms. of several resources (i.e. sleep transistors to switch on/off the power supply, isolation cells to avoid the transmission of spurious signals and state retention logic to maintain the internal state of the gated region) and the rules to manage them are quite complicated, thus commercial tools do not provide it automatically.</p><p>In a CGR architecture, considering the minimum set of disjointed Logic Regions (LRs), composed of processing elements that are always active/inactive together, it is possible to apply to all of them CG or PG techniques. However, if on one hand CG requires only one AND gate for each region to switch off the clock tree, PG has a much higher cost due to the additional logic, and the power overhead may easily overcome the power saved by switching off the unused resources. The research presented in this paper studied an automatic system-level analysis and implementation flow to estimate in advance the cost of CG and PG application in a CGR system, leading to the identification of those LRs that can benefit from the application of these power saving techniques.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Methods and Algorithm</head><p>The proposed model (see Equations 2 and 3) estimate P lkg and P int of Equation 1, when PG and CG are applied at LRs level. They are composed of two terms: 1) P lkg/intON (LR i ) -consumption within the considered LRs; 2) Ext Over lkg/int (LR i ) -consumption due to the logic inserted outside the LR. </p><formula xml:id="formula_0">P lkg/int (LRi) = P lkg/intON (LRi) + Ext Over lkg/int (LRi) = = actors∈LR i [P lkg/int (cmb) + P lkg/int (reg) * T iON ]+ +[P lkg/int (CGON ) * T iON + P lkg/int (CGOF F ) * T iOF F ]<label>(3)</label></formula><p>In particular, the Equations present the following contributions:</p><p>-Combinatorial Logic [P lkg/int (cmb) * T i ON and P lkg/int (cmb)]: sum of the contributions of the combinational cells, weighted for the activation time T i ON . CG switches off only the clock tree; therefore, combinational logic is always active when CG is applied. -Sequential Logic [P lkg/int (reg) * (#reg−#rtn)/#reg * T i ON and P lkg/int (reg) * T i ON ]: this term consider only those registers (#reg) inside the LR that are not replaced by the retention cells. CG does not have effect on the static power, thus when it is applied only the internal contribution is multiplied by T i ON (P lkg (reg) and P int (reg) * T i ON ). -Retention Cells [P lkg/int (RC) * #rtn * T i ON )]: This term consider the retention cells inserted to preserve the status of some registers (#rtn), when PG is applied. P lkg/int (RC) is the consumption of a single retention cell.</p><formula xml:id="formula_1">-Isolation Cells [[P lkg/int (ISO ON ) * T i ON + P lkg/int (ISO OF F ) * T i OF F ] *</formula><p>#iso]: In the PG case, isolation cells are inserted and their dissipation is proportional to their overall number.</p><formula xml:id="formula_2">-Clock Gating Cells P lkg/int (CG ON ) * T i ON + P lkg/int (CG OF F ) * T i OF F ]:</formula><p>Clock gating cells are used in both PG and CG cases. In the PG case they are required for the proper operation of the retention cells.</p><formula xml:id="formula_3">-Power Controller [P lkg/int (Contr ON ) * T i ON +P lkg/int (Contr OF F ) * T i OF F ]:</formula><p>inserted to properly drive the enable signals to the power saving logic.</p><p>This estimation model has been embedded in the automatic design flow for CGR systems offered by the Multi-Dataflow Composer (MDC) tool <ref type="bibr" target="#b1">[2]</ref>. MDC handles the automatic composition and deployment of CGR systems, starting from the high-level specification of the kernels to be executed, represented as networks <ref type="bibr" target="#b2">[3]</ref>. MDC offers also other functionalities: 1) a structural profiler that performs the design space exploration of the implementable multi-functional systems to determine the optimal CGR substrate <ref type="bibr" target="#b3">[4]</ref>; 2) a power manager that partitions the multi-functional network, identifying the minimum set of disjointed LRs and applies to all of them either CG <ref type="bibr" target="#b4">[5]</ref> or PG <ref type="bibr" target="#b5">[6]</ref> (generating an ad-hoc common power format (CPF) file to specify the power intent early in the design <ref type="bibr" target="#b6">[7]</ref>); a rapid prototyper to embed the CGR system into a Xilinx compliant IP <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b8">9]</ref>. The MDC power manager has been extended to analyze the design, exploiting the estimation flow, to identify which regions may mostly benefit of PG and CG application <ref type="bibr" target="#b9">[10,</ref><ref type="bibr" target="#b10">11]</ref>. Fig. <ref type="figure" target="#fig_1">1</ref> shows the complete analysis and implementation flow. MDC derives the HDL code of the baseline CGR system and provides the scripts to perform the synthesis and all the simulations for the system back annotation. Commercial tools are used to determine the baseline system power reports, which are fed back to MDC. MDC identifies the LRs and estimates the number of isolation cells (#iso) at dataflow level by analyzing the connections between the different LRs. These data, together with the information gathered by the reports of a single synthesis run of the baseline CGR system netlist, are used to estimate power consumption of the identified LRs (consumption of the additional power saving logic -isolation cells and retention cells is estimated by characterizing the adopted technology libraries). The power estimation flow analyzes the LRs according to Algorithm 1:</p><p>area evaluation: PG overhead may affect the power consumption of the smaller LRs. Thus, too small LRs are evaluated only for CG application. The threshold value is set by the user. -PG evaluation: the power variations due to PG and CG application are estimated. If PG results more convenient than CG, the LR is candidated for the implementation of PG application. Otherwise, CG is evaluated. -CG evaluation: the power variation due to CG application is estimated. If it is not able to save any power, LR is discarded and its logic will be included in the always on domain. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Methodology Evaluation</head><p>To assess the proposed methodology, two different applications are considered, an FFT application and a computing core accelerating a zoom application. The Fast Fourier Transform (FFT) <ref type="bibr" target="#b11">[12]</ref> is an optimised algorithm for the Discrete Fourier Transform (DFT) calculation; a DFT of size 2 (radix-2) takes the name of butterfly. The adopted use case involves a CGR radix-2 FFT of size 8, where 4 different configurations are available, FFT that uses: 1) 12 butterflies; 2) 4 butterflies; 3) 2 butterflies; 4) 1 butterfly. In this design MDC identified 8 LRs. The Zoom coprocessor is composed of seven computational kernels: 1) absolute value calculation; 2) Bilevel/grayscale block checking; 3) Linear combination calculation; 4) Cubic filter convolution; 5) Median calculation; 6) Maximum/minimum finding; 7) Edge block checking. These kernels have been modelled as networks and implemented over a CGR hardware accelerator. In this CGR co-processor MDC identified 13 LRs. To validate cross-application and cross-technology effectiveness of the proposed estimation model, this section presents assessment on three implementations: FFT targeting a 90 nm CMOS technology; Zoom targeting a 90 nm CMOS technology; Zoom targeting a 45 nm CMOS technology.</p><p>These designs have been synthesized using Cadence RTL Compiler, then generated post-synthesis simulation reports have been fed back to MDC tool to estimate the power consumption of gated LRs, by applying Equations 2 and 3. In order to calculate the accuracy of the proposed estimation models, one power gated and one clock gated design for each identified LR have been implemented. Tab. 1 reports the real (extracted from the post-synthesis reports) and estimated percentage power variation for each LR, respectively when CG and PG strategies are applied. Comparing real power variations with estimated ones we see that the worst estimation is related to LR4 of FFT, whose percentage power variation is so low (-0.67%) that it is impossible to estimate it accurately. For this reason, the algorithm that evaluates the LRs keeps in consideration also their area.</p><p>Tab. 1 also depicts the percentage area of each LR with respect to design area. As expected, biggest regions (LR1 in FFT and LR2 in Zoom) are the ones with the highest power saving, regardless of the considered strategy (PG or CG). Also the composition of the considered LR has an impact on the effectiveness of the applied power saving technique. LR1 of FFT is 99% combinational, thus CG can save really little amount of power in this region. However, simply considering the LR size may not be sufficient to identify the best candidate for power saving; indeed it is possible to notice that LR3 of FFT is 15.8% of total area, but switching it off only saves about 6% or total power.</p><p>The presented methodology speeds-up the evaluation of power saving application estimating in advance the effectiveness of PG or CG if applied to the LRs of a CGR system. For a CGR system composed of N input networks, only one of the baseline design without any power management, and N (one for each kernel) simulations to retrieve the real switching activity of the system, are required. Otherwise, it would be necessary to implement a design for each identified LR for both CG and PG applications, plus the baseline one (to evaluate the cost and benefit of the chosen power saving technique).</p><p>As a follow up of this research it is necessary to improve the model, considering in the power estimation also the contribution of the interconnection which currently is not addressed. Furthermore, power switches overhead is not considered yet; these sleep transistors are inserted only during place and route flow, and a way to estimate in advance how many switches are going to be inserted for each LR has not been explored yet. The number of switches has effect on the rush currents during power-up transitions, and on the power-down/up timing, thus also these issues are going to be addressed in a future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Related Works</head><p>To the best of our knowledge, literature does not treat the problem of modeling PG and CG costs in CGR designs. Shafique at al. <ref type="bibr" target="#b12">[13]</ref> focuses on low-power techniques and power modeling for FPGAs. In <ref type="bibr" target="#b13">[14]</ref>, only CG is taken into account: different power states are defined and their consumption is characterized by low-level power analysis results. The work in <ref type="bibr" target="#b14">[15]</ref> focuses on estimating the leakage reduction for PG and reverse body bias.</p><p>Other approaches perform an estimation that considers different components. For instance, Stokke at al. <ref type="bibr" target="#b15">[16]</ref> propose a power modeling method for the Tegra K1 CPU, that taking into account measured rail voltages and fine-grained hardware activity predictors, expose components such as rail and core leakage cur-rents. The FALPEM framework <ref type="bibr" target="#b16">[17]</ref> provides power estimations at pre-register transfer level (RTL) stage, specifically targeting the power consumed by clock network and interconnection, but PG and CG costs are not defined. Finally, Li et al. <ref type="bibr" target="#b17">[18]</ref> propose an architecture-level integrated power, area, and timing modeling framework for multi-core systems, that evaluates system building blocks (i.e., CPU, buses, etc.) for different technology nodes, providing also PG support.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>P</head><label></label><figDesc>lkg/int (LRi) = P lkg/intON (LRi) + Ext Over lkg/int (LRi) = = actors∈LR i [P lkg/int (cmb) + P lkg/int (RC) * #rtn+ +P lkg/int (reg) * (#reg − #rtn)/#reg] * T iON + +[P lkg/int (ISOON ) * T iON + P lkg/int (ISOOF F ) * T iOF F ] * #iso+ +[P lkg/int (ContrON ) * T iON + P lkg/int (ContrOF F ) * T iOF F ]+ +[P lkg/int (CGON ) * T iON + P lkg/int (CGOF F ) * T iOF F ] (2)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig. 1 .</head><label>1</label><figDesc>Fig. 1. Proposed analysis and implementation flow.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 .</head><label>1</label><figDesc>FFT and Zoom test-cases. area rows report percentage area of each LR wrt total area. Other rows report percentage power variation (both estimated and real) of the design when CG or PG are applied to the considered LR. N.A. refers to purely combinational LRs, where CG is not applied. CG real -5.50 -22.02 -6.26 -0.91 N.A.--0.83 -0.93 -6.71 -2.34 -5.26 -3.20 -4.16 -0.62 CG est -5.50 -22.06 -6.27 -0.91 N.A. -0.83 -0.93 -6.72 -2.35 -5.27 -3.25 -4.17 -0.62 PG real -5.69 -22.98 -6.56 -0.94 0.21 -0.74 -0.95 -7.46 -2.48 -5.47 -3.36 -4.33 -0.58 PG est -5.68 -22.99 -6.56 -0.94 0.19 -0.74 -0.95 -7.46 -2.47 -5.47 -3.42 -4.32 -0.58</figDesc><table><row><cell>PG set is empty;</cell><cell>if P G total overhead &lt; 0 then</cell></row><row><cell>CG set is empty;</cell><cell>estimate CG total overhead;</cell></row><row><cell>foreach LRi in set LRs do evaluate area(LRi, area th ) end function: evaluate area(LRi, area th ):</cell><cell>if P G total overhead &lt; CG total overhead then add LRi to PG set; else add LRi to CG set;</cell></row><row><cell>calculate LRi area;</cell><cell>end</cell></row><row><cell>if areaLR &gt; area th then evaluate PG(LRi);</cell><cell>else evaluate CG(LRi);</cell></row><row><cell>else evaluate CG(LRi); end PG evaluation: evaluate PG(LRi): estimate PG total overhead;</cell><cell>end evaluate CG(LRi); CG evaluation: evaluate CG(LRi): estimate CG total overhead; if CG total overhead &lt; 0 then</cell></row><row><cell></cell><cell>add LRi to CG set;</cell></row><row><cell></cell><cell>end</cell></row></table><note>Algorithm 1: Power saving strategy selection for CGR systems.</note></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A Decade of Reconfigurable Computing: a Visionary Retrospective</title>
		<author>
			<persName><forename type="first">R</forename><surname>Hartenstein</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of Design, Automation and Test in Europe (DATE&apos;01)</title>
				<meeting>of Design, Automation and Test in Europe (DATE&apos;01)</meeting>
		<imprint>
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Power-Awarness in Coarse-Grained Reconfigurable Multi-Functional Architectures: a Dataflow Based Strategy</title>
		<author>
			<persName><forename type="first">F</forename><surname>Palumbo</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11265-016-1106-9</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Signal Processing Systems</title>
		<imprint>
			<biblScope unit="volume">87</biblScope>
			<biblScope unit="page" from="81" to="106" />
			<date type="published" when="2017-04">Apr 2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Automated design flow for multi-functional dataflow-based platforms</title>
		<author>
			<persName><forename type="first">C</forename><surname>Sau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Signal Processing Systems</title>
		<imprint>
			<biblScope unit="volume">85</biblScope>
			<biblScope unit="page" from="143" to="165" />
			<date type="published" when="2016-10">Oct 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Power-awarness in coarse-grained reconfigurable designs: A dataflow based strategy</title>
		<author>
			<persName><forename type="first">F</forename><surname>Palumbo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Workshop on Signal Processing Systems</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Coarse-grained reconfiguration: dataflow-based power management</title>
		<author>
			<persName><forename type="first">F</forename><surname>Palumbo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IET Computers Digital Techniques</title>
		<imprint>
			<biblScope unit="volume">9</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="36" to="48" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Automated Power Gating Methodology for Dataflow-based Reconfigurable Systems</title>
		<author>
			<persName><forename type="first">T</forename><surname>Fanni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference on Computing Frontiers</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<author>
			<persName><surname>Si2</surname></persName>
		</author>
		<title level="m">Si2 Common Power Format Specification TM -Version 2</title>
				<imprint>
			<date type="published" when="2014-12">Dec. 2014</date>
			<biblScope unit="volume">1</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Automatic generation of dataflow-based reconfigurable co-processing units</title>
		<author>
			<persName><forename type="first">C</forename><surname>Sau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conf. on Design and Architectures for Signal and Image Processing</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Reconfigurable coprocessors synthesis in the MPEG-RVC domain</title>
		<author>
			<persName><forename type="first">C</forename><surname>Sau</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference on ReConFigurable Computing and FPGAs</title>
				<imprint>
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems</title>
		<author>
			<persName><forename type="first">T</forename><surname>Fanni</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference on Computing Frontiers</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Modelling and automated implementation of optimal power saving strategies in coarse-grained reconfigurable architectures</title>
		<author>
			<persName><forename type="first">F</forename><surname>Palumbo</surname></persName>
		</author>
		<idno type="DOI">10.1155/2016/4237350</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Electrical and Computer Engineering</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">An Algorithm for the Machine Computation of Complex Fourier Series</title>
		<author>
			<persName><forename type="first">J</forename><surname>Cooley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Tukey</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="s">Mathematics of Computation</title>
		<imprint>
			<biblScope unit="volume">19</biblScope>
			<date type="published" when="1965">1965</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Adaptive Energy Management for Dynamically Reconfigurable Processors</title>
		<author>
			<persName><forename type="first">M</forename><surname>Shafique</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. on CAD of Integrated Circuits and Systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="50" to="63" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Power modeling for digital circuits with clock gating</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kim</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEICE Electronics Express</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">24</biblScope>
			<biblScope unit="page" from="20150817" to="20150817" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Run-time Active Leakage Reduction by Power Gating and Reverse Body Biasing: An Energy View</title>
		<author>
			<persName><forename type="first">H</forename><surname>Xu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Conference on Computer Design</title>
				<imprint>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">High-Precision Power Modelling of the Tegra K1 Variable SMP Processor Architecture</title>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">R</forename><surname>Stokke</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Symposium on Embedded Multicore/Many-core Systems-on-Chip</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">FALPEM: framework for architectural-level power estimation and optimization for large memory sub-systems</title>
		<author>
			<persName><forename type="first">A</forename><surname>Chhabra</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Trans. on CAD of Integrated Circuits and Systems</title>
		<imprint>
			<biblScope unit="volume">34</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="1138" to="1142" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing</title>
		<author>
			<persName><forename type="first">S</forename><surname>Li</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">TACO</title>
		<imprint>
			<biblScope unit="volume">10</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
