=Paper=
{{Paper
|id=Vol-1566/paper2
|storemode=property
|title=Cross-Layer Approaches for an Aging-Aware Design Space Exploration for Microprocessors
|pdfUrl=https://ceur-ws.org/Vol-1566/Paper2.pdf
|volume=Vol-1566
|authors=Fabian Oboril,Mehdi Tahoori
|dblpUrl=https://dblp.org/rec/conf/date/OborilT16
}}
==Cross-Layer Approaches for an Aging-Aware Design Space Exploration for Microprocessors==
5
Cross-Layer Approaches for an Aging-Aware Design Space Exploration for Microprocessors
Fabian Oboril, and Mehdi B. Tahoori
Chair of Dependable Nano Computing (CDNC), Karlsruhe Institute of Technology (KIT)
Email: {fabian.oboril, mehdi.tahoori}@kit.edu
Abstract—With the continuous scaling of CMOS technologies, main-
taining the microprocessor reliability becomes a major design challenge. Application
In particular, accelerated transistor aging is a serious reliability concern,
as it considerably reduces the operational system lifetime. To address this OS / Firmware
issue, in this work cross-layer solutions for aging modeling, simulation
and mitigation are proposed, to be able to co-optimize reliability together
with the traditional design constraints such as power, performance, and (Micro)-Architecture
cost. Therefore, the knowledge from several abstraction layers, ranging
circuit- to architecture-level, are exploited for cost-effective aging-aware Circuit
architecture and system design. The comprehensive simulations and
experimental analysis performed in this work show the benefits of this Gate
approach over state-of-the-art single-layer solutions.
I. I NTRODUCTION Device
Thanks to the aggressive scaling of transistor dimensions in the past
decades, computing systems have revolutionized our life. However, in Fig. 3. Abstraction layers in the hardware-software design stack
the shade of the downscaling benefits such as increased microproces-
sor performance, more integrated features and improved energy/cost In addition, lots of effort is spent on improvements at the lowest
efficiency, the reliability of nanoscale devices became a major threat hardware layers (i.e. at transistor/gate-level), as these layers are very
for the future success of computing systems (see Fig. 1) [2]–[5]. As close to the physical origin of the problem (see Fig. 3). However,
a result, with every new technology node, it becomes harder for chip the influence of higher levels in the hardware-software design stack
manufacturers to ensure the reliable operation of their chips, and thus is neglected in most state-of-the-art solutions, although these layers
malfunctions during the operational mode, that can lead to erroneous have a considerable impact on the system lifetime, for example by
program outputs or even system crashes, become more likely. influencing the thermal behavior of the microprocessor. Therefore,
Among various reliability challenges, accelerated transistor aging it is crucial to consider the effect of these higher layers, to achieve
is of particular importance, as it degrades the transistor switching cost-efficient resilient computer systems. In fact, due to the extend
speed, and thus leads to slower circuits over time [3], [6]. Conse- of transistor aging (see Fig. 2) [2], it will be necessary in future that
quently, in synchronous digital systems, timing failures due to the various layers contribute in a combined fashion (i.e. cross-layer1 ) to
increased circuit delay can occur and cause incorrect system states. co-optimize lifetime with the other design parameters more efficiently
Because of that, the microprocessor lifetime and as a result also the compared to state-of-the-art solutions, which are typically single-
overall system lifetime is considerably reduced, if no countermeasures layer approaches [9], [10].
are taken. This is especially critical for embedded systems that In this work we push cross-layer solutions for aging modeling
require long mission times, for instance in health care (e.g. implants), and simulation, as well as aging mitigation forward. Therefore, we
space missions (e.g. satellites) or electronic control units (e.g. in address the major transistor aging phenomena that cause a gradual
airplanes) [7]. Therefore, it is a necessity to consider reliability, and in degradation of the device parameters such as switching delay, namely
particular lifetime, as another design constraint, beside the traditional Bias Temperature Instability (BTI) [11] and Hot Carrier Injection
performance, power and cost parameters. However, due to the strong (HCI) [12]. In detail, our contributions are as follows:
interdependencies among these different criteria, the co-optimization 1) A set of novel cross-layer aging modeling and analysis frame-
is very challenging. works was developed to allow an effective design space explo-
To avoid aging-induced failures, designers add conservative timing ration throughout the different microprocessor design phases
margins to their designs, which, however, is inefficient and costly [8]. considering the interdependencies of the different design pa-
m m m nm nm rameters including lifetime. Compared to existing state-of-the-
22n 45n 90n 130 180
? years art solutions the advantage of the proposed frameworks is that
1 a much wider design space can be explored due to the cross-
Failure Rate
layer approach combined with the architectural aging models
which allows to evaluate more parameters. Consequently, using
Wearout these platforms, the most critical processor components can
Period
Infant be identified, and selective, cost-efficient cross-layer aging
Mortality 2 Normal
Period Execution Period mitigation techniques can be designed.
Time 2) Using the aforementioned cross-layer platforms a set of effi-
<7 years ⇠7 years ⇠10 years cient cross-layer aging mitigation techniques was designed that
Fig. 1. Increasing unreliability in nanoscale technology nodes due to outperform the existing state-of-the-art solutions. The proposed
accelerated transistor aging 1 and susceptibility to noise as well as soft
techniques include several design time approaches to address
errors 2 (based on [1])
aging of the most critical microprocessor components. Besides,
Acceleration
also a dynamic runtime adaptation scheme was developed to
detect and avoid potentially critical system conditions while
the system is running. This solution complements the design
1 Cross-Layer means that the knowledge and parameters available at multi-
32 nm 22 nm 14 nm 10 nm 7 nm ple abstraction layers are used in combination to optimize the design, whereas
Fig. 2. Aging acceleration in the next technology nodes (based on [2]) single-layer solutions exploit only the information of a single level.
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Co-Located with DATE 2016 -
Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume
is published and copyrighted by its editors.
6
Specification
gem5 Performance Simulator
As a result, this framework considers the influence of parameters
Application at microarchitecture- up to application-level. Moreover, as ExtraTime
Perf. Power Temp. Perf. Delay does not require low-level details (e.g. the actual gate-level implemen-
Data Data Data Data Data tation), it can be employed in early design phases for a first-order
Power aging analysis and design space exploration.
Technology Power Temp. Temp. Aging To also take low-level aspects into account, a second comple-
Parameters Model Model Data Model
Temp. mentary platform was developed, which is based on standard EDA
tools for design synthesis and simulation. It can analyze all internal
Technology Data gates, and it considers the interplay of real-world applications, aging,
Fig. 4. ExtraTime framework for aging modeling and evaluation
power and temperature [20]. Thus, it is very accurate as aging can be
analyzed at gate-level using the models proposed in [17]–[19], but is
less flexible compared to ExtraTime. Consequently, it is intended for
time techniques, which are incapable of dealing with runtime
fine tuning in later design phases.
variations (e.g. changing system conditions). Because of that,
the design time solutions consider lifetime as one optimization III. D ESIGN T IME & RUNTIME AGING M ITIGATION S OLUTIONS
objective and tune the design accordingly based on a given
Using these novel cross-layer platforms the most critical micro-
set of representative workload scenarios, while the runtime
processor components were identified and several unique aging miti-
technique deals with the constantly changing conditions and
gation techniques were designed and evaluated, which are presented
adapts the system to avoid critical states. Thus, the combination
in the following subsections.
of both schemes enables effective and holistic aging mitigation
solutions, which allow a very aggressive and cost-effective A. Aging-Aware Design of Instruction Pipelines
system design. Traditionally the delays of all instruction pipeline stages are
In the following, the different contributions are explained in detail. balanced at design time. However, transistor aging causes a non-
uniform delay degradation among all stages due to different usage
II. F RAMEWORKS FOR W EAROUT M ODELING AND E VALUATING
patterns (see Fig. 5(a)). Hence, this design approach results in an
The first framework developed, is an architectural platform called imbalanced and non-optimal design after a short period of time.
ExtraTime [14]. It is based on the performance simulator gem5 [15] Consequently, a single stage becomes the bottleneck for the overall
which was extended with sophisticated models for power and tem- processor lifetime. In other words, while one pipeline stage already
perature. In order to make these models as realistic as possible, produces timing failures, the other stages still operate correctly. To
they were optimized, and afterwards calibrated and validated using alleviate this problem, a novel instruction pipeline design paradigm
a real experimental platform based on recent Intel Core-i-processors, is proposed (MTTF-balanced pipeline) according to which all stage
which have on-die power and thermal sensors [16]. As a result, the delays are balanced at the end of the desired lifetime (see Fig. 5(b)).
model accuracy is very good (e.g. temperature inaccuracy is < 2 C). The main idea of this approach is to increase the timing slack
In addition, novel and realistic architectural aging models were of aging-critical stages to improve their lifetime, while the timing
developed and incorporated (see Fig. 4). The main advantage of these slack of non-critical stages can be reduced to improve their energy
models is that they do not require detailed circuit-level information efficiency by using slower yet more energy efficient gates. As a result,
to estimate the degradation of a complete architectural component the processor lifetime can be considerably improved by more than
(e.g. ALU, Branch Predictor, etc.). For this purpose, these models 2.3⇥, and at the same time the power consumption can be reduced
were derived from transistor-level models for BTI and HCI [17]– by 10 %. In addition, performance and cost are not affected [20]. This
[19] by introducing a representative transistor, which reflects the underlines, that it is much more effective to address aging already in
average usage behavior (switching activity, ON time, OFF time) early design phases, rather than only adding guardbands to the final
of all transistors within this block. In addition, the temperature of design to cope with the delay degradation.
this representative transistor is estimated by the block temperature.
By that means, the degradation of the representative transistor can B. Aging-Aware Cross-Layer Instruction Scheduling
be obtained, and thus the degradation of the entire block can be As shown in Fig. 5(a), the execution units belong to the most
estimated. In this regard it is important to note that the accuracy of the aging-critical processor components. Therefore, an aging-aware in-
resulting aging models is very good given their level of abstraction. struction scheduling technique was developed [21]. The novelty of
In fact, the inaccuracy compared to accurate gate-level models for an this scheduling policy is that the timing-criticality of instructions
architectural block such as an ALU is less than 5 % without requiring (see Fig. 6) is considered during the scheduling process to reduce
detailed circuit–level knowledge. the load of units that execute critical instructions. Therefore, the
115 10 years 115 10 years
7 years 7 years
Relative delay in [%]
Relative delay in [%]
3 years 3 years
110 0 years 110 0 years
105 105
100 100
95 95
Fe
Pr
D cod
Re de
D me
Is tch
Re
Ex ead
Lo te
W -Sto
Re Bac
Fe
Pr
D
Re e
D me
Is tch
Re
Ex ead
Lo te
W -Sto
Re Bac
ec e
isp
su
ec de
isp
su
ed
ed
rit re
rit re
tc
tc
gR
ec
gR
ec
na
ad
tir k
na
ad
tir k
o
e
od
e
h
h
e
e
e
e
a
a
u
u
e
e
co
(a) Delay-Balanced Design (b) MTTF-Balanced Design
Fig. 5. Delay degradation of the delay-balanced design and MTTF-balanced design for the FabScalar microprocessor [13]
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Co-Located with DATE 2016 -
Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume
is published and copyrighted by its editors.
7
Cycle Boundary
Occurrence rate (Application Level) Non-Timing-Critical Timing-Critical
Remaining Lifetime
Reactive Adaptation Proactive Adaptation
Runtime
Fig. 8. Proactive vs. Reactive Adaptation Strategies
Aging Aging one of these two parameters is considered, there will be considerable
Instruction Delay disadvantages for the other one as pointed out in [22].
(Circuit Level)
Fig. 6. Illustration of the timing criticality of instructions supported by a D. Pro-Active Aging-Aware Dynamic Runtime Adaptation
functional unit (e.g. ALU)
To detect and avoid potentially critical conditions while the sys-
tem is running, dynamic schemes employed at runtime have to
circuit-level delay of all instructions as well as their application- complement solutions applied at design time [23]. However, the
level occurrence rates were analyzed to classify the instructions into dynamic state-of-the-art techniques employ only reactive adaptation
timing-critical (those that start to fail first) and non-timing-critical techniques. These are inefficient due to the nature of “damage
instructions. Then, dedicated functional units are used for the different control”-type of policies, i.e., they deal with already “aged” chips. In
instruction classes. Since less than 20 % of all executed instructions contrast, we propose a proactive and preventive runtime adaptation
are timing-critical, the unit(s) taking care of these instructions are policy that tries to proactively slow down aging in all phases of the
idle most of the time, which is exploited to considerably improve the chip lifetime, and hence can prolong the lifetime more efficiently
overall lifetime of the functional units2 by employing input vector than the existing techniques (see Fig. 8), i.e. with lower performance
control or aggressive power gating policies. In fact, our simulation and power overheads [24]. Therefore, an hierarchical expert system
results obtained with ExtraTime show that the overall lifetime can was developed (see Fig. 9) that takes input from a sensor network
be improved by more than 1.6⇥ compared to existing scheduling (or models running in software) to analyze the current system state
policies that ignore the detailed timing information (i.e. these are as well as the trend of recent system states in a very fine-grained
single-layer solutions) and instead balance the number of incoming manner, i.e. every 1 ms-10 ms and adapts the system accordingly.
instructions among all available units. Whenever a critical state or trend in terms of lifetime, temperature
or power consumption is detected, the system is adapted by means
C. Aging-Aware Instruction Set Encoding of frequency and voltage tuning, that is, frequency and voltage are
Beside the execution units also the decoding stages of a micro- reduced by one level. If no parameter as well as no trend is critical,
processor can become critical and limit the microprocessor lifetime the current system performance is evaluated. If the performance can
(see Fig. 5(a)). Hence, the decoding stages require an aging-aware be maintained with a lower frequency level, frequency and supply
design. Since the instruction set encoding, i.e. the mapping between voltage are lowered to improve lifetime, power consumption and
instructions and opcodes, has a strong influence on the wearout of temperature, otherwise the frequency is kept on the same level or
the decoding stages (see Fig. 7), we propose a novel aging-aware is even increased if necessary. As a result, the lifetime of the entire
instruction set encoding methodology called ArISE to address the microprocessor can be improved by more than 2⇥, and the energy
delay degradation in the decoding stages [22]. This methodology consumption can be reduced by 14 %, while the performance penalty
exploits simulated annealing and genetic algorithms to optimize the is almost negligible (2 % on average). This shows that with such
instruction set encoding with respect to lifetime as well as power a cross-layer, proactive runtime adaptation technique the different
consumption, since exhaustive optimization solutions are infeasible design parameters can be co-optimized very effectively although the
due to the large number of possible encodings (i.e. typically more adaptation decisions are performed at system-level.
than 10200 ). The result is an optimization that yields significant
IV. S UMMARY
lifetime improvements (more than 2⇥ compared to state-of-the-art)
with negligible impact on other design parameters. This is due to In this work cross-layer solutions for aging modeling and simu-
the fact that power consumption and lifetime are co-optimized in our lation as well as mitigation were pushed forward. Therefore, a set
proposed approach which iteratively improves the encoding. If only of unique frameworks and mitigation techniques were developed. In
addition, it was demonstrated that cross-layer solutions allow a much
2 Please note that still the units taking care of the timing-critical instructions more efficient co-optimization of all design parameters including
limit the overall lifetime due to the way the instructions are classified. lifetime compared to state-of-the-art single-layer solutions.
New Configuration Current Configuration User/OS Input
10 Encoding 1
Encoding 2
Delay Change in %
8 Encoding 3 Global constraints Objectives
Expert history P-States
6
4 Local Temperature Power Wearout Performance
Experts Expert Expert Expert Expert
2
0
Fet Pre De Re Dis Iss Re Ex Lo Wr Re
ch dec cod nam pat ue gRe ecu ad- iteB tire Sensor
ode e e ch ad te Sto
re ack System
Temperature Power Wearout Performance
Sensors Sensors Sensors Sensors
Fig. 7. Aging after 3 years for the FabScalar microprocessor [13] for different
encodings Fig. 9. Organization of the expert system
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Co-Located with DATE 2016 -
Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume
is published and copyrighted by its editors.
8
V. ACKNOWLEDGEMENT
This work was partly supported by the German Research Foun-
dation (DFG) as part of the national focal program “Dependable
Embedded Systems” (SPP-1500, http://spp1500.ira.uka.de).
VI. R EFERENCES
[1] T. Mak, “Is CMOS More Reliable with Scaling?” in Proceedings of the
Online Testing Workshop, Jul. 2002.
[2] H. Nguyen, “Resiliency Challenges in Future Communications Infras-
tructure,” in Proceedings of the Communications Quality and Reliability
Workshop, May 2014.
[3] International Technology Roadmap for Semiconductors, in ITRS 2013
Edition – Process Integration, Devices, and Structures, 2014.
[4] S. Borkar et al., “The Future of Microprocessors,” Communications of
the ACM, pp. 67–77, May 2011.
[5] S. Mitra et al., “Robust System Design to Overcome CMOS Reliability
Challenges,” IEEE Journal on Emerging and Selected Topics in Circuits
and Systems, pp. 30–41, Mar. 2011.
[6] S. Borkar, “Designing Reliable Systems from Unreliable Components:
The Challenges of Transistor Variability and Degradation,” IEEE Micro,
pp. 10–16, Nov. 2005.
[7] V. Narayanan et al., “Reliability Concerns in Embedded System
Designs,” Computer, pp. 118–120, Jan. 2006.
[8] M. Agarwal et al., “Circuit Failure Prediction and Its Application to
Transistor Aging,” in Proceedings of the VLSI Test Symposium, May
2007, pp. 277–286.
[9] J. Henkel et al., “Design and architectures for dependable embedded
systems,” in Proceedings of the Conference on Hardware/Software
Codesign and System Synthesis, Oct. 2011, pp. 69–78.
[10] European Nanoelectronics Initiative Advisory Council, ENIAC Strategic
Research Agenda - European Technology Platform Nanoelectronics,
2nd ed. ENIAC, 2007.
[11] Bias Temperature Instability in HKMG MOSFETs: Characterization,
Process Dependence, DC/AC Modeling and Stochastic Effects, Mar.
2014, Tutorial.
[12] X. Li et al., “Compact Modeling of MOSFET Wearout Mechanisms
for Circuit-Reliability Simulation,” IEEE Transactions on Device and
Materials Reliability, pp. 98–121, Mar. 2008.
[13] N. Choudhary et al., “FabScalar: Automating Superscalar Core Design,”
IEEE Micro, pp. 48–59, May 2012.
[14] F. Oboril et al., “ExtraTime: Modeling and Analysis of Wearout due
to Transistor Aging at Microarchitecture-Level,” in Proceedings of the
International Conference on Dependable Systems and Networks, Jun.
2012, pp. 1–12.
[15] N. L. Binkert et al., “The M5 Simulator: Modeling Networked
Systems,” IEEE Micro, pp. 52–60, Jul. 2006.
[16] F. Oboril et al., “High-Resolution Online Power Monitoring for Modern
Microprocessors,” in Proceedings of the Conference on Design, Automa-
tion and Test in Europe, Mar. 2014, pp. 1–4, to appear.
[17] W. Wang et al., “Compact Modeling and Simulation of Circuit Relia-
bility for 65-nm CMOS Technology,” IEEE Transactions on Device and
Materials Reliability, pp. 509–517, Dec. 2007.
[18] E. Takeda et al., “New hot-carrier injection and device degradation in
submicron MOSFETs,” IEEE Proceedings I, Solid-State and Electron
Devices, pp. 144–150, Jun. 1983.
[19] A. Bravaix et al., “Hot-Carrier Acceleration Factors for Low Power Man-
agement in DC-AC stressed 40nm NMOS node at High Temperature,”
in Proceedings of the International Reliability Physics Symposium, Apr.
2009, pp. 531–548.
[20] F. Oboril et al., “Aging-Aware Design of Microprocessor Instruction
Pipelines,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, pp. 704–716, May 2014.
[21] F. Oboril et al., “Negative Bias Temperature Instability-Aware
Instruction Scheduling: A Cross-Layer Approach,” Journal of Low
Power Electronics, pp. 389–402, Dec. 2013.
[22] F. Oboril et al., “Exploiting Instruction Set Encoding for Aging-Aware
Microprocessor Design,” ACM Transactions on Design Automation of
Electronic Systems, pp. 1–23, 2015, to appear.
[23] P. Gupta et al., “Underdesigned and Opportunistic Computing in Pres-
ence of Hardware Variability,” IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems, pp. 8–23, Jan. 2013.
[24] F. Oboril et al., “Reducing Wearout in Embedded Processors using
Proactive Fine-Grained Dynamic Runtime Adaptation,” in Proceedings
of the European Test Symposium, May 2012, pp. 68–73.
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Co-Located with DATE 2016 -
Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume
is published and copyrighted by its editors.