5 Cross-Layer Approaches for an Aging-Aware Design Space Exploration for Microprocessors Fabian Oboril, and Mehdi B. Tahoori Chair of Dependable Nano Computing (CDNC), Karlsruhe Institute of Technology (KIT) Email: {fabian.oboril, mehdi.tahoori}@kit.edu Abstract—With the continuous scaling of CMOS technologies, main- taining the microprocessor reliability becomes a major design challenge. Application In particular, accelerated transistor aging is a serious reliability concern, as it considerably reduces the operational system lifetime. To address this OS / Firmware issue, in this work cross-layer solutions for aging modeling, simulation and mitigation are proposed, to be able to co-optimize reliability together with the traditional design constraints such as power, performance, and (Micro)-Architecture cost. Therefore, the knowledge from several abstraction layers, ranging circuit- to architecture-level, are exploited for cost-effective aging-aware Circuit architecture and system design. The comprehensive simulations and experimental analysis performed in this work show the benefits of this Gate approach over state-of-the-art single-layer solutions. I. I NTRODUCTION Device Thanks to the aggressive scaling of transistor dimensions in the past decades, computing systems have revolutionized our life. However, in Fig. 3. Abstraction layers in the hardware-software design stack the shade of the downscaling benefits such as increased microproces- sor performance, more integrated features and improved energy/cost In addition, lots of effort is spent on improvements at the lowest efficiency, the reliability of nanoscale devices became a major threat hardware layers (i.e. at transistor/gate-level), as these layers are very for the future success of computing systems (see Fig. 1) [2]–[5]. As close to the physical origin of the problem (see Fig. 3). However, a result, with every new technology node, it becomes harder for chip the influence of higher levels in the hardware-software design stack manufacturers to ensure the reliable operation of their chips, and thus is neglected in most state-of-the-art solutions, although these layers malfunctions during the operational mode, that can lead to erroneous have a considerable impact on the system lifetime, for example by program outputs or even system crashes, become more likely. influencing the thermal behavior of the microprocessor. Therefore, Among various reliability challenges, accelerated transistor aging it is crucial to consider the effect of these higher layers, to achieve is of particular importance, as it degrades the transistor switching cost-efficient resilient computer systems. In fact, due to the extend speed, and thus leads to slower circuits over time [3], [6]. Conse- of transistor aging (see Fig. 2) [2], it will be necessary in future that quently, in synchronous digital systems, timing failures due to the various layers contribute in a combined fashion (i.e. cross-layer1 ) to increased circuit delay can occur and cause incorrect system states. co-optimize lifetime with the other design parameters more efficiently Because of that, the microprocessor lifetime and as a result also the compared to state-of-the-art solutions, which are typically single- overall system lifetime is considerably reduced, if no countermeasures layer approaches [9], [10]. are taken. This is especially critical for embedded systems that In this work we push cross-layer solutions for aging modeling require long mission times, for instance in health care (e.g. implants), and simulation, as well as aging mitigation forward. Therefore, we space missions (e.g. satellites) or electronic control units (e.g. in address the major transistor aging phenomena that cause a gradual airplanes) [7]. Therefore, it is a necessity to consider reliability, and in degradation of the device parameters such as switching delay, namely particular lifetime, as another design constraint, beside the traditional Bias Temperature Instability (BTI) [11] and Hot Carrier Injection performance, power and cost parameters. However, due to the strong (HCI) [12]. In detail, our contributions are as follows: interdependencies among these different criteria, the co-optimization 1) A set of novel cross-layer aging modeling and analysis frame- is very challenging. works was developed to allow an effective design space explo- To avoid aging-induced failures, designers add conservative timing ration throughout the different microprocessor design phases margins to their designs, which, however, is inefficient and costly [8]. considering the interdependencies of the different design pa- m m m nm nm rameters including lifetime. Compared to existing state-of-the- 22n 45n 90n 130 180 ? years art solutions the advantage of the proposed frameworks is that 1 a much wider design space can be explored due to the cross- Failure Rate layer approach combined with the architectural aging models which allows to evaluate more parameters. Consequently, using Wearout these platforms, the most critical processor components can Period Infant be identified, and selective, cost-efficient cross-layer aging Mortality 2 Normal Period Execution Period mitigation techniques can be designed. Time 2) Using the aforementioned cross-layer platforms a set of effi- <7 years ⇠7 years ⇠10 years cient cross-layer aging mitigation techniques was designed that Fig. 1. Increasing unreliability in nanoscale technology nodes due to outperform the existing state-of-the-art solutions. The proposed accelerated transistor aging 1 and susceptibility to noise as well as soft techniques include several design time approaches to address errors 2 (based on [1]) aging of the most critical microprocessor components. Besides, Acceleration also a dynamic runtime adaptation scheme was developed to detect and avoid potentially critical system conditions while the system is running. This solution complements the design 1 Cross-Layer means that the knowledge and parameters available at multi- 32 nm 22 nm 14 nm 10 nm 7 nm ple abstraction layers are used in combination to optimize the design, whereas Fig. 2. Aging acceleration in the next technology nodes (based on [2]) single-layer solutions exploit only the information of a single level. Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Co-Located with DATE 2016 - Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 6 Specification gem5 Performance Simulator As a result, this framework considers the influence of parameters Application at microarchitecture- up to application-level. Moreover, as ExtraTime Perf. Power Temp. Perf. Delay does not require low-level details (e.g. the actual gate-level implemen- Data Data Data Data Data tation), it can be employed in early design phases for a first-order Power aging analysis and design space exploration. Technology Power Temp. Temp. Aging To also take low-level aspects into account, a second comple- Parameters Model Model Data Model Temp. mentary platform was developed, which is based on standard EDA tools for design synthesis and simulation. It can analyze all internal Technology Data gates, and it considers the interplay of real-world applications, aging, Fig. 4. ExtraTime framework for aging modeling and evaluation power and temperature [20]. Thus, it is very accurate as aging can be analyzed at gate-level using the models proposed in [17]–[19], but is less flexible compared to ExtraTime. Consequently, it is intended for time techniques, which are incapable of dealing with runtime fine tuning in later design phases. variations (e.g. changing system conditions). Because of that, the design time solutions consider lifetime as one optimization III. D ESIGN T IME & RUNTIME AGING M ITIGATION S OLUTIONS objective and tune the design accordingly based on a given Using these novel cross-layer platforms the most critical micro- set of representative workload scenarios, while the runtime processor components were identified and several unique aging miti- technique deals with the constantly changing conditions and gation techniques were designed and evaluated, which are presented adapts the system to avoid critical states. Thus, the combination in the following subsections. of both schemes enables effective and holistic aging mitigation solutions, which allow a very aggressive and cost-effective A. Aging-Aware Design of Instruction Pipelines system design. Traditionally the delays of all instruction pipeline stages are In the following, the different contributions are explained in detail. balanced at design time. However, transistor aging causes a non- uniform delay degradation among all stages due to different usage II. F RAMEWORKS FOR W EAROUT M ODELING AND E VALUATING patterns (see Fig. 5(a)). Hence, this design approach results in an The first framework developed, is an architectural platform called imbalanced and non-optimal design after a short period of time. ExtraTime [14]. It is based on the performance simulator gem5 [15] Consequently, a single stage becomes the bottleneck for the overall which was extended with sophisticated models for power and tem- processor lifetime. In other words, while one pipeline stage already perature. In order to make these models as realistic as possible, produces timing failures, the other stages still operate correctly. To they were optimized, and afterwards calibrated and validated using alleviate this problem, a novel instruction pipeline design paradigm a real experimental platform based on recent Intel Core-i-processors, is proposed (MTTF-balanced pipeline) according to which all stage which have on-die power and thermal sensors [16]. As a result, the delays are balanced at the end of the desired lifetime (see Fig. 5(b)). model accuracy is very good (e.g. temperature inaccuracy is < 2 C). The main idea of this approach is to increase the timing slack In addition, novel and realistic architectural aging models were of aging-critical stages to improve their lifetime, while the timing developed and incorporated (see Fig. 4). The main advantage of these slack of non-critical stages can be reduced to improve their energy models is that they do not require detailed circuit-level information efficiency by using slower yet more energy efficient gates. As a result, to estimate the degradation of a complete architectural component the processor lifetime can be considerably improved by more than (e.g. ALU, Branch Predictor, etc.). For this purpose, these models 2.3⇥, and at the same time the power consumption can be reduced were derived from transistor-level models for BTI and HCI [17]– by 10 %. In addition, performance and cost are not affected [20]. This [19] by introducing a representative transistor, which reflects the underlines, that it is much more effective to address aging already in average usage behavior (switching activity, ON time, OFF time) early design phases, rather than only adding guardbands to the final of all transistors within this block. In addition, the temperature of design to cope with the delay degradation. this representative transistor is estimated by the block temperature. By that means, the degradation of the representative transistor can B. Aging-Aware Cross-Layer Instruction Scheduling be obtained, and thus the degradation of the entire block can be As shown in Fig. 5(a), the execution units belong to the most estimated. In this regard it is important to note that the accuracy of the aging-critical processor components. Therefore, an aging-aware in- resulting aging models is very good given their level of abstraction. struction scheduling technique was developed [21]. The novelty of In fact, the inaccuracy compared to accurate gate-level models for an this scheduling policy is that the timing-criticality of instructions architectural block such as an ALU is less than 5 % without requiring (see Fig. 6) is considered during the scheduling process to reduce detailed circuit–level knowledge. the load of units that execute critical instructions. Therefore, the 115 10 years 115 10 years 7 years 7 years Relative delay in [%] Relative delay in [%] 3 years 3 years 110 0 years 110 0 years 105 105 100 100 95 95 Fe Pr D cod Re de D me Is tch Re Ex ead Lo te W -Sto Re Bac Fe Pr D Re e D me Is tch Re Ex ead Lo te W -Sto Re Bac ec e isp su ec de isp su ed ed rit re rit re tc tc gR ec gR ec na ad tir k na ad tir k o e od e h h e e e e a a u u e e co (a) Delay-Balanced Design (b) MTTF-Balanced Design Fig. 5. Delay degradation of the delay-balanced design and MTTF-balanced design for the FabScalar microprocessor [13] Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Co-Located with DATE 2016 - Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 7 Cycle Boundary Occurrence rate (Application Level) Non-Timing-Critical Timing-Critical Remaining Lifetime Reactive Adaptation Proactive Adaptation Runtime Fig. 8. Proactive vs. Reactive Adaptation Strategies Aging Aging one of these two parameters is considered, there will be considerable Instruction Delay disadvantages for the other one as pointed out in [22]. (Circuit Level) Fig. 6. Illustration of the timing criticality of instructions supported by a D. Pro-Active Aging-Aware Dynamic Runtime Adaptation functional unit (e.g. ALU) To detect and avoid potentially critical conditions while the sys- tem is running, dynamic schemes employed at runtime have to circuit-level delay of all instructions as well as their application- complement solutions applied at design time [23]. However, the level occurrence rates were analyzed to classify the instructions into dynamic state-of-the-art techniques employ only reactive adaptation timing-critical (those that start to fail first) and non-timing-critical techniques. These are inefficient due to the nature of “damage instructions. Then, dedicated functional units are used for the different control”-type of policies, i.e., they deal with already “aged” chips. In instruction classes. Since less than 20 % of all executed instructions contrast, we propose a proactive and preventive runtime adaptation are timing-critical, the unit(s) taking care of these instructions are policy that tries to proactively slow down aging in all phases of the idle most of the time, which is exploited to considerably improve the chip lifetime, and hence can prolong the lifetime more efficiently overall lifetime of the functional units2 by employing input vector than the existing techniques (see Fig. 8), i.e. with lower performance control or aggressive power gating policies. In fact, our simulation and power overheads [24]. Therefore, an hierarchical expert system results obtained with ExtraTime show that the overall lifetime can was developed (see Fig. 9) that takes input from a sensor network be improved by more than 1.6⇥ compared to existing scheduling (or models running in software) to analyze the current system state policies that ignore the detailed timing information (i.e. these are as well as the trend of recent system states in a very fine-grained single-layer solutions) and instead balance the number of incoming manner, i.e. every 1 ms-10 ms and adapts the system accordingly. instructions among all available units. Whenever a critical state or trend in terms of lifetime, temperature or power consumption is detected, the system is adapted by means C. Aging-Aware Instruction Set Encoding of frequency and voltage tuning, that is, frequency and voltage are Beside the execution units also the decoding stages of a micro- reduced by one level. If no parameter as well as no trend is critical, processor can become critical and limit the microprocessor lifetime the current system performance is evaluated. If the performance can (see Fig. 5(a)). Hence, the decoding stages require an aging-aware be maintained with a lower frequency level, frequency and supply design. Since the instruction set encoding, i.e. the mapping between voltage are lowered to improve lifetime, power consumption and instructions and opcodes, has a strong influence on the wearout of temperature, otherwise the frequency is kept on the same level or the decoding stages (see Fig. 7), we propose a novel aging-aware is even increased if necessary. As a result, the lifetime of the entire instruction set encoding methodology called ArISE to address the microprocessor can be improved by more than 2⇥, and the energy delay degradation in the decoding stages [22]. This methodology consumption can be reduced by 14 %, while the performance penalty exploits simulated annealing and genetic algorithms to optimize the is almost negligible (2 % on average). This shows that with such instruction set encoding with respect to lifetime as well as power a cross-layer, proactive runtime adaptation technique the different consumption, since exhaustive optimization solutions are infeasible design parameters can be co-optimized very effectively although the due to the large number of possible encodings (i.e. typically more adaptation decisions are performed at system-level. than 10200 ). The result is an optimization that yields significant IV. S UMMARY lifetime improvements (more than 2⇥ compared to state-of-the-art) with negligible impact on other design parameters. This is due to In this work cross-layer solutions for aging modeling and simu- the fact that power consumption and lifetime are co-optimized in our lation as well as mitigation were pushed forward. Therefore, a set proposed approach which iteratively improves the encoding. If only of unique frameworks and mitigation techniques were developed. In addition, it was demonstrated that cross-layer solutions allow a much 2 Please note that still the units taking care of the timing-critical instructions more efficient co-optimization of all design parameters including limit the overall lifetime due to the way the instructions are classified. lifetime compared to state-of-the-art single-layer solutions. New Configuration Current Configuration User/OS Input 10 Encoding 1 Encoding 2 Delay Change in % 8 Encoding 3 Global constraints Objectives Expert history P-States 6 4 Local Temperature Power Wearout Performance Experts Expert Expert Expert Expert 2 0 Fet Pre De Re Dis Iss Re Ex Lo Wr Re ch dec cod nam pat ue gRe ecu ad- iteB tire Sensor ode e e ch ad te Sto re ack System Temperature Power Wearout Performance Sensors Sensors Sensors Sensors Fig. 7. Aging after 3 years for the FabScalar microprocessor [13] for different encodings Fig. 9. Organization of the expert system Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Co-Located with DATE 2016 - Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 8 V. ACKNOWLEDGEMENT This work was partly supported by the German Research Foun- dation (DFG) as part of the national focal program “Dependable Embedded Systems” (SPP-1500, http://spp1500.ira.uka.de). VI. R EFERENCES [1] T. Mak, “Is CMOS More Reliable with Scaling?” in Proceedings of the Online Testing Workshop, Jul. 2002. [2] H. Nguyen, “Resiliency Challenges in Future Communications Infras- tructure,” in Proceedings of the Communications Quality and Reliability Workshop, May 2014. [3] International Technology Roadmap for Semiconductors, in ITRS 2013 Edition – Process Integration, Devices, and Structures, 2014. [4] S. Borkar et al., “The Future of Microprocessors,” Communications of the ACM, pp. 67–77, May 2011. [5] S. Mitra et al., “Robust System Design to Overcome CMOS Reliability Challenges,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, pp. 30–41, Mar. 2011. [6] S. Borkar, “Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation,” IEEE Micro, pp. 10–16, Nov. 2005. [7] V. Narayanan et al., “Reliability Concerns in Embedded System Designs,” Computer, pp. 118–120, Jan. 2006. [8] M. Agarwal et al., “Circuit Failure Prediction and Its Application to Transistor Aging,” in Proceedings of the VLSI Test Symposium, May 2007, pp. 277–286. [9] J. Henkel et al., “Design and architectures for dependable embedded systems,” in Proceedings of the Conference on Hardware/Software Codesign and System Synthesis, Oct. 2011, pp. 69–78. [10] European Nanoelectronics Initiative Advisory Council, ENIAC Strategic Research Agenda - European Technology Platform Nanoelectronics, 2nd ed. ENIAC, 2007. [11] Bias Temperature Instability in HKMG MOSFETs: Characterization, Process Dependence, DC/AC Modeling and Stochastic Effects, Mar. 2014, Tutorial. [12] X. Li et al., “Compact Modeling of MOSFET Wearout Mechanisms for Circuit-Reliability Simulation,” IEEE Transactions on Device and Materials Reliability, pp. 98–121, Mar. 2008. [13] N. Choudhary et al., “FabScalar: Automating Superscalar Core Design,” IEEE Micro, pp. 48–59, May 2012. [14] F. Oboril et al., “ExtraTime: Modeling and Analysis of Wearout due to Transistor Aging at Microarchitecture-Level,” in Proceedings of the International Conference on Dependable Systems and Networks, Jun. 2012, pp. 1–12. [15] N. L. Binkert et al., “The M5 Simulator: Modeling Networked Systems,” IEEE Micro, pp. 52–60, Jul. 2006. [16] F. Oboril et al., “High-Resolution Online Power Monitoring for Modern Microprocessors,” in Proceedings of the Conference on Design, Automa- tion and Test in Europe, Mar. 2014, pp. 1–4, to appear. [17] W. Wang et al., “Compact Modeling and Simulation of Circuit Relia- bility for 65-nm CMOS Technology,” IEEE Transactions on Device and Materials Reliability, pp. 509–517, Dec. 2007. [18] E. Takeda et al., “New hot-carrier injection and device degradation in submicron MOSFETs,” IEEE Proceedings I, Solid-State and Electron Devices, pp. 144–150, Jun. 1983. [19] A. Bravaix et al., “Hot-Carrier Acceleration Factors for Low Power Man- agement in DC-AC stressed 40nm NMOS node at High Temperature,” in Proceedings of the International Reliability Physics Symposium, Apr. 2009, pp. 531–548. [20] F. Oboril et al., “Aging-Aware Design of Microprocessor Instruction Pipelines,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 704–716, May 2014. [21] F. Oboril et al., “Negative Bias Temperature Instability-Aware Instruction Scheduling: A Cross-Layer Approach,” Journal of Low Power Electronics, pp. 389–402, Dec. 2013. [22] F. Oboril et al., “Exploiting Instruction Set Encoding for Aging-Aware Microprocessor Design,” ACM Transactions on Design Automation of Electronic Systems, pp. 1–23, 2015, to appear. [23] P. Gupta et al., “Underdesigned and Opportunistic Computing in Pres- ence of Hardware Variability,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 8–23, Jan. 2013. [24] F. Oboril et al., “Reducing Wearout in Embedded Processors using Proactive Fine-Grained Dynamic Runtime Adaptation,” in Proceedings of the European Test Symposium, May 2012, pp. 68–73. Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Co-Located with DATE 2016 - Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.