41 LPVM: Low-Power Variation-Mitigant Adder Architecture Using Carry Expedition Alireza Namazi Meisam Abdollahi Computer Engineering department Computer Engineering department Tehran university Tehran university Tehran, Iran Tehran, Iran a.namazi@ut.ac.ir meisam.abdolahi@ut.ac.ir Abstract— Addition is one of the most crucial operation in pipeline stage and use their own error detection and correction microprocessors which must be performed within a predefined techniques to overcome the variation issue. For example, RL deadline (critical path). Variation is a phenomenon which [7] uses multiple copy of the output logic and compare them negatively affects the performance of this operation. This paper to find out if there exists any error in results. The PaceLine proposes a new Low-Power Variation-Mitigant (LPVM) adder uses a novel duplication technique based on overclocking design using intrinsic behavior of addition operation. The LPVM approach drastically decrease the probability of deadline feature of processors. The DynaTune proposes a circuit level violation in addition circuit. The basic idea of this paper is to optimization technique to improve circuit behavior by expedite carry propagation in adder circuits for vulnerable probabilistic analysis of critical gates of the circuits. These inputs. The LPVM is an input oriented approach which adds a techniques solve the variation problem globally for the worst simple logic to the adder architecture that only affects the case scenario which may drastically degrade performance. vulnerable inputs. This approach is applicable for all presented All above mentioned techniques are considered to be types of adders and improves all high level approaches which general, hence our proposed approach is specially designed tend to overcome the variation issue. Results show that this for the adder circuits considering their behavior. All circuit- approach decrease the percentage of violated RCA, CLA and level techniques should handle variation effects of their CSA about 70.3%, 59.7% and 67.6% respectively. The LPVM approach not only reduces variation effects on the adder internal combinational segments, therefore having more operation from the view point of performance but also it has a tolerable components leads to better performance in these very negligible impact on the adder power consumption. The techniques. The LPVM imposes significantly less overhead to average power consumption overhead of the LPVM approach the system using intrinsic characteristics of the adder circuits. for RCA, CLA and CSA is about 7.3%, 2.1% and 3.1% for RCA, It can be used along with above mentioned high-level CLA and CSA, respectively. techniques and also can increases their efficiency. The second category includes statistical approaches [10]. Keywords— Addition, Process Variation, Low Power The [10] proposes a high level approach for presenting a I. INTRODUCTION variation-aware binding a component selection to maximize the yield. It uses rebinding and Statistical Static Timing Addition is one of the most useful and important arithmetic Analysis (SSTA) to evaluate and maximize the performance. units [1] in microprocessors. Due to its critical role in almost These techniques are also general and do not consider the all processing elements, there exist several architectures with intrinsic behavior of the circuit in their calculations. To the the same functionality and different characteristics. best of our knowledge, the proposed LPVM adder is the first Researchers have been investigated adders from different variation mitigating approach which considers the behavior of views such as performance[2], power consumption [3][4] and the adder in order to remove the effects of variation on the reliability [5]. Earliest investigations were focused on the operation of adder considering predefined clock period. performance [3] and power consumption improvement [1] of In this paper, we propose a novel low power architecture the adders. for adders to overcome the effects of process variation. This In recent years, due to drastically decrease in feature sizes architecture can be used along with all high level approaches of digital system designs, process variation has become the proposed in the literature and can increase their efficiency major obstacle for system designers and the researchers has because it drastically decreases the adder malfunction shown massive interest to address the variation effects with probability. This reduces their overheads to the system. The techniques from device to system level aspects [6]. Process basic idea behind this approach is to expedite carry variation is the concept of the deviation of manufactures propagation for vulnerable inputs which may violate the component from nominal designed component. The variation working clock period. impacts on the performance of a system and makes disorders The rest of this paper is organized as follows. Section II and violations in their operation. All synchronous systems describes the process variation. Section III describes the have a rigid timing constraints and all units must perform with motivation of the paper. The proposed approach is presented predefined delay constraints. Variation is an issue which in Section IV. Experimental results also presented in Section modifies the delay of operating units stochastically. VII and finally Section VI concludes the paper. There exists many efforts in the literature to overcome the effects of the variation in digital circuits. Previous works can II. PROCESS VARIATION be divided into two major categories. The first one includes There exist many types and sources for variation in deep high level approaches which try to overcome the variation sub-micron digital circuits. Two major sources are known for issue such as Razor Logic (RL) [7], Telescopic Unit (TU), variation: 1- manufacturing variations, 2- operation-induced PaceLine Approach [8] and DynaTune Approach (DA) [9]. variations [12]. This paper has concentrated on the first The RU and TU both add new logics at the end of each category. Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 42 10000 considering all possible input pairs in different adders with MinMaxDiff various bit-widths. Results are extracted using Hspice Percentage (%) 8000 6000 simulations and the variation of the threshold voltage (Vth) is 4000 selected as the main impacting parameter with maximum 2000 deviation range of 20% through Monte Carlo simulations. 0 Simulated results are two folded as presented below: - Input pairs have different calculation delay based on 128 128 128 8 8 8 16 32 64 16 32 64 16 32 64 their carry propagation pattern. RCA CLA CSA - Variation changes the calculation delay of each input Adder Type pair and also may change the worst case delay of the Fig. 1. The percentage of difference between Minimum and adder. These changes may violate the deadline which is Maximum delay in adders for different bit widths based on input pairs predefined for the adder. 16.00 IV. PROPOSED APPROACH 14.00 Delay Percentage(%) 12.00 Considering results gathered from input-based variation 10.00 8.00 simulations on different types of adders, it is obvious that 6.00 although variation impacts the delay characteristics of the 4.00 2.00 adder, but it does not have influence on the result of all inputs. 0.00 This paper proposes a simple and low overhead technique to 15-20% 10-15% 10-15% 15-20% 0-5% 0-5% 5-10% 5-10% overcome the effects of process variation on delay characteristics of each adder. The Low Power Variation 32-bit 64-bit Mitigating (LPVM) approach tries to overcome the process RCA Bit Width variation in adder circuits by simply taking care of inputs with calculation time near the critical path of nominal adder. The Fig. 2. Input pair percentages of RCA with 64 and 128 bit widths LPVM design approach schematic diagram is presented in categorized base on their calculation time in compare of the worst case delay Fig. 4. Simulation results showed that inputs with longer carry propagation chains are more susceptible to process variation. Nanoscale IC manufacturing imperfections lead to The LPVM design approach inserts simple combinational variation in design parameters such as length (L), width (W), logics to the adder architecture in order to break the long carry oxide thickness (Tox) and threshold voltage(Vth) [13]. These propagation chain. The inserted blocks decrease the fluctuations in design of Nano-scale (<<90nm) circuits results calculation time of the adder only for selected inputs and is in many side-effects on the operation of designs in called Carry Chain Breaker (CCB) block. The proposed comparison with nominal design parameters. This paper approach has five steps which is described as follows: focuses on the voltage threshold fluctuations because it is the A. Carry chain Determination (Step 1) most affecting parameter [14] and it changes the expected performance and power consumption of the systems. The first step is to determine Carry Chains (CC) of each However, the proposed approach is applicable for all sources input pair which is called CC (S , F ) . Parameters S and F of variation. show start and finish bits of the CC, respectively. This input pair has three CCs. All carry chains of an input pair ( A, B) III. MOTIVATION reside in a set called Chain Set (a.k.a CS ( A, B) ). The addition performance completely depends on its inputs. Carry chain is consisted of multiple consecutive bit B. Susceptibility Analysis (Step 2) positions in an adder architecture which their carry-out The second step divides input pairs into non-overlapped depends on their input carry. Considering simulation results categories considering their calculation delay. Each input pair which is depicted in Fig. 1, it can be concluded that the adder has a weight ( W ( A, B) ) based on its carry propagation pattern. can calculate the addition result of its inputs with the delay The weight of each input pair only depends to the weight of less than its critical path considering their inputs. Results its longest CC and is calculated using (1). show that inputs in Ripple Carry Adder (RCA), Carry Look- W CS ( A, B) MAX Fcc  Scc  1 (1) ahead (CLA) and Carry Select Adder (CSA) with 8, 16, 32, ccCS 64 and 128 bits, directly affect the calculation delay of the adder. The longest delay relates to inputs with longest carry 80 chain. As adder circuits have different addition delay based Min Variaion MAX Variation 70 on their inputs, different input pairs are categorized based on 60 Percentage (%) their calculation delay in comparison with the longest 50 addition delay. Results depicted in Fig. 2 show that for 32-bit 40 RCA adder delay of about 30% of input pairs is up to 20% 30 20 less than the longest delay. Besides, about 4.1% of input pairs 10 have about 10% of the longest delay. 0 Process variation results in delay changes in adder circuits. 8 16 32 64 128 8 16 32 64 128 8 16 32 64 128 According to results depicted in Fig. 3, it can be concluded RCA CLA CSA that variation effects completely depend on the architecture Adder Type and input pair characteristics. This figure shows the minimum Fig. 3. Minimum and Maximum delay deviation percentage of and maximum deviation percentage from nominal delay calculation delay for input pairs in comparison with nominal delay Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 43 Algorithm 1: LPVM Categorization Carry Chain determination Step 1 Finds the best calculation threshold Susceptibility Anaysis Step 2 Inputs : n (adder bit width) For All Possible Input Pair i=n-1 A Convert _ binary (i ); ½ ° Step 3 B Convert _ binary ( j ); ¾ Step1 i=i-1 Select Random Input Pair CS Carry _ Chain _ Extraction( A, B );°¿ From category G(i) CCL Longest _ Chain(CS ); ½ ¾ Step 2 parameters Variation Test Input Under Process Put pair ( A , B) inG (i ) wherei CCL ¿ Variation End Add G(i) elements No to Selected Input Meet CF ? Pairs carry chains in susceptible input pairs to decrease or Yes overcome the effects of variation on the adder circuit operation. The internal structure of the CCB for detecting Intersection Categorizing Step 4 continues propagation pattern is shown in Fig. 4. The depicted CCB block is designed for CC (i, j ) . This block is consisted CCB Insertion Step 5 of parallel operating 2-input XOR gates which are connected Fig. 4. LPVM approach to the adder inputs to detect the propagation pattern. This block connects the carry of the adder in ith bit to the carry in The weight of each input pair shows the calculation delay the jth position. When propagation pattern is detected, a 2-to- of the input pair in comparison with the longest pairs. All 1 multiplexer selects Ci and replaces it with C j 1 carry. The input pairs with the same weight reside in the same category. CCB block has very low overhead because the XOR gates The category G(i) contains all possible (existing) input pairs already resides in the basic architecture of the adders. The of the first step which their weight is equal to i. Step 1 and CCBs operate in parallel with the adder and does not increase Step 2 are performed based on the LPVM categorization its latency. The number of CCBs and their length completely algorithm which is depicted in Algorithm 1. depends on selected input vectors. The CCB architecture has C. Weight Selection (Step 3) overlapped with full adders. The XOR section of the CCB is The third step is to find out vulnerable weighted categories. generated in all adder circuits which reduces the overhead of It is an iterative approach as depicted in Fig. 4. This step starts the CCBs. from the category with highest weight. It randomly selects an V. EXPERIMENTAL RESULTS input pair. The selected input is examined on the adder architecture to find out its behavior under occurrence of The experimental results consists of two different phases. process variation. In this step, the input is evaluated under The first phase uses the LEON3 processor (32-bit). The different variation conditions based on designer parameters. second phase relates to variation investigation of adder Although, this paper selects threshold voltage as the main circuit. Simulations of this phase are performed with Hspice affecting parameter, other affecting parameters can be used to simulator and the technology size in considered as 32nm. evaluate the variation effect. Afterwards, addition delay in all Some applications of Mibench benchmark suit are executed executed samples are gathered and checked if they have met on the LEON3 processor and the input entries of the adder the deadline or not. At this point, a new affecting parameter unit is gathered and evaluated. The weight distribution of will be inserted by the designer which is called Certainty input pairs is depicted in Fig. 6. Results show that in real Factor (CF). This CF shows the acceptable percentage of application executions, we may not have all possible input violations in calculated results. For example if a designer operands. Therefore exhaustive exploration of input pairs for selects 100% for the CF, this means that the G(i) is acceptable the LPVM approach is no longer necessary. only if all its simulated samples from selected input pair meets Applying LPVM approach on the adder types presented in the deadline. Otherwise the G(i) is not acceptable and should Section III shows that variation phenomena based on be added to the vulnerable inputs. When the algorithm reaches threshold voltage (in the range of 20%) different adder to a category which meets the defined CF, it does not evaluate architectures demonstrate different operation violations. Fig. the rest of categories because their calculation latency is 7.a shows the percentage of operation violations in different absolutely less than evaluated category. By reaching this adder architectures for various threshold variation ranges. point, Step 3 iterations stops. D. Intersection Categorizing (Step 4) The third step is to divide CCs of selected input pairs into A i Bi Ai+1 Bi+1 Aj-1 Bj-1 Aj Bj non-overlapped categories. All CCs which have any intersection with other CCs will be put in a same category. The fourth step is to find most covering subsets from CCs reside in each category. This step finds the most overlapping chain between different vulnerable input pairs. Therefore, the overhead of the CCB block insertion reduces. Cj-1 0 Cj+1 Ci-1 1 E. CCB Block Insertion (Step 5) The last step is inserting the CCB units into the adder Fig. 5 General CCB Architecture architecture. This simple block breaks all or a segment of Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 44 60 0-8 9-16 17-24 25-32 80 50 0-5% 5-10% 10-15% 15-20% Percentage(%) 70 40 60 Percentage (%) 30 50 20 40 10 30 0 20 tiff2rgba qsort Jpeg Basicmath susan Lame blowfish bitcount rijndael tiffmedian 10 0 RCA CLA CSA Adder type Automative Consumer Security Benchmark a) Fig. 6. Weight distribution of input pairs in 32-bit adder based on 30 Mibench benchmarks 0-5% 5-10% 10-15% 15-20% Percentage (%) 25 20 Results show that for selected benchmarks, the variation 15 impacts the performance of the adder and results in deadline 10 violation in addition operation. As the variation impact 5 0 increase, the violated percentage of addition also increases. RCA CLA CSA Results show that the CLA has the worst behavior in front of Adder Type variation in comparison with other adder types. Applying the LPVM approach drastically reduces the deadline violations. b) This happens because CCB blocks which are inserted in the Fig. 7. Deadline violation percentage in different adder architecture adder architecture break the carry chain of vulnerable input a) basic architecture, b) LPVM architecture pairs. Design Example,” IEEE J. Solid-State Circuits, vol. 44, no. 2, pp. 569– The LPVM reduces the variation effect of the adder on the 583, Feb. 2009. system behavior by decreasing the deadline violation [2] P. Varman and K. Mohanram, “High performance reliable variable percentage of the adder. Our proposed approach reduces the latency carry select addition,” in 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012, pp. 1257–1262. variation violation in all adder architectures. The effect of the [3] P. Saxena, “Design of low power and high speed Carry Select Adder proposed approach is different and relates to the adder using Brent Kung adder,” in 2015 International Conference on VLSI architecture. According to results presented in Fig. 7.b, the Systems, Architecture, Technology and Applications (VLSI-SATA), LPVM decrease the violation percentage of RCA, CLA and 2015, pp. 1–6. [4] V. Pudi and K. Sridharan, “Low Complexity Design of Ripple Carry CSA about 70.3%, 59.7% and 67.6%, respectively. In the and Brent–Kung Adders in QCA,” IEEE Trans. Nanotechnol., vol. 11, fourth step of the LPVM approach, the intersection of carry no. 1, pp. 105–119, Jan. 2012. chains of vulnerable input pairs are selected to decrsease the [5] S. Wei, “Residue checker using optimal signed-digit adder tree for overhead of the proposed approach. Therefore, in this step, a error detection of arithmetic circuits,” in TENCON 2014 - 2014 IEEE Region 10 Conference, 2014, pp. 1–6. systematic trade-off appears between power consumption [6] D. Blaauw, K. Chopra, A. Srivastava, and L. Scheffer, “Statistical overhead and variation mitigation. Results show that the Timing Analysis: From Basic Principles to State of the Art,” IEEE LVMP approach acceptably reduces the variation effects and Trans. Comput. Des. Integr. Circuits Syst., vol. 27, no. 4, pp. 589–607, power consumption overhead. Apr. 2008. [7] D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, Nam Sung Results show the LPVM approach reduces the violation Kim, and K. Flautner, “Razor: circuit-level correction of timing errors percentage of the adder architecture in front of variation. It for low-power operation,” IEEE Micro, vol. 24, no. 6, pp. 10–20, Nov. also imposes very low power dissipation and area overhead to 2004. the system. The average power consumption overhead of [8] Brian Greskamp and Josep Torrellas, “Paceline: Improving Single- Thread Performance in Nanoscale CMPs through Core Overclocking,” LPVM adders are respectively 7.3%, 2.1% and 3.1% for in Proceedings of the 16th International Conference on Parallel RCA, CLA and CSA architectures. Architecture and Compilation Techniques, 2007, pp. 213–224. [9] L. Wan and D. Chen, “DynaTune,” in Proceedings of the 2009 VI. CONCLUSION International Conference on Computer-Aided Design - ICCAD ’09, 2009, p. 172. The LVPM approach proposes a new approach to design [10] G. Lucas, S. Cromar, and D. Chen, “FastYield: variation-aware, variation tolerant adder circuits based on their intrinsic layout-driven simultaneous binding and module selection for behavior. This approach reduces carry chain of vulnerable performance yield optimization,” pp. 61–66, Jan. 2009. input pairs. This drastically reduces the effect of variation. [11] J. A. Kumar and S. Vasudevan, “Variation-Conscious Formal Timing Verification in RTL,” in 2011 24th Internatioal Conference on VLSI The proposed approach can reduce the malfunction Design, 2011, pp. 58–63. percentage of the adder up to 70%. The other advantage of [12] M. Kamal, A. Afzali-Kusha, S. Safari, and M. Pedram, “Impact of this approach is that is imposes very low power consumption Process Variations on Speedup and Maximum Achievable Frequency overhead to the adder (up to 7.3%). of Extensible Processors,” ACM J. Emerg. Technol. Comput. Syst., vol. 10, no. 3, pp. 1–25, Apr. 2014. [13] N. Banerjee, S. Chandra, S. Ghosh, S. Dey, A. Raghunathan, and K. VII. FUTURE WORK Roy, “Coping with variations through system-level design,” Proc. The LPVM approach should be extended to design a 22nd Int. Conf. VLSI Des. - Held Jointly with 7th Int. Conf. Embed. Syst., pp. 581–586, 2009. variation mitigate ALU to overcome the variation with low [14] M. Kamal, A. Afzali-Kusha, S. Safari, and M. Pedram, “An power consumption overheads. architecture-level approach for mitigating the impact of process variations on extensible processors,” in DATE ’12 Proceedings of the VIII. REFERENCES Conference on Design, Automation and Test in Europe, 2012, pp. 467– [1] R. Zlatanovici, S. Kao, and B. Nikolic, “Energy-Delay Optimization 472. of 64-Bit Carry-Lookahead Adders With a 240 ps 90 nm CMOS Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.