41


       LPVM: Low-Power Variation-Mitigant Adder
          Architecture Using Carry Expedition
                       Alireza Namazi                                                 Meisam Abdollahi
                Computer Engineering department                                 Computer Engineering department
                      Tehran university                                                Tehran university
                         Tehran, Iran                                                    Tehran, Iran
                      a.namazi@ut.ac.ir                                            meisam.abdolahi@ut.ac.ir

Abstract— Addition is one of the most crucial operation in          pipeline stage and use their own error detection and correction
microprocessors which must be performed within a predefined         techniques to overcome the variation issue. For example, RL
deadline (critical path). Variation is a phenomenon which           [7] uses multiple copy of the output logic and compare them
negatively affects the performance of this operation. This paper    to find out if there exists any error in results. The PaceLine
proposes a new Low-Power Variation-Mitigant (LPVM) adder
                                                                    uses a novel duplication technique based on overclocking
design using intrinsic behavior of addition operation. The LPVM
approach drastically decrease the probability of deadline           feature of processors. The DynaTune proposes a circuit level
violation in addition circuit. The basic idea of this paper is to   optimization technique to improve circuit behavior by
expedite carry propagation in adder circuits for vulnerable         probabilistic analysis of critical gates of the circuits. These
inputs. The LPVM is an input oriented approach which adds a         techniques solve the variation problem globally for the worst
simple logic to the adder architecture that only affects the        case scenario which may drastically degrade performance.
vulnerable inputs. This approach is applicable for all presented       All above mentioned techniques are considered to be
types of adders and improves all high level approaches which        general, hence our proposed approach is specially designed
tend to overcome the variation issue. Results show that this        for the adder circuits considering their behavior. All circuit-
approach decrease the percentage of violated RCA, CLA and
                                                                    level techniques should handle variation effects of their
CSA about 70.3%, 59.7% and 67.6% respectively. The LPVM
approach not only reduces variation effects on the adder            internal combinational segments, therefore having more
operation from the view point of performance but also it has a      tolerable components leads to better performance in these
very negligible impact on the adder power consumption. The          techniques. The LPVM imposes significantly less overhead to
average power consumption overhead of the LPVM approach             the system using intrinsic characteristics of the adder circuits.
for RCA, CLA and CSA is about 7.3%, 2.1% and 3.1% for RCA,          It can be used along with above mentioned high-level
CLA and CSA, respectively.                                          techniques and also can increases their efficiency.
                                                                       The second category includes statistical approaches [10].
   Keywords— Addition, Process Variation, Low Power                 The [10] proposes a high level approach for presenting a
                      I.    INTRODUCTION                            variation-aware binding a component selection to maximize
                                                                    the yield. It uses rebinding and Statistical Static Timing
   Addition is one of the most useful and important arithmetic      Analysis (SSTA) to evaluate and maximize the performance.
units [1] in microprocessors. Due to its critical role in almost    These techniques are also general and do not consider the
all processing elements, there exist several architectures with     intrinsic behavior of the circuit in their calculations. To the
the same functionality and different characteristics.               best of our knowledge, the proposed LPVM adder is the first
Researchers have been investigated adders from different            variation mitigating approach which considers the behavior of
views such as performance[2], power consumption [3][4] and          the adder in order to remove the effects of variation on the
reliability [5]. Earliest investigations were focused on the        operation of adder considering predefined clock period.
performance [3] and power consumption improvement [1] of               In this paper, we propose a novel low power architecture
the adders.                                                         for adders to overcome the effects of process variation. This
   In recent years, due to drastically decrease in feature sizes    architecture can be used along with all high level approaches
of digital system designs, process variation has become the         proposed in the literature and can increase their efficiency
major obstacle for system designers and the researchers has         because it drastically decreases the adder malfunction
shown massive interest to address the variation effects with        probability. This reduces their overheads to the system. The
techniques from device to system level aspects [6]. Process         basic idea behind this approach is to expedite carry
variation is the concept of the deviation of manufactures           propagation for vulnerable inputs which may violate the
component from nominal designed component. The variation            working clock period.
impacts on the performance of a system and makes disorders             The rest of this paper is organized as follows. Section II
and violations in their operation. All synchronous systems          describes the process variation. Section III describes the
have a rigid timing constraints and all units must perform with     motivation of the paper. The proposed approach is presented
predefined delay constraints. Variation is an issue which           in Section IV. Experimental results also presented in Section
modifies the delay of operating units stochastically.               VII and finally Section VI concludes the paper.
   There exists many efforts in the literature to overcome the
effects of the variation in digital circuits. Previous works can                       II.   PROCESS VARIATION
be divided into two major categories. The first one includes           There exist many types and sources for variation in deep
high level approaches which try to overcome the variation
                                                                    sub-micron digital circuits. Two major sources are known for
issue such as Razor Logic (RL) [7], Telescopic Unit (TU),
                                                                    variation: 1- manufacturing variations, 2- operation-induced
PaceLine Approach [8] and DynaTune Approach (DA) [9].               variations [12]. This paper has concentrated on the first
The RU and TU both add new logics at the end of each                category.
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany

Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
                                                                                                                                                                                      42

                        10000                                                                         considering all possible input pairs in different adders with
                                                                   MinMaxDiff
                                                                                                      various bit-widths. Results are extracted using Hspice
       Percentage (%)
                         8000
                         6000                                                                         simulations and the variation of the threshold voltage (Vth) is
                         4000                                                                         selected as the main impacting parameter with maximum
                         2000                                                                         deviation range of 20% through Monte Carlo simulations.
                            0                                                                         Simulated results are two folded as presented below:
                                                                                                       - Input pairs have different calculation delay based on

                                128


                                128


                                128
                                  8
                                  8


                                  8
                                 16
                                 32
                                 64


                                 16
                                 32
                                 64


                                 16
                                 32
                                 64
                                                                                                            their carry propagation pattern.
                                       RCA                 CLA                      CSA
                                                                                                       - Variation changes the calculation delay of each input
                                                         Adder Type
                                                                                                            pair and also may change the worst case delay of the
     Fig. 1. The percentage of difference between Minimum and                                               adder. These changes may violate the deadline which is
  Maximum delay in adders for different bit widths based on input pairs                                     predefined for the adder.
                        16.00                                                                                                       IV.     PROPOSED APPROACH
                        14.00                                      Delay
       Percentage(%)


                        12.00                                                                            Considering results gathered from input-based variation
                        10.00
                         8.00                                                                         simulations on different types of adders, it is obvious that
                         6.00                                                                         although variation impacts the delay characteristics of the
                         4.00
                         2.00                                                                         adder, but it does not have influence on the result of all inputs.
                         0.00                                                                         This paper proposes a simple and low overhead technique to
                                                          15-20%
                                                10-15%


                                                                                    10-15%

                                                                                             15-20%
                                0-5%


                                                                     0-5%
                                        5-10%


                                                                            5-10%


                                                                                                      overcome the effects of process variation on delay
                                                                                                      characteristics of each adder. The Low Power Variation
                                           32-bit                              64-bit
                                                                                                      Mitigating (LPVM) approach tries to overcome the process
                                                     RCA Bit Width
                                                                                                      variation in adder circuits by simply taking care of inputs with
                                                                                                      calculation time near the critical path of nominal adder. The
     Fig. 2. Input pair percentages of RCA with 64 and 128 bit widths                                 LPVM design approach schematic diagram is presented in
  categorized base on their calculation time in compare of the worst case
                                    delay                                                             Fig. 4. Simulation results showed that inputs with longer carry
                                                                                                      propagation chains are more susceptible to process variation.
   Nanoscale IC manufacturing imperfections lead to                                                   The LPVM design approach inserts simple combinational
variation in design parameters such as length (L), width (W),                                         logics to the adder architecture in order to break the long carry
oxide thickness (Tox) and threshold voltage(Vth) [13]. These                                          propagation chain. The inserted blocks decrease the
fluctuations in design of Nano-scale (<<90nm) circuits results                                        calculation time of the adder only for selected inputs and is
in many side-effects on the operation of designs in                                                   called Carry Chain Breaker (CCB) block. The proposed
comparison with nominal design parameters. This paper                                                 approach has five steps which is described as follows:
focuses on the voltage threshold fluctuations because it is the
                                                                                                      A. Carry chain Determination (Step 1)
most affecting parameter [14] and it changes the expected
performance and power consumption of the systems.                                                        The first step is to determine Carry Chains (CC) of each
However, the proposed approach is applicable for all sources                                          input pair which is called CC (S , F ) . Parameters S and F
of variation.                                                                                         show start and finish bits of the CC, respectively. This input
                                                                                                      pair has three CCs. All carry chains of an input pair ( A, B)
                                       III.     MOTIVATION
                                                                                                      reside in a set called Chain Set (a.k.a CS ( A, B) ).
   The addition performance completely depends on its
inputs. Carry chain is consisted of multiple consecutive bit                                          B. Susceptibility Analysis (Step 2)
positions in an adder architecture which their carry-out                                                  The second step divides input pairs into non-overlapped
depends on their input carry. Considering simulation results                                          categories considering their calculation delay. Each input pair
which is depicted in Fig. 1, it can be concluded that the adder                                       has a weight ( W ( A, B) ) based on its carry propagation pattern.
can calculate the addition result of its inputs with the delay                                        The weight of each input pair only depends to the weight of
less than its critical path considering their inputs. Results                                         its longest CC and is calculated using (1).
show that inputs in Ripple Carry Adder (RCA), Carry Look-
                                                                                                                     W CS ( A, B)     MAX Fcc  Scc  1                      (1)
ahead (CLA) and Carry Select Adder (CSA) with 8, 16, 32,                                                                                      ccCS
64 and 128 bits, directly affect the calculation delay of the
adder. The longest delay relates to inputs with longest carry                                                            80
chain. As adder circuits have different addition delay based                                                                              Min Variaion    MAX Variation
                                                                                                                         70
on their inputs, different input pairs are categorized based on                                                          60
                                                                                                        Percentage (%)


their calculation delay in comparison with the longest                                                                   50
addition delay. Results depicted in Fig. 2 show that for 32-bit                                                          40
RCA adder delay of about 30% of input pairs is up to 20%                                                                 30
                                                                                                                         20
less than the longest delay. Besides, about 4.1% of input pairs                                                          10
have about 10% of the longest delay.                                                                                      0
   Process variation results in delay changes in adder circuits.                                                              8 16 32 64 128 8 16 32 64 128 8 16 32 64 128
According to results depicted in Fig. 3, it can be concluded                                                                      RCA              CLA                 CSA
that variation effects completely depend on the architecture                                                                                     Adder Type
and input pair characteristics. This figure shows the minimum
                                                                                                                   Fig. 3. Minimum and Maximum delay deviation percentage of
and maximum deviation percentage from nominal delay                                                              calculation delay for input pairs in comparison with nominal delay
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany

Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
                                                                                                                                             43
                                                                                     Algorithm 1: LPVM Categorization
                                     Carry Chain determination   Step 1
                                                                                     Finds the best calculation threshold
                                       Susceptibility Anaysis    Step 2
                                                                                     Inputs : n (adder bit width)
                                                                                      For All Possible Input Pair
                                               i=n-1
                                                                                       A Convert _ binary (i );                 ½
                                                                                                                                °
                                                                 Step 3                B Convert _ binary ( j );                ¾ Step1
             i=i-1
                                     Select Random Input Pair                          CS Carry _ Chain _ Extraction( A, B );°¿
                                        From category G(i)
                                                                                       CCL Longest _ Chain(CS );             ½
                                                                                                                             ¾ Step 2


                                                                  parameters
                                                                   Variation
                                     Test Input Under Process                          Put pair ( A , B) inG (i ) wherei CCL ¿
                                             Variation
                                                                                      End
       Add G(i) elements
                                No
       to Selected Input                    Meet CF ?
             Pairs
                                                                               carry chains in susceptible input pairs to decrease or
                                                       Yes                     overcome the effects of variation on the adder circuit
                                                                               operation. The internal structure of the CCB for detecting
                                     Intersection Categorizing
                                                                 Step 4        continues propagation pattern is shown in Fig. 4. The depicted
                                                                               CCB block is designed for CC (i, j ) . This block is consisted
                                          CCB Insertion          Step 5
                                                                               of parallel operating 2-input XOR gates which are connected
                           Fig. 4. LPVM approach                               to the adder inputs to detect the propagation pattern. This
                                                                               block connects the carry of the adder in ith bit to the carry in
   The weight of each input pair shows the calculation delay                   the jth position. When propagation pattern is detected, a 2-to-
of the input pair in comparison with the longest pairs. All                    1 multiplexer selects Ci and replaces it with C j 1 carry. The
input pairs with the same weight reside in the same category.
                                                                               CCB block has very low overhead because the XOR gates
The category G(i) contains all possible (existing) input pairs
                                                                               already resides in the basic architecture of the adders. The
of the first step which their weight is equal to i. Step 1 and                 CCBs operate in parallel with the adder and does not increase
Step 2 are performed based on the LPVM categorization                          its latency. The number of CCBs and their length completely
algorithm which is depicted in Algorithm 1.                                    depends on selected input vectors. The CCB architecture has
C. Weight Selection (Step 3)                                                   overlapped with full adders. The XOR section of the CCB is
    The third step is to find out vulnerable weighted categories.              generated in all adder circuits which reduces the overhead of
It is an iterative approach as depicted in Fig. 4. This step starts            the CCBs.
from the category with highest weight. It randomly selects an                                   V.       EXPERIMENTAL RESULTS
input pair. The selected input is examined on the adder
architecture to find out its behavior under occurrence of                         The experimental results consists of two different phases.
process variation. In this step, the input is evaluated under                  The first phase uses the LEON3 processor (32-bit). The
different variation conditions based on designer parameters.                   second phase relates to variation investigation of adder
Although, this paper selects threshold voltage as the main                     circuit. Simulations of this phase are performed with Hspice
affecting parameter, other affecting parameters can be used to                 simulator and the technology size in considered as 32nm.
evaluate the variation effect. Afterwards, addition delay in all               Some applications of Mibench benchmark suit are executed
executed samples are gathered and checked if they have met                     on the LEON3 processor and the input entries of the adder
the deadline or not. At this point, a new affecting parameter                  unit is gathered and evaluated. The weight distribution of
will be inserted by the designer which is called Certainty                     input pairs is depicted in Fig. 6. Results show that in real
Factor (CF). This CF shows the acceptable percentage of                        application executions, we may not have all possible input
violations in calculated results. For example if a designer                    operands. Therefore exhaustive exploration of input pairs for
selects 100% for the CF, this means that the G(i) is acceptable                the LPVM approach is no longer necessary.
only if all its simulated samples from selected input pair meets                  Applying LPVM approach on the adder types presented in
the deadline. Otherwise the G(i) is not acceptable and should                  Section III shows that variation phenomena based on
be added to the vulnerable inputs. When the algorithm reaches                  threshold voltage (in the range of 20%) different adder
to a category which meets the defined CF, it does not evaluate                 architectures demonstrate different operation violations. Fig.
the rest of categories because their calculation latency is                    7.a shows the percentage of operation violations in different
absolutely less than evaluated category. By reaching this                      adder architectures for various threshold variation ranges.
point, Step 3 iterations stops.
D. Intersection Categorizing (Step 4)
    The third step is to divide CCs of selected input pairs into                                     A i Bi   Ai+1 Bi+1 Aj-1 Bj-1 Aj Bj


non-overlapped categories. All CCs which have any
intersection with other CCs will be put in a same category.
The fourth step is to find most covering subsets from CCs
reside in each category. This step finds the most overlapping
chain between different vulnerable input pairs. Therefore, the
overhead of the CCB block insertion reduces.
                                                                                                     Cj-1                  0
                                                                                                                                      Cj+1
                                                                                                     Ci-1                 1
E. CCB Block Insertion (Step 5)
   The last step is inserting the CCB units into the adder                                       Fig. 5 General CCB Architecture
architecture. This simple block breaks all or a segment of
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany

Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
                                                                                                                                                                                                                      44

                         60                                    0-8      9-16         17-24                 25-32                                              80
                         50                                                                                                                                          0-5%    5-10%       10-15%      15-20%
         Percentage(%)
                                                                                                                                                              70
                         40
                                                                                                                                                              60


                                                                                                                                             Percentage (%)
                         30
                                                                                                                                                              50
                         20
                                                                                                                                                              40
                         10
                                                                                                                                                              30
                          0
                                                                                                                                                              20


                                                                                                   tiff2rgba
                                                       qsort


                                                                       Jpeg
                              Basicmath


                                                               susan


                                                                              Lame


                                                                                                               blowfish
                                           bitcount


                                                                                                                          rijndael
                                                                                      tiffmedian
                                                                                                                                                              10
                                                                                                                                                               0
                                                                                                                                                                   RCA            CLA                   CSA
                                                                                                                                                                                Adder type
                                          Automative
                                              Consumer          Security
                                        Benchmark                                                                                           a)
      Fig. 6. Weight distribution of input pairs in 32-bit adder based on                                                                                     30
                            Mibench benchmarks                                                                                                                       0-5%    5-10%     10-15%       15-20%


                                                                                                                                             Percentage (%)
                                                                                                                                                              25
                                                                                                                                                              20
   Results show that for selected benchmarks, the variation                                                                                                   15
impacts the performance of the adder and results in deadline                                                                                                  10
violation in addition operation. As the variation impact                                                                                                       5
                                                                                                                                                               0
increase, the violated percentage of addition also increases.
                                                                                                                                                                   RCA            CLA                CSA
Results show that the CLA has the worst behavior in front of
                                                                                                                                                                               Adder Type
variation in comparison with other adder types. Applying the
LPVM approach drastically reduces the deadline violations.                                                                               b)
This happens because CCB blocks which are inserted in the                                                                              Fig. 7. Deadline violation percentage in different adder architecture
adder architecture break the carry chain of vulnerable input                                                                                      a) basic architecture, b) LPVM architecture
pairs.                                                                                                                                           Design Example,” IEEE J. Solid-State Circuits, vol. 44, no. 2, pp. 569–
   The LPVM reduces the variation effect of the adder on the                                                                                     583, Feb. 2009.
system behavior by decreasing the deadline violation                                                                                 [2]         P. Varman and K. Mohanram, “High performance reliable variable
percentage of the adder. Our proposed approach reduces the                                                                                       latency carry select addition,” in 2012 Design, Automation & Test in
                                                                                                                                                 Europe Conference & Exhibition (DATE), 2012, pp. 1257–1262.
variation violation in all adder architectures. The effect of the                                                                    [3]         P. Saxena, “Design of low power and high speed Carry Select Adder
proposed approach is different and relates to the adder                                                                                          using Brent Kung adder,” in 2015 International Conference on VLSI
architecture. According to results presented in Fig. 7.b, the                                                                                    Systems, Architecture, Technology and Applications (VLSI-SATA),
LPVM decrease the violation percentage of RCA, CLA and                                                                                           2015, pp. 1–6.
                                                                                                                                     [4]         V. Pudi and K. Sridharan, “Low Complexity Design of Ripple Carry
CSA about 70.3%, 59.7% and 67.6%, respectively. In the                                                                                           and Brent–Kung Adders in QCA,” IEEE Trans. Nanotechnol., vol. 11,
fourth step of the LPVM approach, the intersection of carry                                                                                      no. 1, pp. 105–119, Jan. 2012.
chains of vulnerable input pairs are selected to decrsease the                                                                       [5]         S. Wei, “Residue checker using optimal signed-digit adder tree for
overhead of the proposed approach. Therefore, in this step, a                                                                                    error detection of arithmetic circuits,” in TENCON 2014 - 2014 IEEE
                                                                                                                                                 Region 10 Conference, 2014, pp. 1–6.
systematic trade-off appears between power consumption                                                                               [6]         D. Blaauw, K. Chopra, A. Srivastava, and L. Scheffer, “Statistical
overhead and variation mitigation. Results show that the                                                                                         Timing Analysis: From Basic Principles to State of the Art,” IEEE
LVMP approach acceptably reduces the variation effects and                                                                                       Trans. Comput. Des. Integr. Circuits Syst., vol. 27, no. 4, pp. 589–607,
power consumption overhead.                                                                                                                      Apr. 2008.
                                                                                                                                     [7]         D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, Nam Sung
   Results show the LPVM approach reduces the violation                                                                                          Kim, and K. Flautner, “Razor: circuit-level correction of timing errors
percentage of the adder architecture in front of variation. It                                                                                   for low-power operation,” IEEE Micro, vol. 24, no. 6, pp. 10–20, Nov.
also imposes very low power dissipation and area overhead to                                                                                     2004.
the system. The average power consumption overhead of                                                                                [8]         Brian Greskamp and Josep Torrellas, “Paceline: Improving Single-
                                                                                                                                                 Thread Performance in Nanoscale CMPs through Core Overclocking,”
LPVM adders are respectively 7.3%, 2.1% and 3.1% for                                                                                             in Proceedings of the 16th International Conference on Parallel
RCA, CLA and CSA architectures.                                                                                                                  Architecture and Compilation Techniques, 2007, pp. 213–224.
                                                                                                                                     [9]         L. Wan and D. Chen, “DynaTune,” in Proceedings of the 2009
                                                      VI.       CONCLUSION                                                                       International Conference on Computer-Aided Design - ICCAD ’09,
                                                                                                                                                 2009, p. 172.
   The LVPM approach proposes a new approach to design                                                                               [10]        G. Lucas, S. Cromar, and D. Chen, “FastYield: variation-aware,
variation tolerant adder circuits based on their intrinsic                                                                                       layout-driven simultaneous binding and module selection for
behavior. This approach reduces carry chain of vulnerable                                                                                        performance yield optimization,” pp. 61–66, Jan. 2009.
input pairs. This drastically reduces the effect of variation.                                                                       [11]        J. A. Kumar and S. Vasudevan, “Variation-Conscious Formal Timing
                                                                                                                                                 Verification in RTL,” in 2011 24th Internatioal Conference on VLSI
The proposed approach can reduce the malfunction                                                                                                 Design, 2011, pp. 58–63.
percentage of the adder up to 70%. The other advantage of                                                                            [12]        M. Kamal, A. Afzali-Kusha, S. Safari, and M. Pedram, “Impact of
this approach is that is imposes very low power consumption                                                                                      Process Variations on Speedup and Maximum Achievable Frequency
overhead to the adder (up to 7.3%).                                                                                                              of Extensible Processors,” ACM J. Emerg. Technol. Comput. Syst.,
                                                                                                                                                 vol. 10, no. 3, pp. 1–25, Apr. 2014.
                                                                                                                                     [13]        N. Banerjee, S. Chandra, S. Ghosh, S. Dey, A. Raghunathan, and K.
                                                 VII. FUTURE WORK                                                                                Roy, “Coping with variations through system-level design,” Proc.
  The LPVM approach should be extended to design a                                                                                               22nd Int. Conf. VLSI Des. - Held Jointly with 7th Int. Conf. Embed.
                                                                                                                                                 Syst., pp. 581–586, 2009.
variation mitigate ALU to overcome the variation with low
                                                                                                                                     [14]        M. Kamal, A. Afzali-Kusha, S. Safari, and M. Pedram, “An
power consumption overheads.                                                                                                                     architecture-level approach for mitigating the impact of process
                                                                                                                                                 variations on extensible processors,” in DATE ’12 Proceedings of the
                                                      VIII. REFERENCES                                                                           Conference on Design, Automation and Test in Europe, 2012, pp. 467–
[1]    R. Zlatanovici, S. Kao, and B. Nikolic, “Energy-Delay Optimization                                                                        472.
       of 64-Bit Carry-Lookahead Adders With a 240 ps 90 nm CMOS
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany

Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.