=Paper=
{{Paper
|id=Vol-1566/paper11
|storemode=property
|title=LPVM: Low-Power Variation-Mitigant Adder Architecture Using Carry Expedition
|pdfUrl=https://ceur-ws.org/Vol-1566/Paper11.pdf
|volume=Vol-1566
|authors=Alireza Namazi,Meisam Abdollahi
|dblpUrl=https://dblp.org/rec/conf/date/NamaziA16
}}
==LPVM: Low-Power Variation-Mitigant Adder Architecture Using Carry Expedition==
41
LPVM: Low-Power Variation-Mitigant Adder
Architecture Using Carry Expedition
Alireza Namazi Meisam Abdollahi
Computer Engineering department Computer Engineering department
Tehran university Tehran university
Tehran, Iran Tehran, Iran
a.namazi@ut.ac.ir meisam.abdolahi@ut.ac.ir
Abstract— Addition is one of the most crucial operation in pipeline stage and use their own error detection and correction
microprocessors which must be performed within a predefined techniques to overcome the variation issue. For example, RL
deadline (critical path). Variation is a phenomenon which [7] uses multiple copy of the output logic and compare them
negatively affects the performance of this operation. This paper to find out if there exists any error in results. The PaceLine
proposes a new Low-Power Variation-Mitigant (LPVM) adder
uses a novel duplication technique based on overclocking
design using intrinsic behavior of addition operation. The LPVM
approach drastically decrease the probability of deadline feature of processors. The DynaTune proposes a circuit level
violation in addition circuit. The basic idea of this paper is to optimization technique to improve circuit behavior by
expedite carry propagation in adder circuits for vulnerable probabilistic analysis of critical gates of the circuits. These
inputs. The LPVM is an input oriented approach which adds a techniques solve the variation problem globally for the worst
simple logic to the adder architecture that only affects the case scenario which may drastically degrade performance.
vulnerable inputs. This approach is applicable for all presented All above mentioned techniques are considered to be
types of adders and improves all high level approaches which general, hence our proposed approach is specially designed
tend to overcome the variation issue. Results show that this for the adder circuits considering their behavior. All circuit-
approach decrease the percentage of violated RCA, CLA and
level techniques should handle variation effects of their
CSA about 70.3%, 59.7% and 67.6% respectively. The LPVM
approach not only reduces variation effects on the adder internal combinational segments, therefore having more
operation from the view point of performance but also it has a tolerable components leads to better performance in these
very negligible impact on the adder power consumption. The techniques. The LPVM imposes significantly less overhead to
average power consumption overhead of the LPVM approach the system using intrinsic characteristics of the adder circuits.
for RCA, CLA and CSA is about 7.3%, 2.1% and 3.1% for RCA, It can be used along with above mentioned high-level
CLA and CSA, respectively. techniques and also can increases their efficiency.
The second category includes statistical approaches [10].
Keywords— Addition, Process Variation, Low Power The [10] proposes a high level approach for presenting a
I. INTRODUCTION variation-aware binding a component selection to maximize
the yield. It uses rebinding and Statistical Static Timing
Addition is one of the most useful and important arithmetic Analysis (SSTA) to evaluate and maximize the performance.
units [1] in microprocessors. Due to its critical role in almost These techniques are also general and do not consider the
all processing elements, there exist several architectures with intrinsic behavior of the circuit in their calculations. To the
the same functionality and different characteristics. best of our knowledge, the proposed LPVM adder is the first
Researchers have been investigated adders from different variation mitigating approach which considers the behavior of
views such as performance[2], power consumption [3][4] and the adder in order to remove the effects of variation on the
reliability [5]. Earliest investigations were focused on the operation of adder considering predefined clock period.
performance [3] and power consumption improvement [1] of In this paper, we propose a novel low power architecture
the adders. for adders to overcome the effects of process variation. This
In recent years, due to drastically decrease in feature sizes architecture can be used along with all high level approaches
of digital system designs, process variation has become the proposed in the literature and can increase their efficiency
major obstacle for system designers and the researchers has because it drastically decreases the adder malfunction
shown massive interest to address the variation effects with probability. This reduces their overheads to the system. The
techniques from device to system level aspects [6]. Process basic idea behind this approach is to expedite carry
variation is the concept of the deviation of manufactures propagation for vulnerable inputs which may violate the
component from nominal designed component. The variation working clock period.
impacts on the performance of a system and makes disorders The rest of this paper is organized as follows. Section II
and violations in their operation. All synchronous systems describes the process variation. Section III describes the
have a rigid timing constraints and all units must perform with motivation of the paper. The proposed approach is presented
predefined delay constraints. Variation is an issue which in Section IV. Experimental results also presented in Section
modifies the delay of operating units stochastically. VII and finally Section VI concludes the paper.
There exists many efforts in the literature to overcome the
effects of the variation in digital circuits. Previous works can II. PROCESS VARIATION
be divided into two major categories. The first one includes There exist many types and sources for variation in deep
high level approaches which try to overcome the variation
sub-micron digital circuits. Two major sources are known for
issue such as Razor Logic (RL) [7], Telescopic Unit (TU),
variation: 1- manufacturing variations, 2- operation-induced
PaceLine Approach [8] and DynaTune Approach (DA) [9]. variations [12]. This paper has concentrated on the first
The RU and TU both add new logics at the end of each category.
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
42
10000 considering all possible input pairs in different adders with
MinMaxDiff
various bit-widths. Results are extracted using Hspice
Percentage (%)
8000
6000 simulations and the variation of the threshold voltage (Vth) is
4000 selected as the main impacting parameter with maximum
2000 deviation range of 20% through Monte Carlo simulations.
0 Simulated results are two folded as presented below:
- Input pairs have different calculation delay based on
128
128
128
8
8
8
16
32
64
16
32
64
16
32
64
their carry propagation pattern.
RCA CLA CSA
- Variation changes the calculation delay of each input
Adder Type
pair and also may change the worst case delay of the
Fig. 1. The percentage of difference between Minimum and adder. These changes may violate the deadline which is
Maximum delay in adders for different bit widths based on input pairs predefined for the adder.
16.00 IV. PROPOSED APPROACH
14.00 Delay
Percentage(%)
12.00 Considering results gathered from input-based variation
10.00
8.00 simulations on different types of adders, it is obvious that
6.00 although variation impacts the delay characteristics of the
4.00
2.00 adder, but it does not have influence on the result of all inputs.
0.00 This paper proposes a simple and low overhead technique to
15-20%
10-15%
10-15%
15-20%
0-5%
0-5%
5-10%
5-10%
overcome the effects of process variation on delay
characteristics of each adder. The Low Power Variation
32-bit 64-bit
Mitigating (LPVM) approach tries to overcome the process
RCA Bit Width
variation in adder circuits by simply taking care of inputs with
calculation time near the critical path of nominal adder. The
Fig. 2. Input pair percentages of RCA with 64 and 128 bit widths LPVM design approach schematic diagram is presented in
categorized base on their calculation time in compare of the worst case
delay Fig. 4. Simulation results showed that inputs with longer carry
propagation chains are more susceptible to process variation.
Nanoscale IC manufacturing imperfections lead to The LPVM design approach inserts simple combinational
variation in design parameters such as length (L), width (W), logics to the adder architecture in order to break the long carry
oxide thickness (Tox) and threshold voltage(Vth) [13]. These propagation chain. The inserted blocks decrease the
fluctuations in design of Nano-scale (<<90nm) circuits results calculation time of the adder only for selected inputs and is
in many side-effects on the operation of designs in called Carry Chain Breaker (CCB) block. The proposed
comparison with nominal design parameters. This paper approach has five steps which is described as follows:
focuses on the voltage threshold fluctuations because it is the
A. Carry chain Determination (Step 1)
most affecting parameter [14] and it changes the expected
performance and power consumption of the systems. The first step is to determine Carry Chains (CC) of each
However, the proposed approach is applicable for all sources input pair which is called CC (S , F ) . Parameters S and F
of variation. show start and finish bits of the CC, respectively. This input
pair has three CCs. All carry chains of an input pair ( A, B)
III. MOTIVATION
reside in a set called Chain Set (a.k.a CS ( A, B) ).
The addition performance completely depends on its
inputs. Carry chain is consisted of multiple consecutive bit B. Susceptibility Analysis (Step 2)
positions in an adder architecture which their carry-out The second step divides input pairs into non-overlapped
depends on their input carry. Considering simulation results categories considering their calculation delay. Each input pair
which is depicted in Fig. 1, it can be concluded that the adder has a weight ( W ( A, B) ) based on its carry propagation pattern.
can calculate the addition result of its inputs with the delay The weight of each input pair only depends to the weight of
less than its critical path considering their inputs. Results its longest CC and is calculated using (1).
show that inputs in Ripple Carry Adder (RCA), Carry Look-
W CS ( A, B) MAX Fcc Scc 1 (1)
ahead (CLA) and Carry Select Adder (CSA) with 8, 16, 32, ccCS
64 and 128 bits, directly affect the calculation delay of the
adder. The longest delay relates to inputs with longest carry 80
chain. As adder circuits have different addition delay based Min Variaion MAX Variation
70
on their inputs, different input pairs are categorized based on 60
Percentage (%)
their calculation delay in comparison with the longest 50
addition delay. Results depicted in Fig. 2 show that for 32-bit 40
RCA adder delay of about 30% of input pairs is up to 20% 30
20
less than the longest delay. Besides, about 4.1% of input pairs 10
have about 10% of the longest delay. 0
Process variation results in delay changes in adder circuits. 8 16 32 64 128 8 16 32 64 128 8 16 32 64 128
According to results depicted in Fig. 3, it can be concluded RCA CLA CSA
that variation effects completely depend on the architecture Adder Type
and input pair characteristics. This figure shows the minimum
Fig. 3. Minimum and Maximum delay deviation percentage of
and maximum deviation percentage from nominal delay calculation delay for input pairs in comparison with nominal delay
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
43
Algorithm 1: LPVM Categorization
Carry Chain determination Step 1
Finds the best calculation threshold
Susceptibility Anaysis Step 2
Inputs : n (adder bit width)
For All Possible Input Pair
i=n-1
A Convert _ binary (i ); ½
°
Step 3 B Convert _ binary ( j ); ¾ Step1
i=i-1
Select Random Input Pair CS Carry _ Chain _ Extraction( A, B );°¿
From category G(i)
CCL Longest _ Chain(CS ); ½
¾ Step 2
parameters
Variation
Test Input Under Process Put pair ( A , B) inG (i ) wherei CCL ¿
Variation
End
Add G(i) elements
No
to Selected Input Meet CF ?
Pairs
carry chains in susceptible input pairs to decrease or
Yes overcome the effects of variation on the adder circuit
operation. The internal structure of the CCB for detecting
Intersection Categorizing
Step 4 continues propagation pattern is shown in Fig. 4. The depicted
CCB block is designed for CC (i, j ) . This block is consisted
CCB Insertion Step 5
of parallel operating 2-input XOR gates which are connected
Fig. 4. LPVM approach to the adder inputs to detect the propagation pattern. This
block connects the carry of the adder in ith bit to the carry in
The weight of each input pair shows the calculation delay the jth position. When propagation pattern is detected, a 2-to-
of the input pair in comparison with the longest pairs. All 1 multiplexer selects Ci and replaces it with C j 1 carry. The
input pairs with the same weight reside in the same category.
CCB block has very low overhead because the XOR gates
The category G(i) contains all possible (existing) input pairs
already resides in the basic architecture of the adders. The
of the first step which their weight is equal to i. Step 1 and CCBs operate in parallel with the adder and does not increase
Step 2 are performed based on the LPVM categorization its latency. The number of CCBs and their length completely
algorithm which is depicted in Algorithm 1. depends on selected input vectors. The CCB architecture has
C. Weight Selection (Step 3) overlapped with full adders. The XOR section of the CCB is
The third step is to find out vulnerable weighted categories. generated in all adder circuits which reduces the overhead of
It is an iterative approach as depicted in Fig. 4. This step starts the CCBs.
from the category with highest weight. It randomly selects an V. EXPERIMENTAL RESULTS
input pair. The selected input is examined on the adder
architecture to find out its behavior under occurrence of The experimental results consists of two different phases.
process variation. In this step, the input is evaluated under The first phase uses the LEON3 processor (32-bit). The
different variation conditions based on designer parameters. second phase relates to variation investigation of adder
Although, this paper selects threshold voltage as the main circuit. Simulations of this phase are performed with Hspice
affecting parameter, other affecting parameters can be used to simulator and the technology size in considered as 32nm.
evaluate the variation effect. Afterwards, addition delay in all Some applications of Mibench benchmark suit are executed
executed samples are gathered and checked if they have met on the LEON3 processor and the input entries of the adder
the deadline or not. At this point, a new affecting parameter unit is gathered and evaluated. The weight distribution of
will be inserted by the designer which is called Certainty input pairs is depicted in Fig. 6. Results show that in real
Factor (CF). This CF shows the acceptable percentage of application executions, we may not have all possible input
violations in calculated results. For example if a designer operands. Therefore exhaustive exploration of input pairs for
selects 100% for the CF, this means that the G(i) is acceptable the LPVM approach is no longer necessary.
only if all its simulated samples from selected input pair meets Applying LPVM approach on the adder types presented in
the deadline. Otherwise the G(i) is not acceptable and should Section III shows that variation phenomena based on
be added to the vulnerable inputs. When the algorithm reaches threshold voltage (in the range of 20%) different adder
to a category which meets the defined CF, it does not evaluate architectures demonstrate different operation violations. Fig.
the rest of categories because their calculation latency is 7.a shows the percentage of operation violations in different
absolutely less than evaluated category. By reaching this adder architectures for various threshold variation ranges.
point, Step 3 iterations stops.
D. Intersection Categorizing (Step 4)
The third step is to divide CCs of selected input pairs into A i Bi Ai+1 Bi+1 Aj-1 Bj-1 Aj Bj
non-overlapped categories. All CCs which have any
intersection with other CCs will be put in a same category.
The fourth step is to find most covering subsets from CCs
reside in each category. This step finds the most overlapping
chain between different vulnerable input pairs. Therefore, the
overhead of the CCB block insertion reduces.
Cj-1 0
Cj+1
Ci-1 1
E. CCB Block Insertion (Step 5)
The last step is inserting the CCB units into the adder Fig. 5 General CCB Architecture
architecture. This simple block breaks all or a segment of
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
44
60 0-8 9-16 17-24 25-32 80
50 0-5% 5-10% 10-15% 15-20%
Percentage(%)
70
40
60
Percentage (%)
30
50
20
40
10
30
0
20
tiff2rgba
qsort
Jpeg
Basicmath
susan
Lame
blowfish
bitcount
rijndael
tiffmedian
10
0
RCA CLA CSA
Adder type
Automative
Consumer Security
Benchmark a)
Fig. 6. Weight distribution of input pairs in 32-bit adder based on 30
Mibench benchmarks 0-5% 5-10% 10-15% 15-20%
Percentage (%)
25
20
Results show that for selected benchmarks, the variation 15
impacts the performance of the adder and results in deadline 10
violation in addition operation. As the variation impact 5
0
increase, the violated percentage of addition also increases.
RCA CLA CSA
Results show that the CLA has the worst behavior in front of
Adder Type
variation in comparison with other adder types. Applying the
LPVM approach drastically reduces the deadline violations. b)
This happens because CCB blocks which are inserted in the Fig. 7. Deadline violation percentage in different adder architecture
adder architecture break the carry chain of vulnerable input a) basic architecture, b) LPVM architecture
pairs. Design Example,” IEEE J. Solid-State Circuits, vol. 44, no. 2, pp. 569–
The LPVM reduces the variation effect of the adder on the 583, Feb. 2009.
system behavior by decreasing the deadline violation [2] P. Varman and K. Mohanram, “High performance reliable variable
percentage of the adder. Our proposed approach reduces the latency carry select addition,” in 2012 Design, Automation & Test in
Europe Conference & Exhibition (DATE), 2012, pp. 1257–1262.
variation violation in all adder architectures. The effect of the [3] P. Saxena, “Design of low power and high speed Carry Select Adder
proposed approach is different and relates to the adder using Brent Kung adder,” in 2015 International Conference on VLSI
architecture. According to results presented in Fig. 7.b, the Systems, Architecture, Technology and Applications (VLSI-SATA),
LPVM decrease the violation percentage of RCA, CLA and 2015, pp. 1–6.
[4] V. Pudi and K. Sridharan, “Low Complexity Design of Ripple Carry
CSA about 70.3%, 59.7% and 67.6%, respectively. In the and Brent–Kung Adders in QCA,” IEEE Trans. Nanotechnol., vol. 11,
fourth step of the LPVM approach, the intersection of carry no. 1, pp. 105–119, Jan. 2012.
chains of vulnerable input pairs are selected to decrsease the [5] S. Wei, “Residue checker using optimal signed-digit adder tree for
overhead of the proposed approach. Therefore, in this step, a error detection of arithmetic circuits,” in TENCON 2014 - 2014 IEEE
Region 10 Conference, 2014, pp. 1–6.
systematic trade-off appears between power consumption [6] D. Blaauw, K. Chopra, A. Srivastava, and L. Scheffer, “Statistical
overhead and variation mitigation. Results show that the Timing Analysis: From Basic Principles to State of the Art,” IEEE
LVMP approach acceptably reduces the variation effects and Trans. Comput. Des. Integr. Circuits Syst., vol. 27, no. 4, pp. 589–607,
power consumption overhead. Apr. 2008.
[7] D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge, Nam Sung
Results show the LPVM approach reduces the violation Kim, and K. Flautner, “Razor: circuit-level correction of timing errors
percentage of the adder architecture in front of variation. It for low-power operation,” IEEE Micro, vol. 24, no. 6, pp. 10–20, Nov.
also imposes very low power dissipation and area overhead to 2004.
the system. The average power consumption overhead of [8] Brian Greskamp and Josep Torrellas, “Paceline: Improving Single-
Thread Performance in Nanoscale CMPs through Core Overclocking,”
LPVM adders are respectively 7.3%, 2.1% and 3.1% for in Proceedings of the 16th International Conference on Parallel
RCA, CLA and CSA architectures. Architecture and Compilation Techniques, 2007, pp. 213–224.
[9] L. Wan and D. Chen, “DynaTune,” in Proceedings of the 2009
VI. CONCLUSION International Conference on Computer-Aided Design - ICCAD ’09,
2009, p. 172.
The LVPM approach proposes a new approach to design [10] G. Lucas, S. Cromar, and D. Chen, “FastYield: variation-aware,
variation tolerant adder circuits based on their intrinsic layout-driven simultaneous binding and module selection for
behavior. This approach reduces carry chain of vulnerable performance yield optimization,” pp. 61–66, Jan. 2009.
input pairs. This drastically reduces the effect of variation. [11] J. A. Kumar and S. Vasudevan, “Variation-Conscious Formal Timing
Verification in RTL,” in 2011 24th Internatioal Conference on VLSI
The proposed approach can reduce the malfunction Design, 2011, pp. 58–63.
percentage of the adder up to 70%. The other advantage of [12] M. Kamal, A. Afzali-Kusha, S. Safari, and M. Pedram, “Impact of
this approach is that is imposes very low power consumption Process Variations on Speedup and Maximum Achievable Frequency
overhead to the adder (up to 7.3%). of Extensible Processors,” ACM J. Emerg. Technol. Comput. Syst.,
vol. 10, no. 3, pp. 1–25, Apr. 2014.
[13] N. Banerjee, S. Chandra, S. Ghosh, S. Dey, A. Raghunathan, and K.
VII. FUTURE WORK Roy, “Coping with variations through system-level design,” Proc.
The LPVM approach should be extended to design a 22nd Int. Conf. VLSI Des. - Held Jointly with 7th Int. Conf. Embed.
Syst., pp. 581–586, 2009.
variation mitigate ALU to overcome the variation with low
[14] M. Kamal, A. Afzali-Kusha, S. Safari, and M. Pedram, “An
power consumption overheads. architecture-level approach for mitigating the impact of process
variations on extensible processors,” in DATE ’12 Proceedings of the
VIII. REFERENCES Conference on Design, Automation and Test in Europe, 2012, pp. 467–
[1] R. Zlatanovici, S. Kao, and B. Nikolic, “Energy-Delay Optimization 472.
of 64-Bit Carry-Lookahead Adders With a 240 ps 90 nm CMOS
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.