<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LPVM: Low-Power Variation-Mitigant Adder Architecture Using Carry Expedition</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alireza Namazi</string-name>
          <email>a.namazi@ut.ac.ir</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Meisam Abdollahi</string-name>
          <email>meisam.abdolahi@ut.ac.ir</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Engineering department, Tehran university</institution>
          ,
          <addr-line>Tehran</addr-line>
          ,
          <country country="IR">Iran</country>
        </aff>
      </contrib-group>
      <fpage>25</fpage>
      <lpage>32</lpage>
      <abstract>
        <p>- Addition is one of the most crucial operation in microprocessors which must be performed within a predefined deadline (critical path). Variation is a phenomenon which negatively affects the performance of this operation. This paper proposes a new Low-Power Variation-Mitigant (LPVM) adder design using intrinsic behavior of addition operation. The LPVM approach drastically decrease the probability of deadline violation in addition circuit. The basic idea of this paper is to expedite carry propagation in adder circuits for vulnerable inputs. The LPVM is an input oriented approach which adds a simple logic to the adder architecture that only affects the vulnerable inputs. This approach is applicable for all presented types of adders and improves all high level approaches which tend to overcome the variation issue. Results show that this approach decrease the percentage of violated RCA, CLA and CSA about 70.3%, 59.7% and 67.6% respectively. The LPVM approach not only reduces variation effects on the adder operation from the view point of performance but also it has a very negligible impact on the adder power consumption. The average power consumption overhead of the LPVM approach for RCA, CLA and CSA is about 7.3%, 2.1% and 3.1% for RCA, CLA and CSA, respectively.</p>
      </abstract>
      <kwd-group>
        <kwd>Addition</kwd>
        <kwd>Process Variation</kwd>
        <kwd>Low Power</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>INTRODUCTION</p>
      <p>
        Addition is one of the most useful and important arithmetic
units [1] in microprocessors. Due to its critical role in almost
all processing elements, there exist several architectures with
the same functionality and different characteristics.
Researchers have been investigated adders from different
views such as performance[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], power consumption [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
reliability [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Earliest investigations were focused on the
performance [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and power consumption improvement [1] of
the adders.
      </p>
      <p>
        In recent years, due to drastically decrease in feature sizes
of digital system designs, process variation has become the
major obstacle for system designers and the researchers has
shown massive interest to address the variation effects with
techniques from device to system level aspects [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Process
variation is the concept of the deviation of manufactures
component from nominal designed component. The variation
impacts on the performance of a system and makes disorders
and violations in their operation. All synchronous systems
have a rigid timing constraints and all units must perform with
predefined delay constraints. Variation is an issue which
modifies the delay of operating units stochastically.
      </p>
      <p>
        There exists many efforts in the literature to overcome the
effects of the variation in digital circuits. Previous works can
be divided into two major categories. The first one includes
high level approaches which try to overcome the variation
issue such as Razor Logic (RL) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Telescopic Unit (TU),
PaceLine Approach [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and DynaTune Approach (DA) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
The RU and TU both add new logics at the end of each
pipeline stage and use their own error detection and correction
techniques to overcome the variation issue. For example, RL
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] uses multiple copy of the output logic and compare them
to find out if there exists any error in results. The PaceLine
uses a novel duplication technique based on overclocking
feature of processors. The DynaTune proposes a circuit level
optimization technique to improve circuit behavior by
probabilistic analysis of critical gates of the circuits. These
techniques solve the variation problem globally for the worst
case scenario which may drastically degrade performance.
      </p>
      <p>All above mentioned techniques are considered to be
general, hence our proposed approach is specially designed
for the adder circuits considering their behavior. All
circuitlevel techniques should handle variation effects of their
internal combinational segments, therefore having more
tolerable components leads to better performance in these
techniques. The LPVM imposes significantly less overhead to
the system using intrinsic characteristics of the adder circuits.
It can be used along with above mentioned high-level
techniques and also can increases their efficiency.</p>
      <p>
        The second category includes statistical approaches [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
The [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] proposes a high level approach for presenting a
variation-aware binding a component selection to maximize
the yield. It uses rebinding and Statistical Static Timing
Analysis (SSTA) to evaluate and maximize the performance.
These techniques are also general and do not consider the
intrinsic behavior of the circuit in their calculations. To the
best of our knowledge, the proposed LPVM adder is the first
variation mitigating approach which considers the behavior of
the adder in order to remove the effects of variation on the
operation of adder considering predefined clock period.
      </p>
      <p>In this paper, we propose a novel low power architecture
for adders to overcome the effects of process variation. This
architecture can be used along with all high level approaches
proposed in the literature and can increase their efficiency
because it drastically decreases the adder malfunction
probability. This reduces their overheads to the system. The
basic idea behind this approach is to expedite carry
propagation for vulnerable inputs which may violate the
working clock period.</p>
      <p>The rest of this paper is organized as follows. Section II
describes the process variation. Section III describes the
motivation of the paper. The proposed approach is presented
in Section IV. Experimental results also presented in Section
VII and finally Section VI concludes the paper.</p>
      <p>II.</p>
    </sec>
    <sec id="sec-2">
      <title>PROCESS VARIATION</title>
      <p>
        There exist many types and sources for variation in deep
sub-micron digital circuits. Two major sources are known for
variation: 1- manufacturing variations, 2- operation-induced
variations [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This paper has concentrated on the first
category.
10000
)
(%8000
eg6000
a
en4000
t
re2000
c
P 0
16.00
)14.00
(%12.00
e10.00
g
ta 8.00
en 6.00
rec 4.00
P 2.00
0.00
      </p>
      <p>MinMaxDiff
8 6 2 4 8 8 6 2 4 8 8 6 2 4 8
1 3 6 21 1 3 6 12 1 3 6 21</p>
      <p>
        Nanoscale IC manufacturing imperfections lead to
variation in design parameters such as length (L), width (W),
oxide thickness (Tox) and threshold voltage(Vth) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. These
fluctuations in design of Nano-scale (&lt;&lt;90nm) circuits results
in many side-effects on the operation of designs in
comparison with nominal design parameters. This paper
focuses on the voltage threshold fluctuations because it is the
most affecting parameter [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and it changes the expected
performance and power consumption of the systems.
However, the proposed approach is applicable for all sources
of variation.
      </p>
      <p>III.</p>
      <p>MOTIVATION</p>
      <p>The addition performance completely depends on its
inputs. Carry chain is consisted of multiple consecutive bit
positions in an adder architecture which their carry-out
depends on their input carry. Considering simulation results
which is depicted in Fig. 1, it can be concluded that the adder
can calculate the addition result of its inputs with the delay
less than its critical path considering their inputs. Results
show that inputs in Ripple Carry Adder (RCA), Carry
Lookahead (CLA) and Carry Select Adder (CSA) with 8, 16, 32,
64 and 128 bits, directly affect the calculation delay of the
adder. The longest delay relates to inputs with longest carry
chain. As adder circuits have different addition delay based
on their inputs, different input pairs are categorized based on
their calculation delay in comparison with the longest
addition delay. Results depicted in Fig. 2 show that for 32-bit
RCA adder delay of about 30% of input pairs is up to 20%
less than the longest delay. Besides, about 4.1% of input pairs
have about 10% of the longest delay.</p>
      <p>Process variation results in delay changes in adder circuits.
According to results depicted in Fig. 3, it can be concluded
that variation effects completely depend on the architecture
and input pair characteristics. This figure shows the minimum
and maximum deviation percentage from nominal delay
considering all possible input pairs in different adders with
various bit-widths. Results are extracted using Hspice
simulations and the variation of the threshold voltage (Vth) is
selected as the main impacting parameter with maximum
deviation range of 20% through Monte Carlo simulations.
Simulated results are two folded as presented below:
- Input pairs have different calculation delay based on
their carry propagation pattern.
- Variation changes the calculation delay of each input
pair and also may change the worst case delay of the
adder. These changes may violate the deadline which is
predefined for the adder.</p>
      <p>IV.</p>
    </sec>
    <sec id="sec-3">
      <title>PROPOSED APPROACH</title>
      <p>Considering results gathered from input-based variation
simulations on different types of adders, it is obvious that
although variation impacts the delay characteristics of the
adder, but it does not have influence on the result of all inputs.
This paper proposes a simple and low overhead technique to
overcome the effects of process variation on delay
characteristics of each adder. The Low Power Variation
Mitigating (LPVM) approach tries to overcome the process
variation in adder circuits by simply taking care of inputs with
calculation time near the critical path of nominal adder. The
LPVM design approach schematic diagram is presented in
Fig. 4. Simulation results showed that inputs with longer carry
propagation chains are more susceptible to process variation.
The LPVM design approach inserts simple combinational
logics to the adder architecture in order to break the long carry
propagation chain. The inserted blocks decrease the
calculation time of the adder only for selected inputs and is
called Carry Chain Breaker (CCB) block. The proposed
approach has five steps which is described as follows:</p>
      <sec id="sec-3-1">
        <title>A. Carry chain Determination (Step 1)</title>
        <p>The first step is to determine Carry Chains (CC) of each
input pair which is called CC(S, F ) . Parameters S and F
show start and finish bits of the CC, respectively. This input
pair has three CCs. All carry chains of an input pair ( A, B)
reside in a set called Chain Set (a.k.a CS( A, B) ).</p>
      </sec>
      <sec id="sec-3-2">
        <title>B. Susceptibility Analysis (Step 2)</title>
        <p>The second step divides input pairs into non-overlapped
categories considering their calculation delay. Each input pair
has a weight ( W ( A, B) ) based on its carry propagation pattern.
The weight of each input pair only depends to the weight of
its longest CC and is calculated using (1).</p>
        <p>W CS( A, B)</p>
        <p>MAX</p>
        <p>Fcc
cc CS</p>
        <p>Scc
1
(1)</p>
        <sec id="sec-3-2-1">
          <title>Min Variaion MAX Variation 80 70</title>
          <p>)60
%
(e50
tag40
ecn30
re20
P10
0</p>
          <p>The weight of each input pair shows the calculation delay
of the input pair in comparison with the longest pairs. All
input pairs with the same weight reside in the same category.
The category G(i) contains all possible (existing) input pairs
of the first step which their weight is equal to i. Step 1 and
Step 2 are performed based on the LPVM categorization
algorithm which is depicted in Algorithm 1.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>C. Weight Selection (Step 3)</title>
        <p>The third step is to find out vulnerable weighted categories.
It is an iterative approach as depicted in Fig. 4. This step starts
from the category with highest weight. It randomly selects an
input pair. The selected input is examined on the adder
architecture to find out its behavior under occurrence of
process variation. In this step, the input is evaluated under
different variation conditions based on designer parameters.
Although, this paper selects threshold voltage as the main
affecting parameter, other affecting parameters can be used to
evaluate the variation effect. Afterwards, addition delay in all
executed samples are gathered and checked if they have met
the deadline or not. At this point, a new affecting parameter
will be inserted by the designer which is called Certainty
Factor (CF). This CF shows the acceptable percentage of
violations in calculated results. For example if a designer
selects 100% for the CF, this means that the G(i) is acceptable
only if all its simulated samples from selected input pair meets
the deadline. Otherwise the G(i) is not acceptable and should
be added to the vulnerable inputs. When the algorithm reaches
to a category which meets the defined CF, it does not evaluate
the rest of categories because their calculation latency is
absolutely less than evaluated category. By reaching this
point, Step 3 iterations stops.</p>
      </sec>
      <sec id="sec-3-4">
        <title>D. Intersection Categorizing (Step 4)</title>
        <p>The third step is to divide CCs of selected input pairs into
non-overlapped categories. All CCs which have any
intersection with other CCs will be put in a same category.
The fourth step is to find most covering subsets from CCs
reside in each category. This step finds the most overlapping
chain between different vulnerable input pairs. Therefore, the
overhead of the CCB block insertion reduces.</p>
      </sec>
      <sec id="sec-3-5">
        <title>E. CCB Block Insertion (Step 5)</title>
        <p>The last step is inserting the CCB units into the adder
architecture. This simple block breaks all or a segment of
A
B
CS</p>
        <p>End</p>
        <sec id="sec-3-5-1">
          <title>Algorithm 1: LPVM Categorization</title>
          <p>Finds the best calculation threshold
Inputs : n (adder bit width)</p>
        </sec>
      </sec>
      <sec id="sec-3-6">
        <title>For All Possible Input Pair</title>
      </sec>
      <sec id="sec-3-7">
        <title>Convert _ binary(i);</title>
      </sec>
      <sec id="sec-3-8">
        <title>Convert _ binary( j);</title>
      </sec>
      <sec id="sec-3-9">
        <title>Carry _ Chain _ Extraction( A, B);</title>
        <p>CCL</p>
      </sec>
      <sec id="sec-3-10">
        <title>Longest _ Chain(CS );</title>
      </sec>
      <sec id="sec-3-11">
        <title>Put pair ( A, B)in G(i) wherei</title>
        <p>CCL</p>
      </sec>
      <sec id="sec-3-12">
        <title>Step2</title>
      </sec>
      <sec id="sec-3-13">
        <title>Step1</title>
        <p>carry chains in susceptible input pairs to decrease or
overcome the effects of variation on the adder circuit
operation. The internal structure of the CCB for detecting
continues propagation pattern is shown in Fig. 4. The depicted
CCB block is designed for CC(i, j) . This block is consisted
of parallel operating 2-input XOR gates which are connected
to the adder inputs to detect the propagation pattern. This
block connects the carry of the adder in ith bit to the carry in
the jth position. When propagation pattern is detected, a
2-to1 multiplexer selects Ci and replaces it with C j 1 carry. The
CCB block has very low overhead because the XOR gates
already resides in the basic architecture of the adders. The
CCBs operate in parallel with the adder and does not increase
its latency. The number of CCBs and their length completely
depends on selected input vectors. The CCB architecture has
overlapped with full adders. The XOR section of the CCB is
generated in all adder circuits which reduces the overhead of
the CCBs.</p>
        <p>The experimental results consists of two different phases.
The first phase uses the LEON3 processor (32-bit). The
second phase relates to variation investigation of adder
circuit. Simulations of this phase are performed with Hspice
simulator and the technology size in considered as 32nm.
Some applications of Mibench benchmark suit are executed
on the LEON3 processor and the input entries of the adder
unit is gathered and evaluated. The weight distribution of
input pairs is depicted in Fig. 6. Results show that in real
application executions, we may not have all possible input
operands. Therefore exhaustive exploration of input pairs for
the LPVM approach is no longer necessary.</p>
        <p>Applying LPVM approach on the adder types presented in
Section III shows that variation phenomena based on
threshold voltage (in the range of 20%) different adder
architectures demonstrate different operation violations. Fig.
7.a shows the percentage of operation violations in different
adder architectures for various threshold variation ranges.</p>
        <p>Ai Bi Ai+1 Bi+1 Aj-1 Bj-1 Aj Bj
Cj-1
Ci-1
0
h
t
a
m
c
i
s
a
B
Security</p>
        <sec id="sec-3-13-1">
          <title>Automative Consumer Benchmark Fig. 6. Weight distribution of input pairs in 32-bit adder based on</title>
          <p>Results show that for selected benchmarks, the variation
impacts the performance of the adder and results in deadline
violation in addition operation. As the variation impact
increase, the violated percentage of addition also increases.
Results show that the CLA has the worst behavior in front of
variation in comparison with other adder types. Applying the
LPVM approach drastically reduces the deadline violations.
This happens because CCB blocks which are inserted in the
adder architecture break the carry chain of vulnerable input
pairs.</p>
          <p>The LPVM reduces the variation effect of the adder on the
system behavior by decreasing the deadline violation
percentage of the adder. Our proposed approach reduces the
variation violation in all adder architectures. The effect of the
proposed approach is different and relates to the adder
architecture. According to results presented in Fig. 7.b, the
LPVM decrease the violation percentage of RCA, CLA and
CSA about 70.3%, 59.7% and 67.6%, respectively. In the
fourth step of the LPVM approach, the intersection of carry
chains of vulnerable input pairs are selected to decrsease the
overhead of the proposed approach. Therefore, in this step, a
systematic trade-off appears between power consumption
overhead and variation mitigation. Results show that the
LVMP approach acceptably reduces the variation effects and
power consumption overhead.</p>
          <p>Results show the LPVM approach reduces the violation
percentage of the adder architecture in front of variation. It
also imposes very low power dissipation and area overhead to
the system. The average power consumption overhead of
LPVM adders are respectively 7.3%, 2.1% and 3.1% for
RCA, CLA and CSA architectures.</p>
          <p>VI.</p>
          <p>CONCLUSION</p>
          <p>The LVPM approach proposes a new approach to design
variation tolerant adder circuits based on their intrinsic
behavior. This approach reduces carry chain of vulnerable
input pairs. This drastically reduces the effect of variation.
The proposed approach can reduce the malfunction
percentage of the adder up to 70%. The other advantage of
this approach is that is imposes very low power consumption
overhead to the adder (up to 7.3%).</p>
          <p>VII.</p>
          <p>FUTURE WORK</p>
          <p>The LPVM approach should be extended to design a
variation mitigate ALU to overcome the variation with low
power consumption overheads.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>VIII. REFERENCES</title>
      <p>R. Zlatanovici, S. Kao, and B. Nikolic, “Energy-Delay Optimization
of 64-Bit Carry-Lookahead Adders With a 240 ps 90 nm CMOS
RCA</p>
      <p>CLA
Adder type
0-5%
5-10%
10-15%
15-20%
RCA</p>
      <p>CLA</p>
      <sec id="sec-4-1">
        <title>Adder Type</title>
        <p>b)
Fig. 7. Deadline violation percentage in different adder architecture
a) basic architecture, b) LPVM architecture</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          30 )%
          <volume>25</volume>
          (
          <article-title>e20 tag15 en10 c re 5 P 0 Design Example</article-title>
          ,” IEEE J.
          <string-name>
            <surname>Solid-State</surname>
            <given-names>Circuits</given-names>
          </string-name>
          , vol.
          <volume>44</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>569</fpage>
          -
          <lpage>583</lpage>
          , Feb.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>P.</given-names>
            <surname>Varman</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Mohanram</surname>
          </string-name>
          , “
          <article-title>High performance reliable variable latency carry select addition</article-title>
          ,” in 2012 Design, Automation &amp; Test in Europe Conference &amp;
          <source>Exhibition (DATE)</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>1257</fpage>
          -
          <lpage>1262</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Saxena</surname>
          </string-name>
          , “
          <article-title>Design of low power and high speed Carry Select Adder using Brent Kung adder</article-title>
          ,” in
          <source>2015 International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI-SATA)</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Pudi</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Sridharan</surname>
          </string-name>
          , “
          <article-title>Low Complexity Design of Ripple Carry and Brent-Kung Adders in QCA,”</article-title>
          <source>IEEE Trans. Nanotechnol.</source>
          , vol.
          <volume>11</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>105</fpage>
          -
          <lpage>119</lpage>
          , Jan.
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wei</surname>
          </string-name>
          , “
          <article-title>Residue checker using optimal signed-digit adder tree for error detection of arithmetic circuits,”</article-title>
          <source>in TENCON 2014 - 2014 IEEE Region 10 Conference</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D.</given-names>
            <surname>Blaauw</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Chopra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          , and L. Scheffer, “
          <article-title>Statistical Timing Analysis: From Basic Principles to State of the Art,”</article-title>
          <source>IEEE Trans. Comput. Des. Integr. Circuits Syst.</source>
          , vol.
          <volume>27</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>589</fpage>
          -
          <lpage>607</lpage>
          , Apr.
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ernst</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Blaauw</surname>
          </string-name>
          , T. Austin, T. Mudge, Nam Sung Kim,
          <article-title>and</article-title>
          K. Flautner, “
          <article-title>Razor: circuit-level correction of timing errors for low-power operation</article-title>
          ,
          <source>” IEEE Micro</source>
          , vol.
          <volume>24</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>10</fpage>
          -
          <lpage>20</lpage>
          , Nov.
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Brian</given-names>
            <surname>Greskamp</surname>
          </string-name>
          and Josep Torrellas, “Paceline:
          <article-title>Improving SingleThread Performance in Nanoscale CMPs through Core Overclocking,”</article-title>
          <source>in Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques</source>
          ,
          <year>2007</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>224</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>L.</given-names>
            <surname>Wan</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , “DynaTune,”
          <source>in Proceedings of the 2009 International Conference on Computer-Aided Design - ICCAD '09</source>
          ,
          <year>2009</year>
          , p.
          <fpage>172</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>G.</given-names>
            <surname>Lucas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Cromar</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          , “
          <article-title>FastYield: variation-aware, layout-driven simultaneous binding and module selection for performance yield optimization</article-title>
          ,” pp.
          <fpage>61</fpage>
          -
          <lpage>66</lpage>
          , Jan.
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Vasudevan</surname>
          </string-name>
          , “
          <string-name>
            <surname>Variation-Conscious Formal</surname>
            Timing Verification in
            <given-names>RTL</given-names>
          </string-name>
          ,” in
          <source>2011 24th Internatioal Conference on VLSI Design</source>
          ,
          <year>2011</year>
          , pp.
          <fpage>58</fpage>
          -
          <lpage>63</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kamal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Afzali-Kusha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Safari</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pedram</surname>
          </string-name>
          , “
          <article-title>Impact of Process Variations on Speedup and Maximum Achievable Frequency of Extensible Processors,”</article-title>
          <source>ACM J. Emerg. Technol. Comput. Syst.</source>
          , vol.
          <volume>10</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          , Apr.
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>N.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Raghunathan</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Roy</surname>
          </string-name>
          , “
          <article-title>Coping with variations through system-level design</article-title>
          ,
          <source>” Proc. 22nd Int. Conf. VLSI Des. - Held Jointly with 7th Int. Conf. Embed. Syst.</source>
          , pp.
          <fpage>581</fpage>
          -
          <lpage>586</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kamal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Afzali-Kusha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Safari</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Pedram</surname>
          </string-name>
          , “
          <article-title>An architecture-level approach for mitigating the impact of process variations on extensible processors,”</article-title>
          <source>in DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>467</fpage>
          -
          <lpage>472</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>