=Paper=
{{Paper
|id=Vol-1566/paper7
|storemode=property
|title=Early failure prediction by using in-situ monitors: Implementation and application results
|pdfUrl=https://ceur-ws.org/Vol-1566/Paper6.pdf
|volume=Vol-1566
|authors=Ahmed Benhassain,Florian Cacho,Vincent Huard,Lorena Anghel
|dblpUrl=https://dblp.org/rec/conf/date/BenhassainCHA16
}}
==Early failure prediction by using in-situ monitors: Implementation and application results==
21
Early failure prediction by using in-situ monitors: Implementation and
application results
A. Benhassain1,2, F. Cacho1, V. Huard1, L. Anghel2
1
STMicroelectronics, Technology R&D, Crolles, France
Phone: + 33(0)438922536, e-mail : sidi-ahmed.benhassain@st.com
2
TIMA 46, avenue Félix Viallet, 38031 Grenoble, France
Abstract – In-situ monitor is a promising strategy to measure insertion are reported and discussed. Finally, several
timing slacks and to provide pre-error warning prior to any applications of ISM usage for compensation are presented.
timing violation. In this work, we demonstrate that the usage of
in-situ monitors with a feedback loop of voltage regulation is
suitable for process and temperature compensation. II. ISM INSERTION FLOW
Index Terms — in-situ timing monitors, CMOS reliability, timing
margin..
The advantage of ISM located inside digital block is the
capability to accurately capture all sources of local physical,
I. INTRODUCTION environmental and temporal variations. ISMs under
investigation are presented in the Fig. 1. The basic idea is to
With CMOS technology scaling, it becomes more and delay the data of a critical path arriving at D in the shadow FF,
more difficult to guarantee circuit functionality for all process, and to compare it with the regular FF. When Flag signal rises,
voltage, and temperature (PVT) corners. Moreover, circuit it means that a violation of the setup time has occurred in the
wearout degradation lead to additional temporal variation. It shadow FF and the remaining slack of the data path is close to
results an increase of design margin for reliable systems the timing of the delay element, as defined in the schematic. In
[1].Adding pessimistic timing margin to guarantee all this work 3 time windows (TW1=60ps, TW2=100ps,
operating conditions under worse case conditions is no more TW3=130ps) have been evaluated. The schematic can be
acceptable due to the huge impact on design costs. carried out in two different ways: semi-custom (flow-based
One can report two categories of ageing monitoring ISM) or full custom designs (cell-based ISM). In the first one,
techniques. Firstly, we can define standalone sensors utilizing all schematic elements are issued from the standard cell design
various configurations of ring oscillators [2] and delay chain. platform. Placement and connectivity is performed with
Replica paths [3] are a solution to mimic the timing behavior scripting during the flow execution. The second one is a new
of the original path in combinatory logic. Second, in-situ delay cell dedicated this usage. For that approach, all CAD views of
monitors can directly measure the delay degradation of a the new cell (functional, physical, timing, etc) need to be
specific path within the target circuit, this approach is very developed to be compliant with standard digital flow.
promising to provide reliable timing information [4]. Delay
monitors such as “Razor I” [5] and “Razor II” [6] detect
timing errors in actual paths. A local microrollback execution
procedure ensures error correction. However, these methods
need huge hardware architecture for error recovery. The
Adaptive Voltage Scaling (AVS) approach in [7, 8] proposes
error correction by using in-situ monitors able to detect timing
error and global system action following the error detection.
Another approach consists in detecting timing pre-error
instead of timing error by detecting critical transitions [8]. In
this case, the in-situ delay monitors can be used as reliability
technique to provide alert prior setup violation. This technique
is also further combined with global system actions such as
AVS or DVFS.
In this paper, an innovative insertion flow of monitor is
presented. Two solutions of ISM are discussed and compared.
The first one is built with standard cells available in the Fig. 1. Schematic and layout of in-situ monitor under investigations. Data
technology design platform library, named here built-in flow arriving at Q is delayed in shadow FF and compared to the regular one. Flow-
ISM. The second one uses a dedicated custom design, named based ISM is composed of standard cells available in the design platform.
Cell-based ISM is a fully customized design.
cell-based ISM. In section III, some benchmarks of strategy of
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
22
In addition to the choice of the monitor, the insertion flow It is important to notice that the delay monitored is in the
is of crucial importance. In the objective of developing order of magnitude of the degradation measured on test chips
quantitative results, an industrial framework compliant with during ageing experiments. It is mandatory to have the highest
STMicroelectronics digital flow design is used. Obviously, the timing accuracy level during monitor insertion. Whereas the
methodology is portable to any other standard or in-house insertion could be possible at synthesis during Front-end, the
digital flow. The generic approach is illustrated in the Fig. 2. physical synthesis (Back-end) is able to account for parasitic
The classical Front-end steps are executed with synthesis and effects. Thus, timing analysis at this level (post-route) is
floorplaning. At the end, a gate netlist is provided as input to suitable and relevant to discuss the efficiency of the insertion
placement and route tool. After placement and pre clock tree flow. Benchmarking this methodology on different digital
synthesis (CTS), a timing analysis (TA) is performed. For blocks is now reviewed to determine how the insertion flow
setup functional corner, a decision is made to insert monitor can cover digital path ageing and establish the performance
(FF cell sweep for cell-based ISM) and to regenerate penalty
connectivity on a sub-set of critical path. It results in a new 60
5% worst CP post-CTS
60
with ISM flow
gate netlist, new timing and power figures, and the flow is 50 post-CTS 50 without ISM
3
normally re-executed: post CTS (hold and setup optimization), 40 40
# paths
# paths
route and optimization until the design is timing, power and 30 5% selection 30
reliability closed. A certain number of back and forth steps is of worst slack 1
20 20
required to fully satisfy the initial design specification, as 10 10
shown in figure 2. 0 0
For illustration, some timing analyses are presented in the 0 0.2 0.4 0 0.2 0.4
slack (ns) slack (ns)
Fig. 3. Based on an initial 5% worst slack selections, ISM are
60
inserted in a sub-set of path. At step 3 (Fig. 2), histogram of no monited path
monitored path
paths are reported for an implementation with and without 50
delayed path
ISM. In the following analysis, delayed paths are not reported. 40 without ISM
# paths
30 40%
monitored
20
path of the
10 initial
selection
0
-0.02 0 0.02 0.04 0.06 0.08
slack (ns)
Fig. 3. Timing analysis of BCH results at different step of the flow. A
preliminary TA at post-CTS is calculated (step 1). Based on this ranking, 5%
worst slack are selected, and ISM are inserted. In the final TA (step 3), slack
of monitored and none monitored paths are presented.
III. BENCHMARK RESULTS
The ISM insertion flow is now applied on different circuits
and performance versus ISM covering efficiency is reviewed.
The design is synthesized and place-and-routed in 28FDSOI
technology with Low-Vt devices. Several circuits are issued
from ITC99 benchmark whose characteristics are typical of
synthetized circuits. The b19, b15 and b14 have respectively
17, 6 and 10 Kgate after physical implementation. They are
Fig. 2. Flow insertion of in-situ built-in monitors. During Front-End flow, a respectively composed of 2, 0.8 and 0.4 kFF. In addition
preliminary Timing Analysis is performed after pre-CTS step. In-situ monitors industrial customer-related digital block is investigated as
are inserted in sub-set of critical paths and a new gate netlist is generated. well. This block is a Bose, Ray-Choudhary and Hocquenghem
Then the Back-end flow is normally executed with new gate netlist.
(BCH) error correcting code IP consisting of encoder and
A particular attention is paid to be sure that for flow-based decoder modules. The IP contains an output signal, Autotest,
ISM the inserted cells are physically the closest possible to the indicating if error correction is preformed correctly. More
monitored FF. To achieve this objective, timing constraints are details about this circuit can be found in [1].
adapted to minimize the skew between shadow and regular
FF. Moreover the delayed data arriving at shadow FF is not Different trials of implementation are performed for the
considered as a real path when the place and route tool same target performance with a large availability of area.
optimize to fulfill the timing constraint. It means that there is Optimization of place and route tool has for first priority,
theoretically no timing penalty after ISM insertion expected performance, and then area/power. Worst negative slack at
the one induced by the slightly additional routing resource. post-route step are discussed for all the circuits.
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
23
0 hazards of a path due to the induced ageing failures are a
-0.01 b15
function of its remaining slack. However, for the ISM flow,
-0.02 b14
we use the number of inserted ISM as the metric under
slack penalty (ns)
-0.03 b19
-0.04 investigation.
-0.05 BCH
-0.06 IV. EXAMPLE OF APPLICATIONS
-0.07
-0.08
-0.09 After discussing the strategy and the benchmark of
-0.1 insertion ISM flow, some experimental results are now
reviewed. Dedicated digital block are developed where on
10% of critical paths custom cell-based ISM has been inserted.
We have investigated designs in 28nm, Low Power (LP) and
Fully Depleted SOI (FDSOI) developed at
STMicroelectronics. Digital block studied is the BCH block
Fig. 4. Relative worst slack penalty for reference implementation, aged mentioned in previous section.
library implementation (for different mission profile), 10% flow-based ISM
and 10% cell-based ISM.
As depicted in the Fig. 4, performance impact after ISM
insertion might depend on circuit. Compared to reference
circuit (fresh library without monitors), the implementation
with an aged library (consumer, networking or automotive are
mission profile dependent) leads to minor penalty. However
this guard band enables the circuit to fulfill the timing
requirement at the end of life. Concerning the ISM insertion,
we chose a sub-set of 10% worst slacks (at step 1 of Fig. 2) to
equip with monitors. BCH result shows a 90ps slack Fig. 6. Management of multicore architecture using ISM. At fixed 1GHz
degradation for cell-based ISM and less than 20ps for flow- clock, when decreasing supply voltage, a warning Flag appears earlier before
the IP failure. The 18 core safety margins are in a 100mV supply voltage
based ISM. The explanation for the penalty of cell-based range.
penalty is the area constraint of the large custom cell. For the In the first application ISM are inserted in the architecture
sake of clarity, the delayed data arriving at shadow FF for cell- to manage the variability under optimum power budget. Major
based ISM is not reported in the TA of Fig. 4 for ITC99 challenge in multicore architecture is to cope with inter-core
benchmark. dispersion. Indeed, local process dispersion leads to variation
0 100 of speed and thus power consumption of all cores. To tackle
% coverage of targeted set of CP
-0.01 this dispersion, an additional margin in the voltage stack needs
-0.02 80 to be used. It is not a trivial task to establish this margin
because it is deeply influenced by the process centering and
slack penalty (ns)
-0.03
-0.04 60 dispersion of manufacturability. Alternative approach is to
-0.05 insert ISM and to use their Flag as a warning to be considered
-0.06 40 as inputs of margin capabilities. As depicted in Fig. 6, 18 BCH
-0.07 cores are implemented in LP technology with ISM without
-0.08 performance 20 any feedback loop. Under constant 1GHz clock frequency,
-0.09 coverage % when supply voltage is decreased, a first Flag monitor occurs
-0.1 0
at 0.99V, corresponding to a 1% of voltage decrease. At that
reference aged lib 5% ISM 10% ISM 30% ISM point, the operating functionality is still correct. While supply
voltage continue to decrease, more and more Flags occur on
different cores and a first failure is reported (setup violation)
Fig. 5. Slack penalty for different implementations of BCH circuit. Increase
of number of ISM leads to a slight timing degradation. The level of coverage
at 0.85V. Interestingly, the VMIN (minimal voltage sustaining
(number of critical path monitored on initial critical path targeted to be to maintain functionality at a given PLL clock) distribution for
monitored) remains close to 40%. all 18 cores, depends on the application execution of all cores
The number of ISM inserted and it performance impacts is and their ageing experience. To optimize the choice of voltage
discussed in the Fig. 5. For a selection of the most 5% of the stack in multicore architecture, the strategy would be to
critical data path, the performance penalty is only 15ps, and monitor the first Flag of each core instead of using a
40% of the initial selection is covered by monitors. A classic conservative extra margin covering intra-core dispersion.
approach would consist to select the CP according to an The second application focuses on monitoring the Flag
absolute delay window and not in a number of CP criteria. number. The Flag number is the indicator for circuit speed and
Thus, the distribution of the CP and sub-CP histogram is used for local variations along with aging aware voltage
important to analyze when using this approach. The violation adaptation. An important measure campaign is performed on
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
24
300 dies using 3 time windows (TW1, TW2, TW3) and 5 V. CONCLUSION
workloads (Workload A, B, C, D, E). This workloads are
A flow of in-situ monitor is developed and applied to
patterns containing 0, 1,2,3,4 errors respectively. Fig. 7 shows
different circuits. Two types of monitors are compared and
the Flag count when decreasing the voltage until the Autotest
discussed: cell-based and flow-based approach. Performance
signal fail using TW1 for different workloads. When
penalty and area overhead of ISM is slightly small. This
modifying the pattern, the activity is modified and it results a
additional margin provided by Flag signal is more accurate
strong modification on CP ranking. A direct consequence is
than the additional voltage stack margin to account for ageing
that VMIN_AUTOTEST (the supply voltage before Autotest signal
degradation. The coverage path statistics, number of critical
fail) and VMIN_Flag (the supply voltage when the count is
path monitored on desired critical path to be monitored, is
starting) vary strongly with the workload (VMIN_AUTOTEST ~
around 40%. This approach is suitable for dynamic
0.8V and VMIN_Flag~ 0.84 for Workload A, VMIN_AUTOTEST~
management of ageing because at long-term, the probability to
0.825V and VMIN_Flag ~ 0.825V for Workload D)
activate one path from critical path selection is high. Some
applications of adaptive regulation are illustrated, this scheme
is promising for process compensation and temperature
change.
REFERENCES
[1] V. Huard, “Adaptative wear out management with in-situ
management” IRPS 2014
[2] X.Wang ,”Path-RO: a novel on-chip critical path delay
measurement under process variation IEEE ACM(2008)
[3] S.Wang “Representative Critical Reliability Paths for low-cost
and accurate on-chip aging evaluation” IEEE/ICCAD (2012)
[4] Saliva.M ,”Digial circuits reliability with in-situ monitors in
28nm fully depleted SOI “ IEEE/DATE (2015)
[5] S. Das et al.,”A Self-Tuning DVS Processor Using Delay-Error
Detection and Correction” IEEE J. Solid-State Circuits, Apr.
Fig. 7: Evolution of Flag number with VDD using time window 1 (TW1) for
different workloads: A, B, C, D ( 0, 1,2,3,4 errors injected). (2006)
[6] D. Blaauw et al.,”RazorII: In Situ Error Detection and Correction
for PVT and SER Tolerance” IEEE J. Solid-State Circuits,Jan.
In order to demonstrate the robustness of ISM, it is (2009)
important to test them under various conditions. For that [7] K.A.Bowman “Energy-efficient and Metastability-Immune
propose, various temperature change have been exercised on Resilient Circuits for Dynamic Variation Tolerance” IEEE
BCH IP. Figure 8 shows the result of VMIN variation under Journal of Solid-State Circuits
30°C and 125°C for both Autotest and Flag signals. As [8] M. Wirnshofer “A Variation-Aware Adaptive Voltage Scaling
depicted, degradation by 250 mV of VMIN_AUTOTEST is observed Technique Based on In-situ Delay monitoring” IEEE/DDECS
when 125°C is applied, confirming the ability of ISM to (2012)
capture local variations induced by temperature change.
Fig. 8: Evolution of Flag number with VDD using TW1 and workload B
under 30°C (magenta) and 125°C (blue) temperatures
Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany
Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.