=Paper= {{Paper |id=Vol-1566/paper7 |storemode=property |title=Early failure prediction by using in-situ monitors: Implementation and application results |pdfUrl=https://ceur-ws.org/Vol-1566/Paper6.pdf |volume=Vol-1566 |authors=Ahmed Benhassain,Florian Cacho,Vincent Huard,Lorena Anghel |dblpUrl=https://dblp.org/rec/conf/date/BenhassainCHA16 }} ==Early failure prediction by using in-situ monitors: Implementation and application results== https://ceur-ws.org/Vol-1566/Paper6.pdf
                                                                                                                                           21



 Early failure prediction by using in-situ monitors: Implementation and
                            application results
                                    A. Benhassain1,2, F. Cacho1, V. Huard1, L. Anghel2
                                1
                              STMicroelectronics, Technology R&D, Crolles, France
                         Phone: + 33(0)438922536, e-mail : sidi-ahmed.benhassain@st.com
                                2
                                  TIMA 46, avenue Félix Viallet, 38031 Grenoble, France


Abstract – In-situ monitor is a promising strategy to measure        insertion are reported and discussed. Finally, several
timing slacks and to provide pre-error warning prior to any          applications of ISM usage for compensation are presented.
timing violation. In this work, we demonstrate that the usage of
in-situ monitors with a feedback loop of voltage regulation is
suitable for process and temperature compensation.                                          II. ISM INSERTION FLOW
Index Terms — in-situ timing monitors, CMOS reliability, timing
margin..
                                                                         The advantage of ISM located inside digital block is the
                                                                     capability to accurately capture all sources of local physical,
                     I. INTRODUCTION                                 environmental and temporal variations. ISMs under
                                                                     investigation are presented in the Fig. 1. The basic idea is to
    With CMOS technology scaling, it becomes more and                delay the data of a critical path arriving at D in the shadow FF,
more difficult to guarantee circuit functionality for all process,   and to compare it with the regular FF. When Flag signal rises,
voltage, and temperature (PVT) corners. Moreover, circuit            it means that a violation of the setup time has occurred in the
wearout degradation lead to additional temporal variation. It        shadow FF and the remaining slack of the data path is close to
results an increase of design margin for reliable systems            the timing of the delay element, as defined in the schematic. In
[1].Adding pessimistic timing margin to guarantee all                this work 3 time windows (TW1=60ps, TW2=100ps,
operating conditions under worse case conditions is no more          TW3=130ps) have been evaluated. The schematic can be
acceptable due to the huge impact on design costs.                   carried out in two different ways: semi-custom (flow-based
    One can report two categories of ageing monitoring               ISM) or full custom designs (cell-based ISM). In the first one,
techniques. Firstly, we can define standalone sensors utilizing      all schematic elements are issued from the standard cell design
various configurations of ring oscillators [2] and delay chain.      platform. Placement and connectivity is performed with
Replica paths [3] are a solution to mimic the timing behavior        scripting during the flow execution. The second one is a new
of the original path in combinatory logic. Second, in-situ delay     cell dedicated this usage. For that approach, all CAD views of
monitors can directly measure the delay degradation of a             the new cell (functional, physical, timing, etc) need to be
specific path within the target circuit, this approach is very       developed to be compliant with standard digital flow.
promising to provide reliable timing information [4]. Delay
monitors such as “Razor I” [5] and “Razor II” [6] detect
timing errors in actual paths. A local microrollback execution
procedure ensures error correction. However, these methods
need huge hardware architecture for error recovery. The
Adaptive Voltage Scaling (AVS) approach in [7, 8] proposes
error correction by using in-situ monitors able to detect timing
error and global system action following the error detection.
    Another approach consists in detecting timing pre-error
instead of timing error by detecting critical transitions [8]. In
this case, the in-situ delay monitors can be used as reliability
technique to provide alert prior setup violation. This technique
is also further combined with global system actions such as
AVS or DVFS.
In this paper, an innovative insertion flow of monitor is
presented. Two solutions of ISM are discussed and compared.
The first one is built with standard cells available in the          Fig. 1. Schematic and layout of in-situ monitor under investigations. Data
technology design platform library, named here built-in flow         arriving at Q is delayed in shadow FF and compared to the regular one. Flow-
ISM. The second one uses a dedicated custom design, named            based ISM is composed of standard cells available in the design platform.
                                                                     Cell-based ISM is a fully customized design.
cell-based ISM. In section III, some benchmarks of strategy of

 Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany

 Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
 volume is published and copyrighted by its editors.
                                                                                                                                                                                                  22


    In addition to the choice of the monitor, the insertion flow                     It is important to notice that the delay monitored is in the
is of crucial importance. In the objective of developing                         order of magnitude of the degradation measured on test chips
quantitative results, an industrial framework compliant with                     during ageing experiments. It is mandatory to have the highest
STMicroelectronics digital flow design is used. Obviously, the                   timing accuracy level during monitor insertion. Whereas the
methodology is portable to any other standard or in-house                        insertion could be possible at synthesis during Front-end, the
digital flow. The generic approach is illustrated in the Fig. 2.                 physical synthesis (Back-end) is able to account for parasitic
The classical Front-end steps are executed with synthesis and                    effects. Thus, timing analysis at this level (post-route) is
floorplaning. At the end, a gate netlist is provided as input to                 suitable and relevant to discuss the efficiency of the insertion
placement and route tool. After placement and pre clock tree                     flow. Benchmarking this methodology on different digital
synthesis (CTS), a timing analysis (TA) is performed. For                        blocks is now reviewed to determine how the insertion flow
setup functional corner, a decision is made to insert monitor                    can cover digital path ageing and establish the performance
(FF cell sweep for cell-based ISM) and to regenerate                             penalty
connectivity on a sub-set of critical path. It results in a new                              60
                                                                                                        5% worst CP post-CTS
                                                                                                                                                       60
                                                                                                                                                                with ISM flow
gate netlist, new timing and power figures, and the flow is                                  50         post-CTS                                       50       without ISM
                                                                                                                                                                                              3
normally re-executed: post CTS (hold and setup optimization),                                40                                                        40




                                                                                                                                             # paths
                                                                                   # paths
route and optimization until the design is timing, power and                                 30       5% selection                                     30
reliability closed. A certain number of back and forth steps is                                       of worst slack                 1
                                                                                             20                                                        20
required to fully satisfy the initial design specification, as                               10                                                        10
shown in figure 2.                                                                            0                                                         0
    For illustration, some timing analyses are presented in the                                   0       0.2                     0.4                       0           0.2             0.4
                                                                                                         slack (ns)                                                    slack (ns)
Fig. 3. Based on an initial 5% worst slack selections, ISM are
                                                                                                                             60
inserted in a sub-set of path. At step 3 (Fig. 2), histogram of                                                                                                    no monited path
                                                                                                                                                                   monitored path
paths are reported for an implementation with and without                                                                    50
                                                                                                                                                                   delayed path
ISM. In the following analysis, delayed paths are not reported.                                                              40                                    without ISM




                                                                                                                   # paths
                                                                                                                             30                                           40%
                                                                                                                                                                          monitored
                                                                                                                             20
                                                                                                                                                                          path of the
                                                                                                                             10                                           initial
                                                                                                                                                                          selection
                                                                                                                             0
                                                                                                                             -0.02       0     0.02 0.04        0.06   0.08
                                                                                                                                               slack (ns)

                                                                                 Fig. 3. Timing analysis of BCH results at different step of the flow. A
                                                                                 preliminary TA at post-CTS is calculated (step 1). Based on this ranking, 5%
                                                                                 worst slack are selected, and ISM are inserted. In the final TA (step 3), slack
                                                                                 of monitored and none monitored paths are presented.



                                                                                                                III. BENCHMARK RESULTS

                                                                                     The ISM insertion flow is now applied on different circuits
                                                                                 and performance versus ISM covering efficiency is reviewed.
                                                                                 The design is synthesized and place-and-routed in 28FDSOI
                                                                                 technology with Low-Vt devices. Several circuits are issued
                                                                                 from ITC99 benchmark whose characteristics are typical of
                                                                                 synthetized circuits. The b19, b15 and b14 have respectively
                                                                                 17, 6 and 10 Kgate after physical implementation. They are
Fig. 2. Flow insertion of in-situ built-in monitors. During Front-End flow, a    respectively composed of 2, 0.8 and 0.4 kFF. In addition
preliminary Timing Analysis is performed after pre-CTS step. In-situ monitors    industrial customer-related digital block is investigated as
are inserted in sub-set of critical paths and a new gate netlist is generated.   well. This block is a Bose, Ray-Choudhary and Hocquenghem
Then the Back-end flow is normally executed with new gate netlist.
                                                                                 (BCH) error correcting code IP consisting of encoder and
A particular attention is paid to be sure that for flow-based                    decoder modules. The IP contains an output signal, Autotest,
ISM the inserted cells are physically the closest possible to the                indicating if error correction is preformed correctly. More
monitored FF. To achieve this objective, timing constraints are                  details about this circuit can be found in [1].
adapted to minimize the skew between shadow and regular
FF. Moreover the delayed data arriving at shadow FF is not                          Different trials of implementation are performed for the
considered as a real path when the place and route tool                          same target performance with a large availability of area.
optimize to fulfill the timing constraint. It means that there is                Optimization of place and route tool has for first priority,
theoretically no timing penalty after ISM insertion expected                     performance, and then area/power. Worst negative slack at
the one induced by the slightly additional routing resource.                     post-route step are discussed for all the circuits.


 Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany

 Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
 volume is published and copyrighted by its editors.
                                                                                                                                                                                                  23


                           0                                                                                                hazards of a path due to the induced ageing failures are a
                       -0.01                                                           b15
                                                                                                                            function of its remaining slack. However, for the ISM flow,
                       -0.02                                                           b14
                                                                                                                            we use the number of inserted ISM as the metric under
  slack penalty (ns)




                       -0.03                                                           b19
                       -0.04                                                                                                investigation.
                       -0.05                                                           BCH
                       -0.06                                                                                                                   IV. EXAMPLE OF APPLICATIONS
                       -0.07
                       -0.08
                       -0.09                                                                                                    After discussing the strategy and the benchmark of
                        -0.1                                                                                                insertion ISM flow, some experimental results are now
                                                                                                                            reviewed. Dedicated digital block are developed where on
                                                                                                                            10% of critical paths custom cell-based ISM has been inserted.
                                                                                                                            We have investigated designs in 28nm, Low Power (LP) and
                                                                                                                            Fully     Depleted      SOI      (FDSOI)     developed      at
                                                                                                                            STMicroelectronics. Digital block studied is the BCH block
Fig. 4. Relative worst slack penalty for reference implementation, aged                                                     mentioned in previous section.
library implementation (for different mission profile), 10% flow-based ISM
and 10% cell-based ISM.
    As depicted in the Fig. 4, performance impact after ISM
insertion might depend on circuit. Compared to reference
circuit (fresh library without monitors), the implementation
with an aged library (consumer, networking or automotive are
mission profile dependent) leads to minor penalty. However
this guard band enables the circuit to fulfill the timing
requirement at the end of life. Concerning the ISM insertion,
we chose a sub-set of 10% worst slacks (at step 1 of Fig. 2) to
equip with monitors. BCH result shows a 90ps slack                                                                          Fig. 6. Management of multicore architecture using ISM. At fixed 1GHz
degradation for cell-based ISM and less than 20ps for flow-                                                                 clock, when decreasing supply voltage, a warning Flag appears earlier before
                                                                                                                            the IP failure. The 18 core safety margins are in a 100mV supply voltage
based ISM. The explanation for the penalty of cell-based                                                                    range.
penalty is the area constraint of the large custom cell. For the                                                                In the first application ISM are inserted in the architecture
sake of clarity, the delayed data arriving at shadow FF for cell-                                                           to manage the variability under optimum power budget. Major
based ISM is not reported in the TA of Fig. 4 for ITC99                                                                     challenge in multicore architecture is to cope with inter-core
benchmark.                                                                                                                  dispersion. Indeed, local process dispersion leads to variation
                          0                                                        100                                      of speed and thus power consumption of all cores. To tackle
                                                                                         % coverage of targeted set of CP




                       -0.01                                                                                                this dispersion, an additional margin in the voltage stack needs
                       -0.02                                                       80                                       to be used. It is not a trivial task to establish this margin
                                                                                                                            because it is deeply influenced by the process centering and
  slack penalty (ns)




                       -0.03
                       -0.04                                                       60                                       dispersion of manufacturability. Alternative approach is to
                       -0.05                                                                                                insert ISM and to use their Flag as a warning to be considered
                       -0.06                                                       40                                       as inputs of margin capabilities. As depicted in Fig. 6, 18 BCH
                       -0.07                                                                                                cores are implemented in LP technology with ISM without
                       -0.08                                    performance        20                                       any feedback loop. Under constant 1GHz clock frequency,
                       -0.09                                    coverage %                                                  when supply voltage is decreased, a first Flag monitor occurs
                        -0.1                                                       0
                                                                                                                            at 0.99V, corresponding to a 1% of voltage decrease. At that
                               reference   aged lib   5% ISM   10% ISM   30% ISM                                            point, the operating functionality is still correct. While supply
                                                                                                                            voltage continue to decrease, more and more Flags occur on
                                                                                                                            different cores and a first failure is reported (setup violation)
Fig. 5. Slack penalty for different implementations of BCH circuit. Increase
of number of ISM leads to a slight timing degradation. The level of coverage
                                                                                                                            at 0.85V. Interestingly, the VMIN (minimal voltage sustaining
(number of critical path monitored on initial critical path targeted to be                                                  to maintain functionality at a given PLL clock) distribution for
monitored) remains close to 40%.                                                                                            all 18 cores, depends on the application execution of all cores
    The number of ISM inserted and it performance impacts is                                                                and their ageing experience. To optimize the choice of voltage
discussed in the Fig. 5. For a selection of the most 5% of the                                                              stack in multicore architecture, the strategy would be to
critical data path, the performance penalty is only 15ps, and                                                               monitor the first Flag of each core instead of using a
40% of the initial selection is covered by monitors. A classic                                                              conservative extra margin covering intra-core dispersion.
approach would consist to select the CP according to an                                                                         The second application focuses on monitoring the Flag
absolute delay window and not in a number of CP criteria.                                                                   number. The Flag number is the indicator for circuit speed and
Thus, the distribution of the CP and sub-CP histogram is                                                                    used for local variations along with aging aware voltage
important to analyze when using this approach. The violation                                                                adaptation. An important measure campaign is performed on


 Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany

 Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
 volume is published and copyrighted by its editors.
                                                                                                                                           24


300 dies using 3 time windows (TW1, TW2, TW3) and 5                                                 V. CONCLUSION
workloads (Workload A, B, C, D, E). This workloads are
                                                                              A flow of in-situ monitor is developed and applied to
patterns containing 0, 1,2,3,4 errors respectively. Fig. 7 shows
                                                                          different circuits. Two types of monitors are compared and
the Flag count when decreasing the voltage until the Autotest
                                                                          discussed: cell-based and flow-based approach. Performance
signal fail using TW1 for different workloads. When
                                                                          penalty and area overhead of ISM is slightly small. This
modifying the pattern, the activity is modified and it results a
                                                                          additional margin provided by Flag signal is more accurate
strong modification on CP ranking. A direct consequence is
                                                                          than the additional voltage stack margin to account for ageing
that VMIN_AUTOTEST (the supply voltage before Autotest signal
                                                                          degradation. The coverage path statistics, number of critical
fail) and VMIN_Flag (the supply voltage when the count is
                                                                          path monitored on desired critical path to be monitored, is
starting) vary strongly with the workload (VMIN_AUTOTEST ~
                                                                          around 40%. This approach is suitable for dynamic
0.8V and VMIN_Flag~ 0.84 for Workload A, VMIN_AUTOTEST~
                                                                          management of ageing because at long-term, the probability to
0.825V and VMIN_Flag ~ 0.825V for Workload D)
                                                                          activate one path from critical path selection is high. Some
                                                                          applications of adaptive regulation are illustrated, this scheme
                                                                          is promising for process compensation and temperature
                                                                          change.

                                                                                                     REFERENCES

                                                                          [1] V. Huard, “Adaptative wear out management with in-situ
                                                                          management” IRPS 2014
                                                                          [2] X.Wang ,”Path-RO: a novel on-chip critical path delay
                                                                          measurement under process variation IEEE ACM(2008)
                                                                          [3] S.Wang “Representative Critical Reliability Paths for low-cost
                                                                          and accurate on-chip aging evaluation” IEEE/ICCAD (2012)
                                                                          [4] Saliva.M ,”Digial circuits reliability with in-situ monitors in
                                                                          28nm fully depleted SOI “ IEEE/DATE (2015)
                                                                          [5] S. Das et al.,”A Self-Tuning DVS Processor Using Delay-Error
                                                                          Detection and Correction” IEEE J. Solid-State Circuits, Apr.
Fig. 7: Evolution of Flag number with VDD using time window 1 (TW1) for
different workloads: A, B, C, D ( 0, 1,2,3,4 errors injected).            (2006)
                                                                          [6] D. Blaauw et al.,”RazorII: In Situ Error Detection and Correction
                                                                          for PVT and SER Tolerance” IEEE J. Solid-State Circuits,Jan.
   In order to demonstrate the robustness of ISM, it is                   (2009)
important to test them under various conditions. For that                 [7] K.A.Bowman “Energy-efficient and Metastability-Immune
propose, various temperature change have been exercised on                Resilient Circuits for Dynamic Variation Tolerance” IEEE
BCH IP. Figure 8 shows the result of VMIN variation under                 Journal of Solid-State Circuits
30°C and 125°C for both Autotest and Flag signals. As                     [8] M. Wirnshofer “A Variation-Aware Adaptive Voltage Scaling
depicted, degradation by 250 mV of VMIN_AUTOTEST is observed              Technique Based on In-situ Delay monitoring” IEEE/DDECS
when 125°C is applied, confirming the ability of ISM to                   (2012)
capture local variations induced by temperature change.




Fig. 8: Evolution of Flag number with VDD using TW1 and workload B
under 30°C (magenta) and 125°C (blue) temperatures




 Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany

 Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This
 volume is published and copyrighted by its editors.