13 Approximating Standard Cell Delay Distributions by Reformulating the Most Probable Failure Point Dimitrios Rodopoulos⇤,† , Philippe Roussel‡ , Francky Catthoor†,‡ , Yannakis Sazeides^ , and Dimitrios Soudris⇤ ⇤ MicroLab-ECE-NTUA, Greece, † ESAT–KU Leuven, Belgium, ‡ imec, Belgium, ^ ⌅ Group-UCY, Cyprus Contact Email: drodo@microlab.ntua.gr Abstract—The delay distribution of a digital circuit path is II. G ENERAL F ORMULATION & P RIOR A RT crucial for the early reliability evaluation of a digital design. As transistors are shrunk to unprecedented dimensions, accurate The manifestation of variability in a circuit can be encap- yet fast estimation of such distributions remains a valid goal. sulated in a vector x (e.g. threshold voltage shift per involved Such distributions may not be provided or are delivered in a transistor). A performance metric (e.g. delay of cell) y can heavily abstracted fashion to designers, which reduces the insight be evaluated at each x point as y(x). A failure occurs when into design dependability. In view of the above observations, we the performance metric is larger than a specified target (Y ), propose a technique that approximates the probability density function of a path of digital circuits by exending a well-known namely y > Y . For a specific Y , the MPFP methodology computational kernel, namely the Most Probable Failure Point aims to find the failure, i.e. Pfail (Y ) = P (y > Y ). In order to (MPFP) technique. The output of this concept is the failure prob- connect variability with the failure specification, it is important ability of standard cells or paths thereof for various target delays. to isolate all the values of x that satisfy the failure criterion. If We reformulate MPFP and establish a concise methodology for we use F to represent the set of x values that lead to a failurem delay distribution approximation. We present simulations for an inverter and outline projections for more complex gates. then the failure probability is P (x 2 F ). The challenge posed is both isolating F and calculating P (x 2 F ) in a systematic I. I NTRODUCTION way. We illustrate this situation with an inverter, which has been profiled using Synopsys NanoTime [11] according to The inherent time-zero and time-dependent variability of Figure 1a. Simulation results have been fitted with MATLAB semiconductor structures creates challenges for the effi- for simplicity, as shown in Figure 1b, where a level set is cient/accurate modeling of integrated circuit reliability [1]. annotated for a specific Y (roughly 17.5 ps). For all the Statistical Static Timing Analysis (SSTA) has been prevalent simulations presented herein, we have used a publicly available in handling delay distributions of standard cells. This is split high performance 16 nm modelcard [12]. mainly between depth-first (or path based) [3], [4] and breadth- According to MPFP, the functionality criterion (which in our first (or block based) [2], [5], [6] techniques. case is y > Y ) is combined with a product of probabilities to Regardless of SSTA techniques, what is really interesting is approximate space F [8], [9], [10] and its probability mass, the derivation of the primitive delay distributions of standard according to Equation 1 has been used to approximate F . If we cells. Accurate derivation requires a large number of simu- cast the product of probabilities to the delay space (Figure 1c), lations or measurements, since rare delay events need to be it is clear that even if the functionality criterion is correct, the accounted for. In many cases, complete distributions are not probability mass of set F is not totally covered. To address even available and only an approximated view of standard the above inaccuracy, we propose Pfail (Y ) calculation in two cell delay variability is provided, as in the case of the stage- separate steps, described in Subsections III-A and III-B. There based on-chip variation (OCV) [7]. In view of the above, it exactly lies the novelty of the current paper, in replacing the is important to provide an accurate and efficient technique for traditional MPFP formulation with the 2 distribution. the approximation of a standard cell’s delay distribution. The current paper delivers the distribution of an inverter (N 1 ) Y delay, based on the distributions of the involved threshold Pfail = max P (| Vth,i | xi ) such that y(x) > Y voltages (Vth ). The numerical kernel used is an extension of i=0 the Most Probable Failure Point (MPFP), which has been used (1) for memory cells [8], [9], [10]. We reformulate it using the 2 III. A DAPTING THE MPFP – I NVERTER F OCUS distribution and use it iteratively to get the probability mass for various inverter target delays. Project and extensions are A. Minimum Identification provided for more complex standard cells and paths. We isolate the point xA which leads to minimum delay Section II presents general formulation and aspects of prior ymin . To achieve this, we solve the optimization problem art. Simulation results are analytically presented for the case min {y(x)} for x and also get the distance of xA , which of a simple inverter in Section III. In Sections IV and V, we notate as rA . For the case of the inverter, it is reasonable we outline the extensions of our the scheme to more complex to expect a unique solution to this optimization problem. gates and paths. Conclusions are summarized in Section VI. This statement has been verified with a series of Synopsys Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 14 formulation is guaranteed to contain all the failure points, log10 of Inverter Delay (s) 0.5 −9.5 (V) −10 provided that point xY is selected based on a greatest ascent, moving away from xA . This ensures that the “pass” region th,p 0 −10.5 only contains “pass” points, even though some are excluded ∆V −11 −0.5 −11.5 (pessimism). It is important to note that the above statement −0.5 −0.4 −0.3 −0.2 −0.1 0 ∆ Vth,n (V) 0.1 0.2 0.3 0.4 0.5 is correct regardless of the shape of y(x), as long as no other (a) Inverter delay as measured by NanoTime minima exist beyond the level set. The goal is to calculate the probability mass of random variable z2 , as defined in Equation 2, where N is the number log10 of Inverter Delay (s) 0.5 −9.5 −10 of involved transistors (2 in the case of the inverter) and i (V) is the standard deviation of the Vth per involved transistor. th,p 0 −10.5 ∆V −11 In the current paper, we assume that all transistors exhibit −0.5 −11.5 the same i . The non-centrality parameter of the utilized −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 ∆V th,n (V) distribution is calculated according to Equation 2 [13]. Apart (b) Fitted NanoTime measurements and bounding F with level set of 20 ps from the involved i , this parameter considers the distance between point xA and the origin of the axis of the x space, which is encapsulated as the translated mean Vth shift per log10 of Inverter Delay (s) 0.5 −9.5 Local Minimum (xA) −10 transistor (i.e. µi ). For the rest of current paper, we assume (V) that these mean values are constant. In case transistor aging th,p 0 −10.5 ∆V Point in Level Set of ~ 17.5 ps −11 is assumed, actual mean Vth shifts become non-zero [14] and −0.5 −11.5 distance to xA can be easily recalculated. −0.5 −0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 ∆ Vth,n (V) N X1 N X1 N Y1 x2i µ2i (c) F approximation by P | Vth,i | xi as used in Equation 1 z2 = 2 and = 2 (2) i=0 i=0 i i=0 i At this point, we note that the approximation of the mutli- log10 of Inverter Delay (s) 0.5 −9.5 Local Minimum (xA) variate distribution for the x vector can be improved by −10 utilizing distribution transformations, as advised in prior art (V) th,p 0 −10.5 [15]. In the current paper and for the sake of simplicity, we ∆V Point in Level Set of ~ 17.5 ps (xY) −11 assume no correlations between involved xi ’s, hence we solely −0.5 −0.5 −0.4 −0.3 −0.2 −0.1 0 ∆ Vth,n (V) 0.1 0.2 0.3 0.4 0.5 −11.5 rely on the 2 distribution. With the above approach, we end up with the Pfail results of Figure 2a. As expected, a higher (d) F approximation using a hypersphere around the local minimum of inverter delay; additional probability mass is included and the 2 distribution applies for Vth leads to higher Pfail for the same target Y . Finally it is clear that, in case f (x) has a single maximum (instead Fig. 1. Navigating the x space for accurate approximation of the inverter delay distribution. The 2 distribution is useful in bounding the failure region (F ). of minimum), we can alternatively start from its maximum and directly bound set F , instead of its complement. Choice between minimum/maximum depends on the shape of f (x). NanoTime [11] simulations (Figure 1a) and remains valid when we use a fitted expression for y(x) as in Figure 1b. C. Getting Delay Distribution Points The failure probability for a target Y satisfies Equation 3, B. Moving away from the Minimum where PDFy and CDFy are the probability and cumulative The procedure followed in this step is outlined in Figure 1d. density functions for y (inverter delay in our case). First, we isolate a direction away from xA along which the increase of delay is maximum. Having selected a target delay Z Y Y , we move along the above direction, until we reach xY , Pfail = P (y > Y ) = 1 CDFy (Y ) = 1 PDFy dy (3) where y(x) = Y at a distance equal to rY from the point 1 xA . At this point, the generalized non-central Chi distribution It is clear that by repeating the process of Subsection III-B can be used ( 2 ). This provides the probability mass of a for different values of Y we can isolate the cumulative hypersphere in the threshold voltage shift space, which is probability for various delay specifications of the target circuit. centered at xA and has a radius equal to rY . By comparing Based on the Pfail vs. Y relation derived in Figure 2a, we easily Figures 1b and 1d, it is clear that the use of the 2 distribution produce the respective CDFy data, as illustrated in Figure 2b. is rather pessimistic, since x points that are faster than the A simple differentiation yields the corresponding PDFy . This target Y are included in space F . However, this is preferable effectively constitutes the delay distribution of the inverter. than the formulation of Equation 1, which is highly optimistic, It is solely based on a NanoTime-compatible description of since it ignores a huge portion of probability mass, as we the inverter and uses values of the standard deviation for Vth can verify from Figure 1c. On the contrary, the 2 -based shifts (multiple values inspected). This being a non-analytical Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 15 0 10 0.1 Algorithm 1: Coordinate descent used in the current paper, 0.08 based on iteration limit and Vth step equal to s Pfail (p.u.) σ∆ V (V) −5 10 0.06 1 while (itNum 14, 000 NanoTime current technologies [14], our technique behaves with suffi- iterations). This is a crude estimation of the minimum value cient accuracy. The delay of the inverter is lower bounded and the value of the corresponding Vth shifts (i.e. estimation (Figure 1b). This means that delay distribution of the inverter of xA . Using Algorithm 1, we succeed in identifying the does not have a tail on the left. However, as increases, the minimum point in ⇠ 160 NanoTime iterations (Figure 3). right-hand tail extends uncontrollably. Given the multitude of transistors in the pull-up/-down branches of standard cells, multiple minima/maxima may exist IV. G ENERALIZING TO C OMPLEX G ATES for y(x) (i = 0, 1, ..., M 1). Given the symmetry of such cells, we may treat only one of these points (xAi ) with the Standard cells with more than one transistor in the pull-up/- technique of Subsection III-B. The cumulative probability down branches, require a systematic way of identifying the around xAi can be multiplied by M to provide the total delay minimum (of maximum). In the general case, one has probability mass of the “pass” event at delay Y . This is, no information about y(x), which is evaluated with NanoTime conceptually, the complement of the F set (“failure” region). for each iteration. We implement coordinate descent [16] By subtracting from one, we get Pfail , which is substituted according to Algorithm 1 for the case of a NAND gate. in Equation 3. Repeating this for different Y values (Subsec- Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 16 tion III-C) yields the delay distribution of the standard cell. 0 10 0.1 Clearly, in case y(x) has a finite number of maxima (instead 0.08 PDFy4 (p.u.) σ∆ V (V) −5 of minima), a dual approach can be maintained. This leads to 10 0.06 th bounding set F , instead of the latter’s complement. The choice −10 0.04 10 between the two courses of action can be resolved with a high- 1 2 3 4 0.02 level view of y(x), e.g. by crudely sweeping the x space. 10 10 y (ps) 10 10 The techniques of minimum/maximum identification (i.e. (a) Subsection III-A and Algorithm 1) and probability mass 0 calculation (i.e. Subsection III-B) need to be generalized for 10 0.1 an arbitrary number of transistors (N ) in the standard cell. 0.08 CDFy4 (p.u.) σ∆ V (V) −5 10 The two step technique of Subsections III-A and III-B should 0.06 th 0.04 account for plateaus in y(x) and non-global minima/maxima. −10 10 0.02 All these enhancements constitute points for future work. 1 10 2 10 10 3 10 4 y (ps) V. G ENERALIZING TO S TANDARD C ELL PATHS (b) Given a (sufficiently) accurate PDF approximation for the 0 delay distribution of a set of standard cells, it is quite easy 10 0.1 to provide the delay distribution for a path of standard cells. 0.08 Pfail,y4 (p.u.) σ∆ V (V) −5 10 0.06 Given that we target the sum of the delays of the involved th 0.04 standard cells, the respective distribution is produced with −10 10 0.02 convolution of the delay distributions [17]. In Figure 4 we 1 10 2 10 10 3 10 4 present the results for a chain of four inverters, each one being y (ps) identical to the one used in Section III. The chain of operations (c) is exactly inverted: we convolute PDFy the appropriate amount Fig. 4. Iterative convolution of PDFy from Figure 2c provides the delay PDF, CDF and, eventually, the Pfail for a chain of four inverters. of times (four) and produce Figure 4a, namely the delay PDF for the path of inverters (PDFy4 ). A simple integration yields the CDFy4 (Figure 4b) and subtraction from one provides the [3] Jess, J.A.G. et al., “Statistical timing for parametric yield prediction of failure probability of the simple four-inverter path for various digital integrated circuits,” IEEE TCAD, vol. 25, no. 11, pp. 2376–2392, target delays (Y ), as illustrated in Figure 4c. We note that Nov 2006. [4] M. Orshansky and K. Keutzer, “A general probabilistic framework for the resulting failure probabilities span a wider Y range in worst case timing analysis,” in Design Automation Conference, 2002. comparison to the single-inverter equivalent. Also, there is a Proceedings. 39th, 2002, pp. 556–561. general transposition of the nominal delay in comparison to [5] J.-J. Liou, K.-T. Cheng, S. Kundu, and A. Krstic, “Fast statistical timing analysis by probabilistic event propagation,” in Design Automation Figure 2a, given the connection of inverters in series. Conference, 2001. Proceedings, 2001, pp. 661–666. [6] A. Agarwal, D. Blaauw, V. Zolotov, and S. Vrudhula, “Computation and VI. C ONCLUSIONS refinement of statistical bounds on circuit delay,” in Design Automation In the current paper we disclose an iterative technique Conference, 2003. Proceedings, June 2003, pp. 348–353. [7] A. Dunsmoor and J. ao Geada, “Applications and use of stage-basd ocv,” used to approximate the delay distribution of a standard cell. EE Times, May 2012. Complete reduction to practice has been achieved for the [8] Khalil, DiaaEldin et al., “Sram dynamic stability estimation using mpfp case of a simple inverter and extensions are discussed for and its applications,” Microelectron. J., vol. 40, no. 11, pp. 1523–1530, Nov. 2009. [Online]. Available: http://dx.doi.org/10.1016/j.mejo.2009. more complex gates and paths of standard cells. The proposed 01.015 technique starts from the Most Probable Failure Point (MPFP) [9] Ganapathy, S. et al., “Informer: An integrated framework for early-stage concept, which has been traditionally used in prior art for memory robustness analysis,” in DATE, March 2014, pp. 1–4. [10] Rodopoulos, D. et al., “Sensitivity of sram cell most probable snm failure reliability modeling of memory cells. In the current paper, we point to time-dependent variability,” in IEEE SELSE Workshop, Austin– extendMPFP to improve probability mass coverage, using the Texas, 2015. 2 distribution and coordinate descend. [11] Synopsys, Inc., “Nanotime – transistor-level static timing analy- sis solution for custom designs,” https://www.synopsys.com/Tools/ ACKNOWLEDGMENTS Implementation/SignOff/Documents/nanotime ds.pdf, Tech. Rep., 2010. [12] “Predictive technology model (ptm),” http://ptm.asu.edu/. Partial support by European Commission project FP7-612069- [13] N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous Univariate HARPA. Prof. Julius Georgiou (UCY, CY) acknowledged for Distributions, Volume 2, May 1995. [14] Weckx, P. et al., “Implications of bti-induced time-dependent statistics EDA support. Dr. Pieter Weckx (KUL, BE) and Prof. Ramon on yield estimation of digital circuits,” IEEE TED, vol. 61, no. 3, pp. Canal (UPC, ES) acknowledged for inspiring discussions. 666–673, March 2014. [15] M. Miranda, P. Roussel, and L. Brusamarello, “Response characteriza- R EFERENCES tion of an electronic system under variability effects,” Aug. 24 2011, eP Patent App. EP20,100,189,434. [1] D. Rodopoulos, P. Weckx, M. Noltsis, F. Catthoor, and D. Soudris, [16] S. J. Wright, “Coordinate Descent Algorithms,” ArXiv e-prints, Feb. “Atomistic pseudo-transient bti simulation with inherent workload mem- 2015. ory,” IEEE TDMR, vol. 14, no. 2, pp. 704–714, June 2014. [17] C. M. Grinstead and J. L. Snell, Introduction to Probability, 2003. [2] Visweswariah, C. et al., “First-order incremental block-based statistical timing analysis,” IEEE TCAD, vol. 25, no. 10, pp. 2170–2180, Oct 2006. Workshop on Early Reliability Modeling for Aging and Variability in Silicon Systems – March 18th 2016 – Dresden, Germany Copyright © 2016 for the individual papers by the papers' authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.