Features of Hidden Fault Detection in Pipeline Digital Components of Safety-Related Systems Alex Drozd, Miroslav Drozd, Viktor Antonyuk Institute of Computer Systems, Odessa National Polytechnic University, ave Shevchenko 1, 65044 Odessa, Ukraine Drozd@ukr.net, miroslav_dr@mail.ru, melmoth@te.net.ua Abstract. Paper is devoted to a problem of the hidden faults, which are appro- priate for safety-related instrumentation and control systems aimed at ensuring the safety of high-risk objects. Such systems are designed for operation in two modes: normal and emergency. The problem consists in accumulation (during a normal mode) of the hidden faults impairing the functionality of the digital components and system in an emergency mode. A model of activated path that determines the input data for simulation of pipelined digital component is of- fered. Simulation is executed to assess observability of the circuit points and to detect the potentially hazardous points, which are carriers of considered faults. The method of identifying potentially hazardous points in circuits with the LUT-oriented architecture in FPGA projects of digital components is proposed. Keywords. safety-related instrumentation and control system, pipeline digital component, hidden faults, controllability, observability Key Terms. HighPerformanceComputing, ConcurrentComputation, Model, Method, Simulation 1 Introduction The high-risk objects presented in energy, on transport, in space and defense branches have become an essential part of human environment. These include the power grid and power plants, aircraft and ground systems of ensuring flights, various kinds of weapons. Development and exploitation of these objects is impossible with- out wide use of information technologies which act as a counterbalancing factor of the complexity growing quantitatively and qualitatively, power and danger of critical applications [1]. The safety-related instrumentation and control systems (I&CS) which are the de- velopment of computer systems with the diversification of an operation mode by its division into normal and emergency are designed for servicing of high-risk objects. Great demands for a complex of attributes which are regulated by the international standards are made of I&CS. Requirements for ensuring functional safety of I&CS based on the construction of fault-tolerant components are distinguished from the most important [2]. The technologies of design of the fault-tolerant digital devices including use of the correcting codes, majority structures, different types of element reservation and sys- tem reconfiguration and also the multi-version solutions for prevention of faults caused by the common reason are traditionally applied to the digital components (DC) [3]. However, the fault tolerance of I&CS and its components cannot be provided in a separation from a solution of a problem of the hidden faults which can accumulate during a long normal mode in DC circuits owing to their low checkability [4]. In practice the problem of hidden faults is solved by improving the checkability of DC using periodic checking [5]. It is performed in testing by imitation of emergency mode with shutdown of emergency protection. On-line testing is used in periodic checking at manual regulation of the input data with the approximation to the condi- tions of the emergency mode, keeping within the normal mode. This solution has often led to emergency consequences of unauthorized inclusion of imitation in emer- gency mode by the person or fault [6]. Manual regulation and shutdown of emergency protection preceded Chernobyl catastrophe. Thus, the problem of hidden faults is better known for emergencies that arise due to actions aimed at its solution. Faults remaining hidden have not led to any accidents. At the same time, the history of fight against them shows mistrust to fault tolerance of I&CS and its components. For verification of I&CS, its components and solutions used for their development and testing apply a number of methods and technologies, including [7-9]:  Expanded Functional Testing to study the behavior of I&CS on the occurrence of rare events, for example, multiple failure;  Event Tree Analysis (ETA) and Fault Tree Analysis (FTA), considering the se- quence of events and fault developing in ICS;  Failure Modes, Effects and Criticality Analysis (FMECA) of components on their criticality for safety of I&CS (it is aimed at determining the need for special condi- tions of design and operation);  Fault Insertion Testing (FIT) to evaluate the methods and means of testing and the consequences caused by the fault. These and other measures stipulated by international standards in the existing I&CS does not directly put and do not solve the problem of hidden faults. In practice, these measures do not solve the full problem of functional safety systems and facilities management. Proof of that are numerous accidents in recent years in power networks and power plants, train wreck and the crash, failed launches of spacecraft. The concept of checkability was formed in testing as testability for estimating the complexity of test generation and further testable design of the digital devices aimed at detecting faults in pauses of an operating mode. The assessment is carried out for points of the digital circuit by calculation of controllability, observability and checkability as their product [10]. In on-line testing observability coincides with checkability, and controllability is their upper bound. The checkability of DC becomes structurally functional showing the dependence not only on the structure of the circuit, but also the characteristics of the input data [11]. On its basis the methods for improving checkability in normal mode of I&CS by approach to the upper bound and its raising are developed [12]. In I&CS the checkability of DC is diversified, becoming different for normal and emergency mode. A model of dual-mode structural-functional checkability is offered in [13]. On its basis controllability and observability of a point of the circuit is diversified and the method of detection in the DC circuit of the potentially hazardous points (PHP) in which during a normal mode there can be a hidden fault reducing fault tolerance of the DC upon transition of I&CS to emergency mode is offered. It is proposed to identify PHP on their observability which is defined for normal and emergency mode by simulating the operation of simultaneous DC in the ranges of input data of these modes. The method allows estimating the circuit of the DC in probability of reducing the fault tolerance of the DC upon transition to emergency mode, whereas the percentage of possible hidden faults of certain type [14]. The offered paper is aimed at development of models and methods for determining the observability to identify PHP for pipeline DC used in I&CS. Section 2 discusses a model of activated path that determines the DC input data required to assess the observability of the circuit point. A method of pipeline DC simulation for an assessment of the observability of the circuit points at the input data processed in a mode of I&CS and detection of the PHP is proposed in section 3. A method of detecting internal PHP of the DC circuit with LUT-oriented architecture in FPGA projects is offered in section 4. 2 The Model of the Activated Path of Pipeline DC Circuit Typically, DC for I&CS are built under construction the pipeline. The sections of pipeline are simultaneous units that perform one or more arithmetic and logical operations with numbers represented in parallel codes. In each clock cycle the input words made up of operands of the operations, including the processing numbers and control bits come at the inputs of pipeline DC. For stuck-at faults, belonging of a circuit point to a set of PHP is completely determined by the values of its observability in normal and emergency mode, using the formulas [14], or as it is shown in Table 1. Table 1. Conditions of circuit point belonging to a set of PHP Mode Emergency Normal OE = 1 OE = 2 OE = 3 ON = 2 ‘1’ – ‘1’ ON = 1 – ‘0’ ‘0’ ON = 0 ‘1’ ‘0’ ‘1’, ‘0’ Rows and columns of Table 1 contain the values of observability ON and OE for normal and emergency modes of I&CS, respectively. At their intersection the types of stuck-at faults: ‘0’ or ‘1’, which include the point of the circuit to the set of PHP are shown. Observability ON or OE of a point takes on the values 0, 1, 2 or 3 if this point is not observable, observed in ‘1’ value, ‘0’ value or both values ‘0’ and ‘1’, respec- tively. The point is observable or not observable in the presence or absence of the path ac- tivated from this point to a check point of the circuit. The path is activated if an erro- neous value accepted by a signal in the point passes this path at the input data of the considered I&CS mode. Table 1 shows a necessary condition for the point belonging to set of PHP. Within this condition the PHP are identified on condition ON + OE = 3 according to which the quantity NPHP of PHP in the circuit of the DC can be estimated. Two types of stuck-at faults can arise in PHP when performing a condition (ON = 0)  (OE = 3). Their quantity NDHF allows estimating probability of decrease in fault tolerance of the DC upon transition to emergency mode [14] by the formula PFTR = (NPHP + NDHF) / (2NDCC), (1) where NDCC is the total number of points of the DC circuit. To identify PHP of the circuit it is necessary to properly assess their observability ON and OE, whereas the DC input data, typical for normal and emergency mode of I&CS. We propose the following model of the activated path AP of DC circuit: AP(U1, …, UI, …, UK), where UI, ..., UK are descriptions of data processing units on the pipeline sections that make up the path AP; K is the number of units UI, I = 1, K . Each unit UI is represented by a model UI (FI, ZI 1, …, ZI J, …, Z I M I ), where FI is the operation performed by the unit UI; Z I 1,…, Z I M I – descriptions of inputs of the unit UI; MI is the number of inputs of the unit UI. Each input ZI J is characterized by the distance DI J between own and current word and also by values calculated at the input words of the DC circuit. The distance is measured in clock cycles according to the formula DI J = | YI – YI J |, (2) where YI is the number of the current word input to the unit UI by input ZI belonging to the way AP; YI J is the number of own word at the input ZI J. Input ZI is one of the inputs of ZI J for which DI J = 0. In case (I, J), DI J = 0 unit UI is simulated at the inputs ZI J, each of which receives the value calculated for the current word. Model of activated path AP is simplified so that the evaluation of the observability can accumulate with consecutive simulation of path AP at separate words of I&CS mode. In case (I, J), DI J > 0 the unit UI is simulated at the inputs ZI J, which takes the value calculated for the current word, and the word input to the DC input on DI J clock cycles sooner or later. Generally this word can be any input word of the considered mode. Therefore, the unit UI should be simulated for each of the input words on the values of input ZI J which are accepted by it on all words of the considered mode. For inertial processes of change of the input data processed in DC, the model of path AP can be refined taking into account the maximum possible step Δ of changing the input words or the operands making them. Let Δ ≥ 1. Then the unit UI should be simulated for each input word with number G on the values of input ZI J, which are accepted in the range of words with numbers G  Δ · DI J,, where G > Δ · DI J. For G ≤ Δ · DI J the initial set value of input DI J are used. In case of Δ < 1, when the value of the input word is changed no more than once per 1 / Δ clock cycles, the number range is rounded to the value of G  ] Δ · DI J [. Building a model of the path AP is performed by analyzing the structure of the pipeline taking into account quantity of the sections preceding the data processing unit UI on its inputs ZI J. The quantity of these sections determines amount of clock cycles required for data delivery from the inputs of the DC circuit to the unit UI. Let HI J is the quantity of sections (clock cycles) preceding the input ZI J, and HMAX = MAX(HI J). Then, the equality HI J + YI J =HMAX + 1 is carried out for unit UI in clock cycle HMAX. This determines number of own word YI J = HMAX + 1 – HI J, (3) including current word YI for input ZI. Substituting (3) in (2) determines the values of the distance DI J = | HI – HI J | in the model of path AP by the structure of the pipeline DC. 3 Simulation of the Circuit of a Pipeline DC The observability of the circuit point is calculated in the course of pipeline DC simulation at the input data, which are determined taking into account the model of path AP. Simulation of DC is performed according to the following method. Examination of all input words of the considered mode and examination of all points of the circuit on the pipeline course will be organized. The value of the examined point is calculated at a given input word and is complemented by an inverse value. Values of all following points of the circuit are calculated for two values of the examined point before reaching a check point or a point where results of calculations coincide. If the results in a check point are inverted, all the points belonging to the path, refer to the 0- observable or 1-observable depending on the values accepted by them. If at the previous input words the point was identified as observed with opposite value, this point refers to observable and is not considered at the following input words. Point values with DI J > 0 are calculated on an extended set of input words. The simulation is carried out taking into account all their values. It should be noted that incomplete simulation of the DC circuit in case of a restriction of the input data typical for considered mode leads to an underestimation of the observability of points. This underestimation admitted for normal and emergency mode conducts to false detection of PHP and their skipping, respectively. Considering that controllability of the circuit points is estimated directly by their values, i.e. it is much simpler than observability, and taking into account that controllability is the upper bound of observability, the following method of PHP identification is offered:  Comparison in sizes | RN | and | RE | of ranges RN and RE for the input data of normal and emergency modes.  In case | RN | > | RE | simulation of the DC is running at the input words of emergency mode for determining the sets EO–0, EO–1, EO–2 and EO–3 of points with observability of OE: 0, 1, 2 and 3. The points of the set EO–0 are excluded from consideration as they cannot be PHP. For the other points, the simulation of DC is running at the input words of normal mode for determining the sets NO–0, NO–1, NO–2, NO–3 of points with observability of ON: 0, 1, 2, 3 and identification of PHP, according to Table 1.  In case | RN | ≤ | RE | simulation of the DC is running at the input words of normal mode for determining the sets NO–0, NO–1, NO–2 and NO–3 of points with observability of ON: 0, 1, 2 and 3. The points of the set NO–3 are excluded from consideration. The simulation of the DC is running at the input words of emergency mode to determine the sets EC–1 and EC–2 of points with controllability of CE: 1 and 2. The points of the sets NO–1  EC–1 and NO–2  EC–2 are excluded from consideration. For the other points, simulation of the DC is running at the input words of emergency mode for determining the sets EO–0, EO–1, EO–2, EO–3 of points with observability of OE: 0, 1, 2, 3 and identification of PHP, according to Table 1. 4 Identification of Internal PHP in Circuits with the LUT- Oriented Architecture Modern I&CS are designed with the use of pipeline DC constructed on FPGA with the LUT-oriented architecture. The feature of the circuits of such DC consists in a table specifying logical functions in memory of LUT (Look-Up Table). The result of function is read out (with use of the multiplexer) from memory of LUT with the address which code is formed from arguments of function [15]. The bits of the LUT memory are considered as internal points of the DC circuit. Stuck-at faults of an internal point can be caused by defect of memory bit or defect of the multiplexer. The internal point of the circuit is controllable if the appropriate bit is selected at the input data of the considered mode, and is uncontrollable otherwise. The internal point is observable if the point of the LUT output is observable at a choice of the appropriate bit of memory. The set of all controllable internal points of the circuit can be calculated for each mode by simulation of DC at all input words of this mode including additional words for points with DI J > 0. All internal points addressed in memory of LUT belong to the controllable. Internal PHP of the circuit can be identified according to the following method: For each LUT two sets NC and EC of internal points which are controllable respectively in normal and emergency mode are determined. The set of CEN = EC \ NC of the internal points addressed in emergency mode and not used in the normal one is calculated. Internal points of a set of CEN are checked for observability in emergency mode. If they are observable, refer to the set of PHP. For example, the task of identifying internal PHP in the circuit of DC that computes the function F(X) = 1 for X mod 3 = 0 and F(X) = 0 for other values of X, where X = {x5, x4, x3, x2, x1}, X = 0  31. The circuit is evaluated for stuck-at faults for the given ranges RN and RE of input data. The solution is considered for the three DC functioning in I&CS in different conditions: a) RN = 0  23 and RE = 24  31; b) RN = 0  15 and RE = 16  31; c) RN = 8  23 and RE = 0  7, 24  31. A description of the function F and ranges RN and RE are shown in Table 2. Table 2. Description of the function F(x5, x4, x3, x2, x1) and ranges RN and RE X for x5 Variables F for x5 Ranges RN and RE 1 0 x4 x3 x2 x1 1 0 a b c 16 0 0 0 0 0 1 0 0 16 0 16 0 16 17 1 0 0 0 1 0 0 18 2 0 0 1 0 0 1 19 3 0 0 1 1 1 0 20 4 0 1 0 0 0 0 RN RE RN 21 5 0 1 0 1 0 1 22 6 0 1 1 0 1 0 23 7 0 1 1 1 0 0 23 7 23 24 8 1 0 0 0 0 1 RN 24 RN RE 8 24 25 9 1 0 0 1 1 0 26 10 1 0 1 0 0 0 27 11 1 0 1 1 0 1 RE RN RE 28 12 1 1 0 0 1 0 29 13 1 1 0 1 0 0 30 14 1 1 1 0 0 1 31 15 1 1 1 1 1 0 15 31 15 31 15 31 The number of input words and the function values are shown in pairs of columns separately for values x5 = 0 and x5 = 1. The ranges RE of emergency mode selected dark color. The circuit of DC designed on FPGA ALTERA Quartus II [16] is shown in Fig. 1. x5 x4 x3 x2 x1 LUT L1 1 LUT F L2 LUT 3 2 Fig. 1. The circuit of DC The circuit consists of three LUT: LUT 1, LUT 2 and LUT 3, which implement the functions L1(x3, x5, x1, x4), L2(x3, x5, x1, x4) and F(L2, x2, L1) described by codes 6BBD16, 97E916 and 0AA016, respectively. Values of the variables x3, x5, x1, x4, arriving at the inputs of the LUT 1 and LUT 2 in emergency mode, and also calculated values of the functions L1, L2, F and FL1, FL2 are shown in Table 3. Table 3. Description of input variables and functions for LUT 1 and LUT 2 Variables F for x2 FL1 for x2 FL2 for x2 X for x2 № L1 L2 x3 x5 x1 x4 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 2 2 0 0 1 0 1 0 0 1 0 0 1 0 1 3 4 0 1 0 0 1 0 0 1 0 0 0 0 16 18 5 0 1 0 1 0 0 1 0 0 0 0 1 24 26 6 0 1 1 0 0 1 0 0 1 0 0 0 17 19 7 0 1 1 1 1 0 1 0 0 0 0 0 25 27 8 1 0 0 0 1 0 0 1 0 0 1 0 4 6 10 1 0 1 0 0 1 0 0 1 0 0 0 5 7 12 1 1 0 0 0 1 0 0 1 0 0 0 20 22 13 1 1 0 1 1 0 0 1 0 0 1 0 28 30 14 1 1 1 0 1 1 1 0 0 0 0 1 21 23 15 1 1 1 1 0 1 0 0 1 0 0 0 29 31 The functions FL1 and FL2 take the values of the function F for inverse value according to L1 and L2 owing to the distorted value of internal point in the LUT memory. Besides, the first column contains the numbers of sets of the input variables equal to their decimal equivalent. They are also numbers of internal points disposed in the LUT memory. The values of X are specified in two last columns for two values of variable x2 = 0 and x2 = 1. Values of functions FL1, FL2 selected in Table 3 by dark color are inversed to values of function F that defines the internal points of LUT corresponding to them as observable. According to Table 3 internal points which are observable in emergency mode for the considered ranges of RN and RE compose the following sets: a) 5, 7, 13, 15 to LUT 1 and 5, 7, 13 to LUT 2; b) 4 – 7, 12 – 15 to LUT 1 and 4, 5, 7, 13, 14 to LUT 2; c) 0, 2, 5, 7, 8, 10, 13, 15 to LUT 1 and 0, 2, 5, 7, 8, 13 to LUT 2. Listed internal points are not used in the normal mode, i.e. are unobservable. Therefore they belong to the set of PHP. In LUT 3 the same 6 internal points 1, 3 – 7 in all modes are used that excludes them from a set of PHP as CEN =  and the output of LUT 3 is an observable point. The quantity of PHP in DC for the considered three cases makes respectively 7, 13 and 14 at total of internal points NDCC = 40. The probability of decrease in fault tolerance of the DC upon transition to emergency mode calculated by the formula (1) accepts values of 17.5%, 32.5% and 35% for cases of a, b and c, respectively. 5 Conclusions Two modes of operation characteristic for I&CS generate the problem of hidden faults which can accumulate in normal mode and reduce fault tolerance of the DC in the most responsible emergency mode. In single-mode systems such problem isn't present as a hidden fault is never shown and if a fault was shown, it isn't hidden. The success in the solution of a task of PHP identification where the problem of the hidden faults occurs is determined by opportunities of an assessment in observability of circuit points in each mode of I&CS. The correctness of this assessment is provided with completeness of input data considered for each mode of I&CS. The underestimated assessment leads to false detection of PHP or their skipping. The offered model of the activated path specifies a set of input data for determination of observability of points in circuits of pipeline DC. The method of simulation of pipeline DC and method of PHP identification follows from this model. The design of modern I&CS on FPGA defines an additional task of identifying in- ternal PHP typical for circuits with LUT-oriented architecture. The offered method of solving this task analyzes sets of internal points used in normal and emergency mode. It should be noted that these sets determine the effectiveness of manual regulation of the input data in the normal mode. This procedure is used in practice for solving the problem of the hidden faults. Procedure reduces the effectiveness with different sets and becomes completely meaningless in case of disjoint sets. References 1. Bakhmach, E., Kharchenko, V., Siora, A., Sklyar, V., Tokarev, V.: Design and Qualification of I&C Systems on the Basis of FPGA Technologies. In: 7th International Topical Meeting on Nuclear Plant Instrumentation, Control, and Human-Machine Interface Technologies (NPIC&HMIT 2010), pp. 916–924. Las Vegas, Nevada (2010) 2. IEC 61508-1:2010. Functional Safety of Electrical / Electronic / Programmable Electronic Safety Related Systems – Part 1: General requirements. Geneva: International Electrotech- nical Commission (2010) 3. Sklyar, V.V., Kharchenko, V.S.: Fault-Tolerant Computer-Aided Control Systems with Multiversion-Threshold Adaptation: Adaptation Methods, Reliability Estimation, and Choice of an Architecture. Automation and Remote Control 63(6), 991–1003 (2002) 4. Drozd, M., Drozd, A.: Safety-Related Instrumentation and Control Systems and a Problem of the Hidden Faults. In: 10th International Conference on Digital Technologies, pp. 137– 140. Zhilina, Slovak Republic (2014) 5. Kharchenko, V.S., Sklyar, V.V. (eds): FPGA-based NPP I&C Systems: Development and Safety Assessment: RPC Radiy, National Aerospace University “KhAI”, SSTC on Nuclear and Radiation Safety, Kharkiv, Ukraine (2008) 6. Gillis, D. The Apocalypses that Might Have Been, http://www.popmech.ru/go.php? url=http%3A%2F%2Fwww.damninteresting.com%2F%3Fp%3D913 7. Garcia, P.A., Schirru, R., Frutuoso P.F., Melo, E.: A Fuzzy Data Envelopment Analysis Approach for FMECA. Progress in Nuclear Energy 46, 359–373 (2005) 8. Andrashov, A., Kharchenko, V., Sklyar, V., Reva, L., Dovgopolyi, V., Golovir, V.: Verifi- cation of FPGA Electronic Designs for Nuclear Reactor Trip Systems: Test- and Invariant- Based Methods. In: IEEE East-West Design & Test Symposium (EWDTS’10), pp. 92–97. St. Petersburg, Russia (2010) 9. Kharchenko, V.S. (ed): Safety of Critical Infrastructures: Mathematical and Engineering Methods of Analysis and Ensuring: Ministry of Education and Science, National Aerospace University “KhAI”, Kharkiv, Ukraine (2013) 10.IEEE 1149-1:2001, IEEE Standard Test Access Port and Boundary-Scan Architecture. IEEE Computer Society (2001) 11.Drozd, A., Kharchenko, V., Antoshchuk, S., Sulima, J., Drozd, M.: Checkability of the Digital Components in Safety-Critical Systems: Problems and Solutions. In: IEEE East-West De- sign & Test Symposium, pp. 411–416. Sevastopol, Ukraine (2011) 12.Drozd, A., Kharchenko, V., Antoshchuk, S., Drozd, J., Lobachev, M., Sulima, J.: The Use of Natural Resources for Increasing a Checkability of the Digital Components in Safety- Critical Systems. In: IEEE East-West Design & Test Symposium, pp. 327–332. Kharkiv, Ukraine (2012) 13.Drozd, A., Kharchenko, V., Antoshchuk, S., Drozd, M.: Checkability of Safety-Critical I&C System Components in Normal and Emergency Modes. Journal of Information, Control and Management Systems 10(1), 33–40 (2012) 14.Drozd, A., Drozd, M.: A New Approach to Solving a Problem of the Hidden Faults in Safe- ty-Related Systems Journal of Information, Control and Management Systems 12(2), 125– 132 (2014) 15.Cyclone FPGA Family Data Sheet. Altera Corporation (2003), http://www.altera.com 16.Netlist Optimizations and Physical Synthesis. Qii52007-2.0. Quartus II Handbook. Vol. 2. Altera Corporation (2004)