=Paper=
{{Paper
|id=None
|storemode=property
|title=Features of Hidden Fault Detection in Pipeline Digital Components of
Safety-Related Systems
|pdfUrl=https://ceur-ws.org/Vol-1356/paper_70.pdf
|volume=Vol-1356
|dblpUrl=https://dblp.org/rec/conf/icteri/DrozdDA15
}}
==Features of Hidden Fault Detection in Pipeline Digital Components of
Safety-Related Systems==
Features of Hidden Fault Detection in Pipeline Digital
Components of Safety-Related Systems
Alex Drozd, Miroslav Drozd, Viktor Antonyuk
Institute of Computer Systems, Odessa National Polytechnic University,
ave Shevchenko 1, 65044 Odessa, Ukraine
Drozd@ukr.net, miroslav_dr@mail.ru, melmoth@te.net.ua
Abstract. Paper is devoted to a problem of the hidden faults, which are appro-
priate for safety-related instrumentation and control systems aimed at ensuring
the safety of high-risk objects. Such systems are designed for operation in two
modes: normal and emergency. The problem consists in accumulation (during a
normal mode) of the hidden faults impairing the functionality of the digital
components and system in an emergency mode. A model of activated path that
determines the input data for simulation of pipelined digital component is of-
fered. Simulation is executed to assess observability of the circuit points and to
detect the potentially hazardous points, which are carriers of considered faults.
The method of identifying potentially hazardous points in circuits with the
LUT-oriented architecture in FPGA projects of digital components is proposed.
Keywords. safety-related instrumentation and control system, pipeline digital
component, hidden faults, controllability, observability
Key Terms. HighPerformanceComputing, ConcurrentComputation, Model,
Method, Simulation
1 Introduction
The high-risk objects presented in energy, on transport, in space and defense
branches have become an essential part of human environment. These include the
power grid and power plants, aircraft and ground systems of ensuring flights, various
kinds of weapons. Development and exploitation of these objects is impossible with-
out wide use of information technologies which act as a counterbalancing factor of the
complexity growing quantitatively and qualitatively, power and danger of critical
applications [1].
The safety-related instrumentation and control systems (I&CS) which are the de-
velopment of computer systems with the diversification of an operation mode by its
division into normal and emergency are designed for servicing of high-risk objects.
Great demands for a complex of attributes which are regulated by the international
standards are made of I&CS. Requirements for ensuring functional safety of I&CS
based on the construction of fault-tolerant components are distinguished from the most
important [2].
The technologies of design of the fault-tolerant digital devices including use of the
correcting codes, majority structures, different types of element reservation and sys-
tem reconfiguration and also the multi-version solutions for prevention of faults
caused by the common reason are traditionally applied to the digital components (DC)
[3].
However, the fault tolerance of I&CS and its components cannot be provided in a
separation from a solution of a problem of the hidden faults which can accumulate
during a long normal mode in DC circuits owing to their low checkability [4].
In practice the problem of hidden faults is solved by improving the checkability of
DC using periodic checking [5]. It is performed in testing by imitation of emergency
mode with shutdown of emergency protection. On-line testing is used in periodic
checking at manual regulation of the input data with the approximation to the condi-
tions of the emergency mode, keeping within the normal mode. This solution has
often led to emergency consequences of unauthorized inclusion of imitation in emer-
gency mode by the person or fault [6]. Manual regulation and shutdown of emergency
protection preceded Chernobyl catastrophe.
Thus, the problem of hidden faults is better known for emergencies that arise due
to actions aimed at its solution. Faults remaining hidden have not led to any accidents.
At the same time, the history of fight against them shows mistrust to fault tolerance of
I&CS and its components.
For verification of I&CS, its components and solutions used for their development
and testing apply a number of methods and technologies, including [7-9]:
Expanded Functional Testing to study the behavior of I&CS on the occurrence of
rare events, for example, multiple failure;
Event Tree Analysis (ETA) and Fault Tree Analysis (FTA), considering the se-
quence of events and fault developing in ICS;
Failure Modes, Effects and Criticality Analysis (FMECA) of components on their
criticality for safety of I&CS (it is aimed at determining the need for special condi-
tions of design and operation);
Fault Insertion Testing (FIT) to evaluate the methods and means of testing and the
consequences caused by the fault.
These and other measures stipulated by international standards in the existing
I&CS does not directly put and do not solve the problem of hidden faults. In practice,
these measures do not solve the full problem of functional safety systems and
facilities management. Proof of that are numerous accidents in recent years in power
networks and power plants, train wreck and the crash, failed launches of spacecraft.
The concept of checkability was formed in testing as testability for estimating the
complexity of test generation and further testable design of the digital devices aimed
at detecting faults in pauses of an operating mode. The assessment is carried out for
points of the digital circuit by calculation of controllability, observability and
checkability as their product [10]. In on-line testing observability coincides with
checkability, and controllability is their upper bound. The checkability of DC
becomes structurally functional showing the dependence not only on the structure of
the circuit, but also the characteristics of the input data [11]. On its basis the methods
for improving checkability in normal mode of I&CS by approach to the upper bound
and its raising are developed [12].
In I&CS the checkability of DC is diversified, becoming different for normal and
emergency mode. A model of dual-mode structural-functional checkability is offered
in [13]. On its basis controllability and observability of a point of the circuit is
diversified and the method of detection in the DC circuit of the potentially hazardous
points (PHP) in which during a normal mode there can be a hidden fault reducing
fault tolerance of the DC upon transition of I&CS to emergency mode is offered. It is
proposed to identify PHP on their observability which is defined for normal and
emergency mode by simulating the operation of simultaneous DC in the ranges of
input data of these modes. The method allows estimating the circuit of the DC in
probability of reducing the fault tolerance of the DC upon transition to emergency
mode, whereas the percentage of possible hidden faults of certain type [14].
The offered paper is aimed at development of models and methods for determining
the observability to identify PHP for pipeline DC used in I&CS. Section 2 discusses a
model of activated path that determines the DC input data required to assess the
observability of the circuit point. A method of pipeline DC simulation for an
assessment of the observability of the circuit points at the input data processed in a
mode of I&CS and detection of the PHP is proposed in section 3. A method of
detecting internal PHP of the DC circuit with LUT-oriented architecture in FPGA
projects is offered in section 4.
2 The Model of the Activated Path of Pipeline DC Circuit
Typically, DC for I&CS are built under construction the pipeline. The sections of
pipeline are simultaneous units that perform one or more arithmetic and logical
operations with numbers represented in parallel codes. In each clock cycle the input
words made up of operands of the operations, including the processing numbers and
control bits come at the inputs of pipeline DC.
For stuck-at faults, belonging of a circuit point to a set of PHP is completely
determined by the values of its observability in normal and emergency mode, using
the formulas [14], or as it is shown in Table 1.
Table 1. Conditions of circuit point belonging to a set of PHP
Mode Emergency
Normal OE = 1 OE = 2 OE = 3
ON = 2 ‘1’ – ‘1’
ON = 1 – ‘0’ ‘0’
ON = 0 ‘1’ ‘0’ ‘1’, ‘0’
Rows and columns of Table 1 contain the values of observability ON and OE for
normal and emergency modes of I&CS, respectively. At their intersection the types of
stuck-at faults: ‘0’ or ‘1’, which include the point of the circuit to the set of PHP are
shown. Observability ON or OE of a point takes on the values 0, 1, 2 or 3 if this point
is not observable, observed in ‘1’ value, ‘0’ value or both values ‘0’ and ‘1’, respec-
tively.
The point is observable or not observable in the presence or absence of the path ac-
tivated from this point to a check point of the circuit. The path is activated if an erro-
neous value accepted by a signal in the point passes this path at the input data of the
considered I&CS mode.
Table 1 shows a necessary condition for the point belonging to set of PHP. Within
this condition the PHP are identified on condition ON + OE = 3 according to which the
quantity NPHP of PHP in the circuit of the DC can be estimated. Two types of stuck-at
faults can arise in PHP when performing a condition (ON = 0) (OE = 3). Their
quantity NDHF allows estimating probability of decrease in fault tolerance of the DC
upon transition to emergency mode [14] by the formula
PFTR = (NPHP + NDHF) / (2NDCC), (1)
where NDCC is the total number of points of the DC circuit.
To identify PHP of the circuit it is necessary to properly assess their observability
ON and OE, whereas the DC input data, typical for normal and emergency mode of
I&CS.
We propose the following model of the activated path AP of DC circuit: AP(U1, …,
UI, …, UK), where UI, ..., UK are descriptions of data processing units on the pipeline
sections that make up the path AP; K is the number of units UI, I = 1, K . Each unit UI
is represented by a model UI (FI, ZI 1, …, ZI J, …, Z I M I ), where FI is the operation
performed by the unit UI; Z I 1,…, Z I M I – descriptions of inputs of the unit UI; MI is
the number of inputs of the unit UI. Each input ZI J is characterized by the distance DI J
between own and current word and also by values calculated at the input words of the
DC circuit. The distance is measured in clock cycles according to the formula
DI J = | YI – YI J |, (2)
where YI is the number of the current word input to the unit UI by input ZI belonging
to the way AP; YI J is the number of own word at the input ZI J. Input ZI is one of the
inputs of ZI J for which DI J = 0.
In case (I, J), DI J = 0 unit UI is simulated at the inputs ZI J, each of which
receives the value calculated for the current word. Model of activated path AP is
simplified so that the evaluation of the observability can accumulate with consecutive
simulation of path AP at separate words of I&CS mode.
In case (I, J), DI J > 0 the unit UI is simulated at the inputs ZI J, which takes the
value calculated for the current word, and the word input to the DC input on DI J
clock cycles sooner or later. Generally this word can be any input word of the
considered mode. Therefore, the unit UI should be simulated for each of the input
words on the values of input ZI J which are accepted by it on all words of the
considered mode.
For inertial processes of change of the input data processed in DC, the model of
path AP can be refined taking into account the maximum possible step Δ of changing
the input words or the operands making them. Let Δ ≥ 1. Then the unit UI should be
simulated for each input word with number G on the values of input ZI J, which are
accepted in the range of words with numbers G Δ · DI J,, where G > Δ · DI J. For
G ≤ Δ · DI J the initial set value of input DI J are used. In case of Δ < 1, when the value
of the input word is changed no more than once per 1 / Δ clock cycles, the number
range is rounded to the value of G ] Δ · DI J [.
Building a model of the path AP is performed by analyzing the structure of the
pipeline taking into account quantity of the sections preceding the data processing
unit UI on its inputs ZI J. The quantity of these sections determines amount of clock
cycles required for data delivery from the inputs of the DC circuit to the unit UI. Let
HI J is the quantity of sections (clock cycles) preceding the input ZI J, and
HMAX = MAX(HI J). Then, the equality HI J + YI J =HMAX + 1 is carried out for unit UI
in clock cycle HMAX. This determines number of own word
YI J = HMAX + 1 – HI J, (3)
including current word YI for input ZI.
Substituting (3) in (2) determines the values of the distance DI J = | HI – HI J | in the
model of path AP by the structure of the pipeline DC.
3 Simulation of the Circuit of a Pipeline DC
The observability of the circuit point is calculated in the course of pipeline DC
simulation at the input data, which are determined taking into account the model of
path AP.
Simulation of DC is performed according to the following method. Examination of
all input words of the considered mode and examination of all points of the circuit on
the pipeline course will be organized. The value of the examined point is calculated at
a given input word and is complemented by an inverse value. Values of all following
points of the circuit are calculated for two values of the examined point before
reaching a check point or a point where results of calculations coincide. If the results
in a check point are inverted, all the points belonging to the path, refer to the 0-
observable or 1-observable depending on the values accepted by them. If at the
previous input words the point was identified as observed with opposite value, this
point refers to observable and is not considered at the following input words. Point
values with DI J > 0 are calculated on an extended set of input words. The simulation
is carried out taking into account all their values.
It should be noted that incomplete simulation of the DC circuit in case of a
restriction of the input data typical for considered mode leads to an underestimation of
the observability of points. This underestimation admitted for normal and emergency
mode conducts to false detection of PHP and their skipping, respectively.
Considering that controllability of the circuit points is estimated directly by their
values, i.e. it is much simpler than observability, and taking into account that
controllability is the upper bound of observability, the following method of PHP
identification is offered:
Comparison in sizes | RN | and | RE | of ranges RN and RE for the input data of
normal and emergency modes.
In case | RN | > | RE | simulation of the DC is running at the input words of
emergency mode for determining the sets EO–0, EO–1, EO–2 and EO–3 of points with
observability of OE: 0, 1, 2 and 3. The points of the set EO–0 are excluded from
consideration as they cannot be PHP. For the other points, the simulation of DC is
running at the input words of normal mode for determining the sets NO–0, NO–1,
NO–2, NO–3 of points with observability of ON: 0, 1, 2, 3 and identification of PHP,
according to Table 1.
In case | RN | ≤ | RE | simulation of the DC is running at the input words of normal
mode for determining the sets NO–0, NO–1, NO–2 and NO–3 of points with observability
of ON: 0, 1, 2 and 3. The points of the set NO–3 are excluded from consideration.
The simulation of the DC is running at the input words of emergency mode to
determine the sets EC–1 and EC–2 of points with controllability of CE: 1 and 2. The
points of the sets NO–1 EC–1 and NO–2 EC–2 are excluded from consideration.
For the other points, simulation of the DC is running at the input words of
emergency mode for determining the sets EO–0, EO–1, EO–2, EO–3 of points with
observability of OE: 0, 1, 2, 3 and identification of PHP, according to Table 1.
4 Identification of Internal PHP in Circuits with the LUT-
Oriented Architecture
Modern I&CS are designed with the use of pipeline DC constructed on FPGA with
the LUT-oriented architecture. The feature of the circuits of such DC consists in a
table specifying logical functions in memory of LUT (Look-Up Table). The result of
function is read out (with use of the multiplexer) from memory of LUT with the
address which code is formed from arguments of function [15].
The bits of the LUT memory are considered as internal points of the DC circuit.
Stuck-at faults of an internal point can be caused by defect of memory bit or defect of
the multiplexer. The internal point of the circuit is controllable if the appropriate bit is
selected at the input data of the considered mode, and is uncontrollable otherwise. The
internal point is observable if the point of the LUT output is observable at a choice of
the appropriate bit of memory.
The set of all controllable internal points of the circuit can be calculated for each
mode by simulation of DC at all input words of this mode including additional words
for points with DI J > 0. All internal points addressed in memory of LUT belong to the
controllable.
Internal PHP of the circuit can be identified according to the following method:
For each LUT two sets NC and EC of internal points which are controllable
respectively in normal and emergency mode are determined. The set of CEN = EC \ NC
of the internal points addressed in emergency mode and not used in the normal one is
calculated. Internal points of a set of CEN are checked for observability in emergency
mode. If they are observable, refer to the set of PHP.
For example, the task of identifying internal PHP in the circuit of DC that
computes the function F(X) = 1 for X mod 3 = 0 and F(X) = 0 for other values of X,
where X = {x5, x4, x3, x2, x1}, X = 0 31. The circuit is evaluated for stuck-at faults for
the given ranges RN and RE of input data.
The solution is considered for the three DC functioning in I&CS in different
conditions:
a) RN = 0 23 and RE = 24 31;
b) RN = 0 15 and RE = 16 31;
c) RN = 8 23 and RE = 0 7, 24 31.
A description of the function F and ranges RN and RE are shown in Table 2.
Table 2. Description of the function F(x5, x4, x3, x2, x1) and ranges RN and RE
X for x5 Variables F for x5 Ranges RN and RE
1 0 x4 x3 x2 x1 1 0 a b c
16 0 0 0 0 0 1 0 0 16 0 16 0 16
17 1 0 0 0 1 0 0
18 2 0 0 1 0 0 1
19 3 0 0 1 1 1 0
20 4 0 1 0 0 0 0 RN RE RN
21 5 0 1 0 1 0 1
22 6 0 1 1 0 1 0
23 7 0 1 1 1 0 0 23 7 23
24 8 1 0 0 0 0 1 RN 24 RN RE 8 24
25 9 1 0 0 1 1 0
26 10 1 0 1 0 0 0
27 11 1 0 1 1 0 1 RE RN RE
28 12 1 1 0 0 1 0
29 13 1 1 0 1 0 0
30 14 1 1 1 0 0 1
31 15 1 1 1 1 1 0 15 31 15 31 15 31
The number of input words and the function values are shown in pairs of columns
separately for values x5 = 0 and x5 = 1. The ranges RE of emergency mode selected
dark color.
The circuit of DC designed on FPGA ALTERA Quartus II [16] is shown in Fig. 1.
x5 x4 x3 x2 x1
LUT
L1
1 LUT
F
L2
LUT 3
2
Fig. 1. The circuit of DC
The circuit consists of three LUT: LUT 1, LUT 2 and LUT 3, which implement the
functions L1(x3, x5, x1, x4), L2(x3, x5, x1, x4) and F(L2, x2, L1) described by codes
6BBD16, 97E916 and 0AA016, respectively.
Values of the variables x3, x5, x1, x4, arriving at the inputs of the LUT 1 and LUT 2
in emergency mode, and also calculated values of the functions L1, L2, F and FL1, FL2
are shown in Table 3.
Table 3. Description of input variables and functions for LUT 1 and LUT 2
Variables F for x2 FL1 for x2 FL2 for x2 X for x2
№ L1 L2
x3 x5 x1 x4 0 1 0 1 0 1 0 1
0 0 0 0 0 0 0 1 0 0 0 0 0 0 2
2 0 0 1 0 1 0 0 1 0 0 1 0 1 3
4 0 1 0 0 1 0 0 1 0 0 0 0 16 18
5 0 1 0 1 0 0 1 0 0 0 0 1 24 26
6 0 1 1 0 0 1 0 0 1 0 0 0 17 19
7 0 1 1 1 1 0 1 0 0 0 0 0 25 27
8 1 0 0 0 1 0 0 1 0 0 1 0 4 6
10 1 0 1 0 0 1 0 0 1 0 0 0 5 7
12 1 1 0 0 0 1 0 0 1 0 0 0 20 22
13 1 1 0 1 1 0 0 1 0 0 1 0 28 30
14 1 1 1 0 1 1 1 0 0 0 0 1 21 23
15 1 1 1 1 0 1 0 0 1 0 0 0 29 31
The functions FL1 and FL2 take the values of the function F for inverse value
according to L1 and L2 owing to the distorted value of internal point in the LUT
memory. Besides, the first column contains the numbers of sets of the input variables
equal to their decimal equivalent. They are also numbers of internal points disposed in
the LUT memory. The values of X are specified in two last columns for two values of
variable x2 = 0 and x2 = 1. Values of functions FL1, FL2 selected in Table 3 by dark
color are inversed to values of function F that defines the internal points of LUT
corresponding to them as observable.
According to Table 3 internal points which are observable in emergency mode for
the considered ranges of RN and RE compose the following sets:
a) 5, 7, 13, 15 to LUT 1 and 5, 7, 13 to LUT 2;
b) 4 – 7, 12 – 15 to LUT 1 and 4, 5, 7, 13, 14 to LUT 2;
c) 0, 2, 5, 7, 8, 10, 13, 15 to LUT 1 and 0, 2, 5, 7, 8, 13 to LUT 2.
Listed internal points are not used in the normal mode, i.e. are unobservable.
Therefore they belong to the set of PHP. In LUT 3 the same 6 internal points 1, 3 – 7
in all modes are used that excludes them from a set of PHP as CEN = and the output
of LUT 3 is an observable point. The quantity of PHP in DC for the considered three
cases makes respectively 7, 13 and 14 at total of internal points NDCC = 40. The
probability of decrease in fault tolerance of the DC upon transition to emergency
mode calculated by the formula (1) accepts values of 17.5%, 32.5% and 35% for
cases of a, b and c, respectively.
5 Conclusions
Two modes of operation characteristic for I&CS generate the problem of hidden
faults which can accumulate in normal mode and reduce fault tolerance of the DC in
the most responsible emergency mode. In single-mode systems such problem isn't
present as a hidden fault is never shown and if a fault was shown, it isn't hidden.
The success in the solution of a task of PHP identification where the problem of
the hidden faults occurs is determined by opportunities of an assessment in
observability of circuit points in each mode of I&CS. The correctness of this
assessment is provided with completeness of input data considered for each mode of
I&CS. The underestimated assessment leads to false detection of PHP or their
skipping. The offered model of the activated path specifies a set of input data for
determination of observability of points in circuits of pipeline DC. The method of
simulation of pipeline DC and method of PHP identification follows from this model.
The design of modern I&CS on FPGA defines an additional task of identifying in-
ternal PHP typical for circuits with LUT-oriented architecture. The offered method of
solving this task analyzes sets of internal points used in normal and emergency mode.
It should be noted that these sets determine the effectiveness of manual regulation
of the input data in the normal mode. This procedure is used in practice for solving
the problem of the hidden faults. Procedure reduces the effectiveness with different
sets and becomes completely meaningless in case of disjoint sets.
References
1. Bakhmach, E., Kharchenko, V., Siora, A., Sklyar, V., Tokarev, V.: Design and Qualification
of I&C Systems on the Basis of FPGA Technologies. In: 7th International Topical Meeting
on Nuclear Plant Instrumentation, Control, and Human-Machine Interface Technologies
(NPIC&HMIT 2010), pp. 916–924. Las Vegas, Nevada (2010)
2. IEC 61508-1:2010. Functional Safety of Electrical / Electronic / Programmable Electronic
Safety Related Systems – Part 1: General requirements. Geneva: International Electrotech-
nical Commission (2010)
3. Sklyar, V.V., Kharchenko, V.S.: Fault-Tolerant Computer-Aided Control Systems with
Multiversion-Threshold Adaptation: Adaptation Methods, Reliability Estimation, and
Choice of an Architecture. Automation and Remote Control 63(6), 991–1003 (2002)
4. Drozd, M., Drozd, A.: Safety-Related Instrumentation and Control Systems and a Problem
of the Hidden Faults. In: 10th International Conference on Digital Technologies, pp. 137–
140. Zhilina, Slovak Republic (2014)
5. Kharchenko, V.S., Sklyar, V.V. (eds): FPGA-based NPP I&C Systems: Development and
Safety Assessment: RPC Radiy, National Aerospace University “KhAI”, SSTC on Nuclear
and Radiation Safety, Kharkiv, Ukraine (2008)
6. Gillis, D. The Apocalypses that Might Have Been, http://www.popmech.ru/go.php?
url=http%3A%2F%2Fwww.damninteresting.com%2F%3Fp%3D913
7. Garcia, P.A., Schirru, R., Frutuoso P.F., Melo, E.: A Fuzzy Data Envelopment Analysis
Approach for FMECA. Progress in Nuclear Energy 46, 359–373 (2005)
8. Andrashov, A., Kharchenko, V., Sklyar, V., Reva, L., Dovgopolyi, V., Golovir, V.: Verifi-
cation of FPGA Electronic Designs for Nuclear Reactor Trip Systems: Test- and Invariant-
Based Methods. In: IEEE East-West Design & Test Symposium (EWDTS’10), pp. 92–97.
St. Petersburg, Russia (2010)
9. Kharchenko, V.S. (ed): Safety of Critical Infrastructures: Mathematical and Engineering
Methods of Analysis and Ensuring: Ministry of Education and Science, National Aerospace
University “KhAI”, Kharkiv, Ukraine (2013)
10.IEEE 1149-1:2001, IEEE Standard Test Access Port and Boundary-Scan Architecture. IEEE
Computer Society (2001)
11.Drozd, A., Kharchenko, V., Antoshchuk, S., Sulima, J., Drozd, M.: Checkability of the Digital
Components in Safety-Critical Systems: Problems and Solutions. In: IEEE East-West De-
sign & Test Symposium, pp. 411–416. Sevastopol, Ukraine (2011)
12.Drozd, A., Kharchenko, V., Antoshchuk, S., Drozd, J., Lobachev, M., Sulima, J.: The Use of
Natural Resources for Increasing a Checkability of the Digital Components in Safety-
Critical Systems. In: IEEE East-West Design & Test Symposium, pp. 327–332. Kharkiv,
Ukraine (2012)
13.Drozd, A., Kharchenko, V., Antoshchuk, S., Drozd, M.: Checkability of Safety-Critical I&C
System Components in Normal and Emergency Modes. Journal of Information, Control and
Management Systems 10(1), 33–40 (2012)
14.Drozd, A., Drozd, M.: A New Approach to Solving a Problem of the Hidden Faults in Safe-
ty-Related Systems Journal of Information, Control and Management Systems 12(2), 125–
132 (2014)
15.Cyclone FPGA Family Data Sheet. Altera Corporation (2003), http://www.altera.com
16.Netlist Optimizations and Physical Synthesis. Qii52007-2.0. Quartus II Handbook. Vol. 2.
Altera Corporation (2004)