=Paper=
{{Paper
|id=Vol-2289/paper18
|storemode=property
|title=Diagnosing Hybrid Cyber-Physical Systems using State-Space Models and Satisfiability Modulo Theory
|pdfUrl=https://ceur-ws.org/Vol-2289/paper18.pdf
|volume=Vol-2289
|authors=Alexander Diedrich,Oliver Niggemann
|dblpUrl=https://dblp.org/rec/conf/safeprocess/DiedrichN18
}}
==Diagnosing Hybrid Cyber-Physical Systems using State-Space Models and Satisfiability Modulo Theory==
<pdf width="1500px">https://ceur-ws.org/Vol-2289/paper18.pdf</pdf>
<pre>
     Diagnosing Hybrid Cyber-Physical Systems using State-Space Models and
                         Satisfiability Modulo Theory
                                  Alexander Diedrich1 and Oliver Niggemann2
                                     1
                                       Fraunhofer IOSB-INA, Lemgo, Germany
                                      2
                                        Institute Industrial IT, Lemgo, Germany
                                 e-mail: alexander.diedrich@iosb-ina.fraunhofer.de
                                              oliver.niggemann@hs-owl.de

                         Abstract                                 alarms can quickly overwhelm an operator and thus keep
                                                                  him from finding the faulty component that caused the fault
     Diagnosing faults in large cyber-physical produc-            [2].
     tion systems is hard and often done manually.                Identifying and isolating faults in large industrial plants can
     In this paper we present an approach to lever-               take precious time which can quickly lead to a significant
     age methods from the fault detection and isolation           deterioration of the produced material or even physical de-
     community as well as model-based diagnosis to                struction of involved components. For operators of these
     diagnose faults. Given a model of the production             plants this can lead to high costs. These high costs occur
     system we capture its dynamic behaviour with a               even in smaller scale enterprises, due to low quality output,
     state-space model. Then we translate the state-              costs to locate and repair the broken component, and costs
     space model into satisfiability theory modulo lin-           to ramp up production again after the fault.
     ear arithmetic over reals. This translation converts         In the presented approach we attempt to perform fault de-
     numerical information in symbolic logic. These               tection and isolation (FDI) through model-based diagnosis
     symbols can be used to diagnose faults with Re-              over satisfiability modulo theory (SMT). For this, a model of
     iter’s diagnosis algorithm.                                  the physical process is created manually. Industrial cyber-
     We use a four-tank model as a demonstration use              physical production systems are dynamic because their pa-
     case. Under the assumption that the use-case is              rameters change over time and contain hybrid signals which
     fully-observable (i.e. all components except the             can be either binary or continuous. Here we use state-space
     water tanks can be observed) our methodology de-             models to capture the dynamic behaviour of hybrid and
     tects all injected faults.                                   cyber-physical production systems. We limit our use case
                                                                  to a multiple tanks model. The behaviour of the tanks is
1 Introduction                                                    modelled using differential equations, while the connection
                                                                  between components is modelled using predicate logic. A
For operators of large industrial cyber-physical production
                                                                  set of piecewise functions translates the state space model
systems (CPPS) it is often a hard task to precisely detect,
                                                                  into satisfiability theory modulo linear arithmetic over re-
identify, and isolate technical faults [1]. This is especially
                                                                  als (LRA). Through the translations it is possible to lever-
the case in large production plants in the process industry or
                                                                  age standard model-based diagnosis algorithms, once devel-
in pharmacological processes which often extend over sig-
                                                                  oped to diagnose binary circuits, to diagnose hybrid cyber-
nificant physical distances, consist of highly interdependent
                                                                  physical production systems. The outcome of this trans-
components, and involve many parallel paths in the form of
                                                                  lation is a set of tuples which states for each component
pipe and valve networks. The correct behaviour of a biolog-
                                                                  whether or not it is currently faulty < comp, status >.
ical reactor, for example, depends on the exact amount of
                                                                  This tuple is converted into predicate logic and combined
different ingredients, their pressure, their temperature, their
                                                                  with the connection model of the plant. We employ Reiter’s
viscosity, the ambient temperature and pressure, or other im-
                                                                  diagnosis lattice [3] to find the minimum cardinality diagno-
portant process values. Further, subsequent processes heav-
                                                                  sis and thus isolate the fault that caused the production plant
ily depend on the correct amount, quality, and time of dis-
                                                                  to fail.
charge from the biological reactor. Since these systems are
                                                                  In this work we demonstrate how our described approach
interdependent it is hard for human operators to physically
                                                                  can be used with a multiple tanks model, how we translate
locate the root-cause of a fault. For example, a valve might
                                                                  the numerical state-space model into predicate logic, and
break and block the flow of some liquid into the reactor. Due
                                                                  how we can perform diagnosis using Reiter’s well-known
to this blocked flow, the pressure and temperature within the
                                                                  algorithm [3]. We show that in case of a fully-observable
reactor might change and degrade the material by changing
                                                                  system our approach is able to find all faults as part of
its viscosity. The degraded material might then go on into
                                                                  the minimum cardinality diagnosis. By fully-observable we
a stamping and forming process. Consequently, due to the
                                                                  mean that we can observe the behaviour of all components
changed viscosity, the presses will register changed pres-
                                                                  except the water level within the tank. The water level is
sures within their control systems. For a human operator,
                                                                  inferred through calculation in the state-space model. We
all these components will at some point sound an alarm in-
                                                                  also give ideas how the approach can be extended to semi-
dicating that parameters are outside their normal operating
                                                                  observable and non-observable systems.
conditions. In modern industrial plants, the amount of these
This paper makes the following contributions:                   Assumption 2 (Measurement errors). Measurements from
 1. We show how to capture timing behaviour for model-          the flow sensors and over- and underflow switches are al-
    based diagnosis on the basis of hybrid and dynamic          ways perfect without measurement error. If necessary, it is
    cyber-physical systems.                                     stated when this assumption is relaxed.
 2. We demonstrate how diagnosis methods (i.e. Reiter’s            Justification for assumption 1 is that it suffices to simu-
    algorithm) from the model-based community can be            late faults within the valves and tanks. Modelling the pipes
    successfully combined with approaches from the fault        with physical properties would dramatically increase the
    detection and isolation (FDI) community.                    model size and thus reduce the clarity. Assumption 2 is
                                                                taken to simplify the used equations. When necessary this
 3. We show how satisfiability modulo theory can help to        assumption is relaxed.
    perform diagnosis in hybrid and dynamic systems by          This demonstration use case can be imagined as a prepro-
    keeping the amount of computations low.                     cessing stage in a larger industrial plant within the process
                                                                industry. A reliable external water supply is provided by
2 Demonstration Use Case                                        the facilities of the industrial park. The water flows from
                                                                the supply line into a buffer tank t0 . From there it can go
For this work we will use the four tank system depicted in
                                                                into one or both or the two intermediate tanks or bypass
Figure 1 as a running example. The system consists of four
                                                                both tanks to go directly into tank t3 . The two intermediate
                                                                tanks can be thought of as a mixing stage (not modelled)
                                                                where ingredients are added to the water until it reaches the
                                                                holding tank t3 . From the holding tank the water flows to
                                                                subsequent process steps.
                  p0
                                                                   The water level in tanks can be described by well-known
                            t0                                  differential equations. Laubwald [4] provided a comprehen-
                                                                sive overview about modelling multiple water-tank systems.
                                                                A single tank can be described with the differential equation
                       p1             p2                                                               dh
                                                                                  Qi (t) − Qo (t) = A                     (1)
                                                                                                        dt
                                                                which describes the time derivative of the height h given
         p3                 t1             t2                   the tank area A. Qi is the inflow to the tank and Qo is the
                                                                outflow. However, in the real world tanks are subjected to
                                                                gravity and properties of their materials. Therefore the out-
                       p4             p5                        flow of a tank is calculated by
                                                                                                 √
                                                                                      Qo = Cd a 2gh                       (2)

                                 t3                             where Cd is the discharge coefficient taking into account all
                                                                fluid characteristics, losses, and irregularities and g is the
                                                      p6        gravitational constant. a is the cross sectional area of the
                                                                orifice within the tank. All tanks have a perfectly circular
                                                                bottom and a cylindrical shape. Combining equations 1 and
Figure 1: The demonstration use case showing a four-tanks       2 one can create
model                                                                                          √           dh
                                                                                 Qi (t) = Cd a 2gh + A                     (3)
water tanks t, seven electric valves p with integrated flow                                                dt
sensors, an unlimited water source and an unlimited water       and reformulating this to bring dh
                                                                                                dt on the left-hand side
sink (not shown). Valve p0 controls water from the unlim-
                                                                                 dh    1              √
ited water source, for example the public water mains, into                          = (Qi − Cd a 2gh)                     (4)
tank t0 . From there three pipes with an equal diameter lead                     dt    A
to valves p1 , p2 , and p3 . Valve p1 leads into tank t1 and    To calculate a new tank height h at time t, given the previous
valve p2 leads into tank t2 . Valve p3 bypasses both tanks      height h0 one can use
and is directly connected to tank t3 . Over valves p4 and p5
the two tanks can be drained into tank t3 . Finally, valve p6                         h(t) = h0 + ∆h                       (5)
drains tank t3 into the unlimited water sink, for example a     Through substitution into equation 4 this results in
river or a processing facility.
Each tank has two binary sensors which indicate overflow                                1              √
                                                                    h(t) = h(t − 1) + (Qi (t) − Cd a 2gh(t − 1)) (6)
and underflow, respectively. There are no provisions to di-                             A
rectly measure the water level. Each valve has a switch         which describes that, given the discharge coefficient, the
which indicates whether or not the valve is enabled. In ad-     gravitation, and the diameter of the orifice it is possible
dition, each valve has an associated flow sensor. For the       to calculate a new height from a given input with only the
present system the following assumptions are made:              knowledge of the previous water level.
Assumption 1 (Pipes). Pipes are invisible to the system,        These differential equations can be used to create a simu-
have ideal physical properties and cannot break                 lation of the four-tank system depicted in 1. The following
  Parameter                   Element     Value     Unit           and Mealy automata.
  Area                                                             Daigle et al. [7] have adapted a discrete event approach to
  A                           t0          4.0       m2             diagnose continuous systems. They claim that each fault
                              t1          2.0       m2             that occurs in a continuous system has a unique fault sig-
                              t2          2.0       m2             nature. A fault signature denotes a qualitative effect that a
                              t3          6.0       m2             fault occurs in an observation. They also claim that there ex-
  Height                                                           ists a measurement ordering that describes which sequence
  H                           t0          20.0      m              measurements deviate until a fault occurs. To capture fault
                              t1          10.0      m              ordering they manually construct a temporal causal graph.
                              t2          10.0      m              Under the assumption that all fault signatures and measure-
                              t3          20.0      m              ment orderings are known, they employ a diagnoser that
  Discharge Coefficient                                            traces the states through the temporal causal graph based on
  Cd                          t0          1.0       None           measurements. The output of the diagnoser is a fault trace.
                              t1          1.0       None           A second diagnosis algorithm takes this fault trace and de-
                              t2          1.0       None           termines which components must be faulty to explain this
                              t3          1.0       None           trace. This second diagnosis step is similar to the diagnosis
  Orifice                                                          lattice introduced by Reiter.
  a                           p0          0.3       m2             Grastien et al. [8] have developed an approach to extend
                              p1          0.1       m2             Reiter’s diagnosis algorithms which was described for bi-
                              p2          0.1       m2             nary circuits to include discrete event systems and hybrid
                              p3          0.1       m2             systems. Their approach is similar to Daigle et al, Struss,
                              p4          0.1       m2             and Provan in that they transform the continuous parts of a
                              p5          0.1       m2             model into qualitative states. Following this, their preferred-
                              p6          0.3       m2             first algorithm goes through Reiter’s diagnosis lattice and
  Gravitational Constant      g           9.81      m/s
                                                        2          computes valid hypothesis with the goal of finding a mini-
                                                                   mum cardinality diagnosis. An improvement over previous
Table 1: Parameters of the four-tank system for the demon-         work is that they implement their hypotheses tests with a
stration use case                                                  SAT solver.
                                                                   Roychoudhury et al. [9] [10] have shown how to use hybrid
                                                                   bond graphs (HBG) to diagnose hybrid systems. HBGs ab-
parameters were used to run the system simulation: The pa-
                                                                   stractly model the system by describing causal, continuous
rameters in table 1 have been chosen to represent a typical
                                                                   relationships between components. In Daigle et al. [7] they
industrial use case. At the same time, noise parameters such
                                                                   have shown how to employ the developed HBGs to diag-
as the discharge coefficient are kept neutral to further the ar-
                                                                   nose a spacecraft power distribution system. Prakash et al.
gument. The parameters area and height for each tank have          [11] have used an extended framework with HBGs to make
been chosen such that they are big enough to hold and store
                                                                   improvements in diagnosing two-tank systems.
some water while the experiments are running. This way
                                                                   Grastien [12] used SMT for the diagnosis of hybrid systems.
occurring errors will also have a longer time to propagate.
                                                                   He discretizes values in a hybrid system into a set of dis-
Further, the orifices for each tank are quite small. This pro-
                                                                   tinct states. Each observation < τ, A >is understood as a
longs the time it takes to drain the tanks in case the water
                                                                   behaviour A at time τ , where A is a partial assignment of
supply stop through the occurrence of a fault.
                                                                   the variables in a state. Measurement errors are included by
                                                                   including constraints which state that the observed voltage
3 Related Work                                                     must be between two tolerance thresholds. Each variable is
Struss [5] published a paper on the fundamentals of MBD of         augmented with an indicator stating at which time-step the
dynamic systems. In this he described how hybrid systems           variable expression is valid.
can be modelled without resorting to a complete simulation         Fraenzle et al. [13] have augmented SMT with stochas-
of the system under investigation. He proposed to capture          tic in order to analyse stochastic hybrid systems. By using
the temporal and dynamic behaviour of a hybrid system in a         bounded-model checking together with probabilistic hybrid
set of modes which model the system. Each mode has dis-            automata, piecewise deterministic Markov processes, and
tinct state and temporal constraints in addition to so called      stochastic differential equations they are able to create an
Continuity, Integration, and Derivatives (CID) constraints         analysis system without the need to formulate intermediate
that affect all modes. For one mode, all variables have a do-      finite-state abstractions as the methods mentioned above do.
main which captures the permissible states (i.e. values) for       In another work Khorasgani and Biswas [14] describe a
this variable. Diagnosis is performed by checking whether          hybrid system model through hybrid minimal structurally
the set of constraints together with the observations from         overdetermined sets (HMSOs). These are sets of differential
sensors is consistent. He demonstrates his approach on a           equations and (in-) equations which model the behaviour of
car’s anti-braking system and claims to find all usually oc-       a hybrid system. Their FDI algorithm works as follows: The
curring faults.                                                    algorithm detects the current system mode and generates an
When dealing with hybrid systems there always exists the           appropriate model. From this it generates a minimal set of
problem of discretization. Provan used a composite au-             HMSOs for this mode. The residuals are computed for each
tomaton for this. Struss divides his system into modes by          HMSO and can then be combined with fault signature to
discretizing the underlying sensor values. Lin [6] already         perform diagnosis.
showed in 1994 that online and offline diagnosis for discrete      In contrast to Struss, Provan, and Lin we do not use au-
event systems (DES) can be realised by using simple Moore
tomatons and mode estimation to partition the system into          8. To perform model-based diagnosis according to the prin-
different states. Instead, we only sample the system at some       ciples proposed by Reiter [3] it is necessary to separate the
suitable interval and use the obtained information directly        diagnostics part from the state propagation. Therefore, we
to model the states in the state-space representation. Un-         will not calculate classical residuals as in equation 8 and can
like to space-craft in the case of Daigle in industrial systems    further remove the gain factor and residuals Kr(t).
fault signatures and measurement orderings are unknown,            For this work the state-space model needs to be described
which requires us to pursue a more uninformed approach.            more abstractly. First the top-level information flow is de-
Our approach can be more seen as an alternative to hybrid          scribed. This shows how the state is propagated within the
bond graphs used by Roychoudhury et al., while they are            system. Here, the model is general enough to be extended
at the same time an extension to the work of Grastien and          and adapted to many use cases. After that the state-space
Khorasgani and Biswas. In comparison to Grastien we do             model is described on the demonstration use case introduced
not singly use satisfiability modulo theory, but instead cap-      in section 2. In a third step the diagnosis part is described
ture system behaviour in a state-space representation. We          which fusions the calculation of binary residuals with an ex-
expect this to reduce the required computational effort. We        pression in predicate and SMT logic. The state is propagated
also make use of (in-) equations and differential equations        through
as were used by Khorasgani and Biswas, but augment these
                                                                                   x(t + 1) = f (x(t), u(t))
with the diagnostic reasoning of traditional model-based di-                                                                  (9)
agnosis. Compared to Fraenzle, we do not make use of                                   y(t) = g(x(t), u(t), τ )
stochastic SMT at this point to keep the system more ex-           where x(t + 1) is a vector of the state in the next time step,
plainable for users.                                               x(t) is the current state vector, u(t) is the observable input
                                                                   vector, y(t) is the observable output vector, and τ is a vector
4 Modelling a Hybrid System                                        of threshold values.
                                                                   According to the demonstration use case the water level can-
This section first shows the requirements for developing and
                                                                   not be measured directly. Therefore, each tank’s water level
evaluating a FDI method for hybrid, cyber-physical systems.
                                                                   needs to be calculated through its inflow and outflow. The
Then it shows the concept to realise these requirements. The
                                                                   inflow and outflow can be measured through the associated
developed approach makes use of MBD by modelling the
                                                                   valves in each in- and outflow pipe. Since each tank has
hybrid system with a state-space representation. This model
                                                                   sensors to indicate under- and overflow, these are used for
is augmented with an observer, which determines boolean
                                                                   the target (output). For the state, input, and output vectors
residuals. These residuals indicate whether or not a compo-
                                                                   we thus have:
nent is faulty. The information about which components in-                                                                   
dicate faults are merged with sensor observations and make                                                     overf low0
up the diagnostic part of the approach. Diagnosis is done                                                      ..        
                                                                             h0            f low0                   .        
through Reiter’s diagnosis lattice.                                                      f low1                            
                                                                           h1                               overf    low   
The formal form of a state-space representation is:                   x=           u=  .. .      y = 
                                                                                                     
                                                                                                                            3
                                                                                                                              
                                                                             h2                             underf low0 
                                                                             h3                                     ..       
   x(t + 1) = (A + ∆A)x(t) + (B + ∆B)u(t)+                                                 f low6                    .       
           Bd dk (t) + Bafa (t) + Bc fc (t)                 (7)                                              underf low3
       y(t) = (C + ∆C)x(t) + Ds fs (k) + Dω ω(k)
                                                                   The function f (x, u) models the current state and its current
x(t) is the systems’s state, u(t) the control input, y(t)          input and from this computes the next state. Therefore we
the observed output, fa (t) the unexpected actuator fault,         can write:
fc (t) a component fault, fs (t) a sensor fault dk (t)
a process disturbance, and ω(t) measurement noises.                           f (x(t), u(t)) = A∆(x, u, t) + Bu(t)           (10)
A, B, C, Bd , Ba , Bc , D, s, Dω are known parameter matri-        with the connection matrices being
ces and ∆A, ∆B, ∆C modelling parameter errors.                                                             
For observer-based methods and to calculate residuals these                                  1 0 0        0
                                                                                           0 1 0         0
equations can reformulated to:                                                        A=
                                                                                             0 0 1        0
            x̂(t + 1) = Ax̂(t) + Bu(t) + Kr(t)                                               0 0 0        1
                 r(t) = y(t) − ŷ(t)                        (8)    and
                 ŷ(t) = C x̂(t)                                                                                       
                                                                              1       −1    −1    −1    0       0    0
                                                                             0       1     0     0     −1      0    0
                                                                           B=
                                                                                                                     0
Here, x̂(t) and ŷ(t) are estimates of the state and output val-
                                                                              0       0     1     0     0       −1
ues. r(t) is the calculated residual signal, and K is a gain
                                                                              0       0     0     1     1       1    −1
factor.
According to assumption 2 we can safely neglect the factors        A shows that each current state only influences the exact
the ∆ and ω terms in equation 7. Further, in this approach         next state. For the demonstration use case this means each
we will only model component faults. Therefore, we can             differential equation which models a tank will only affect
remove fa (t), fs (t), and through assuming that there are no      the state of this single tank. Matrix B shows the connec-
disturbances within the process we can also remove dk (t).         tions between the system’s components, which in this case
This leaves only the system’s observable input, observable         are the pipes between the tanks. The first row describes how
output, state, and component fault in equation 7. Through          the system’s primary input is connected as an input (indi-
these simplifications equation 7 becomes closer to equation        cated by the number 1 in the first column) to tank 1. The
three values of −1 in the first row show there are three pipes      components. In the presented demonstration use case two
that are used as the output of tank 1.                              fault models for tanks and valves exist, respectively. Tanks
To model the water level in each tank it is possible to use dif-    fail, when the water level within the tank reaches either
ferential equations. Each differential equation has the form        the upper limit (overflow) or the lower limit (underflow).
introduced in section 2:                                            Pumps fail when the measured flow deviates more than a
                       1                √                           certain amount from the expected flow.
   h(t) = h(t − 1) + (Qi (t) − Cd a 2gh(t − 1)) (11)                Classical MBD uses observations(OBS), a system
                       A
                                                                    description(SD), and a component description(COMPS)
Using the parameters from the state-space system this is            for describing a system. After having described the
written as                                                          actual behaviour of the hybrid system with state-space
                         1              √                           equations, it is now important to translate this into diag-
            x(t + 1) = (u(t) − Cd a 2gx(t))                (12)
                        A                                           nostic information. OBS are given by the input vector
A vector ∆(x, u, t) can be created with the right-hand side         u(t). The component behaviour COMPS is described
of these equations:                                                 by the differential equations in the case of tanks and by
                                                   √              assuming no further properties for the valves, resulting
                x0 (t + 1) = A10 (u0 (t) − Cd,0 a0 2gx0 (t))        in input(valvei ) = output(valvei ). SD is given in two
               x (t + 1) = 1 (u (t) − C a √2gx (t))               parts. For normal operation this is the incidence matrix B, a
                1                  1         d,1 1
                                                    √     1     
∆(x, u, t) =                  A1
                                                                   predicate logic description of the inputs and outputs of the
               x2 (t + 1) = A12 (u2 (t) − Cd,2 a2 2gx2 (t))
                                                    √               system, and a fault model. For the given demonstration use
                x3 (t + 1) = A13 (u3 (t) − Cd,3 a3 2gx3 (t))        case it suffices to specify a weak-fault model (WFM). A
With this model it is possible to propagate the state of            WFM to model the fault modes of the tanks can be specified
the system as it evolves through time. Differential equa-           as
tions calculate the water level in the tank for the next state,                     σT,i : HT,i → ¬o(i) ∧ ¬l(i)              (16)
given the current water level and the inflow obtained by            For valves the statement is specified as
reading the valve flow sensors. However, given this infor-
mation a control system cannot yet determine the full be-            σP,i : HP,i → (f lowil ≤ f lowi ) ∧ (f lowiu ≥ f lowi ) (17)
haviour of the system. For this, the output vector y(t) =           In this case the health variables H do not describe a proba-
g(x(t), u(t), τ ) needs to be calculated.                           bility for the component being faulty, but are instead binary.
                                                                  The terms σt,i and σp,i can be written as a vector
                                    o(h0 , τ0o )
                                         ..                                                                                    T
                                          .                               C = [σT,0       ...   σT,3       σP,0     ...   σP,6 ]
                                  o(h , τ o ) 
                                   3 3           
            g(x(t), u(t), τ ) = C                       (13)      If C is semantically interpreted through an SMT solver C ′ =
                                   l(h0 , τ0l ) 
                                        ..                        I(C), we obtain the diagnosis vector
                                         .       
                                                                                     C′ = [c0
                                                                                                                        T
                                     l(h3 , τ3l )                                                  c1    ...        c10 ]
and                                                               with ci ∈ {⊤, ⊥}. This vector shows for each component
                    1 0 0 0 0 0 0 0                                 whether it is faulty or not, given the current observations
                  0 1 0 0 0 0 0 0                                 from the sensors.
                  0 0 1 0 0 0 0 0
                                                                  The numerical information for the statements σT,i and σP,i
                  0 0 0 1 0 0 0 0
             C=                                                  is obtained from the state space model. Within the state-
                                                
                  0 0 0 0 1 0 0 0                                 ments the state vector x(t) represents the water level hi and
                  0 0 0 0 0 1 0 0                                 the input vector u(t) represents the flow values f lowi . By
                                               
                    0 0 0 0 0 0 1 0                                 interpreting the statements it is possible to translate the sub-
                    0 0 0 0 0 0 0 1                                 symbolic, numerical data within the state-space model into
    τ is a vector of threshold values which indicate at what        symbolic information used for diagnosis through the vector
height the tank is overfull or underfull. For notation we use       C′
τio to denote the threshold for the upper limit of tank i and       Equation 9 shows the propagations of the state vector
τil to denote the lower limit of tank i. The diagonal matrix        through time. For each new time step the statements 16
C maps the results of the functions o(h, τ ) and l(h, τ ) into      and 17 have to be reformulated. To capture this time-related
the output vector y. The function o(h, τ ) indicates when the       behaviour we adopt the notation of Grastien [12] and state
water level within the tank has approached the upper limit.         that varname@t stands for the variable varname at time
This is calculated by                                               t, where t ∈ N0 . From this, we can state the value for each
                                {                                   variable at each observed time step. When the observations
                                  0 if hi ≤ τio                     are only carried out while the system is in normal operation
                 o(hi , τio ) =                           (14)      it is possible to create a logical representation of all obser-
                                  1 else
                                                                    vations so far:
Likewise, the lower limit of the water level can be calculated                          ∧             ∧
                               {                                                           σT,i @t ∪     σP,i @t
                                 0 if hi ≥ τil
                l(hi , τil ) =                            (15)                          t                t
                                 1 else
                                                                    which describes the logical conjunction of all σT,i and σP,i
To diagnose faults within the described state-space system          over all time steps. Adding the statements in each time
it is necessary to obtain health information about single           step to the knowledge base as described by Grastien will
increase the required space linearly and still take exponen-      that we can measure the water flow through each valve
tial time to check the consistency. Especially in large in-       at each point in time. This is also a realistic assumption
dustrial plants where observations run for months with in-        for many smaller scale application and most demonstration
dividual observations being performed at second intervals,        plants which are built with observability in mind. Older and
a linearly growing knowledge base is infeasible. For exam-        more complex industrial plants, however, contain more of-
ple, when observing 200 signals with a sampling rate of one       ten component whose parameters can not be observed.
second a knowledge base would grow by 17,280,000 data
points per day. Therefore, in this work we will focus only        Semi-observable system
on the observations in the current time step. This keeps the      In the case that the system is semi-observable, not all com-
knowledge base size constant and adds no additional com-          ponents’ behaviour can be observed with sensors. This is the
putational complexity. More observations can bring a higher       case in most real-world industrial plants where its either too
precision in locating a fault. In this case, the number of ob-    expensive to add sensors for every machine parameter or it
servations can be increased by some constant factor, taking       is infeasible due to physical constraints or historical reasons.
for example always the last three observations into account.      A diagnostic system which cannot observe every parameter
A hybrid system can be represented through a directed-            has to work with partial information. If necessary the miss-
acyclic graph (DAG) showing the connections and causal            ing values need to be estimated. In the case boolean circuits
relationships between components. Depending on the loca-          this can be done straightforward. For every component in a
tion of the component within the graph a fault in one com-        boolean circuit the behaviour model is known. Further, the
ponent may cause several other components to fail as well.        system description SD is known. If only parts of the com-
In the demonstration use case, for example, if valve p0 fails,    ponents can be observed a diagnostic reasoning system can
all the other valves and the tanks will also exhibit anoma-       infer the missing values.
lous behaviour. The goal in diagnosis is therefore to find        In hybrid, physical systems inferring values is more dif-
the smallest amount of components which would explain a           ficult. Some real-world components may behave non-
fault. This search for minimum cardinality diagnoses can          linearly, stochastically, or very unpredictable. Further, sig-
be done with Reiter’s diagnosis lattice. First, a power set       nal propagation may not be instantaneous as in boolean cir-
P = P(COMPS) is constructed. This contains all sets of            cuits, but a change in one parameter may only be notice-
sets of components. From this the diagnosis lattice can be        able some time later. This is the case, for example, in bio-
created. On the bottom is the empty set which denotes no          reactors. If the temperature in a reactor changes, the sub-
faulty components. In the row above are all sets that con-        stance may only exhibit a change in an observable property
tain exactly one component. In the row above that are sets        some time later.
that contain exactly two components and so forth. Each            In the demonstration use case the tanks are modelled
observation of the sensors within the system leads to a re-       through differential equations and the valves have the
computation of the set of possible faulty components C ′ . By     throughput that is maximally allowed by the outflow of a
computing the hitting sets of all these observations it is pos-   tank. Thus, even if not all sensors can be observed it is still
sible to close in on the faulty components. In the diagnosis      possible to infer missing values.
lattice this is done by searching the lattice bottom-up and       Non-observable system
refuting all branches which include a component that can be
                                                                  In non-observable systems only the primary inputs and out-
proven to be healthy through observations. Once the lattice
                                                                  puts are observable. A diagnostic system needs to measure
has been searched the solutions with the minimum number
                                                                  the primary input signals, the primary output signals and
of components are the minimum cardinality diagnoses ω ′ .
                                                                  combine these with the model SD and COMPS of the hy-
Once the diagnosis framework has been set up three possible
                                                                  brid system. To perform diagnosis, every intermediate value
usages can be identified: fully-observable, semi-observable,
                                                                  must be assumed by propagating the primary input values
and non-observable. In the first type of usage the diagnoser
                                                                  through the system. This approach is the most computing
can observe every property of the physical system, except
                                                                  intensive, since assumable values need to be computed in se-
the water level described by the state vector x(t). In the sec-
                                                                  quence. Diagnosis is performed by comparing the expected
ond type of usage, only a subset of sensors are accessible to
                                                                  primary output values with the observed primary output val-
the diagnoser. Thus, some values need to be approximated.
                                                                  ues and then going through the circuit back-to-front to find
In the third type of usage only the primary inputs and pri-
                                                                  diagnosis candidates.
mary outputs can be observed, while all other values need to
be inferred. The following three sub-sections describe these
types of usage in detail.                                         6   Experiments
                                                                  To show that the developed diagnostic methodology works
5 Observability                                                   as intended 16 experiments with the simulation of the
In this paper we will focus on diagnosis of fully-observable      demonstration use case were carried out. These are divided
systems. This simplifying use case makes it easier to de-         into two sets. In the first set the primary input to the demon-
scribe the methods, while the extension to semi-observable        stration use case was a constant stream of water. We expect
systems, and non-observable systems is reserved for future        in this case that during normal operation the water level in
work. However, this section gives some ideas on how to ex-        the tanks will reach a constant height and remain there until
tend the developed methodology to include semi- and non-          a fault occurs. In the second set of experiments the primary
observable systems.                                               input was changed to a sinusoidal water stream. For this we
                                                                  used the function
Fully-observable system                                                                {
In full-observable systems we assume that each component                                  O + α sin(2t) if t ≤ T π
                                                                              in(t) =                                         (18)
can be observed. For the demonstration use case this means                                0                 else
Equation 18 shows the form of the sinusoidal wave with pe-                     Constant                 Sinusoidal
riod T . We use an offset C to ensure a constant basic input          Index    # Faults    Detected     # Faults Detected
stream into tank 1. The gain factor A adjusts the period such         0        p0          x*           p0         x*
that we achieve variability within the tank water levels, but         1        p3          x*           p3         x*
without triggering and under- or overflow. Further, we use            2        p5          x*           p5         x*
a piecewise function to cut off the negative half-wave of the         3        p6          x*           p6         x*
sinusoidal input wave for convenience and ease of interpre-           4        p1 , p3     x*           p1 , p3    x*
tation.                                                               5        p4 , p5     x*           p4 , p5    x*
For both sets of experiments the normal operating condition,
five cases of single-faults, and two cases of double-faults       Table 2: Recognized faults for experiments with constant
were simulated. For each experiment 300 time steps were           input stream or sinosoidal input stream at time-step 100
carried out, with the respective faults being injected at time
step 100 and being removed at time step 200.                                   Constant                 Sinusoidal
We split the developed method into two parts. Part one is             Index    # Faults    Detected     # Faults Detected
the quantitative simulation of the demonstration use case             0        p0          x (11)       p0         x (11)
described in section 2. Part two is the diagnosis algorithm           1        p3          x (3)        p3         x (3)
consisting of the state-space model, the SMT logic, and the           2        p5          x (3)        p5         x (3)
diagnosis lattice. Both parts have been written in Python             3        p6          x (3)        p6         x (3)
3.4.5. The quantitative simulation provides the user with             4        p1 , p3     x (5)        p1 , p3    x (7)
functionalities to inject faults and generate normal process          5        p4 , p5     x (6)        p4 , p5    x (6)
data. The location and number faults can be specified as
well as the type of input (for example, if the water inflow       Table 3: Recognized faults for experiments with constant
is constant or sinusoidal). The output of the simulation is a     input stream or sinosoidal input stream at time-step 199
.csv file which contains all process data as well as the diag-
nostic information. This method was chosen to be close to
real industrial use cases.                                        if valve 5 stops working its flow would almost immediately
SD is modelled through predicate logic. In the fully-             go to 0. The sampling frequency is high enough to detect
observable case for the demonstration use case it suffices to     this decrease in the flow rate early enough that the water
explicitly model the connections between components, in-          level in the tanks is not yet significantly affected. However,
puts, and outputs. Therefore, only three functions are used       in large industrial plants sampling rates are often far lower.
for the predicate logic: The function component(c) with           A faulty component might then only be recognised once its
arity 1, and the function input(i, c) and output(o, c) with       effects have propagated into other observations from other
arity 2. These model the names of components in the sys-          components. Further, in the semi- and non-observable cases
tem and the number and names of their inputs and outputs.         not every status of every component is known. In this case,
In addition the relation connects(ci , cj ) specifies which in-   too, the set of possible faulty components will grow in size.
put is connected to which output. For the present use case        As the criterion in table 2 we evaluated the output of the di-
the constants source and sink are used to denote primary          agnosis algorithm in the time step 101 which was directly
inputs and primary outputs, respectively. With this logic it      after the fault had been injected. It is evident that due to
is possible to describe the connections between components        the SMT logic statement in equation 17 every unexpected
in the form:                                                      change in the throughput of a valve would be immediately
                  component(t0)∧                                  detected.
                                                                  However, table 3 shows the results when the output of the
                   ...                                            diagnosis algorithm was evaluated directly before removing
                  component(p6)∧                                  a fault at time-step 199. The number in brackets denotes the
                  input(t0.i0, t0)                                size of the minimum-cardinality set, while x still denotes
                                                                  whether or not the fault was within the result set. As was
                   ...
                                                                  the case in table 2, all faults were detected, though the re-
                  output(p6.o0, p6)∧                              sults set also contained components which were not faulty.
                  connects(source, p0.i0)                            Figure 2 shows the development of the water level in tank
                   ...                                            3 for the experiment with a constant water input and the
                  connects(t3.o0, p6.i0)                          fault being injected at valve p5 . In this case, the flow from
                                                                  tank 2 into tank 3 is blocked. In the figure the grey line
                  connects(p6.o0, sink)                           represents normal working behaviour and the black line ab-
                                                                  normal working behaviour. The shaded area indicates the
7 Results                                                         time during which p5 is simulated to be faulty. From time-
Table 2 shows the experiments for constant and sinusoidal         step 0 until 100 it is evident that the water level in tank 3 is
input streams, the injected fault and whether or not the fault    the same in both conditions. The tank was initialised with
was detected. An x in the column detected denotes that the        a water level of 7m, which first drains as not enough water
injected fault was among the result set of the diagnosis algo-    is flowing into the tank. Then, once tanks 0, 1, and 2 have
rithm. This means the algorithm is complete. An x∗ denotes        reached their normal operating conditions tank 3 stabilises.
that only the injected fault was detected, which corresponds      Once the fault is injected the water level in tank 3 becomes
to soundness of the algorithm. It must be noted here, how-        unsteady. This results from the implementation of the model
ever, that finding only the injected faults depends heavily on    which only models tank levels until they reach their upper
the granularity of the underlying data source. For example,       limit. Thus, with p5 being disabled tank 2 begins to over-
flow. This in turn redistributes the water pressures in all the   [2] Marta Fullen, Peter Schüller, and Oliver Niggemann.
other tanks. Once the fault is removed, however, all flows             Defining and validating similarity measures for indus-
reach their maximum level and the water level in tank 3 in-            trial alarm flood analysis. In Industrial Informatics
creases.                                                               (INDIN), 2017 IEEE 15th International Conference
                                                                       on, pages 781–786. IEEE, 2017.
                                                                  [3] Raymond Reiter. A theory of diagnosis from first prin-
                                                                       ciples. Artificial intelligence, 32(1):57–95, 1987.
                                                                  [4] Elke Laubwald. Coupled tanks systems 1. control-
                                                                       systems-principles. co. uk, 2015.
                                                                  [5] Peter Struss. Fundamentals of model-based diagno-
                                                                       sis of dynamic systems. In IJCAI (1), pages 480–485,
                                                                       1997.
                                                                  [6] Feng Lin. Diagnosability of discrete event systems
                                                                       and its applications. Discrete Event Dynamic Systems,
                                                                       4(2):197–212, 1994.
                                                                  [7] Matthew J Daigle, Indranil Roychoudhury, Gautam
                                                                       Biswas, Xenofon D Koutsoukos, Ann Patterson-Hine,
                                                                       and Scott Poll. A comprehensive diagnosis method-
                                                                       ology for complex hybrid systems: A case study on
                                                                       spacecraft power distribution systems. IEEE Transac-
Figure 2: Development of the water level in tank 3 over                tions on Systems, Man, and Cybernetics-Part A: Sys-
time in normal conditions (grey) and with a fault in valve 5           tems and Humans, 40(5):917–931, 2010.
(black)                                                           [8] Alban Grastien, Patrik Haslum, Sylvie Thiébaux, et al.
                                                                       Conflict-based diagnosis of discrete event systems:
                                                                       Theory and practice. In KR, 2012.
8 Conclusion                                                      [9] Indranil Roychoudhury, Matthew J Daigle, Gautam
                                                                       Biswas, and Xenofon Koutsoukos. Efficient simula-
This paper shows for the limited domain of a fully-
                                                                       tion of hybrid systems: A hybrid bond graph approach.
observable, hybrid, dynamic industrial production system
                                                                       Simulation, 87(6):467–498, 2011.
that we can model its behaviour with state-space equations
and then translate it into satisfiability modulo theory and       [10] Sriram Narasimhan and Gautam Biswas. Model-based
perform diagnosis. So far, operators in the process industry           diagnosis of hybrid systems. IEEE Transactions on
rely only on fault identification techniques such as support-          systems, man, and cybernetics-Part A: Systems and hu-
vector machines, artificial neural networks, Bayesian ap-              mans, 37(3):348–361, 2007.
proaches etc. In this work we present an approach which can       [11] Om Prakash and AK Samantaray. Model-based diag-
be used to also bring fault isolation into cyber-physical pro-         nosis and prognosis of hybrid dynamical systems with
duction systems. Given a suitable model of the production              dynamically updated parameters. In Bond Graphs for
system the presented method is able to capture behaviour               Modelling, Control and Fault Diagnosis of Engineer-
over time while preserving the ability to directly react to            ing Systems, pages 195–232. Springer, 2017.
faults. With a suitably chosen data sampling frequency the
                                                                  [12] Alban Grastien. Diagnosis of hybrid systems by con-
potentially huge knowledge-base proposed by Grastien [8]
can be avoided.                                                        sistency testing. In 24th International Workshop on
For future work we will show how to extend this approach to            Principles of Diagnosis (DX-13), pages 9–14. Cite-
deal with semi- and non-observable systems. For these we               seer, 2013.
need better models of single components by, for example,          [13] Martin Fränzle, Holger Hermanns, and Tino Teige.
creating assumables with the help of differential equations            Stochastic satisfiability modulo theory: A novel tech-
specified in SMT logic.                                                nique for the analysis of probabilistic hybrid systems.
Another direction for further research is the automatic gen-           In International Workshop on Hybrid Systems: Com-
eration and learning of system models. Nowadays, models                putation and Control, pages 172–186. Springer, 2008.
of systems need to be created manually which is not gener-        [14] Hamed Khorasgani and Gautam Biswas. Structural
alizable and time consuming. The state-space equations and             fault detection and isolation in hybrid systems. IEEE
their translations into logic are simple and can be specified          Transactions on Automation Science and Engineering,
algorithmically. Therefore, an attempt should be made to               2017.
at least semi-automatically learn parts of these models from
descriptions of meta-data.

References
[1] Rolf Isermann and Peter Balle. Trends in the appli-
    cation of model-based fault detection and diagnosis
    of technical processes. Control engineering practice,
    5(5):709–719, 1997.

</pre>