Proceedings of the 26th International Workshop on Principles of Diagnosis


         Using Incremental SAT for Testing Diagnosability of Distributed DES

                    Hassan IBRAHIM1 and Philippe DAGUE1 and Laurent SIMON2
                              1
                                LRI, Univ. Paris-Sud and CNRS, Orsay, France
                                 hassan.ibrahim@lri.fr, philippe.dague@lri.fr
                          2
                            LaBRI, Univ. Bordeaux and CNRS, Bordeaux, France
                                               lsimon@labri.fr


                         Abstract                                      positioning the sensors to manage the observation require-
                                                                       ments. The main difficulty in diagnosability algorithms is
     We extend in this work the existing approach to                   related to the states number explosion. Another difficulty
     analyse diagnosability in discrete event systems                  appears when checking diagnosability of a system which
     (DES) using satisfiability algorithms (SAT), in or-               is actually diagnosable, i.e. the inexistence of a counter-
     der to analyse the diagnosability in distributed                  example witnessing non diagnosability. Thus all possibili-
     DES (DDES) and we test this extension. For this,                  ties need to be tested as for proving the non existence of a
     we handle observable and non observable com-                      plan in a planning problem, and usually in this case some
     munication events at the same time. We also pro-                  approximations are used to avoid exploring all the search
     pose an adaptation to use incremental SAT over                    space.
     the existing and the extended approaches to over-                 The paper is structured as follows. Section 2 will introduce
     come some of the limitations, especially concern-                 the system transition models for centralized DES and recall
     ing the length and the distance of the cycles that                the traditional definition of the diagnosability in those mod-
     witness the non diagnosability of the fault, and                  els and the state of the art of encoding this definition as a
     improve the process of dealing with the reacha-                   satisfiability problem in propositional logic. Section 3 will
     bility limit when scaling up to large systems.                    present our first contribution, an extension of this state of the
                                                                       art to DDES with observable and non observable communi-
1 Introduction                                                         cation events in the same model, and will give experimental
                                                                       results of this extension. Section 4 is devoted to our sec-
Diagnosis task is mainly using the available observations to
                                                                       ond contribution, using incremental SAT calls to overcome
explain the difference between the expected behavior of a
                                                                       the limitation when the number of steps required to check
system and its real behavior which may contain some faults.
                                                                       diagnosability, i.e., the length of possible paths with cycles
Many works have been done to study the automatic ap-
                                                                       witnessing non diagnosability, is large, and will present ex-
proaches to system fault diagnosis. They all try to deal with
                                                                       perimental results showing how the method scales up. Sec-
the main problem, i.e. the compromise between the number
                                                                       tion 5 will present related works and section 6 will conclude
of possible diagnoses to the considered faults and the num-
                                                                       and give our perspectives for future work.
ber of observations which must be given to make the deci-
sion. Diagnosis problem is NP-hard and one always needs
to cope with an explosion in the number of system model                2 Using SAT in Diagnosability Analysis of
states. Moreover, the diagnosis decision is not always cer-              Centralized Systems
tain, and thus running a diagnosis algorithm may not be ac-
curate. For example, two sets of observations provided by              We recall first the definitions of DES models we use and of
different sets of sensors or at different times may lead to            diagnosability for these models.
different diagnoses. This uncertainty raises the problem of
diagnosability which is essential while designing the system           2.1 Preliminaries
model. After that, the model based diagnosis will be used              We will use finite state machines (FSM) to model systems.
in applications to explain any anomaly, with a guarantee of            We define labeled transition systems following [1].
correctness and precision at least for anticipated faults.
Diagnosability of the considered systems is a property de-             Definition 1. A Labeled Transition System (LTS) is
fined to answer the question about the possibility to distin-          a tuple T = hX, Σo , Σu , Σf , δ, s0 i where:
guish any possible faulty behavior in the system from any
other behavior without this fault (i.e., correct or with a dif-          • X is a finite set of states,
ferent fault) within a finite time after the occurrence of the           • Σo is a finite set of observable correct events,
fault. A fault is diagnosable if it can be surely identified
from the partial observation available in a finite delay af-             • Σu is a finite set of unobservable correct events,
ter its occurrence. A system is diagnosable if every possible            • Σf is a finite set of unobservable faulty events,
fault in it is diagnosable. This property provides information
                                                                         • δ ⊆ X × (Σo ∪ Σu ∪ Σf ) × X is the transition relation,
before getting into finding the explanations of the fault. It
also helps in designing a robust system against faults and in            • s0 is the initial state.


                                                                  51
                         Proceedings of the 26th International Workshop on Principles of Diagnosis


   In [2] the authors used an equivalent but more compact               that L(T ) is live (i.e., for any state, there is at least one tran-
representation than LTS for modeling systems in order to                sition issued from this state) and convergent (i.e., there is no
analyze their diagnosability: succinct transition systems,              cycle made up only of unobservable events).
that exploit the regularity in the systems structures and                  A system T is said to be diagnosable iff any fault f ∈ Σf
are expressed in terms of propositional variables, which                is diagnosable in T . In order to avoid exponential complex-
allowed them to translate more easily to a SAT problem the              ity in the number of faults during diagnosability analysis,
twin plant method proposed by [3] for checking diagnos-                 only one fault at a time is checked for diagnosability. It will
ability.                                                                thus be assumed in the following that there exists only one
As we aim at studying the diagnosability of DDES using                  fault event f (Σf = {f }), without restriction on the num-
SAT solvers, we will follow the model of [2] who stud-                  ber of its occurrences. Diagnosability checking has been
ied the same problem in centralized DES. It represents                  proved in [3] to be polynomial in the number |X| of states
the system states by the valuations of a finite set A of                for LTS, so exponential in the number |A| of state variables
Boolean state variables where valuation changes reflect                 for SLTS (actually the problem is NLOGSPACE-complete
the transitions between states according to the events. The             for LTS and PSPACE-complete for SLTS [4]).
set of all literals issued from A is L = A ∪ {¬a|a ∈ A}
and L is the language over A that consists of all formulas              2.2 SLTS Diagnosability as Satisfiability
that can be formed from A and the connectives ∨ and                     An immediate rephrasing of the definition 3 shows that T is
¬. We use the standard definitions of further connec-                   non diagnosable iff it exists a pair of trajectories correspond-
tives Φ ∧ Ψ ≡ ¬(¬Φ ∨ ¬Ψ), Φ → Ψ ≡ ¬Φ ∨ Ψ and                            ing to cycles (and thus to infinite paths), a faulty one and
Φ ↔ Ψ ≡ (Φ → Ψ) ∧ (Ψ → Φ). The transition relation                      a correct one, sharing the same observable events. Which
is defined to allow two or more events to take place                    is equivalent to the existence of an ambiguous (i.e. made
simultaneously. Thus each event is described by a set of                up of pairs of states respectively reachable by a faulty path
pairs hφ, ci which represent its possible ways of occurrence            and a correct path) cycle in the product of T by itself, syn-
by indicating that the event can be associated with changes             chronized on observable events, which is at the origin of the
c ∈ 2L in states that satisfy the condition φ ∈ L.                      so called twin plant structure introduced in [3]. This non
                                                                        diagnosability test was formulated in [2] as a satisfiability
Definition 2. A Succinct Transition System (SLTS)                       problem in propositional logic. We recall below this encod-
is described by a tuple T = hA, Σo , Σu , Σf , δ, s0 i where:           ing with the variables and the formulas used, where super-
   • A is a finite set of state variables,                              scripts t refer to time points and (eto ) and (êto ) refer respec-
                                                                        tively to the faulty and correct events occurrences sequences
   • Σo is a finite set of observable correct events,
                                                                        (corresponding states being described by valuations of (at )
   • Σu is a finite set of unobservable correct events,                 and (ât )) of a pair of trajectories witnessing non diagnos-
   • Σf is a finite set of unobservable faulty events,                  ability (so sharing the same observable events represented
                                       L                                by (et ) and forming a cycle). The increasing of the time
   • δ : Σ = Σo ∪ Σu ∪ Σf → 2L×2 assigns to each event                  step corresponds to the triggering of at least one transition
      a set of pairs hφ, ci,                                            and the extension by an event of at least one of the two tra-
   • s0 is the initial state (a valuation of A).                        jectories. T = hA, Σu , Σo , Σf , δ, s0 i being an SLTS, the
It is straightforward to show that any LTS can be repre-                propositional variables are thus:
sented as an SLTS (one takes dlog(|X|)e Boolean variables                 • at and ât for all a ∈ A and t ∈ {0, . . . , n},
and represents states by different valuations of these vari-              • eto for all e ∈ Σo ∪ Σu ∪ Σf , o ∈ δ(e) and t ∈ {0, . . . ,
ables; one assigns to each occurence of an event e labeling                 n − 1},
a transition (x, e, y) a pair hφ, ci, with φ expressing the
valuation of x and c the valuation changes between x and                  • êto for all e ∈ Σo ∪ Σu , o ∈ δ(e) and t ∈ {0, . . . ,
y). And reciprocally any SLTS can be mapped to an LTS                       n − 1},
(see Definition 2.4 in [2]).                                              • et for all e ∈ Σo and t ∈ {0, . . . , n − 1}.
The formal definition of diagnosability of a fault f in a
                                                                        The following formulas express the constraints that must be
centralized system modeled by (an LTS or SLTS) T was
                                                                        applied at each time step t or between t and t + 1.
proposed by [1] as follows:
                                                                          1. The event occurrence eto must be possible in the current
Definition 3. Diagnosability.         A fault f is diagnos-                  state:
able in a system T iff
                                                                                           eto → φt     for o = hφ, ci ∈ δ(e)         (2.1)
      ∃k ∈ N, ∀sf ∈ L(T ), ∀t ∈ L(T )/sf , |t| ≥ k ⇒
                                                                             and its effects must hold at the next time step:
         ∀p ∈ L(T ), (P (p) = P (sf .t) ⇒ f ∈ p).
                                                                                          ^
In this formula, L(T ) denotes the prefix-closed language of                        eto →     lt+1 for o = hφ, ci ∈ δ(e)              (2.2)
T whose words are called trajectories, sf any trajectory end-                              l∈c
ing by the fault f , L(T )/s the post-language of L(T ) after
s, i.e., {t ∈ Σ∗ |s.t ∈ L(T )} and P the projection of tra-                  We have the same formulas with êto .
jectories on observable events. The above definition states               2. The present value (T rue or F alse) of a state variable
that for each trajectory sf ending with fault f in T , for each              changes to a new value (F alse or T rue, respectively)
t that is an extension of sf in T with enough events, every                  only if there is a reason for this change, i.e., because of
trajectory p in T that is equivalent to sf .t in terms of obser-             an event that has the new value in its effects (so, change
vation should contain in it f . As usual, it will be assumed                 without reason is prohibited). Here is the change from


                                                                   52
                              Proceedings of the 26th International Workshop on Principles of Diagnosis


       T rue to F alse (the change from F alse to T rue is de-                    3.1 DDES Modeling
       fined similarly by interchanging a and ¬a):                                In order to model DDES with SLTS, we need to extend
              (at ∧ ¬at+1 ) → (eti1 o      ∨ · · · ∨ etik o )        (2.3)        these ones by adding communication events to each com-
                                      j1                 jk
                                                                                  ponent. So we use the following definition for a distributed
       where the ojl = hφjl , cjl i ∈ δ(eil ) are all the occur-                  SLTS with k different components (sites):
       rences of events eil with ¬a ∈ cji .
       We have the same formulas with ât and êtil o .                           Definition 4.      A Distributed Succinct Transition
                                                                                  System (DSLTS) with k components is described by a tuple
                                                              jl

  3. At most one occurrence of a given event can occur at                         T = hA, Σo , Σu , Σf , Σc , δ, s0 i where (subscripts i refer to
     a time and the occurrences of two different events can-                      component i):
     not be simultaneous if they interfere (i.e., if they have
     two contradicting effects or if the precondition of one                        • A is a union of disjoint finite sets (Ai )1≤i≤k of com-
     contradicts the effect of the other):                                            ponent own state variables, A = ∪ki=1 Ai ,
         ¬(eto ∧ eto0 ) ∀e ∈ Σ, ∀{o, o0} ⊆ δ(e), o 6= o0 (2.4)                      • Σo is a union of disjoint finite sets of component own
                                                                                      observable correct events, Σo = ∪ki=1 Σoi ,
        ¬(eto ∧ e0t
                 o0 ) ∀{e, e0} ⊆ Σ, e 6= e0, ∀o ∈ δ(e),
                                                                                    • Σu is a union of disjoint finite sets of component own
              ∀o0 ∈ δ(e0) such that o and o0 interfere               (2.5)            unobservable correct events, Σu = ∪ki=1 Σui ,
     We have the same formulas with êto .                                          • Σf is a union of disjoint finite sets of component own
  4. The formulas that connect the two events sequences                               unobservable faulty events, Σf = ∪ki=1 Σf i ,
     require that observable events take place in both se-
                                                                                    • Σc is a union of finite sets of (observable or unobserv-
     quences whenever they take place (use of et ):
         _                  _                                                         able) correct communication events, Σc = ∪ki=1 Σci ,
              eto ↔ et and      êto ↔ et ∀e ∈ Σo (2.6)                               which are the only events shared by at least two differ-
          o∈δ(e)                 o∈δ(e)                                               ent components (i.e., ∀i, ∀c ∈ Σci , ∃j 6= i, c ∈ Σcj ),
                                                                                    • δ = (δi ), where δi : Σi = Σoi ∪ Σui ∪ Σf i ∪ Σci →
  The conjunction of all the above formulas for a given t is                                Li
denoted by T (t, t + 1).                                                              2Li ×2 , assigns to each event a set of pairs hφ, ci in
A formula for the initial state s0 is:                                                the propositional language of the component where it
           ^                         ^                                                occurs (so, for communication events, in each compo-
 I0 =              (a0 ∧â0 ) ∧         (¬a0 ∧¬â0 ) (2.7)                            nent separately where they occur),
         a∈A,s0 (a)=1               a∈A,s0 (a)=0
                                                                                    • s0 = (s0i ) is the initial state (a valuation of each Ai ).
   At last, the following formula can be defined to encode
                                                                                  In this distributed framework, synchronous communication
the fact that a pair of executions is found with the same ob-
                                                                                  is assumed, i.e., communication events are synchronized
servable events and no fault in one execution (first line), but
                                                                                  such that they all occur simultaneously in all components
one fault in the other (second line), which are infinite (in
                                                                                  where they appear. More precisely, a transition by a com-
the form of a non trivial cycle, so containing at least one
                                                                                  munication event c may occur in a component iff a simul-
observable event, 1 at step n; third line), witnessing non di-
                                                                                  taneous transition by c occurs in all the other components
agnosability:
                                                                                  where c appears (has at least one occurrence). In particular,
        ΦTn =       I0 ∧ T (0, 1) ∧ · · · ∧ T (n − 1, n) ∧                        all events before c in trajectories in all these components
                        n−1
                        _     _     _                                             necessarily occur before all events after c in these trajecto-
                                           eto   ∧                                ries. The global model of the system is thus nothing else that
                        t=0 e∈Σf o∈δ(e)
                                                                                  the product of the models of the components, synchronized
                                                                                  on communication events. Notice that we allow in whole
    n−1
    _         ^                                      n−1
                                                     _        _                   generality communication events to be, partially or totally,
          (       ((an ↔ am ) ∧ (ân ↔ âm )) ∧                    et )           unobservable, so one has in general to wait further obser-
    m=0 a∈A                                          t=m e∈Σo                     vations to know that some communication event occurred
   From this encoding in propositional logic, follows the re-                     between two or more components. On the other side, as-
sult (theorem 3.2 of [2]) that an SLTS T is not diagnosable                       suming these communications to be faultless is not actually
if and only if ∃n ≥ 1, ΦTn is satisfiable. It is also equivalent                  a limitation. If a communication process or protocol may be
to ΦT22|A| being satisfiable, as the twin plant states number is                  faulty, it has just to be modeled as a proper component with
an obvious upper bound for n, but often impractically high                        its own correct and faulty behaviors (the same that, e.g., for
(see in [2] some ways to deal with this problem).                                 a wire in an electrical circuit). In this sense, communica-
                                                                                  tions between components are just a modeling concept, not
3 Using SAT in Diagnosability Analysis of                                         subject to diagnosis. It will be also assumed that the observ-
  Distributed Systems                                                             able information is global, i.e. centralized (when observable
                                                                                  information is only local to each component, distributed di-
We extend from centralized systems to distributed systems                         agnosability checking becomes undecidable [5]), allowing
the satisfiability framework of subsection 2.2 for testing di-                    to keep definition 3 for diagnosability.
agnosability and we provide some experimental results.
   1                                                                              3.2 DSLTS Diagnosability as Satisfiability
     This verification that the cycle found is not trivial was not done
in [2]; it is why the authors had to add for each time point a for-               Let T be a DSLTS made up of k components denoted by
mula, not needed here, guaranteeing that at least one event took                  indexes i, 1 ≤ i ≤ k. In order to express the diagnosability
place, to avoid silent loops with no state change.                                analysis of T as a satisfiability problem, we have to extend


                                                                             53
                               Proceedings of the 26th International Workshop on Principles of Diagnosis


the formulas of subsection 2.2 to deal with communication                        We have tested our tool on small examples with sev-
events between components. Let Σc = Σco ∪ Σcu be the                          eral communication events with multiple occurrences (three
communication events, with Σco = ∪ki=1 Σco i the observ-                      communicating components) with global communication
able ones and Σcu = ∪ki=1 Σcu i the unobservable ones.                        (all components share the same event) or partial commu-
    The idea is to treat each communication event as any                      nication (only some components share the same event), as
other event in each of its owners and, as it has been done                    in Figure 1, which was the running example in [7].
with events et for e ∈ Σo for synchronizing observable
events occurrences in the two executions, to introduce in the
same way a global reference variable for each communica-
tion event at each time step, in charge of synchronizing any
communication event occurrence in any of its owner with
occurrences of it in all its other owners. We use one such
reference variable for each trajectory, et and êt , for unob-
servable events e ∈ Σcu , and only one for both trajectories,
et , for observable events e ∈ Σco as it will also in addition
play the role of synchronizing observable events between
trajectories exactly as the et for e ∈ Σo . So, we add to the
previous propositional variables the new following ones:
   • eto , êto for all e ∈ Σc , o ∈ δ(e) = ∪i δi (e) and
     t ∈ {0, . . . , n − 1},
   • et for all e ∈ Σc ,              êt     for all e ∈ Σcu     and
     t ∈ {0, . . . , n − 1}.                                                  Figure 1: A DDES made up of 3 components C1, C2 and
Formulas in T (t, t + 1) are extended as follows.                             C3 from left to right. ci ,1≤i≤2 are unobservable communi-
                                                                              cation events, oi ,1≤i≤5 are observable events and fi ,1≤i≤2
  1. Formulas (2.1), (2.2), (2.3) and (2.5) extend unchanged                  are faulty events.
     to eto and êto ∀e ∈ Σc , expressing that a communication
     event must be possible and has effects in each of its                       The total number of propositional variables V arsN um
     owner components and that two such different events                      in the generated formula ΦTn after n steps is:
     cannot be simultaneous if they interfere.                                                                            PObs
                                                                              V arsN um        =    n × (2|A| + 3 i=1 ObOcci +
                                                                              PF aults                    PU nobs
  2. Formulas (2.4) extend to prevent two simultaneous oc-                       i=1     F aultOcci + 2 i=1 U nobOcci ), where:
     currences of a given communication event in the same                     |A| is the total number of state variables,
     owner component, i.e. apply ∀e ∈ Σc , ∀i, ∀{oi , oi 0} ⊆                 Obs the total number of observable events,
     δi (e), oi 6= oi 0 and the same with ê (obviously they do               ObOcci the total number of occurrences of the observable
     not apply to different owner components, by the very                     event ei ,
     definition of communication events).                                     F aults the total number of faults,
  3. Finally, the new following formulas express the com-                     F aultOcci the total number of occurrences of the faulty
     munication process itself, i.e. the synchronization of                   event ei ,
     the occurrences of any communication event e in all its                  U nobs the total number of unobservable correct events,
     owners components (S(e) being the set of indexes of                      U nobOcci the total number of occurrences of the unob-
     the owners components of e) and extend also formulas                     servable correct event ei .
     (2.6) to observable communication events:                                The results are in Table 1, where the columns show the
                                                                              system and the fault considered (3 cases), the steps number
   _                            _                                             n, the numbers of variables and clauses and the runtime.
             etoi ↔ et and                êtoi ↔ êt   ∀e ∈ Σcu ∀i ∈ S(e)
oi ∈δi (e)                   oi ∈δi (e)                                        System       Fault   |Steps|   SAT?   |Variables|   |Clauses|   runtime(ms)
   _                            _                                              C2           f2      4         No     106           628         27
             etoi ↔ et and                êtoi ↔ et    ∀e ∈ Σco ∀i ∈ S(e)     C2           f2      5         Yes    131           783         15
                                                                               C2, C3       f2      5         No     225           1157        28
oi ∈δi (e)                   oi ∈δi (e)                                        C2, C3       f2      32        No     1386          7340        641
                                                                               C2, C3       f2      64        No     2762          14668       1422
The formula ΦTn is unchanged except that, in the verification                  C2, C3       f2      128       No     5514          29324       5061
                                                                               C2, C3       f2      256       No     11018         58636       18970
that the found cycle (third line) is not trivial, any observable               C2, C3       f2      512       No     22026         117260      130164
event can be used, so the final disjunct of events et is ex-                   C2, C3       f2      1024      No     44042         234508      548644
tended to all e ∈ Σo ∪ Σco . We have thus the result that a                    C1, C2, C3   f1      8         No     576           3546        91
                                                                               C1, C2, C3   f1      9         Yes    646           3987        110
DSLTS T is not diagnosable if and only if ∃n ≥ 1, ΦTn is
satisfiable.                                                                          Table 1: Results on the example of Figure 1.
                                                                                 Which means that f 2 is not diagnosable in C2 alone
3.3 Implementation and Experimental Testing                                   while it becomes diagnosable when synchronizing C2 and
We have implemented the above extension in Java. We used                      C3. For this last result, we have increased the steps number
the well designed API of the SAT solver Sat4j [6]. If more                    until reaching 22|A| , which is the theoretical upper bound of
efficient solvers could have been chosen, it fitted well our                  the twin plant states represented in the logical formula. As
clause generator written in Java and only a limited speed                     in general it is not always possible to reach this bound in
up can be awaited from C++ solvers (a speed up of 4, i.e.                     practice, we propose in section 4 using incremental SAT to
reduction of 75% of the runtime is often observed).                           improve the management of increasing steps number. While


                                                                         54
                         Proceedings of the 26th International Workshop on Principles of Diagnosis


f 1 is not diagnosable even after synchronizing all three                of the behavior of both trajectories (represented by the con-
components together. Numbers of variables and clauses are                junction of formulas T (t, t+1), 0 ≤ t ≤ n−1, representing
small in comparison to what SAT solvers can handle (up to                the (t + 1)th step). The second part Dn describes the diag-
hundred thousands propositional variables and millions of                nosability property at step n, i.e., the occurrence of a fault
clauses). These tests are mentioned as a proof of concept.               in the n previous steps of the faulty trajectory (given by the
However, to test the tool on larger systems and because of               formula Fn ) and the detection of a cycle at step n (given by
the absence of benchmark in the literature, we have created              the formula Cn ). So we obtain, for n ≥ 1:
in subsection 4.2. an example that can be scaled up.
                                                                         ΦTn = Tn ∧ Dn
4 Adaptation to Incremental SAT                                                           n−1
                                                                                          ^
                                                                         Tn = I0 ∧              T (t, t + 1)           Dn = Fn ∧ Cn
  Diagnosability Checking                                                                 t=0
We adapt satisfiability algorithms for checking diagnosabil-                    n−1
                                                                                _         _       _
ity of both centralized (subsection 2.2) and distributed (sub-           Fn =                            eto
section 3.2) DES in order to incrementally process the max-                     t=0 e∈Σf o∈δ(e)
imum length of paths with cycles searched for witnessing                        n−1                                                 n−1
non diagnosability and we provide experimental results.                         _         ^                                         _     _
                                                                         Cn =         (         ((an ↔ am ) ∧ (ân ↔ âm )) ∧                  et )
4.1 Diagnosability as Incremental Satisfiability                                m=0 a∈A                                             t=m e∈Σo

Two cases have to be distinguished while testing diagnos-                Add now at each step j a control variable hj allowing to
ability using SAT solvers to verify the satisfiability of the            disable (when its truth value is F alse) or activate (when its
logical formula ΦTn for a given n [2]. The first case is when            truth value is T rue) the formulas Fj and Cj and keep at step
we find a model for ΦTn , which definitely indicates the non             n all these controlled formulas for 1 ≤ j ≤ n. We obtain
diagnosability of the studied fault. The second case is when             the following ΨTn formula, for n ≥ 1:
we do not find such a model: this result indicates just that the                                n
                                                                                                ^
studied fault has not been found non diagnosable according                  ΨTn = Tn ∧                Dj 0     Dj 0 = Fj 0 ∧ Cj 0   1≤j≤n
to the value of n. In other words, after testing all the possible                               j=1
first n steps, we did not find a pair of executions of length
at most n containing cycles such that one of them contains                  Fj 0 = ¬hj ∨ Fj                    Cj 0 = ¬hj ∨ Cj      1≤j≤n
the fault and not the other and such that the two executions             We have thus the equivalence, for all n ≥ 1:
are equivalent in terms of observation. However, as the the-
                                                                                                                        n−1
                                                                                                                        ^
oretical upper bound n = 22|A| which would guarantee that
                                                                                                ΦTn ≡ ΨTn ∧ hn ∧              ¬hj
the fault is actually diagnosable is often in practice unreach-
                                                                                                                        j=1
able, such a pair may exist for a greater value of n. Testing
it means increasing n and rebuilding the logical formula ΦTn             This allows one, for all n ≥ 1, to replace the SAT call on
then recalling the SAT solver.                                           ΦTn by a SAT call on ΨTn under the control variables set-
   Instead, we propose to adapt the formula ΦTn in order to              ting given by Hn = {¬h1 , . . . , ¬hn−1 , hn } (indicated in a
be tested in an incremental SAT mode by multiple calls to                second argument of the call):
a Conflict Driven Clause Learning (CDCL) solver. Using
CDCL solvers in a specialized, incremental, mode is rela-                                     SAT (ΦTn ) = SAT (ΨTn , Hn )
tively new but already widely used [8] in many applications.             The idea is now to consider the control variables hj as as-
In this operation mode, the solver can be called many times              sumptions and use incremental SAT calls IncSATj under
with different formulas. However, solvers are designed to                varying assumptions, for 1 ≤ j ≤ n. For this, we use
work with similar formulas, where clauses are removed and                the following recurrence relationships for both formulas ΨTj
added from calls to calls. Learnt clauses can be kept as soon            and assumptions Hj :
as the solver can ensure that clauses used to derive them are
not removed. This is generally done by adding specialized                 ΨT0 = I0         ΨTj+1 = ΨTj ∧ T (j, j + 1) ∧ Dj+1 0            j≥0
variables, called assumptions, to each clause that can be re-             H1 = {h1 } Hj+1 = Hj [{¬hj , hj+1 }] j ≥ 1
moved. By assuming the variable to be F alse, the clause
is activated and by assuming the variable to be T rue, the               where the notation Hj [{assi }] means updating in Hj
clause is trivially satisfied and no longer used by the solver.          assumptions hi by their new settings assi , i.e., in the
What is interesting for our purpose is that the CDCL solver              formula above, replacing the truth value of hj , which was
can save clauses learnt during the previous calls and test               T rue, by F alse, and adding the new assumption hj+1
multiple assumptions in each new call. This means that af-               with truth value T rue. From these relationships, the unique
ter n steps we hope that the solver will have learnt some                call to SAT under given assumptions SAT (ΨTn , Hn ) can
constraints about the behavior of the system. Although we                be replaced, starting with the set of clauses I0 , by multiple
are interested in testing the diagnosability property on a de-           calls, 0 ≤ j ≤ n − 1, to an incremental SAT under varying
fined system, this property is independent from the system               assumptions:
behavior which can be learnt by the solver from the previous               IncSATj+1 (N ewClausesj+1 , N ewAssumptionsj+1 )
calls.
   In order to extend the clauses representation given in sub-             = IncSATj+1 (T (j, j + 1) ∧ Dj+1 0 , {¬hj , hj+1 }) (4.1)
sections 2.2 and 3.2 to this mode of operation, we propose               If IncSATj answers SAT, the search is stopped as non diag-
to divide the formula ΦTn in two parts. The first part Tn de-            nosability is proved, if it answers UNSAT, then IncSATj+1
scribes the first n steps, synchronized on the observations,             is called.


                                                                    55
                        Proceedings of the 26th International Workshop on Principles of Diagnosis


   Notice that we used a unique assumption hj for control-             model (for k = 3, 13, 23, 33, 43 and 63). The length of a pair
ling both Fj and Cj as non diagnosability checking requires            of executions with cycles witnessing the non diagnosability
the presence of both a fault occurrence in the faulty trajec-          of f in each example is k + 2 and we consider the satisfia-
tory and of a cycle. But the same framework allows the                 bility of the formula ΦTk+2 , so the number of steps required
independent control of formulas by separate assumptions.               for SAT to provide the answer Yes is: |Steps] = k + 2. In
For sake of simplicity, we also assumed we called IncSAT               order to obtain a fair comparison between IncSAT , which
at each step, but this is not mandatory and indexes j for the          manages internally by handling assumptions the successive
successive calls can be decoupled from indexes t for steps.            satisfiability checks of increasing formulas for j = 1, . . . ,
We should also say that, even if IncSAT allows us to re-               k + 2, and SAT, for which k + 2 successive calls are made to
activate an already disabled clause, we are sure in our case           the solver with respective formulas ΦTn for n = 1, . . . , k+2,
to never use this function (when hk has been set to F alse,            the sum of the k + 2 runtimes of the SAT solver calls are
it always remains so) and we can thus force the solver to              considered in this case (last column in the tables).
do a hard simplification process that removes the forgotten
clauses permanently. As a result of our adaptation we will                    |Steps|   |Clauses|     Inc. SAT(s)     SAT(s)
be able to scale up the size of the tested system and the dis-
tance and length of a cycle witnessing non diagnosability.                    20        42,614        1.5             1.3
                                                                              30        131,714       10.3            13.1
4.2 Experimental Results                                                      40        303,736       49.3            77.8
                                                                              50        576,466       106             223
We show in this subsection a comparison between our                           60        970,156       320             699
adapted version of subsection 4.1, that uses incremental                      100       4,334,018     9410            13040
SAT, and the previous versions, for centralized model (sub-
section 2.2 following [2]) and for distributed model (subsec-
tion 3.2). We have created the example in Figure 2 which                  Table 2: Results on the faulty component of Figure 2.
contains 2k + 1 components: one faulty component and
two sets of k neighboring components. The faulty compo-
nent has two separated paths, each one containing k differ-             |Steps|   |Comps|      |Clauses|    Inc. SAT(s)     SAT(s)
ent successive unobservable events ci and ending with the
                                                                        5         7            1,962        0.04            0.06
same observable cycle of length 1, but only one of them
                                                                        15        27           30,313       0.8             0.5
contains the fault. The centralized model will be limited to
                                                                        25        47           113,906      6.5             4.8
this faulty component alone and thus in this case the events
                                                                        35        67           277,873      33.8            33.7
ci , 1 ≤ i ≤ 2k, are just unobservable events as is u. In
                                                                        45        87           542,033      111             132
the distributed model, these events ci are communication
                                                                        65        127          1,490,590    967             1090
events and the faulty component is considered with the other
two sets of components, where each component in both sets                   Table 3: Results on the whole system of Figure 2.
shares one event ci with the faulty component to ensure a
number 2k of communications before arriving to the cycles
that will witness the non diagnosability of the fault. Each               Although these examples remain relatively simple and do
set of components will be synchronized with only one path,             not reflect any potential constraint that could be resumed by
either the faulty path or the correct one. This allows us to           some learnt clauses (e.g. no interfering events), we can al-
study the effect of the cycle distance in both models.                 ready notice the difference in runtime in favor of our incre-
                                                                       mental version in the centralized case and for the two largest
                                                                       values of k in the distributed case. This difference could be
                                                                       explained by the fact that generating all variables from the
                                                                       beginning for all time steps and for all events imply many
                                                                       meaningless clauses that would add a load on the solver in
                                                                       the version in [2], this load being avoided in our incremen-
                                                                       tal version because of the clauses learnt by the CDCL SAT
                                                                       solver. From another side, we should say that generating in
                                                                       both versions all variables from the beginning has two main
                                                                       advantages: firstly, it allows the system description without
                                                                       unfolding it (even if this description is verbose); secondly,
                                                                       it allows the ordering of these variables by their time step
                                                                       in order to generate the constraints for only one time step
                                                                       and then get next steps constraints by just shifting the num-
                                                                       bers (as we are representing the clauses in DIMACS for-
                                                                       mat). One last point could help to a more efficient descrip-
                                                                       tion of the system: in the succinct systems we represent all
                                                                       the occurrences of an event together, but in its SAT encod-
Figure 2: One faulty component that communicates with                  ing we “unfold” this succinctness by generating for each
two sets of k components. Each set communicates with one               occurrence n variables (for n time steps), even though log-
path (resp. faulty and correct) in the faulty component.               ically only one of them will be assigned to True. We could
                                                                       thus mark this relation among these n copies by introducing
  The results are in Table 2 for the centralized model (for k          a global cardinality constraint to express that these copies
= 18, 28, 38, 48, 58 and 98) and in Table 3 for the distributed        belong to only one occurrence of an event.


                                                                  56
                         Proceedings of the 26th International Workshop on Principles of Diagnosis


5 Selection of Related Works                                            faults.
                                                                           The work by [11] has optimized the construction of lo-
The first introduction to the notion of diagnosability was by           cal twin plants, by exploiting the fact that one distinguishes
[1]. The authors studied diagnosability of FSM, as defined
                                                                        two behaviors (faulty and correct) and one synchronizes at
in definition 1. Their formal definition of diagnosability is           two levels (observations first and communications later). It
the one we mentioned in definition 3. They introduced an                improved the construction of the twin plants proposed by
approach to test this property by constructing a deterministic          [7] by exploiting the different identifiers given to the com-
diagnoser. However, in the general case, this approach is               munication events at the observation synchronization level
exponential in the number of states of the system, which                (depending on which instance, left or right, they belong to)
makes it impractical.                                                   to assign them directly to the two behaviors studied (left
   In order to overcome this limitation [3] introduced the              copy assigned to the faulty behavior, right copy to the cor-
twin plant approach, which is a special structure built by              rect one). This helped in deleting the redundant informa-
synchronizing on their observable events two identical in-              tion, then in abstracting the amount of information to be
stances of a nondeterministic fault diagnoser, and then                 transferred later to next steps if the diagnosability was not
searched for a path in this structure with an observed cy-              answered. The generalization to fault patterns in DDES was
cle made up of ambiguous states, i.e. states that are pairs             introduced by [12].
of original states, one reached by going through a fault and               After the reduction of diagnosability problem to a path
the other not. Thus faults diagnosability is equivalent to the          finding problem by [3], it became transferable to a satis-
absence of such a path, called a critical path. This approach           fiability problem like it is the case for planning problems
turns the diagnosability problem in a search for a path with            [13]. This was done by [2] which formulated the diagnos-
a cycle in a finite automaton, and this reduces its complexity          ability problem (in its twin plant version) into a SAT prob-
to be polynomial of degree 4 in the number of states (and ex-           lem, assuming a centralized DES with simple fault events.
ponential in the number of faults, but processing each fault            The authors represented the studied transition system by a
separately makes its linear in the number of faults).                   succinct representation (cf. definition 2). This allows both
   Let us mention here that the two previous works were in-             a compact representation of the system states and a max-
terested in centralized systems with simple faults modeled              imum amount of non interfering events to be fired simul-
as distinguished events. The first studies about fault pat-             taneously. Thus, they represented the system states by the
terns were introduced in [9] and [10] which generalize the              valuation of a set of Boolean state variables (dlog(q)e state
simple fault event in a centralized DES to handle a sequence            variables for q states) and the interference relation between
of events considered together as a fault, or handle multiple            two events according to the consistency among their effects
occurrences of the same fault or of different faults. More              and preconditions, one versus the other. They distinguished
generally, a fault pattern is given as a suffix-closed rational         between an occurrence of an event in the faulty sequence or
events language (so by a complete deterministic automaton               in the correct sequence by introducing two versions of it and
with a stable subset of final states).                                  constructed the logical formula expressing states transitions
   The first work that addressed diagnosability analysis in             for each possible step in the system. Each step may con-
DDES was [7]. A DDES is modeled as a set of communicat-                 tain simultaneous events that belong to faulty and correct
ing FSM. Each FSM has its own events set, communication                 sequences but must synchronize the occurrence of observ-
events being the only ones shared by at least two different             able events whenever they take place. For a given bound n
FSM. In [7] was introduced an incremental diagnosability                of paths length, they made the conjunct of these formulas
test which avoids to build the twin plant for the whole dis-            for n steps and added the logical formula that represents the
tributed system if not needed. Thus one starts by building              occurrence of the fault in the faulty sequence and the oc-
a local twin plant for the faulty component to test the exis-           currence of a cycle in both sequences. The satisfiability of
tence of a local critical path. If such a path exists one builds        the obtained formula is equivalent to finding a critical path,
the local twin checkers of the neighboring components. Lo-              i.e. to the non diagnosability of the fault (see subsection 2.2
cal twin checker is a structure similar to local twin plant,            for a summary of this approach). Although this approach
i.e., where each path in it represents a pair of behaviors with         allows one to test diagnosability in large systems, it has a
the same observations, except that there is no fault infor-             limitation which is that we cannot dynamically increase n
mation in it since it is constructed from non-faulty compo-             to ensure reaching more states while scaling up the size of
nent. After constructing local twin checkers, one tries to              the system where the cycles that witness non diagnosabil-
solve the ambiguity resulting from the existence of a critical          ity can be very long. However the authors notice that we
path in the local twin plant. This is done by synchronizing             are not always forced to test all reachable states in many
on their communication events this local twin plant with the            cases where an approximation for the reachable states can
local twin checker of one neighboring component. In other               be applied, but without explaining explicitly how such an
words, one tries to distinguish the faulty path from the cor-           approximation can be found.
rect one by exploiting the observable events in the neigh-
boring components, because theses events occurrences that
are consistent with the occurrences of the communication
                                                                        6 Conclusion and Future Works
events could solve the ambiguity. The process is repeated               By extending the state of the art works for centralized DES,
until the diagnosability is answered, so only in the worst              we have expressed diagnosability analysis of DDES as a
case has the whole system to be visited. Another impor-                 satisfiability problem by building a propositional formula
tant contribution in this work was to delete the unambigu-              whose satisfiability, witnessing non diagnosability, can be
ous parts after each synchronization on the communication               checked by SAT solvers. We allow both observable and
events, reducing thus the amount of information transferred             non observable communication events in our model. Our
to next check (if needed). The approach assumed simple                  expression of these communication events, which avoids


                                                                   57
                        Proceedings of the 26th International Workshop on Principles of Diagnosis


merging all their owner components, helps in reducing the              [2] J. Rintanen and A. Grastien. Diagnosability testing
number of clauses used to represent them and this reduction                 with satisfiability algorithms. In Proceedings of the
is proportional to the number of their occurrences. We have                 20th International Joint Conference on Artificial Intel-
also proposed an adaptation of the logical formula in order                 ligence (IJCAI’07), pages 532–537, 2007.
to use incremental SAT calls helping managing the scaling              [3] S. Jiang, Z. Huang, V. Chandra, and R. Kumar. A poly-
up of the distance and the length of the intended cycles                    nomial algorithm for testing diagnosability of discrete-
witnessing non diagnosability and thus the size of the tested               event systems. IEEE Transactions on Automatic Con-
system. Thus we exploited the clauses learnt about the                      trol, 46(8):1318–1321, 2001.
system behavior in the previous calls. This approach is
more practical and more efficient for complex systems than             [4] J. Rintanen. Diagnosers and diagnosability of succinct
existing ones, as it avoids starting from scratch at each call.             transition systems. In Proceedings of the 20th Interna-
                                                                            tional Joint Conference on Artificial Intelligence (IJ-
   We are now considering the extension of this work to                     CAI’07), pages 538–544, 2007.
fault patterns diagnosability [12]. We will use the same ap-           [5] L. Ye and P. Dague. Undecidable case and decidable
proach to express predictability analysis [14] as a satisfia-               case of joint diagnosability in distributed discrete event
bility problem, for DES and DDES [15] and both for simple                   systems. International Journal On Advances in Sys-
fault events and fault patterns [16] . Although our represen-               tems and Measurements, 6(3 and 4):287–299, 2013.
tation can be easily extended to deal with local observations          [6] D. Le Berre and A. Parrain. The sat4j library, release
(i.e., observable events in one component are observed only                 2.2. Journal on Satisfiability, Boolean Modeling and
by this component), we know that in general diagnosability                  Computation, 7:59–64, 2010.
checking becomes then undecidable, e.g. when communica-
tion events are unobservable (obviously it remains decidable           [7] Y. Pencolé. Diagnosability analysis of distributed dis-
when these events are observable in all their owners) [5]. A                crete event systems. In Proceedings of the 16th Euro-
future work will be to study decidable cases of diagnosabil-                pean Conference on Artificial Intelligence (ECAI’04),
ity checking in DDES with local observations, e.g. assum-                   2004.
ing some well chosen communication events being observ-                [8] A. Nadel and V. Ryvchin. Efficient SAT solving under
able. Another natural question is to study if the methods                   assumptions. In Proceedings of the 15th International
used in [7] and refined in [11] to check diagnosability in                  Conference on Theory and Applications of Satisfiabil-
DDES in an incremental way in terms of the system com-                      ity Testing (SAT’12), 2012.
ponents could be transposed as guiding strategies for some             [9] T. Jéron, H. Marchand, S. Pinchinat, and M.-O.
component incremental SAT based approach for testing di-                    Cordier. Supervision patterns in discrete event sys-
agnosability in DDES. Transposing in SAT these methods,                     tems diagnosis. In Proceedings of the 8th International
based on building a local twin plant and local twin check-                  Workshop on Discrete Event Systems, 2006.
ers for gaining efficiency with regards to a global checking,
seems difficult. Basically, at any step k, corresponding to            [10] S. Genc and S. Lafortune. Diagnosis of patterns in
considering a subsystem made up of k components, these                      partially-observed discrete-event systems. In Proceed-
methods build all critical paths witnessing non diagnosabil-                ings of the 45th IEEE Conference on Decision and
ity at the level of this subsystem and the incremental step,                Control, pages 422–427. IEEE, 2006.
when adding a (k + 1)th neighboring component, consists                [11] L. Ye and P. Dague. An optimized algorithm for diag-
in checking the consistency of these pairs with the observa-                nosability of component-based systems. In Proceed-
tions in the new component: only those pairs which can be                   ings of the 10th International Workshop on Discrete
consistently extended are kept, if any. In addition, in [11],               Event Systems (WODES’10), 2010.
only useful and abstracted information is kept from one step           [12] L. Ye, Y. Yan, and P. Dague. Diagnosability for pat-
to the next one. With SAT, only one critical pair witness-
                                                                            terns in distributed discrete event systems. In Proceed-
ing non diagnosability of the subsystem (i.e., a model for
                                                                            ings of the 21st International Workshop on Principles
the formula) will be built. If it is not consistent, and thus
                                                                            of Diagnosis (DX’10), 2010.
disappears, when adding the (k + 1)th component, diagnos-
ability is not proven for all that: other critical pairs in the        [13] H. Kautz and B. Selman. Planning as satisfiability. In
subsystem, not completely computed at step k, may exist                     Proceedings of the 10th European Conference on Ar-
and be extendible to step (k + 1). So, they have to be com-                 tificial Intelligence (ECAI’92), pages 359–363, 1992.
puted now, which limits the incremental characteristic of the          [14] S. Genc and S. Lafortune. Predictability of Event
approach. In the same way, abstracting some information                     Occurrences in Partially-observed Discrete-event Sys-
is difficult to achieve with SAT. So, there is no evidence                  tems. Automatica, 45(2):301–311, 2009.
a priori that efficiency gain could be obtained by trying to
                                                                       [15] L. Ye, P. Dague, and F. Nouioua. Predictability Analy-
develop a component incremental SAT based approach for
testing DDES diagnosability.                                                sis of Distributed Discrete Event Systems. In Proceed-
                                                                            ings of the 52nd IEEE Conference on Decision and
                                                                            Control (CDC-13), pages 5009–5015. IEEE., 2013.
References                                                             [16] T. Jéron, H. Marchand, S. Genc, and S. Lafortune.
                                                                            Predictability of Sequence Patterns in Discrete Event
[1] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamo-                      Systems. In Proceedings of the 17th World Congress,
    hideen, and D. Teneketzis. Diagnosability of discrete-                  pages 537–453. IFAC., 2008.
    event systems. IEEE Transactions on Automatic Con-
    trol, 40(9):1555–1575, 1995.


                                                                  58