=Paper=
{{Paper
|id=Vol-1507/dx15paper7
|storemode=property
|title=Using Incremental SAT for Testing Diagnosability of Distributed DES
|pdfUrl=https://ceur-ws.org/Vol-1507/dx15paper7.pdf
|volume=Vol-1507
|dblpUrl=https://dblp.org/rec/conf/safeprocess/IbrahimDS15
}}
==Using Incremental SAT for Testing Diagnosability of Distributed DES==
Proceedings of the 26th International Workshop on Principles of Diagnosis
Using Incremental SAT for Testing Diagnosability of Distributed DES
Hassan IBRAHIM1 and Philippe DAGUE1 and Laurent SIMON2
1
LRI, Univ. Paris-Sud and CNRS, Orsay, France
hassan.ibrahim@lri.fr, philippe.dague@lri.fr
2
LaBRI, Univ. Bordeaux and CNRS, Bordeaux, France
lsimon@labri.fr
Abstract positioning the sensors to manage the observation require-
ments. The main difficulty in diagnosability algorithms is
We extend in this work the existing approach to related to the states number explosion. Another difficulty
analyse diagnosability in discrete event systems appears when checking diagnosability of a system which
(DES) using satisfiability algorithms (SAT), in or- is actually diagnosable, i.e. the inexistence of a counter-
der to analyse the diagnosability in distributed example witnessing non diagnosability. Thus all possibili-
DES (DDES) and we test this extension. For this, ties need to be tested as for proving the non existence of a
we handle observable and non observable com- plan in a planning problem, and usually in this case some
munication events at the same time. We also pro- approximations are used to avoid exploring all the search
pose an adaptation to use incremental SAT over space.
the existing and the extended approaches to over- The paper is structured as follows. Section 2 will introduce
come some of the limitations, especially concern- the system transition models for centralized DES and recall
ing the length and the distance of the cycles that the traditional definition of the diagnosability in those mod-
witness the non diagnosability of the fault, and els and the state of the art of encoding this definition as a
improve the process of dealing with the reacha- satisfiability problem in propositional logic. Section 3 will
bility limit when scaling up to large systems. present our first contribution, an extension of this state of the
art to DDES with observable and non observable communi-
1 Introduction cation events in the same model, and will give experimental
results of this extension. Section 4 is devoted to our sec-
Diagnosis task is mainly using the available observations to
ond contribution, using incremental SAT calls to overcome
explain the difference between the expected behavior of a
the limitation when the number of steps required to check
system and its real behavior which may contain some faults.
diagnosability, i.e., the length of possible paths with cycles
Many works have been done to study the automatic ap-
witnessing non diagnosability, is large, and will present ex-
proaches to system fault diagnosis. They all try to deal with
perimental results showing how the method scales up. Sec-
the main problem, i.e. the compromise between the number
tion 5 will present related works and section 6 will conclude
of possible diagnoses to the considered faults and the num-
and give our perspectives for future work.
ber of observations which must be given to make the deci-
sion. Diagnosis problem is NP-hard and one always needs
to cope with an explosion in the number of system model 2 Using SAT in Diagnosability Analysis of
states. Moreover, the diagnosis decision is not always cer- Centralized Systems
tain, and thus running a diagnosis algorithm may not be ac-
curate. For example, two sets of observations provided by We recall first the definitions of DES models we use and of
different sets of sensors or at different times may lead to diagnosability for these models.
different diagnoses. This uncertainty raises the problem of
diagnosability which is essential while designing the system 2.1 Preliminaries
model. After that, the model based diagnosis will be used We will use finite state machines (FSM) to model systems.
in applications to explain any anomaly, with a guarantee of We define labeled transition systems following [1].
correctness and precision at least for anticipated faults.
Diagnosability of the considered systems is a property de- Definition 1. A Labeled Transition System (LTS) is
fined to answer the question about the possibility to distin- a tuple T = hX, Σo , Σu , Σf , δ, s0 i where:
guish any possible faulty behavior in the system from any
other behavior without this fault (i.e., correct or with a dif- • X is a finite set of states,
ferent fault) within a finite time after the occurrence of the • Σo is a finite set of observable correct events,
fault. A fault is diagnosable if it can be surely identified
from the partial observation available in a finite delay af- • Σu is a finite set of unobservable correct events,
ter its occurrence. A system is diagnosable if every possible • Σf is a finite set of unobservable faulty events,
fault in it is diagnosable. This property provides information
• δ ⊆ X × (Σo ∪ Σu ∪ Σf ) × X is the transition relation,
before getting into finding the explanations of the fault. It
also helps in designing a robust system against faults and in • s0 is the initial state.
51
Proceedings of the 26th International Workshop on Principles of Diagnosis
In [2] the authors used an equivalent but more compact that L(T ) is live (i.e., for any state, there is at least one tran-
representation than LTS for modeling systems in order to sition issued from this state) and convergent (i.e., there is no
analyze their diagnosability: succinct transition systems, cycle made up only of unobservable events).
that exploit the regularity in the systems structures and A system T is said to be diagnosable iff any fault f ∈ Σf
are expressed in terms of propositional variables, which is diagnosable in T . In order to avoid exponential complex-
allowed them to translate more easily to a SAT problem the ity in the number of faults during diagnosability analysis,
twin plant method proposed by [3] for checking diagnos- only one fault at a time is checked for diagnosability. It will
ability. thus be assumed in the following that there exists only one
As we aim at studying the diagnosability of DDES using fault event f (Σf = {f }), without restriction on the num-
SAT solvers, we will follow the model of [2] who stud- ber of its occurrences. Diagnosability checking has been
ied the same problem in centralized DES. It represents proved in [3] to be polynomial in the number |X| of states
the system states by the valuations of a finite set A of for LTS, so exponential in the number |A| of state variables
Boolean state variables where valuation changes reflect for SLTS (actually the problem is NLOGSPACE-complete
the transitions between states according to the events. The for LTS and PSPACE-complete for SLTS [4]).
set of all literals issued from A is L = A ∪ {¬a|a ∈ A}
and L is the language over A that consists of all formulas 2.2 SLTS Diagnosability as Satisfiability
that can be formed from A and the connectives ∨ and An immediate rephrasing of the definition 3 shows that T is
¬. We use the standard definitions of further connec- non diagnosable iff it exists a pair of trajectories correspond-
tives Φ ∧ Ψ ≡ ¬(¬Φ ∨ ¬Ψ), Φ → Ψ ≡ ¬Φ ∨ Ψ and ing to cycles (and thus to infinite paths), a faulty one and
Φ ↔ Ψ ≡ (Φ → Ψ) ∧ (Ψ → Φ). The transition relation a correct one, sharing the same observable events. Which
is defined to allow two or more events to take place is equivalent to the existence of an ambiguous (i.e. made
simultaneously. Thus each event is described by a set of up of pairs of states respectively reachable by a faulty path
pairs hφ, ci which represent its possible ways of occurrence and a correct path) cycle in the product of T by itself, syn-
by indicating that the event can be associated with changes chronized on observable events, which is at the origin of the
c ∈ 2L in states that satisfy the condition φ ∈ L. so called twin plant structure introduced in [3]. This non
diagnosability test was formulated in [2] as a satisfiability
Definition 2. A Succinct Transition System (SLTS) problem in propositional logic. We recall below this encod-
is described by a tuple T = hA, Σo , Σu , Σf , δ, s0 i where: ing with the variables and the formulas used, where super-
• A is a finite set of state variables, scripts t refer to time points and (eto ) and (êto ) refer respec-
tively to the faulty and correct events occurrences sequences
• Σo is a finite set of observable correct events,
(corresponding states being described by valuations of (at )
• Σu is a finite set of unobservable correct events, and (ât )) of a pair of trajectories witnessing non diagnos-
• Σf is a finite set of unobservable faulty events, ability (so sharing the same observable events represented
L by (et ) and forming a cycle). The increasing of the time
• δ : Σ = Σo ∪ Σu ∪ Σf → 2L×2 assigns to each event step corresponds to the triggering of at least one transition
a set of pairs hφ, ci, and the extension by an event of at least one of the two tra-
• s0 is the initial state (a valuation of A). jectories. T = hA, Σu , Σo , Σf , δ, s0 i being an SLTS, the
It is straightforward to show that any LTS can be repre- propositional variables are thus:
sented as an SLTS (one takes dlog(|X|)e Boolean variables • at and ât for all a ∈ A and t ∈ {0, . . . , n},
and represents states by different valuations of these vari- • eto for all e ∈ Σo ∪ Σu ∪ Σf , o ∈ δ(e) and t ∈ {0, . . . ,
ables; one assigns to each occurence of an event e labeling n − 1},
a transition (x, e, y) a pair hφ, ci, with φ expressing the
valuation of x and c the valuation changes between x and • êto for all e ∈ Σo ∪ Σu , o ∈ δ(e) and t ∈ {0, . . . ,
y). And reciprocally any SLTS can be mapped to an LTS n − 1},
(see Definition 2.4 in [2]). • et for all e ∈ Σo and t ∈ {0, . . . , n − 1}.
The formal definition of diagnosability of a fault f in a
The following formulas express the constraints that must be
centralized system modeled by (an LTS or SLTS) T was
applied at each time step t or between t and t + 1.
proposed by [1] as follows:
1. The event occurrence eto must be possible in the current
Definition 3. Diagnosability. A fault f is diagnos- state:
able in a system T iff
eto → φt for o = hφ, ci ∈ δ(e) (2.1)
∃k ∈ N, ∀sf ∈ L(T ), ∀t ∈ L(T )/sf , |t| ≥ k ⇒
and its effects must hold at the next time step:
∀p ∈ L(T ), (P (p) = P (sf .t) ⇒ f ∈ p).
^
In this formula, L(T ) denotes the prefix-closed language of eto → lt+1 for o = hφ, ci ∈ δ(e) (2.2)
T whose words are called trajectories, sf any trajectory end- l∈c
ing by the fault f , L(T )/s the post-language of L(T ) after
s, i.e., {t ∈ Σ∗ |s.t ∈ L(T )} and P the projection of tra- We have the same formulas with êto .
jectories on observable events. The above definition states 2. The present value (T rue or F alse) of a state variable
that for each trajectory sf ending with fault f in T , for each changes to a new value (F alse or T rue, respectively)
t that is an extension of sf in T with enough events, every only if there is a reason for this change, i.e., because of
trajectory p in T that is equivalent to sf .t in terms of obser- an event that has the new value in its effects (so, change
vation should contain in it f . As usual, it will be assumed without reason is prohibited). Here is the change from
52
Proceedings of the 26th International Workshop on Principles of Diagnosis
T rue to F alse (the change from F alse to T rue is de- 3.1 DDES Modeling
fined similarly by interchanging a and ¬a): In order to model DDES with SLTS, we need to extend
(at ∧ ¬at+1 ) → (eti1 o ∨ · · · ∨ etik o ) (2.3) these ones by adding communication events to each com-
j1 jk
ponent. So we use the following definition for a distributed
where the ojl = hφjl , cjl i ∈ δ(eil ) are all the occur- SLTS with k different components (sites):
rences of events eil with ¬a ∈ cji .
We have the same formulas with ât and êtil o . Definition 4. A Distributed Succinct Transition
System (DSLTS) with k components is described by a tuple
jl
3. At most one occurrence of a given event can occur at T = hA, Σo , Σu , Σf , Σc , δ, s0 i where (subscripts i refer to
a time and the occurrences of two different events can- component i):
not be simultaneous if they interfere (i.e., if they have
two contradicting effects or if the precondition of one • A is a union of disjoint finite sets (Ai )1≤i≤k of com-
contradicts the effect of the other): ponent own state variables, A = ∪ki=1 Ai ,
¬(eto ∧ eto0 ) ∀e ∈ Σ, ∀{o, o0} ⊆ δ(e), o 6= o0 (2.4) • Σo is a union of disjoint finite sets of component own
observable correct events, Σo = ∪ki=1 Σoi ,
¬(eto ∧ e0t
o0 ) ∀{e, e0} ⊆ Σ, e 6= e0, ∀o ∈ δ(e),
• Σu is a union of disjoint finite sets of component own
∀o0 ∈ δ(e0) such that o and o0 interfere (2.5) unobservable correct events, Σu = ∪ki=1 Σui ,
We have the same formulas with êto . • Σf is a union of disjoint finite sets of component own
4. The formulas that connect the two events sequences unobservable faulty events, Σf = ∪ki=1 Σf i ,
require that observable events take place in both se-
• Σc is a union of finite sets of (observable or unobserv-
quences whenever they take place (use of et ):
_ _ able) correct communication events, Σc = ∪ki=1 Σci ,
eto ↔ et and êto ↔ et ∀e ∈ Σo (2.6) which are the only events shared by at least two differ-
o∈δ(e) o∈δ(e) ent components (i.e., ∀i, ∀c ∈ Σci , ∃j 6= i, c ∈ Σcj ),
• δ = (δi ), where δi : Σi = Σoi ∪ Σui ∪ Σf i ∪ Σci →
The conjunction of all the above formulas for a given t is Li
denoted by T (t, t + 1). 2Li ×2 , assigns to each event a set of pairs hφ, ci in
A formula for the initial state s0 is: the propositional language of the component where it
^ ^ occurs (so, for communication events, in each compo-
I0 = (a0 ∧â0 ) ∧ (¬a0 ∧¬â0 ) (2.7) nent separately where they occur),
a∈A,s0 (a)=1 a∈A,s0 (a)=0
• s0 = (s0i ) is the initial state (a valuation of each Ai ).
At last, the following formula can be defined to encode
In this distributed framework, synchronous communication
the fact that a pair of executions is found with the same ob-
is assumed, i.e., communication events are synchronized
servable events and no fault in one execution (first line), but
such that they all occur simultaneously in all components
one fault in the other (second line), which are infinite (in
where they appear. More precisely, a transition by a com-
the form of a non trivial cycle, so containing at least one
munication event c may occur in a component iff a simul-
observable event, 1 at step n; third line), witnessing non di-
taneous transition by c occurs in all the other components
agnosability:
where c appears (has at least one occurrence). In particular,
ΦTn = I0 ∧ T (0, 1) ∧ · · · ∧ T (n − 1, n) ∧ all events before c in trajectories in all these components
n−1
_ _ _ necessarily occur before all events after c in these trajecto-
eto ∧ ries. The global model of the system is thus nothing else that
t=0 e∈Σf o∈δ(e)
the product of the models of the components, synchronized
on communication events. Notice that we allow in whole
n−1
_ ^ n−1
_ _ generality communication events to be, partially or totally,
( ((an ↔ am ) ∧ (ân ↔ âm )) ∧ et ) unobservable, so one has in general to wait further obser-
m=0 a∈A t=m e∈Σo vations to know that some communication event occurred
From this encoding in propositional logic, follows the re- between two or more components. On the other side, as-
sult (theorem 3.2 of [2]) that an SLTS T is not diagnosable suming these communications to be faultless is not actually
if and only if ∃n ≥ 1, ΦTn is satisfiable. It is also equivalent a limitation. If a communication process or protocol may be
to ΦT22|A| being satisfiable, as the twin plant states number is faulty, it has just to be modeled as a proper component with
an obvious upper bound for n, but often impractically high its own correct and faulty behaviors (the same that, e.g., for
(see in [2] some ways to deal with this problem). a wire in an electrical circuit). In this sense, communica-
tions between components are just a modeling concept, not
3 Using SAT in Diagnosability Analysis of subject to diagnosis. It will be also assumed that the observ-
Distributed Systems able information is global, i.e. centralized (when observable
information is only local to each component, distributed di-
We extend from centralized systems to distributed systems agnosability checking becomes undecidable [5]), allowing
the satisfiability framework of subsection 2.2 for testing di- to keep definition 3 for diagnosability.
agnosability and we provide some experimental results.
1 3.2 DSLTS Diagnosability as Satisfiability
This verification that the cycle found is not trivial was not done
in [2]; it is why the authors had to add for each time point a for- Let T be a DSLTS made up of k components denoted by
mula, not needed here, guaranteeing that at least one event took indexes i, 1 ≤ i ≤ k. In order to express the diagnosability
place, to avoid silent loops with no state change. analysis of T as a satisfiability problem, we have to extend
53
Proceedings of the 26th International Workshop on Principles of Diagnosis
the formulas of subsection 2.2 to deal with communication We have tested our tool on small examples with sev-
events between components. Let Σc = Σco ∪ Σcu be the eral communication events with multiple occurrences (three
communication events, with Σco = ∪ki=1 Σco i the observ- communicating components) with global communication
able ones and Σcu = ∪ki=1 Σcu i the unobservable ones. (all components share the same event) or partial commu-
The idea is to treat each communication event as any nication (only some components share the same event), as
other event in each of its owners and, as it has been done in Figure 1, which was the running example in [7].
with events et for e ∈ Σo for synchronizing observable
events occurrences in the two executions, to introduce in the
same way a global reference variable for each communica-
tion event at each time step, in charge of synchronizing any
communication event occurrence in any of its owner with
occurrences of it in all its other owners. We use one such
reference variable for each trajectory, et and êt , for unob-
servable events e ∈ Σcu , and only one for both trajectories,
et , for observable events e ∈ Σco as it will also in addition
play the role of synchronizing observable events between
trajectories exactly as the et for e ∈ Σo . So, we add to the
previous propositional variables the new following ones:
• eto , êto for all e ∈ Σc , o ∈ δ(e) = ∪i δi (e) and
t ∈ {0, . . . , n − 1},
• et for all e ∈ Σc , êt for all e ∈ Σcu and
t ∈ {0, . . . , n − 1}. Figure 1: A DDES made up of 3 components C1, C2 and
Formulas in T (t, t + 1) are extended as follows. C3 from left to right. ci ,1≤i≤2 are unobservable communi-
cation events, oi ,1≤i≤5 are observable events and fi ,1≤i≤2
1. Formulas (2.1), (2.2), (2.3) and (2.5) extend unchanged are faulty events.
to eto and êto ∀e ∈ Σc , expressing that a communication
event must be possible and has effects in each of its The total number of propositional variables V arsN um
owner components and that two such different events in the generated formula ΦTn after n steps is:
cannot be simultaneous if they interfere. PObs
V arsN um = n × (2|A| + 3 i=1 ObOcci +
PF aults PU nobs
2. Formulas (2.4) extend to prevent two simultaneous oc- i=1 F aultOcci + 2 i=1 U nobOcci ), where:
currences of a given communication event in the same |A| is the total number of state variables,
owner component, i.e. apply ∀e ∈ Σc , ∀i, ∀{oi , oi 0} ⊆ Obs the total number of observable events,
δi (e), oi 6= oi 0 and the same with ê (obviously they do ObOcci the total number of occurrences of the observable
not apply to different owner components, by the very event ei ,
definition of communication events). F aults the total number of faults,
3. Finally, the new following formulas express the com- F aultOcci the total number of occurrences of the faulty
munication process itself, i.e. the synchronization of event ei ,
the occurrences of any communication event e in all its U nobs the total number of unobservable correct events,
owners components (S(e) being the set of indexes of U nobOcci the total number of occurrences of the unob-
the owners components of e) and extend also formulas servable correct event ei .
(2.6) to observable communication events: The results are in Table 1, where the columns show the
system and the fault considered (3 cases), the steps number
_ _ n, the numbers of variables and clauses and the runtime.
etoi ↔ et and êtoi ↔ êt ∀e ∈ Σcu ∀i ∈ S(e)
oi ∈δi (e) oi ∈δi (e) System Fault |Steps| SAT? |Variables| |Clauses| runtime(ms)
_ _ C2 f2 4 No 106 628 27
etoi ↔ et and êtoi ↔ et ∀e ∈ Σco ∀i ∈ S(e) C2 f2 5 Yes 131 783 15
C2, C3 f2 5 No 225 1157 28
oi ∈δi (e) oi ∈δi (e) C2, C3 f2 32 No 1386 7340 641
C2, C3 f2 64 No 2762 14668 1422
The formula ΦTn is unchanged except that, in the verification C2, C3 f2 128 No 5514 29324 5061
C2, C3 f2 256 No 11018 58636 18970
that the found cycle (third line) is not trivial, any observable C2, C3 f2 512 No 22026 117260 130164
event can be used, so the final disjunct of events et is ex- C2, C3 f2 1024 No 44042 234508 548644
tended to all e ∈ Σo ∪ Σco . We have thus the result that a C1, C2, C3 f1 8 No 576 3546 91
C1, C2, C3 f1 9 Yes 646 3987 110
DSLTS T is not diagnosable if and only if ∃n ≥ 1, ΦTn is
satisfiable. Table 1: Results on the example of Figure 1.
Which means that f 2 is not diagnosable in C2 alone
3.3 Implementation and Experimental Testing while it becomes diagnosable when synchronizing C2 and
We have implemented the above extension in Java. We used C3. For this last result, we have increased the steps number
the well designed API of the SAT solver Sat4j [6]. If more until reaching 22|A| , which is the theoretical upper bound of
efficient solvers could have been chosen, it fitted well our the twin plant states represented in the logical formula. As
clause generator written in Java and only a limited speed in general it is not always possible to reach this bound in
up can be awaited from C++ solvers (a speed up of 4, i.e. practice, we propose in section 4 using incremental SAT to
reduction of 75% of the runtime is often observed). improve the management of increasing steps number. While
54
Proceedings of the 26th International Workshop on Principles of Diagnosis
f 1 is not diagnosable even after synchronizing all three of the behavior of both trajectories (represented by the con-
components together. Numbers of variables and clauses are junction of formulas T (t, t+1), 0 ≤ t ≤ n−1, representing
small in comparison to what SAT solvers can handle (up to the (t + 1)th step). The second part Dn describes the diag-
hundred thousands propositional variables and millions of nosability property at step n, i.e., the occurrence of a fault
clauses). These tests are mentioned as a proof of concept. in the n previous steps of the faulty trajectory (given by the
However, to test the tool on larger systems and because of formula Fn ) and the detection of a cycle at step n (given by
the absence of benchmark in the literature, we have created the formula Cn ). So we obtain, for n ≥ 1:
in subsection 4.2. an example that can be scaled up.
ΦTn = Tn ∧ Dn
4 Adaptation to Incremental SAT n−1
^
Tn = I0 ∧ T (t, t + 1) Dn = Fn ∧ Cn
Diagnosability Checking t=0
We adapt satisfiability algorithms for checking diagnosabil- n−1
_ _ _
ity of both centralized (subsection 2.2) and distributed (sub- Fn = eto
section 3.2) DES in order to incrementally process the max- t=0 e∈Σf o∈δ(e)
imum length of paths with cycles searched for witnessing n−1 n−1
non diagnosability and we provide experimental results. _ ^ _ _
Cn = ( ((an ↔ am ) ∧ (ân ↔ âm )) ∧ et )
4.1 Diagnosability as Incremental Satisfiability m=0 a∈A t=m e∈Σo
Two cases have to be distinguished while testing diagnos- Add now at each step j a control variable hj allowing to
ability using SAT solvers to verify the satisfiability of the disable (when its truth value is F alse) or activate (when its
logical formula ΦTn for a given n [2]. The first case is when truth value is T rue) the formulas Fj and Cj and keep at step
we find a model for ΦTn , which definitely indicates the non n all these controlled formulas for 1 ≤ j ≤ n. We obtain
diagnosability of the studied fault. The second case is when the following ΨTn formula, for n ≥ 1:
we do not find such a model: this result indicates just that the n
^
studied fault has not been found non diagnosable according ΨTn = Tn ∧ Dj 0 Dj 0 = Fj 0 ∧ Cj 0 1≤j≤n
to the value of n. In other words, after testing all the possible j=1
first n steps, we did not find a pair of executions of length
at most n containing cycles such that one of them contains Fj 0 = ¬hj ∨ Fj Cj 0 = ¬hj ∨ Cj 1≤j≤n
the fault and not the other and such that the two executions We have thus the equivalence, for all n ≥ 1:
are equivalent in terms of observation. However, as the the-
n−1
^
oretical upper bound n = 22|A| which would guarantee that
ΦTn ≡ ΨTn ∧ hn ∧ ¬hj
the fault is actually diagnosable is often in practice unreach-
j=1
able, such a pair may exist for a greater value of n. Testing
it means increasing n and rebuilding the logical formula ΦTn This allows one, for all n ≥ 1, to replace the SAT call on
then recalling the SAT solver. ΦTn by a SAT call on ΨTn under the control variables set-
Instead, we propose to adapt the formula ΦTn in order to ting given by Hn = {¬h1 , . . . , ¬hn−1 , hn } (indicated in a
be tested in an incremental SAT mode by multiple calls to second argument of the call):
a Conflict Driven Clause Learning (CDCL) solver. Using
CDCL solvers in a specialized, incremental, mode is rela- SAT (ΦTn ) = SAT (ΨTn , Hn )
tively new but already widely used [8] in many applications. The idea is now to consider the control variables hj as as-
In this operation mode, the solver can be called many times sumptions and use incremental SAT calls IncSATj under
with different formulas. However, solvers are designed to varying assumptions, for 1 ≤ j ≤ n. For this, we use
work with similar formulas, where clauses are removed and the following recurrence relationships for both formulas ΨTj
added from calls to calls. Learnt clauses can be kept as soon and assumptions Hj :
as the solver can ensure that clauses used to derive them are
not removed. This is generally done by adding specialized ΨT0 = I0 ΨTj+1 = ΨTj ∧ T (j, j + 1) ∧ Dj+1 0 j≥0
variables, called assumptions, to each clause that can be re- H1 = {h1 } Hj+1 = Hj [{¬hj , hj+1 }] j ≥ 1
moved. By assuming the variable to be F alse, the clause
is activated and by assuming the variable to be T rue, the where the notation Hj [{assi }] means updating in Hj
clause is trivially satisfied and no longer used by the solver. assumptions hi by their new settings assi , i.e., in the
What is interesting for our purpose is that the CDCL solver formula above, replacing the truth value of hj , which was
can save clauses learnt during the previous calls and test T rue, by F alse, and adding the new assumption hj+1
multiple assumptions in each new call. This means that af- with truth value T rue. From these relationships, the unique
ter n steps we hope that the solver will have learnt some call to SAT under given assumptions SAT (ΨTn , Hn ) can
constraints about the behavior of the system. Although we be replaced, starting with the set of clauses I0 , by multiple
are interested in testing the diagnosability property on a de- calls, 0 ≤ j ≤ n − 1, to an incremental SAT under varying
fined system, this property is independent from the system assumptions:
behavior which can be learnt by the solver from the previous IncSATj+1 (N ewClausesj+1 , N ewAssumptionsj+1 )
calls.
In order to extend the clauses representation given in sub- = IncSATj+1 (T (j, j + 1) ∧ Dj+1 0 , {¬hj , hj+1 }) (4.1)
sections 2.2 and 3.2 to this mode of operation, we propose If IncSATj answers SAT, the search is stopped as non diag-
to divide the formula ΦTn in two parts. The first part Tn de- nosability is proved, if it answers UNSAT, then IncSATj+1
scribes the first n steps, synchronized on the observations, is called.
55
Proceedings of the 26th International Workshop on Principles of Diagnosis
Notice that we used a unique assumption hj for control- model (for k = 3, 13, 23, 33, 43 and 63). The length of a pair
ling both Fj and Cj as non diagnosability checking requires of executions with cycles witnessing the non diagnosability
the presence of both a fault occurrence in the faulty trajec- of f in each example is k + 2 and we consider the satisfia-
tory and of a cycle. But the same framework allows the bility of the formula ΦTk+2 , so the number of steps required
independent control of formulas by separate assumptions. for SAT to provide the answer Yes is: |Steps] = k + 2. In
For sake of simplicity, we also assumed we called IncSAT order to obtain a fair comparison between IncSAT , which
at each step, but this is not mandatory and indexes j for the manages internally by handling assumptions the successive
successive calls can be decoupled from indexes t for steps. satisfiability checks of increasing formulas for j = 1, . . . ,
We should also say that, even if IncSAT allows us to re- k + 2, and SAT, for which k + 2 successive calls are made to
activate an already disabled clause, we are sure in our case the solver with respective formulas ΦTn for n = 1, . . . , k+2,
to never use this function (when hk has been set to F alse, the sum of the k + 2 runtimes of the SAT solver calls are
it always remains so) and we can thus force the solver to considered in this case (last column in the tables).
do a hard simplification process that removes the forgotten
clauses permanently. As a result of our adaptation we will |Steps| |Clauses| Inc. SAT(s) SAT(s)
be able to scale up the size of the tested system and the dis-
tance and length of a cycle witnessing non diagnosability. 20 42,614 1.5 1.3
30 131,714 10.3 13.1
4.2 Experimental Results 40 303,736 49.3 77.8
50 576,466 106 223
We show in this subsection a comparison between our 60 970,156 320 699
adapted version of subsection 4.1, that uses incremental 100 4,334,018 9410 13040
SAT, and the previous versions, for centralized model (sub-
section 2.2 following [2]) and for distributed model (subsec-
tion 3.2). We have created the example in Figure 2 which Table 2: Results on the faulty component of Figure 2.
contains 2k + 1 components: one faulty component and
two sets of k neighboring components. The faulty compo-
nent has two separated paths, each one containing k differ- |Steps| |Comps| |Clauses| Inc. SAT(s) SAT(s)
ent successive unobservable events ci and ending with the
5 7 1,962 0.04 0.06
same observable cycle of length 1, but only one of them
15 27 30,313 0.8 0.5
contains the fault. The centralized model will be limited to
25 47 113,906 6.5 4.8
this faulty component alone and thus in this case the events
35 67 277,873 33.8 33.7
ci , 1 ≤ i ≤ 2k, are just unobservable events as is u. In
45 87 542,033 111 132
the distributed model, these events ci are communication
65 127 1,490,590 967 1090
events and the faulty component is considered with the other
two sets of components, where each component in both sets Table 3: Results on the whole system of Figure 2.
shares one event ci with the faulty component to ensure a
number 2k of communications before arriving to the cycles
that will witness the non diagnosability of the fault. Each Although these examples remain relatively simple and do
set of components will be synchronized with only one path, not reflect any potential constraint that could be resumed by
either the faulty path or the correct one. This allows us to some learnt clauses (e.g. no interfering events), we can al-
study the effect of the cycle distance in both models. ready notice the difference in runtime in favor of our incre-
mental version in the centralized case and for the two largest
values of k in the distributed case. This difference could be
explained by the fact that generating all variables from the
beginning for all time steps and for all events imply many
meaningless clauses that would add a load on the solver in
the version in [2], this load being avoided in our incremen-
tal version because of the clauses learnt by the CDCL SAT
solver. From another side, we should say that generating in
both versions all variables from the beginning has two main
advantages: firstly, it allows the system description without
unfolding it (even if this description is verbose); secondly,
it allows the ordering of these variables by their time step
in order to generate the constraints for only one time step
and then get next steps constraints by just shifting the num-
bers (as we are representing the clauses in DIMACS for-
mat). One last point could help to a more efficient descrip-
tion of the system: in the succinct systems we represent all
the occurrences of an event together, but in its SAT encod-
Figure 2: One faulty component that communicates with ing we “unfold” this succinctness by generating for each
two sets of k components. Each set communicates with one occurrence n variables (for n time steps), even though log-
path (resp. faulty and correct) in the faulty component. ically only one of them will be assigned to True. We could
thus mark this relation among these n copies by introducing
The results are in Table 2 for the centralized model (for k a global cardinality constraint to express that these copies
= 18, 28, 38, 48, 58 and 98) and in Table 3 for the distributed belong to only one occurrence of an event.
56
Proceedings of the 26th International Workshop on Principles of Diagnosis
5 Selection of Related Works faults.
The work by [11] has optimized the construction of lo-
The first introduction to the notion of diagnosability was by cal twin plants, by exploiting the fact that one distinguishes
[1]. The authors studied diagnosability of FSM, as defined
two behaviors (faulty and correct) and one synchronizes at
in definition 1. Their formal definition of diagnosability is two levels (observations first and communications later). It
the one we mentioned in definition 3. They introduced an improved the construction of the twin plants proposed by
approach to test this property by constructing a deterministic [7] by exploiting the different identifiers given to the com-
diagnoser. However, in the general case, this approach is munication events at the observation synchronization level
exponential in the number of states of the system, which (depending on which instance, left or right, they belong to)
makes it impractical. to assign them directly to the two behaviors studied (left
In order to overcome this limitation [3] introduced the copy assigned to the faulty behavior, right copy to the cor-
twin plant approach, which is a special structure built by rect one). This helped in deleting the redundant informa-
synchronizing on their observable events two identical in- tion, then in abstracting the amount of information to be
stances of a nondeterministic fault diagnoser, and then transferred later to next steps if the diagnosability was not
searched for a path in this structure with an observed cy- answered. The generalization to fault patterns in DDES was
cle made up of ambiguous states, i.e. states that are pairs introduced by [12].
of original states, one reached by going through a fault and After the reduction of diagnosability problem to a path
the other not. Thus faults diagnosability is equivalent to the finding problem by [3], it became transferable to a satis-
absence of such a path, called a critical path. This approach fiability problem like it is the case for planning problems
turns the diagnosability problem in a search for a path with [13]. This was done by [2] which formulated the diagnos-
a cycle in a finite automaton, and this reduces its complexity ability problem (in its twin plant version) into a SAT prob-
to be polynomial of degree 4 in the number of states (and ex- lem, assuming a centralized DES with simple fault events.
ponential in the number of faults, but processing each fault The authors represented the studied transition system by a
separately makes its linear in the number of faults). succinct representation (cf. definition 2). This allows both
Let us mention here that the two previous works were in- a compact representation of the system states and a max-
terested in centralized systems with simple faults modeled imum amount of non interfering events to be fired simul-
as distinguished events. The first studies about fault pat- taneously. Thus, they represented the system states by the
terns were introduced in [9] and [10] which generalize the valuation of a set of Boolean state variables (dlog(q)e state
simple fault event in a centralized DES to handle a sequence variables for q states) and the interference relation between
of events considered together as a fault, or handle multiple two events according to the consistency among their effects
occurrences of the same fault or of different faults. More and preconditions, one versus the other. They distinguished
generally, a fault pattern is given as a suffix-closed rational between an occurrence of an event in the faulty sequence or
events language (so by a complete deterministic automaton in the correct sequence by introducing two versions of it and
with a stable subset of final states). constructed the logical formula expressing states transitions
The first work that addressed diagnosability analysis in for each possible step in the system. Each step may con-
DDES was [7]. A DDES is modeled as a set of communicat- tain simultaneous events that belong to faulty and correct
ing FSM. Each FSM has its own events set, communication sequences but must synchronize the occurrence of observ-
events being the only ones shared by at least two different able events whenever they take place. For a given bound n
FSM. In [7] was introduced an incremental diagnosability of paths length, they made the conjunct of these formulas
test which avoids to build the twin plant for the whole dis- for n steps and added the logical formula that represents the
tributed system if not needed. Thus one starts by building occurrence of the fault in the faulty sequence and the oc-
a local twin plant for the faulty component to test the exis- currence of a cycle in both sequences. The satisfiability of
tence of a local critical path. If such a path exists one builds the obtained formula is equivalent to finding a critical path,
the local twin checkers of the neighboring components. Lo- i.e. to the non diagnosability of the fault (see subsection 2.2
cal twin checker is a structure similar to local twin plant, for a summary of this approach). Although this approach
i.e., where each path in it represents a pair of behaviors with allows one to test diagnosability in large systems, it has a
the same observations, except that there is no fault infor- limitation which is that we cannot dynamically increase n
mation in it since it is constructed from non-faulty compo- to ensure reaching more states while scaling up the size of
nent. After constructing local twin checkers, one tries to the system where the cycles that witness non diagnosabil-
solve the ambiguity resulting from the existence of a critical ity can be very long. However the authors notice that we
path in the local twin plant. This is done by synchronizing are not always forced to test all reachable states in many
on their communication events this local twin plant with the cases where an approximation for the reachable states can
local twin checker of one neighboring component. In other be applied, but without explaining explicitly how such an
words, one tries to distinguish the faulty path from the cor- approximation can be found.
rect one by exploiting the observable events in the neigh-
boring components, because theses events occurrences that
are consistent with the occurrences of the communication
6 Conclusion and Future Works
events could solve the ambiguity. The process is repeated By extending the state of the art works for centralized DES,
until the diagnosability is answered, so only in the worst we have expressed diagnosability analysis of DDES as a
case has the whole system to be visited. Another impor- satisfiability problem by building a propositional formula
tant contribution in this work was to delete the unambigu- whose satisfiability, witnessing non diagnosability, can be
ous parts after each synchronization on the communication checked by SAT solvers. We allow both observable and
events, reducing thus the amount of information transferred non observable communication events in our model. Our
to next check (if needed). The approach assumed simple expression of these communication events, which avoids
57
Proceedings of the 26th International Workshop on Principles of Diagnosis
merging all their owner components, helps in reducing the [2] J. Rintanen and A. Grastien. Diagnosability testing
number of clauses used to represent them and this reduction with satisfiability algorithms. In Proceedings of the
is proportional to the number of their occurrences. We have 20th International Joint Conference on Artificial Intel-
also proposed an adaptation of the logical formula in order ligence (IJCAI’07), pages 532–537, 2007.
to use incremental SAT calls helping managing the scaling [3] S. Jiang, Z. Huang, V. Chandra, and R. Kumar. A poly-
up of the distance and the length of the intended cycles nomial algorithm for testing diagnosability of discrete-
witnessing non diagnosability and thus the size of the tested event systems. IEEE Transactions on Automatic Con-
system. Thus we exploited the clauses learnt about the trol, 46(8):1318–1321, 2001.
system behavior in the previous calls. This approach is
more practical and more efficient for complex systems than [4] J. Rintanen. Diagnosers and diagnosability of succinct
existing ones, as it avoids starting from scratch at each call. transition systems. In Proceedings of the 20th Interna-
tional Joint Conference on Artificial Intelligence (IJ-
We are now considering the extension of this work to CAI’07), pages 538–544, 2007.
fault patterns diagnosability [12]. We will use the same ap- [5] L. Ye and P. Dague. Undecidable case and decidable
proach to express predictability analysis [14] as a satisfia- case of joint diagnosability in distributed discrete event
bility problem, for DES and DDES [15] and both for simple systems. International Journal On Advances in Sys-
fault events and fault patterns [16] . Although our represen- tems and Measurements, 6(3 and 4):287–299, 2013.
tation can be easily extended to deal with local observations [6] D. Le Berre and A. Parrain. The sat4j library, release
(i.e., observable events in one component are observed only 2.2. Journal on Satisfiability, Boolean Modeling and
by this component), we know that in general diagnosability Computation, 7:59–64, 2010.
checking becomes then undecidable, e.g. when communica-
tion events are unobservable (obviously it remains decidable [7] Y. Pencolé. Diagnosability analysis of distributed dis-
when these events are observable in all their owners) [5]. A crete event systems. In Proceedings of the 16th Euro-
future work will be to study decidable cases of diagnosabil- pean Conference on Artificial Intelligence (ECAI’04),
ity checking in DDES with local observations, e.g. assum- 2004.
ing some well chosen communication events being observ- [8] A. Nadel and V. Ryvchin. Efficient SAT solving under
able. Another natural question is to study if the methods assumptions. In Proceedings of the 15th International
used in [7] and refined in [11] to check diagnosability in Conference on Theory and Applications of Satisfiabil-
DDES in an incremental way in terms of the system com- ity Testing (SAT’12), 2012.
ponents could be transposed as guiding strategies for some [9] T. Jéron, H. Marchand, S. Pinchinat, and M.-O.
component incremental SAT based approach for testing di- Cordier. Supervision patterns in discrete event sys-
agnosability in DDES. Transposing in SAT these methods, tems diagnosis. In Proceedings of the 8th International
based on building a local twin plant and local twin check- Workshop on Discrete Event Systems, 2006.
ers for gaining efficiency with regards to a global checking,
seems difficult. Basically, at any step k, corresponding to [10] S. Genc and S. Lafortune. Diagnosis of patterns in
considering a subsystem made up of k components, these partially-observed discrete-event systems. In Proceed-
methods build all critical paths witnessing non diagnosabil- ings of the 45th IEEE Conference on Decision and
ity at the level of this subsystem and the incremental step, Control, pages 422–427. IEEE, 2006.
when adding a (k + 1)th neighboring component, consists [11] L. Ye and P. Dague. An optimized algorithm for diag-
in checking the consistency of these pairs with the observa- nosability of component-based systems. In Proceed-
tions in the new component: only those pairs which can be ings of the 10th International Workshop on Discrete
consistently extended are kept, if any. In addition, in [11], Event Systems (WODES’10), 2010.
only useful and abstracted information is kept from one step [12] L. Ye, Y. Yan, and P. Dague. Diagnosability for pat-
to the next one. With SAT, only one critical pair witness-
terns in distributed discrete event systems. In Proceed-
ing non diagnosability of the subsystem (i.e., a model for
ings of the 21st International Workshop on Principles
the formula) will be built. If it is not consistent, and thus
of Diagnosis (DX’10), 2010.
disappears, when adding the (k + 1)th component, diagnos-
ability is not proven for all that: other critical pairs in the [13] H. Kautz and B. Selman. Planning as satisfiability. In
subsystem, not completely computed at step k, may exist Proceedings of the 10th European Conference on Ar-
and be extendible to step (k + 1). So, they have to be com- tificial Intelligence (ECAI’92), pages 359–363, 1992.
puted now, which limits the incremental characteristic of the [14] S. Genc and S. Lafortune. Predictability of Event
approach. In the same way, abstracting some information Occurrences in Partially-observed Discrete-event Sys-
is difficult to achieve with SAT. So, there is no evidence tems. Automatica, 45(2):301–311, 2009.
a priori that efficiency gain could be obtained by trying to
[15] L. Ye, P. Dague, and F. Nouioua. Predictability Analy-
develop a component incremental SAT based approach for
testing DDES diagnosability. sis of Distributed Discrete Event Systems. In Proceed-
ings of the 52nd IEEE Conference on Decision and
Control (CDC-13), pages 5009–5015. IEEE., 2013.
References [16] T. Jéron, H. Marchand, S. Genc, and S. Lafortune.
Predictability of Sequence Patterns in Discrete Event
[1] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamo- Systems. In Proceedings of the 17th World Congress,
hideen, and D. Teneketzis. Diagnosability of discrete- pages 537–453. IFAC., 2008.
event systems. IEEE Transactions on Automatic Con-
trol, 40(9):1555–1575, 1995.
58