Proceedings of the 26th International Workshop on Principles of Diagnosis Using Incremental SAT for Testing Diagnosability of Distributed DES Hassan IBRAHIM1 and Philippe DAGUE1 and Laurent SIMON2 1 LRI, Univ. Paris-Sud and CNRS, Orsay, France hassan.ibrahim@lri.fr, philippe.dague@lri.fr 2 LaBRI, Univ. Bordeaux and CNRS, Bordeaux, France lsimon@labri.fr Abstract positioning the sensors to manage the observation require- ments. The main difficulty in diagnosability algorithms is We extend in this work the existing approach to related to the states number explosion. Another difficulty analyse diagnosability in discrete event systems appears when checking diagnosability of a system which (DES) using satisfiability algorithms (SAT), in or- is actually diagnosable, i.e. the inexistence of a counter- der to analyse the diagnosability in distributed example witnessing non diagnosability. Thus all possibili- DES (DDES) and we test this extension. For this, ties need to be tested as for proving the non existence of a we handle observable and non observable com- plan in a planning problem, and usually in this case some munication events at the same time. We also pro- approximations are used to avoid exploring all the search pose an adaptation to use incremental SAT over space. the existing and the extended approaches to over- The paper is structured as follows. Section 2 will introduce come some of the limitations, especially concern- the system transition models for centralized DES and recall ing the length and the distance of the cycles that the traditional definition of the diagnosability in those mod- witness the non diagnosability of the fault, and els and the state of the art of encoding this definition as a improve the process of dealing with the reacha- satisfiability problem in propositional logic. Section 3 will bility limit when scaling up to large systems. present our first contribution, an extension of this state of the art to DDES with observable and non observable communi- 1 Introduction cation events in the same model, and will give experimental results of this extension. Section 4 is devoted to our sec- Diagnosis task is mainly using the available observations to ond contribution, using incremental SAT calls to overcome explain the difference between the expected behavior of a the limitation when the number of steps required to check system and its real behavior which may contain some faults. diagnosability, i.e., the length of possible paths with cycles Many works have been done to study the automatic ap- witnessing non diagnosability, is large, and will present ex- proaches to system fault diagnosis. They all try to deal with perimental results showing how the method scales up. Sec- the main problem, i.e. the compromise between the number tion 5 will present related works and section 6 will conclude of possible diagnoses to the considered faults and the num- and give our perspectives for future work. ber of observations which must be given to make the deci- sion. Diagnosis problem is NP-hard and one always needs to cope with an explosion in the number of system model 2 Using SAT in Diagnosability Analysis of states. Moreover, the diagnosis decision is not always cer- Centralized Systems tain, and thus running a diagnosis algorithm may not be ac- curate. For example, two sets of observations provided by We recall first the definitions of DES models we use and of different sets of sensors or at different times may lead to diagnosability for these models. different diagnoses. This uncertainty raises the problem of diagnosability which is essential while designing the system 2.1 Preliminaries model. After that, the model based diagnosis will be used We will use finite state machines (FSM) to model systems. in applications to explain any anomaly, with a guarantee of We define labeled transition systems following [1]. correctness and precision at least for anticipated faults. Diagnosability of the considered systems is a property de- Definition 1. A Labeled Transition System (LTS) is fined to answer the question about the possibility to distin- a tuple T = hX, Σo , Σu , Σf , δ, s0 i where: guish any possible faulty behavior in the system from any other behavior without this fault (i.e., correct or with a dif- • X is a finite set of states, ferent fault) within a finite time after the occurrence of the • Σo is a finite set of observable correct events, fault. A fault is diagnosable if it can be surely identified from the partial observation available in a finite delay af- • Σu is a finite set of unobservable correct events, ter its occurrence. A system is diagnosable if every possible • Σf is a finite set of unobservable faulty events, fault in it is diagnosable. This property provides information • δ ⊆ X × (Σo ∪ Σu ∪ Σf ) × X is the transition relation, before getting into finding the explanations of the fault. It also helps in designing a robust system against faults and in • s0 is the initial state. 51 Proceedings of the 26th International Workshop on Principles of Diagnosis In [2] the authors used an equivalent but more compact that L(T ) is live (i.e., for any state, there is at least one tran- representation than LTS for modeling systems in order to sition issued from this state) and convergent (i.e., there is no analyze their diagnosability: succinct transition systems, cycle made up only of unobservable events). that exploit the regularity in the systems structures and A system T is said to be diagnosable iff any fault f ∈ Σf are expressed in terms of propositional variables, which is diagnosable in T . In order to avoid exponential complex- allowed them to translate more easily to a SAT problem the ity in the number of faults during diagnosability analysis, twin plant method proposed by [3] for checking diagnos- only one fault at a time is checked for diagnosability. It will ability. thus be assumed in the following that there exists only one As we aim at studying the diagnosability of DDES using fault event f (Σf = {f }), without restriction on the num- SAT solvers, we will follow the model of [2] who stud- ber of its occurrences. Diagnosability checking has been ied the same problem in centralized DES. It represents proved in [3] to be polynomial in the number |X| of states the system states by the valuations of a finite set A of for LTS, so exponential in the number |A| of state variables Boolean state variables where valuation changes reflect for SLTS (actually the problem is NLOGSPACE-complete the transitions between states according to the events. The for LTS and PSPACE-complete for SLTS [4]). set of all literals issued from A is L = A ∪ {¬a|a ∈ A} and L is the language over A that consists of all formulas 2.2 SLTS Diagnosability as Satisfiability that can be formed from A and the connectives ∨ and An immediate rephrasing of the definition 3 shows that T is ¬. We use the standard definitions of further connec- non diagnosable iff it exists a pair of trajectories correspond- tives Φ ∧ Ψ ≡ ¬(¬Φ ∨ ¬Ψ), Φ → Ψ ≡ ¬Φ ∨ Ψ and ing to cycles (and thus to infinite paths), a faulty one and Φ ↔ Ψ ≡ (Φ → Ψ) ∧ (Ψ → Φ). The transition relation a correct one, sharing the same observable events. Which is defined to allow two or more events to take place is equivalent to the existence of an ambiguous (i.e. made simultaneously. Thus each event is described by a set of up of pairs of states respectively reachable by a faulty path pairs hφ, ci which represent its possible ways of occurrence and a correct path) cycle in the product of T by itself, syn- by indicating that the event can be associated with changes chronized on observable events, which is at the origin of the c ∈ 2L in states that satisfy the condition φ ∈ L. so called twin plant structure introduced in [3]. This non diagnosability test was formulated in [2] as a satisfiability Definition 2. A Succinct Transition System (SLTS) problem in propositional logic. We recall below this encod- is described by a tuple T = hA, Σo , Σu , Σf , δ, s0 i where: ing with the variables and the formulas used, where super- • A is a finite set of state variables, scripts t refer to time points and (eto ) and (êto ) refer respec- tively to the faulty and correct events occurrences sequences • Σo is a finite set of observable correct events, (corresponding states being described by valuations of (at ) • Σu is a finite set of unobservable correct events, and (ât )) of a pair of trajectories witnessing non diagnos- • Σf is a finite set of unobservable faulty events, ability (so sharing the same observable events represented L by (et ) and forming a cycle). The increasing of the time • δ : Σ = Σo ∪ Σu ∪ Σf → 2L×2 assigns to each event step corresponds to the triggering of at least one transition a set of pairs hφ, ci, and the extension by an event of at least one of the two tra- • s0 is the initial state (a valuation of A). jectories. T = hA, Σu , Σo , Σf , δ, s0 i being an SLTS, the It is straightforward to show that any LTS can be repre- propositional variables are thus: sented as an SLTS (one takes dlog(|X|)e Boolean variables • at and ât for all a ∈ A and t ∈ {0, . . . , n}, and represents states by different valuations of these vari- • eto for all e ∈ Σo ∪ Σu ∪ Σf , o ∈ δ(e) and t ∈ {0, . . . , ables; one assigns to each occurence of an event e labeling n − 1}, a transition (x, e, y) a pair hφ, ci, with φ expressing the valuation of x and c the valuation changes between x and • êto for all e ∈ Σo ∪ Σu , o ∈ δ(e) and t ∈ {0, . . . , y). And reciprocally any SLTS can be mapped to an LTS n − 1}, (see Definition 2.4 in [2]). • et for all e ∈ Σo and t ∈ {0, . . . , n − 1}. The formal definition of diagnosability of a fault f in a The following formulas express the constraints that must be centralized system modeled by (an LTS or SLTS) T was applied at each time step t or between t and t + 1. proposed by [1] as follows: 1. The event occurrence eto must be possible in the current Definition 3. Diagnosability. A fault f is diagnos- state: able in a system T iff eto → φt for o = hφ, ci ∈ δ(e) (2.1) ∃k ∈ N, ∀sf ∈ L(T ), ∀t ∈ L(T )/sf , |t| ≥ k ⇒ and its effects must hold at the next time step: ∀p ∈ L(T ), (P (p) = P (sf .t) ⇒ f ∈ p). ^ In this formula, L(T ) denotes the prefix-closed language of eto → lt+1 for o = hφ, ci ∈ δ(e) (2.2) T whose words are called trajectories, sf any trajectory end- l∈c ing by the fault f , L(T )/s the post-language of L(T ) after s, i.e., {t ∈ Σ∗ |s.t ∈ L(T )} and P the projection of tra- We have the same formulas with êto . jectories on observable events. The above definition states 2. The present value (T rue or F alse) of a state variable that for each trajectory sf ending with fault f in T , for each changes to a new value (F alse or T rue, respectively) t that is an extension of sf in T with enough events, every only if there is a reason for this change, i.e., because of trajectory p in T that is equivalent to sf .t in terms of obser- an event that has the new value in its effects (so, change vation should contain in it f . As usual, it will be assumed without reason is prohibited). Here is the change from 52 Proceedings of the 26th International Workshop on Principles of Diagnosis T rue to F alse (the change from F alse to T rue is de- 3.1 DDES Modeling fined similarly by interchanging a and ¬a): In order to model DDES with SLTS, we need to extend (at ∧ ¬at+1 ) → (eti1 o ∨ · · · ∨ etik o ) (2.3) these ones by adding communication events to each com- j1 jk ponent. So we use the following definition for a distributed where the ojl = hφjl , cjl i ∈ δ(eil ) are all the occur- SLTS with k different components (sites): rences of events eil with ¬a ∈ cji . We have the same formulas with ât and êtil o . Definition 4. A Distributed Succinct Transition System (DSLTS) with k components is described by a tuple jl 3. At most one occurrence of a given event can occur at T = hA, Σo , Σu , Σf , Σc , δ, s0 i where (subscripts i refer to a time and the occurrences of two different events can- component i): not be simultaneous if they interfere (i.e., if they have two contradicting effects or if the precondition of one • A is a union of disjoint finite sets (Ai )1≤i≤k of com- contradicts the effect of the other): ponent own state variables, A = ∪ki=1 Ai , ¬(eto ∧ eto0 ) ∀e ∈ Σ, ∀{o, o0} ⊆ δ(e), o 6= o0 (2.4) • Σo is a union of disjoint finite sets of component own observable correct events, Σo = ∪ki=1 Σoi , ¬(eto ∧ e0t o0 ) ∀{e, e0} ⊆ Σ, e 6= e0, ∀o ∈ δ(e), • Σu is a union of disjoint finite sets of component own ∀o0 ∈ δ(e0) such that o and o0 interfere (2.5) unobservable correct events, Σu = ∪ki=1 Σui , We have the same formulas with êto . • Σf is a union of disjoint finite sets of component own 4. The formulas that connect the two events sequences unobservable faulty events, Σf = ∪ki=1 Σf i , require that observable events take place in both se- • Σc is a union of finite sets of (observable or unobserv- quences whenever they take place (use of et ): _ _ able) correct communication events, Σc = ∪ki=1 Σci , eto ↔ et and êto ↔ et ∀e ∈ Σo (2.6) which are the only events shared by at least two differ- o∈δ(e) o∈δ(e) ent components (i.e., ∀i, ∀c ∈ Σci , ∃j 6= i, c ∈ Σcj ), • δ = (δi ), where δi : Σi = Σoi ∪ Σui ∪ Σf i ∪ Σci → The conjunction of all the above formulas for a given t is Li denoted by T (t, t + 1). 2Li ×2 , assigns to each event a set of pairs hφ, ci in A formula for the initial state s0 is: the propositional language of the component where it ^ ^ occurs (so, for communication events, in each compo- I0 = (a0 ∧â0 ) ∧ (¬a0 ∧¬â0 ) (2.7) nent separately where they occur), a∈A,s0 (a)=1 a∈A,s0 (a)=0 • s0 = (s0i ) is the initial state (a valuation of each Ai ). At last, the following formula can be defined to encode In this distributed framework, synchronous communication the fact that a pair of executions is found with the same ob- is assumed, i.e., communication events are synchronized servable events and no fault in one execution (first line), but such that they all occur simultaneously in all components one fault in the other (second line), which are infinite (in where they appear. More precisely, a transition by a com- the form of a non trivial cycle, so containing at least one munication event c may occur in a component iff a simul- observable event, 1 at step n; third line), witnessing non di- taneous transition by c occurs in all the other components agnosability: where c appears (has at least one occurrence). In particular, ΦTn = I0 ∧ T (0, 1) ∧ · · · ∧ T (n − 1, n) ∧ all events before c in trajectories in all these components n−1 _ _ _ necessarily occur before all events after c in these trajecto- eto ∧ ries. The global model of the system is thus nothing else that t=0 e∈Σf o∈δ(e) the product of the models of the components, synchronized on communication events. Notice that we allow in whole n−1 _ ^ n−1 _ _ generality communication events to be, partially or totally, ( ((an ↔ am ) ∧ (ân ↔ âm )) ∧ et ) unobservable, so one has in general to wait further obser- m=0 a∈A t=m e∈Σo vations to know that some communication event occurred From this encoding in propositional logic, follows the re- between two or more components. On the other side, as- sult (theorem 3.2 of [2]) that an SLTS T is not diagnosable suming these communications to be faultless is not actually if and only if ∃n ≥ 1, ΦTn is satisfiable. It is also equivalent a limitation. If a communication process or protocol may be to ΦT22|A| being satisfiable, as the twin plant states number is faulty, it has just to be modeled as a proper component with an obvious upper bound for n, but often impractically high its own correct and faulty behaviors (the same that, e.g., for (see in [2] some ways to deal with this problem). a wire in an electrical circuit). In this sense, communica- tions between components are just a modeling concept, not 3 Using SAT in Diagnosability Analysis of subject to diagnosis. It will be also assumed that the observ- Distributed Systems able information is global, i.e. centralized (when observable information is only local to each component, distributed di- We extend from centralized systems to distributed systems agnosability checking becomes undecidable [5]), allowing the satisfiability framework of subsection 2.2 for testing di- to keep definition 3 for diagnosability. agnosability and we provide some experimental results. 1 3.2 DSLTS Diagnosability as Satisfiability This verification that the cycle found is not trivial was not done in [2]; it is why the authors had to add for each time point a for- Let T be a DSLTS made up of k components denoted by mula, not needed here, guaranteeing that at least one event took indexes i, 1 ≤ i ≤ k. In order to express the diagnosability place, to avoid silent loops with no state change. analysis of T as a satisfiability problem, we have to extend 53 Proceedings of the 26th International Workshop on Principles of Diagnosis the formulas of subsection 2.2 to deal with communication We have tested our tool on small examples with sev- events between components. Let Σc = Σco ∪ Σcu be the eral communication events with multiple occurrences (three communication events, with Σco = ∪ki=1 Σco i the observ- communicating components) with global communication able ones and Σcu = ∪ki=1 Σcu i the unobservable ones. (all components share the same event) or partial commu- The idea is to treat each communication event as any nication (only some components share the same event), as other event in each of its owners and, as it has been done in Figure 1, which was the running example in [7]. with events et for e ∈ Σo for synchronizing observable events occurrences in the two executions, to introduce in the same way a global reference variable for each communica- tion event at each time step, in charge of synchronizing any communication event occurrence in any of its owner with occurrences of it in all its other owners. We use one such reference variable for each trajectory, et and êt , for unob- servable events e ∈ Σcu , and only one for both trajectories, et , for observable events e ∈ Σco as it will also in addition play the role of synchronizing observable events between trajectories exactly as the et for e ∈ Σo . So, we add to the previous propositional variables the new following ones: • eto , êto for all e ∈ Σc , o ∈ δ(e) = ∪i δi (e) and t ∈ {0, . . . , n − 1}, • et for all e ∈ Σc , êt for all e ∈ Σcu and t ∈ {0, . . . , n − 1}. Figure 1: A DDES made up of 3 components C1, C2 and Formulas in T (t, t + 1) are extended as follows. C3 from left to right. ci ,1≤i≤2 are unobservable communi- cation events, oi ,1≤i≤5 are observable events and fi ,1≤i≤2 1. Formulas (2.1), (2.2), (2.3) and (2.5) extend unchanged are faulty events. to eto and êto ∀e ∈ Σc , expressing that a communication event must be possible and has effects in each of its The total number of propositional variables V arsN um owner components and that two such different events in the generated formula ΦTn after n steps is: cannot be simultaneous if they interfere. PObs V arsN um = n × (2|A| + 3 i=1 ObOcci + PF aults PU nobs 2. Formulas (2.4) extend to prevent two simultaneous oc- i=1 F aultOcci + 2 i=1 U nobOcci ), where: currences of a given communication event in the same |A| is the total number of state variables, owner component, i.e. apply ∀e ∈ Σc , ∀i, ∀{oi , oi 0} ⊆ Obs the total number of observable events, δi (e), oi 6= oi 0 and the same with ê (obviously they do ObOcci the total number of occurrences of the observable not apply to different owner components, by the very event ei , definition of communication events). F aults the total number of faults, 3. Finally, the new following formulas express the com- F aultOcci the total number of occurrences of the faulty munication process itself, i.e. the synchronization of event ei , the occurrences of any communication event e in all its U nobs the total number of unobservable correct events, owners components (S(e) being the set of indexes of U nobOcci the total number of occurrences of the unob- the owners components of e) and extend also formulas servable correct event ei . (2.6) to observable communication events: The results are in Table 1, where the columns show the system and the fault considered (3 cases), the steps number _ _ n, the numbers of variables and clauses and the runtime. etoi ↔ et and êtoi ↔ êt ∀e ∈ Σcu ∀i ∈ S(e) oi ∈δi (e) oi ∈δi (e) System Fault |Steps| SAT? |Variables| |Clauses| runtime(ms) _ _ C2 f2 4 No 106 628 27 etoi ↔ et and êtoi ↔ et ∀e ∈ Σco ∀i ∈ S(e) C2 f2 5 Yes 131 783 15 C2, C3 f2 5 No 225 1157 28 oi ∈δi (e) oi ∈δi (e) C2, C3 f2 32 No 1386 7340 641 C2, C3 f2 64 No 2762 14668 1422 The formula ΦTn is unchanged except that, in the verification C2, C3 f2 128 No 5514 29324 5061 C2, C3 f2 256 No 11018 58636 18970 that the found cycle (third line) is not trivial, any observable C2, C3 f2 512 No 22026 117260 130164 event can be used, so the final disjunct of events et is ex- C2, C3 f2 1024 No 44042 234508 548644 tended to all e ∈ Σo ∪ Σco . We have thus the result that a C1, C2, C3 f1 8 No 576 3546 91 C1, C2, C3 f1 9 Yes 646 3987 110 DSLTS T is not diagnosable if and only if ∃n ≥ 1, ΦTn is satisfiable. Table 1: Results on the example of Figure 1. Which means that f 2 is not diagnosable in C2 alone 3.3 Implementation and Experimental Testing while it becomes diagnosable when synchronizing C2 and We have implemented the above extension in Java. We used C3. For this last result, we have increased the steps number the well designed API of the SAT solver Sat4j [6]. If more until reaching 22|A| , which is the theoretical upper bound of efficient solvers could have been chosen, it fitted well our the twin plant states represented in the logical formula. As clause generator written in Java and only a limited speed in general it is not always possible to reach this bound in up can be awaited from C++ solvers (a speed up of 4, i.e. practice, we propose in section 4 using incremental SAT to reduction of 75% of the runtime is often observed). improve the management of increasing steps number. While 54 Proceedings of the 26th International Workshop on Principles of Diagnosis f 1 is not diagnosable even after synchronizing all three of the behavior of both trajectories (represented by the con- components together. Numbers of variables and clauses are junction of formulas T (t, t+1), 0 ≤ t ≤ n−1, representing small in comparison to what SAT solvers can handle (up to the (t + 1)th step). The second part Dn describes the diag- hundred thousands propositional variables and millions of nosability property at step n, i.e., the occurrence of a fault clauses). These tests are mentioned as a proof of concept. in the n previous steps of the faulty trajectory (given by the However, to test the tool on larger systems and because of formula Fn ) and the detection of a cycle at step n (given by the absence of benchmark in the literature, we have created the formula Cn ). So we obtain, for n ≥ 1: in subsection 4.2. an example that can be scaled up. ΦTn = Tn ∧ Dn 4 Adaptation to Incremental SAT n−1 ^ Tn = I0 ∧ T (t, t + 1) Dn = Fn ∧ Cn Diagnosability Checking t=0 We adapt satisfiability algorithms for checking diagnosabil- n−1 _ _ _ ity of both centralized (subsection 2.2) and distributed (sub- Fn = eto section 3.2) DES in order to incrementally process the max- t=0 e∈Σf o∈δ(e) imum length of paths with cycles searched for witnessing n−1 n−1 non diagnosability and we provide experimental results. _ ^ _ _ Cn = ( ((an ↔ am ) ∧ (ân ↔ âm )) ∧ et ) 4.1 Diagnosability as Incremental Satisfiability m=0 a∈A t=m e∈Σo Two cases have to be distinguished while testing diagnos- Add now at each step j a control variable hj allowing to ability using SAT solvers to verify the satisfiability of the disable (when its truth value is F alse) or activate (when its logical formula ΦTn for a given n [2]. The first case is when truth value is T rue) the formulas Fj and Cj and keep at step we find a model for ΦTn , which definitely indicates the non n all these controlled formulas for 1 ≤ j ≤ n. We obtain diagnosability of the studied fault. The second case is when the following ΨTn formula, for n ≥ 1: we do not find such a model: this result indicates just that the n ^ studied fault has not been found non diagnosable according ΨTn = Tn ∧ Dj 0 Dj 0 = Fj 0 ∧ Cj 0 1≤j≤n to the value of n. In other words, after testing all the possible j=1 first n steps, we did not find a pair of executions of length at most n containing cycles such that one of them contains Fj 0 = ¬hj ∨ Fj Cj 0 = ¬hj ∨ Cj 1≤j≤n the fault and not the other and such that the two executions We have thus the equivalence, for all n ≥ 1: are equivalent in terms of observation. However, as the the- n−1 ^ oretical upper bound n = 22|A| which would guarantee that ΦTn ≡ ΨTn ∧ hn ∧ ¬hj the fault is actually diagnosable is often in practice unreach- j=1 able, such a pair may exist for a greater value of n. Testing it means increasing n and rebuilding the logical formula ΦTn This allows one, for all n ≥ 1, to replace the SAT call on then recalling the SAT solver. ΦTn by a SAT call on ΨTn under the control variables set- Instead, we propose to adapt the formula ΦTn in order to ting given by Hn = {¬h1 , . . . , ¬hn−1 , hn } (indicated in a be tested in an incremental SAT mode by multiple calls to second argument of the call): a Conflict Driven Clause Learning (CDCL) solver. Using CDCL solvers in a specialized, incremental, mode is rela- SAT (ΦTn ) = SAT (ΨTn , Hn ) tively new but already widely used [8] in many applications. The idea is now to consider the control variables hj as as- In this operation mode, the solver can be called many times sumptions and use incremental SAT calls IncSATj under with different formulas. However, solvers are designed to varying assumptions, for 1 ≤ j ≤ n. For this, we use work with similar formulas, where clauses are removed and the following recurrence relationships for both formulas ΨTj added from calls to calls. Learnt clauses can be kept as soon and assumptions Hj : as the solver can ensure that clauses used to derive them are not removed. This is generally done by adding specialized ΨT0 = I0 ΨTj+1 = ΨTj ∧ T (j, j + 1) ∧ Dj+1 0 j≥0 variables, called assumptions, to each clause that can be re- H1 = {h1 } Hj+1 = Hj [{¬hj , hj+1 }] j ≥ 1 moved. By assuming the variable to be F alse, the clause is activated and by assuming the variable to be T rue, the where the notation Hj [{assi }] means updating in Hj clause is trivially satisfied and no longer used by the solver. assumptions hi by their new settings assi , i.e., in the What is interesting for our purpose is that the CDCL solver formula above, replacing the truth value of hj , which was can save clauses learnt during the previous calls and test T rue, by F alse, and adding the new assumption hj+1 multiple assumptions in each new call. This means that af- with truth value T rue. From these relationships, the unique ter n steps we hope that the solver will have learnt some call to SAT under given assumptions SAT (ΨTn , Hn ) can constraints about the behavior of the system. Although we be replaced, starting with the set of clauses I0 , by multiple are interested in testing the diagnosability property on a de- calls, 0 ≤ j ≤ n − 1, to an incremental SAT under varying fined system, this property is independent from the system assumptions: behavior which can be learnt by the solver from the previous IncSATj+1 (N ewClausesj+1 , N ewAssumptionsj+1 ) calls. In order to extend the clauses representation given in sub- = IncSATj+1 (T (j, j + 1) ∧ Dj+1 0 , {¬hj , hj+1 }) (4.1) sections 2.2 and 3.2 to this mode of operation, we propose If IncSATj answers SAT, the search is stopped as non diag- to divide the formula ΦTn in two parts. The first part Tn de- nosability is proved, if it answers UNSAT, then IncSATj+1 scribes the first n steps, synchronized on the observations, is called. 55 Proceedings of the 26th International Workshop on Principles of Diagnosis Notice that we used a unique assumption hj for control- model (for k = 3, 13, 23, 33, 43 and 63). The length of a pair ling both Fj and Cj as non diagnosability checking requires of executions with cycles witnessing the non diagnosability the presence of both a fault occurrence in the faulty trajec- of f in each example is k + 2 and we consider the satisfia- tory and of a cycle. But the same framework allows the bility of the formula ΦTk+2 , so the number of steps required independent control of formulas by separate assumptions. for SAT to provide the answer Yes is: |Steps] = k + 2. In For sake of simplicity, we also assumed we called IncSAT order to obtain a fair comparison between IncSAT , which at each step, but this is not mandatory and indexes j for the manages internally by handling assumptions the successive successive calls can be decoupled from indexes t for steps. satisfiability checks of increasing formulas for j = 1, . . . , We should also say that, even if IncSAT allows us to re- k + 2, and SAT, for which k + 2 successive calls are made to activate an already disabled clause, we are sure in our case the solver with respective formulas ΦTn for n = 1, . . . , k+2, to never use this function (when hk has been set to F alse, the sum of the k + 2 runtimes of the SAT solver calls are it always remains so) and we can thus force the solver to considered in this case (last column in the tables). do a hard simplification process that removes the forgotten clauses permanently. As a result of our adaptation we will |Steps| |Clauses| Inc. SAT(s) SAT(s) be able to scale up the size of the tested system and the dis- tance and length of a cycle witnessing non diagnosability. 20 42,614 1.5 1.3 30 131,714 10.3 13.1 4.2 Experimental Results 40 303,736 49.3 77.8 50 576,466 106 223 We show in this subsection a comparison between our 60 970,156 320 699 adapted version of subsection 4.1, that uses incremental 100 4,334,018 9410 13040 SAT, and the previous versions, for centralized model (sub- section 2.2 following [2]) and for distributed model (subsec- tion 3.2). We have created the example in Figure 2 which Table 2: Results on the faulty component of Figure 2. contains 2k + 1 components: one faulty component and two sets of k neighboring components. The faulty compo- nent has two separated paths, each one containing k differ- |Steps| |Comps| |Clauses| Inc. SAT(s) SAT(s) ent successive unobservable events ci and ending with the 5 7 1,962 0.04 0.06 same observable cycle of length 1, but only one of them 15 27 30,313 0.8 0.5 contains the fault. The centralized model will be limited to 25 47 113,906 6.5 4.8 this faulty component alone and thus in this case the events 35 67 277,873 33.8 33.7 ci , 1 ≤ i ≤ 2k, are just unobservable events as is u. In 45 87 542,033 111 132 the distributed model, these events ci are communication 65 127 1,490,590 967 1090 events and the faulty component is considered with the other two sets of components, where each component in both sets Table 3: Results on the whole system of Figure 2. shares one event ci with the faulty component to ensure a number 2k of communications before arriving to the cycles that will witness the non diagnosability of the fault. Each Although these examples remain relatively simple and do set of components will be synchronized with only one path, not reflect any potential constraint that could be resumed by either the faulty path or the correct one. This allows us to some learnt clauses (e.g. no interfering events), we can al- study the effect of the cycle distance in both models. ready notice the difference in runtime in favor of our incre- mental version in the centralized case and for the two largest values of k in the distributed case. This difference could be explained by the fact that generating all variables from the beginning for all time steps and for all events imply many meaningless clauses that would add a load on the solver in the version in [2], this load being avoided in our incremen- tal version because of the clauses learnt by the CDCL SAT solver. From another side, we should say that generating in both versions all variables from the beginning has two main advantages: firstly, it allows the system description without unfolding it (even if this description is verbose); secondly, it allows the ordering of these variables by their time step in order to generate the constraints for only one time step and then get next steps constraints by just shifting the num- bers (as we are representing the clauses in DIMACS for- mat). One last point could help to a more efficient descrip- tion of the system: in the succinct systems we represent all the occurrences of an event together, but in its SAT encod- Figure 2: One faulty component that communicates with ing we “unfold” this succinctness by generating for each two sets of k components. Each set communicates with one occurrence n variables (for n time steps), even though log- path (resp. faulty and correct) in the faulty component. ically only one of them will be assigned to True. We could thus mark this relation among these n copies by introducing The results are in Table 2 for the centralized model (for k a global cardinality constraint to express that these copies = 18, 28, 38, 48, 58 and 98) and in Table 3 for the distributed belong to only one occurrence of an event. 56 Proceedings of the 26th International Workshop on Principles of Diagnosis 5 Selection of Related Works faults. The work by [11] has optimized the construction of lo- The first introduction to the notion of diagnosability was by cal twin plants, by exploiting the fact that one distinguishes [1]. The authors studied diagnosability of FSM, as defined two behaviors (faulty and correct) and one synchronizes at in definition 1. Their formal definition of diagnosability is two levels (observations first and communications later). It the one we mentioned in definition 3. They introduced an improved the construction of the twin plants proposed by approach to test this property by constructing a deterministic [7] by exploiting the different identifiers given to the com- diagnoser. However, in the general case, this approach is munication events at the observation synchronization level exponential in the number of states of the system, which (depending on which instance, left or right, they belong to) makes it impractical. to assign them directly to the two behaviors studied (left In order to overcome this limitation [3] introduced the copy assigned to the faulty behavior, right copy to the cor- twin plant approach, which is a special structure built by rect one). This helped in deleting the redundant informa- synchronizing on their observable events two identical in- tion, then in abstracting the amount of information to be stances of a nondeterministic fault diagnoser, and then transferred later to next steps if the diagnosability was not searched for a path in this structure with an observed cy- answered. The generalization to fault patterns in DDES was cle made up of ambiguous states, i.e. states that are pairs introduced by [12]. of original states, one reached by going through a fault and After the reduction of diagnosability problem to a path the other not. Thus faults diagnosability is equivalent to the finding problem by [3], it became transferable to a satis- absence of such a path, called a critical path. This approach fiability problem like it is the case for planning problems turns the diagnosability problem in a search for a path with [13]. This was done by [2] which formulated the diagnos- a cycle in a finite automaton, and this reduces its complexity ability problem (in its twin plant version) into a SAT prob- to be polynomial of degree 4 in the number of states (and ex- lem, assuming a centralized DES with simple fault events. ponential in the number of faults, but processing each fault The authors represented the studied transition system by a separately makes its linear in the number of faults). succinct representation (cf. definition 2). This allows both Let us mention here that the two previous works were in- a compact representation of the system states and a max- terested in centralized systems with simple faults modeled imum amount of non interfering events to be fired simul- as distinguished events. The first studies about fault pat- taneously. Thus, they represented the system states by the terns were introduced in [9] and [10] which generalize the valuation of a set of Boolean state variables (dlog(q)e state simple fault event in a centralized DES to handle a sequence variables for q states) and the interference relation between of events considered together as a fault, or handle multiple two events according to the consistency among their effects occurrences of the same fault or of different faults. More and preconditions, one versus the other. They distinguished generally, a fault pattern is given as a suffix-closed rational between an occurrence of an event in the faulty sequence or events language (so by a complete deterministic automaton in the correct sequence by introducing two versions of it and with a stable subset of final states). constructed the logical formula expressing states transitions The first work that addressed diagnosability analysis in for each possible step in the system. Each step may con- DDES was [7]. A DDES is modeled as a set of communicat- tain simultaneous events that belong to faulty and correct ing FSM. Each FSM has its own events set, communication sequences but must synchronize the occurrence of observ- events being the only ones shared by at least two different able events whenever they take place. For a given bound n FSM. In [7] was introduced an incremental diagnosability of paths length, they made the conjunct of these formulas test which avoids to build the twin plant for the whole dis- for n steps and added the logical formula that represents the tributed system if not needed. Thus one starts by building occurrence of the fault in the faulty sequence and the oc- a local twin plant for the faulty component to test the exis- currence of a cycle in both sequences. The satisfiability of tence of a local critical path. If such a path exists one builds the obtained formula is equivalent to finding a critical path, the local twin checkers of the neighboring components. Lo- i.e. to the non diagnosability of the fault (see subsection 2.2 cal twin checker is a structure similar to local twin plant, for a summary of this approach). Although this approach i.e., where each path in it represents a pair of behaviors with allows one to test diagnosability in large systems, it has a the same observations, except that there is no fault infor- limitation which is that we cannot dynamically increase n mation in it since it is constructed from non-faulty compo- to ensure reaching more states while scaling up the size of nent. After constructing local twin checkers, one tries to the system where the cycles that witness non diagnosabil- solve the ambiguity resulting from the existence of a critical ity can be very long. However the authors notice that we path in the local twin plant. This is done by synchronizing are not always forced to test all reachable states in many on their communication events this local twin plant with the cases where an approximation for the reachable states can local twin checker of one neighboring component. In other be applied, but without explaining explicitly how such an words, one tries to distinguish the faulty path from the cor- approximation can be found. rect one by exploiting the observable events in the neigh- boring components, because theses events occurrences that are consistent with the occurrences of the communication 6 Conclusion and Future Works events could solve the ambiguity. The process is repeated By extending the state of the art works for centralized DES, until the diagnosability is answered, so only in the worst we have expressed diagnosability analysis of DDES as a case has the whole system to be visited. Another impor- satisfiability problem by building a propositional formula tant contribution in this work was to delete the unambigu- whose satisfiability, witnessing non diagnosability, can be ous parts after each synchronization on the communication checked by SAT solvers. We allow both observable and events, reducing thus the amount of information transferred non observable communication events in our model. Our to next check (if needed). The approach assumed simple expression of these communication events, which avoids 57 Proceedings of the 26th International Workshop on Principles of Diagnosis merging all their owner components, helps in reducing the [2] J. Rintanen and A. Grastien. Diagnosability testing number of clauses used to represent them and this reduction with satisfiability algorithms. In Proceedings of the is proportional to the number of their occurrences. We have 20th International Joint Conference on Artificial Intel- also proposed an adaptation of the logical formula in order ligence (IJCAI’07), pages 532–537, 2007. to use incremental SAT calls helping managing the scaling [3] S. Jiang, Z. Huang, V. Chandra, and R. Kumar. A poly- up of the distance and the length of the intended cycles nomial algorithm for testing diagnosability of discrete- witnessing non diagnosability and thus the size of the tested event systems. IEEE Transactions on Automatic Con- system. Thus we exploited the clauses learnt about the trol, 46(8):1318–1321, 2001. system behavior in the previous calls. This approach is more practical and more efficient for complex systems than [4] J. Rintanen. Diagnosers and diagnosability of succinct existing ones, as it avoids starting from scratch at each call. transition systems. In Proceedings of the 20th Interna- tional Joint Conference on Artificial Intelligence (IJ- We are now considering the extension of this work to CAI’07), pages 538–544, 2007. fault patterns diagnosability [12]. We will use the same ap- [5] L. Ye and P. Dague. Undecidable case and decidable proach to express predictability analysis [14] as a satisfia- case of joint diagnosability in distributed discrete event bility problem, for DES and DDES [15] and both for simple systems. International Journal On Advances in Sys- fault events and fault patterns [16] . Although our represen- tems and Measurements, 6(3 and 4):287–299, 2013. tation can be easily extended to deal with local observations [6] D. Le Berre and A. Parrain. The sat4j library, release (i.e., observable events in one component are observed only 2.2. Journal on Satisfiability, Boolean Modeling and by this component), we know that in general diagnosability Computation, 7:59–64, 2010. checking becomes then undecidable, e.g. when communica- tion events are unobservable (obviously it remains decidable [7] Y. Pencolé. Diagnosability analysis of distributed dis- when these events are observable in all their owners) [5]. A crete event systems. In Proceedings of the 16th Euro- future work will be to study decidable cases of diagnosabil- pean Conference on Artificial Intelligence (ECAI’04), ity checking in DDES with local observations, e.g. assum- 2004. ing some well chosen communication events being observ- [8] A. Nadel and V. Ryvchin. Efficient SAT solving under able. Another natural question is to study if the methods assumptions. In Proceedings of the 15th International used in [7] and refined in [11] to check diagnosability in Conference on Theory and Applications of Satisfiabil- DDES in an incremental way in terms of the system com- ity Testing (SAT’12), 2012. ponents could be transposed as guiding strategies for some [9] T. Jéron, H. Marchand, S. Pinchinat, and M.-O. component incremental SAT based approach for testing di- Cordier. Supervision patterns in discrete event sys- agnosability in DDES. Transposing in SAT these methods, tems diagnosis. In Proceedings of the 8th International based on building a local twin plant and local twin check- Workshop on Discrete Event Systems, 2006. ers for gaining efficiency with regards to a global checking, seems difficult. Basically, at any step k, corresponding to [10] S. Genc and S. Lafortune. Diagnosis of patterns in considering a subsystem made up of k components, these partially-observed discrete-event systems. In Proceed- methods build all critical paths witnessing non diagnosabil- ings of the 45th IEEE Conference on Decision and ity at the level of this subsystem and the incremental step, Control, pages 422–427. IEEE, 2006. when adding a (k + 1)th neighboring component, consists [11] L. Ye and P. Dague. An optimized algorithm for diag- in checking the consistency of these pairs with the observa- nosability of component-based systems. In Proceed- tions in the new component: only those pairs which can be ings of the 10th International Workshop on Discrete consistently extended are kept, if any. In addition, in [11], Event Systems (WODES’10), 2010. only useful and abstracted information is kept from one step [12] L. Ye, Y. Yan, and P. Dague. Diagnosability for pat- to the next one. With SAT, only one critical pair witness- terns in distributed discrete event systems. In Proceed- ing non diagnosability of the subsystem (i.e., a model for ings of the 21st International Workshop on Principles the formula) will be built. If it is not consistent, and thus of Diagnosis (DX’10), 2010. disappears, when adding the (k + 1)th component, diagnos- ability is not proven for all that: other critical pairs in the [13] H. Kautz and B. Selman. Planning as satisfiability. In subsystem, not completely computed at step k, may exist Proceedings of the 10th European Conference on Ar- and be extendible to step (k + 1). So, they have to be com- tificial Intelligence (ECAI’92), pages 359–363, 1992. puted now, which limits the incremental characteristic of the [14] S. Genc and S. Lafortune. Predictability of Event approach. In the same way, abstracting some information Occurrences in Partially-observed Discrete-event Sys- is difficult to achieve with SAT. So, there is no evidence tems. Automatica, 45(2):301–311, 2009. a priori that efficiency gain could be obtained by trying to [15] L. Ye, P. Dague, and F. Nouioua. Predictability Analy- develop a component incremental SAT based approach for testing DDES diagnosability. sis of Distributed Discrete Event Systems. In Proceed- ings of the 52nd IEEE Conference on Decision and Control (CDC-13), pages 5009–5015. IEEE., 2013. References [16] T. Jéron, H. Marchand, S. Genc, and S. Lafortune. Predictability of Sequence Patterns in Discrete Event [1] M. Sampath, R. Sengupta, S. Lafortune, K. Sinnamo- Systems. In Proceedings of the 17th World Congress, hideen, and D. Teneketzis. Diagnosability of discrete- pages 537–453. IFAC., 2008. event systems. IEEE Transactions on Automatic Con- trol, 40(9):1555–1575, 1995. 58