Localization of (in)consistencies by monotone reducing automata⋆ Martin Procházka and Martin Plátek Charles University, MFF UK, Department of Computer Science Malostranské náměstí 25, 118 00 Praha 1, Czech Republic prma@centrum.cz, martin.platek@mff.cuni.cz Abstract. A reducing automaton (red-automaton) is prefix-inconsistency (prefix error) in w. The prefix in- a deterministic automaton proposed for checking word and consistency is the minimal incorrect prefix of w. Let sub-word correctness by the use of analysis by reduction. Its x ∈ Σ be the leftmost symbol in the word w such that monotone version characterizes the class of deterministicw = uxv, u, v ∈ Σ ∗ , there exists a word v ′ ∈ Σ ∗ for context-free languages (DCFL). We propose a method for which it holds uv ′ ∈ L and there is no word v ′′ such a construction of a deterministic monotone enhancement that uxv ′′ ∈ L. The u is the maximal correct prefix of any monotone reducing automaton which is able with the of w, and ux the prefix-inconsistency of w. help of special auxiliary symbols to localize its prefix and post-prefix (in)consistencies, and certain types of reducing In a similar way we can consider (in)correct infixes for a given language L and a word w 6∈ L. We can eas- conflicts. In other words this method ensures a robust anal- ily see that a prefix-inconsistence can occur in a word ysis by reduction without spurious error messages. We for- mulate natural conditions for which this method ensures at most once. Our effort is to study properties of re- ducing automata which will ensure the detection of the localization of all prefix and post-prefix inconsistencies in any (incorrect) word with respect to a DCFL. (in)correct prefixes, and/or certain types of (in)correct infixes. The types of the (in)correct infixes studied here are studied by different techniques already in [1]. 1 Introduction We call them here post-prefix (in)consistencies. In this paper we use the advantage of the fact, that A reducing automaton (red-automaton for short) is to any deterministic context-free language L there is a device that models the so called analysis by reduc- a monotone reducing automaton recognizing L which tion. Analysis by reduction consists in a stepwise sim- is also able to detect the prefix inconsistency (error). plification of an extended sentence (word) until a sim- Such type of automaton characterizes the class of ple sentence (word) is obtained or an error is found. It DCFL. This fact is shown in [6]. is based on another automata model – restarting au- The paper is structured as follows. First, in Sec- tomaton (R-automaton) introduced in [2]. Similarly to tion 2 we introduce red-automata and their basic prop- R-automaton, red-automaton can only delete symbols. erties. The Section 3 creates the core of this paper. At some place it decides to delete some of the last k Conclusion contains some remarks about connections visited symbols, where k is limited by a fixed constant of the presented method with the methods based on and then restarts its computations, i.e. it enters its the so called head-symbols. initial state and its head is placed on the left end of the remaining word. Reducing automaton is formalized as an extension 2 Definitions and basic properties of deterministic finite automaton. This kind of for- malization serves here as a basic tool for the method The reducing automaton has a finite control unit and of algorithmic localization of syntactic inconsistencies a working head attached to a list with sentinels on (errors) for the languages from the class of DCFL. both ends. It works in certain cycles called stages. At The notion of red-automata was introduced in order to the beginning of each stage, the head points at the present naturaly the techniques of minimization. In [7] leftmost item behind the left sentinel, and the control we construct to any red-automaton M an unambigu- unit is in a special initial state. In the process of the ously determined minimal red-automaton Mm which stage the automaton moves the head from the item it preserves the recognized language, and the set of all currently points to the next item on the right. Dur- reductions defined by M . ing such a transition it changes the state of its control For a given language L and a word w 6∈ L, it unit according to the current state and the currently is natural to define the maximal correct prefix and scanned symbol. The stage ends as the control unit gets to any of special states called operations. There ⋆ This work was supported by the grant projects of the are three kinds of operations: ACC, ERR, and RED. Both Grant Agency of the Czech Republic No. P202/10/1333 ACC and ERR-operation halts the whole computation, and P103/10/0783. ACC accepts and ERR rejects the word in the list. The 56 Martin Procházka, Martin Plátek RED-operation RED(n) determines how the list should We will often use the following conventions: ∗ ∗ be shortened. Its parameter n – a binary word of a lim- δM (sM , w) = δM («w), ∆∗M (sM , w) = ∆∗M («w). ited size – specifies which item on the left of the head We define for the both function a further important are to be removed from the list. Bit 1 means “remove enhancement, namely for the final subsets S of the set ∗ the item from the list”, bit 0 means “leave the item SM ∪ FM ∪ {RED} resp. SM ∪ FM : ∗ ∗ in the list”. After all items designated for deletion are δM (S, u) = {δM (s, u) | s ∈ S} removed, the automaton resets its control unit to the ∆∗M (S, u) = {∆∗M (s, u) | s ∈ S} initial state and places the head at the leftmost item Let us note that the tuple behind the left sentinel. The string n ∈ (10∗ )+ deter- (ΣM ∪ {«, »}, SM ∪ FM ∪ {RED}, sM , FM , δM ) is mines, which items will be deleted from the list. If the a finite automaton which accepts exactly all prefixes i-th symbol of n from the right is equal to 1, then the (words) which lead M to some reduction, or to an ac- automaton deletes the i-th item to the left from the ceptation, or to a rejection. position of the head. The item scanned by the head is We will consider in the following only the reducing considered as the first one. automata which fulfills the following natural condi- All final states of a reducing automaton M create tion: ∗ a finite subset FM of the (unbounded) set { ACC, ERR }∪ δM (sM , u) = RED(n) =⇒ |u| ≥ |n|. { RED(n) | n ∈ (1·0∗ )+ }. Now we are able to introduce Let us take: L0 (M ) = {w ∈ ΣM | ∆∗M (« w ») = ACC}. ∗ reducing automata in a formal way. Constant. Let kM = max |n| RED(n) ∈ FM . We A reducing automaton (red-automaton) is a 7-tuple call kM the characteristic constant of M . M = (ΣM , «, », SM , sM , FM , fM ), where ΣM is a fi- The operation of reduction. We will exactly de- nite input alphabet, «, » 6∈ ΣM are the (left and right) scribe a reduction of a word by a binary sequence with sentinels, SM is the finite set of internal states, the help of the following operation /: sM ∈ SM is the (re)starting state, FM is the finite set a/0 = a, a/1 = λ, λ/n = λ, u/λ = u, of final states (operations), fM : SM × (ΣM ∪ {»}) −→ (u · a)/(n · i) = (u/n) · (a/i), (SM ∪ FM ) is the transition function of M , which ful- where u ∈ Σ ∗ , a ∈ Σ, n ∈ (10∗ )+ and i ∈ {0, 1}. The fills the following condition: size of the strings u, n is here unbounded, moreover u ∀s ∈ SM : fM (s, ») ∈ FM . can be longer then n, and vice versa. The reduction of We will describe the behavior of M in more details by the word (a) by the sequence 1 · 0 · 1 is given in the two functions enhancing the transition function fM : following way: (a)/101 = a. δM : (SM ∪ FM ∪ {RED}) × (ΣM ∪ {«, »}) −→ (SM ∪ Using just defined operation we can describe the way FM ∪ {RED}) how the red-automaton M reduces a word w ∈ ΣM ∗ . ∗ ∗ ∆M : (SM ∪ FM ) × (ΣM ∪ {«, »}) −→ (SM ∪ FM ) The relation of reduction denoted by ⇒M is intro- RED is a new (helping) state which is different from duced in the following way: «w» ⇒M «w′ », ∗ all states from SM , and the set FM is defined in the if ∆∗M («w») = RED(n), and «w»/n = «w′ ». following way: If «w» ⇒M «w′ » holds, we say, that the automaton M ∗ FM = FM ∪ {RED(n · 0k ) | RED(n) ∈ FM a k ≥ 1} reduces the word w into the word w′ . We can see that Both functions δM , ∆M for all pairs created by a state |w| > |w′ |. s ∈ SM and by a symbol a ∈ (ΣM ∪ {»} are equal to The relation ⇒+ is the transitive closure of ⇒; ⇒∗ is the function fM . We define the new functions for the the reflexive and transitive closure of ⇒. remaining relevant pairs in the following way: Analysis by reduction by M is any sequence of re- ductions «w1 » ⇒ «w2 » ⇒ . . . ⇒ «wn », which cannot δM (s, «) = sM ∆M (s, «) = sM be further prolonged. If wn ∈ L0 (M ), we speak about accepting analysis by reduction, in the other case we and for all a ∈ (ΣM ∪ {»}), speak about rejecting analysis by reduction. Often we will speak about analysis instead of analysis by reduc- δM (ACC, a) = ACC ∆M (ACC, a) = ACC tion. δM (ERR, a) = ERR ∆M (ERR, a) = ERR Stages. Let us recall that each computation of a red- automaton is divided in stages. At the beginning of δM (RED(n), a) = RED ∆M (RED(n), a) = RED(n·0) each stage the head points at the leftmost item be- δM (RED, a) = RED hind the left sentinel, and the control unit is in the (re)starting state. The stage ends as the control unit The first enhancement of δM : gets to any final state (operation) from FM . There are ∗ ∗ ∗ δM (s, λ) = s, δM (s, ua) = δM (δM (s, u), a) three kinds of operations: ACC, ERR, and RED. Accord- The first enhancement of ∆M : ingly, we have accepting (ACC-), rejecting (ERR-), and ∆∗M (s, λ) = s, ∆∗M (s, ua) = ∆M (∆∗M (s, u), a) reducing (RED-)stages. Localization of (in)consistencies . . . 57 Recognized language. The language recognized by in stages. Therefore we can define the monotony for M is defined in the following way: robust analyzers in the same way as for reducing au- L(M ) = { w | «w» ⇒∗M «w′ », and w′ ∈ L0 (M ) }. tomata. Equivalences of red-automata. Two red-automata We divide auxiliary symbols into two types. Each M1 and M2 are equivalent, if L(M1 ) = L(M2 ). type serves to a different purpose: We will often consider a stronger equivalence. Let 1) for a transfer of local informations between different us suppose that for any w ∈ (ΣM ∗ ∗ ∪ ΣM )·{λ, »} hold stages, 1 2 at the same time 2) for a marking of correct and incorrect sub-words ∗ δM («w) = ACC ⇐⇒ δM ∗ («w) = ACC, of the analyzed word, and for marking of the place of 1 2 ∗ ∗ δM1 («w) = RED(n) ⇐⇒ δM2 («w) = RED(n). certain types of a reducing conflict. We say that M1 and M2 are strongly equivalent. The auxiliary symbols of the type 2 are called signs. We can see that if M1 and M2 are strongly equivalent Here we use two signs: then they are equivalent, as well. ! – for marking of incorrectnesses, ? – for marking of reducing ambiguities. Error and correctness preserving property. We understand under the reducing ambiguity a sub- We can see the following usefull property: word for which is obtained by the robust analysis an Lemma 1. If «w1 » ⇒M «w2 », then w1 ∈ L(M ), ex- ambiguous information about its current reduction. actly if w2 ∈ L(M ). There is a technical difference between reducing automata, and robust parsers. The behavior of the ro- Monotony. Monotony is an important property that bust parser is described by the following three func- enables to characterize the class of DCFL in terms tions: of monotonic reducing automata. This property was Transition function tA : SA × (XA ∪ YA ) −→ SA . introduced for restarting automata in [2], first. Infor- ta determines the state, into which will be transfered mally a red-automaton M is monotonic if the size of the finite control from its current state after scanning sequences of non-visited items (symbols) in individ- the symbol from the item visited by the working head. ual stages of any analysis by reduction by M is non- Inserting function iA : SA × (XA ∪ YA ) −→ IA . increasing. A monotonic reducing automaton will be iA assigns to a state, and to a scanned symbol an in- called a mon-red-automaton for short. As an example serting sequence from a (final) set IA ⊆ (YA · 0∗ )∗ . see Table 1. Inserting sequences serves in a similar way as re- ducing sequences. They describe the inserted auxiliary 3 Robust analyzer symbols, and their inserting positions. If the value of the inserting function is λ, it will be nothing inserted. We introduce for the robust analysis by reduction If the value is !, A inserts new item with the marking ! a new type of automaton – robust analyzer. Robust an- immediately before (to the left) the scanned item. The alyzer enhances the reducing automaton by the ability value ?000 says, that the marking ? should be inserted of inserting special auxiliary symbols, and by the abil- before the third item to the left before the scanned ity to read any input word about its input alphabet to item. the end. Robust analyzer A consists of finite control Reducing function rA : SA × (XA ∪ YA ) −→ RA . unit with a finite set of states SA , and from a working rA assigns to a state, and to a scanned symbol a re- head connecting the finite control unit with a linear ducing sequence from a (final) set RA ⊆ (1 · 0∗ )∗ . list of items. The list of items is bounded by a left Reducing sequence is here interpreted in the same and right sentinels « and ». All other items contain way as for reducing automata. a symbol from a finite input alphabet XA , or of a finite A step of the robust analyzer A consists from the auxiliary alphabet YA . These alphabets are mutually sequence of the following actions: disjunct. Shift to the right. Analyzer A starts each step by Robust analyzer is able to delete some items from a shift to the right of its working head to the next the list (operation of reduction). Deleted can be the item of the working list. item visited by the working head and some items po- Application of the inserting function. iA by the sitioned not far to the left from the working head. current situation (state, symbol) determines the Operations of reductions are controlled by reducing inserting sequence Is. A controlled by Is inserts sequences. Each operation of reduction is followed by new auxiliary symbols. a restart, i.e., transfer of the control unit into the Application of the reduction function. rA deter- (re)starting state sA , and a placement of the work- mines by the current situation the current reduc- ing head on the left sentinel «. It means, that the ro- ing sequence Rs. If RS is non-empty, A controlled bust analyzer, similarly as reducing automaton, works by Rs reduces (deletes) determined items from the 58 Martin Procházka, Martin Plátek working list. After such a reduction A finishes the of any word w ∈ Σ ∗ is contained in some correct core step by a restart, i.e., moves the working head on of this word. the left sentinel «, and transfers the control unit Prefix consistence is the longest correct prefix v of into the (re)starting state sA . Then a new stage the analyzed word w. Prefix inconsistence is the short- will be started on the reduced list. est incorrect prefix of the analyzed word w, i.e., it is If Rs is empty then A does not perform any reduc- the prefix va of w, where a ∈ Σ. Post-prefix consis- tion, neither the restart, and it finishes the step by tence is a suffix x of a correct core behind (to the right the following action. of) the prefix consistence, or behind some of the pre- Application of the transition function. tA deter- vious post-prefix consistencies. We assign to the post- mines by the current situation the new state q. prefix consistency x the incorrect sub-word xa of w. Then A continues from the state q by a further We say that xa is a post-prefix inconsistence of w (with step of the current stage. respect to L). Our effort in the following is to deterministically, The last difference of A from reducing automata in a monotonic way, and exactly to localize the prefix consists in the fact that A contains only one halt- and post-prefix (in)consistencies in the analyzed words ing state END ∈ SA . We will see that the signs of A from DCFL. will refine the ability of accepting and rejecting by the states ACC and ERR of reducing automata. For this purpose we observe the signs ! and ? inserted in the 3.2 Post-prefix robust analyzer A different stages of the computation into the gradually Prefix consistence. A red-automaton M is prefix- reduced working list. We will project the signs into consistent when for each word u and each symbol a the original input list in such a way that we will in- (including the right sentinel) it holds the following: if sert the signs into the same positions (i.e. before the ∆M (sM , u) ∈ SM and ∆M (sM , ua) 6= ERR, then ua is same items) into which they were inserted during the a prefix of some word from L(M ) · {»}. individual stages of the computation (robust analysis) The following proposition is derived from the main by A. result from [2]. The detailed proof is in [6]. There is We denote by pA (w) a word w enriched by the also connected with some other propositions. signs (in the way mentioned above) inserted into the list during the analysis by the robust analyzer A. We Proposition 1. Monotone, prefix consistent, red- will later formulate our results using this denotation. automata characterize the class of DCFL. As an example see Fig.1. We can consider pA (w) as the output word of A. It is shown in [7] that the notion of red-automata is useful for the techniques of minimization. There is to any red-automaton M constructed an unambiguously 3.1 Prefix and post-prefix (in)consistencies determined state-minimal red-automaton Mm which Assumption. We assume in the following that is strongly equivalent with M . L ⊆ Σ ∗ , and any symbol of Σ is a symbol of some We will show informally in the next part a method word from L. how construct for a given monotone, prefix-correct, We call a word v inconsistent (incorrect) with re- state-minimal reducing automaton M a robust ana- spect to the language L ⊆ Σ ∗ , if for any u, w ∈ Σ ∗ is lyzer A which determines in any word w ∈ «ΣM ∗ » the uvw 6∈ L. prefix-(in)consistence, and (not obligatory all) post- We can see that incorrect words can obtain proper prefix (in)consistencies. We suppose for the construc- incorrect sub-words. This fact lead us to the following tion that L(M ) 6= ∅. notion. At first A will use the prefix-consistency of M for We say that a word v is an incorrect core of the word w the finding of the prefix-(in)consistency of the ana- with respect to the language L, if it is a subword of w, lyzed word w. if it is incorrect with respect to the language L, and if Such a situation can occur after one, or after more it is minimal by the ordering “to be a sub-word”. stages if M will be transfered into the final rejecting On the other hand, a word v is a correct sub-word state ERR. The computation (analysis) of M on the of a word w with respect to the language L, if w = xvy, word w until this moment we describe in the following and for some x′ , y ′ is x′ vy ′ ∈ L. We say that v is way: a correct core of a word w with respect to the language 1) At first M (possibly) gradually reduces the word w L, if it is a correct sub-word of w with respect to L, into the word w′ , i.e., w ⇒∗M w′ . and it is maximal by the ordering “o be a sub-word”. 2) Then in the next stage M transfers over some pre- The assumption that each symbol of Σ is a symbol of fix x of the word w′ into some non-final state s ∈ SM , ∗ some word of the language L ensures that each symbol i.e., δM (sM , x) = s ∈ SM , Localization of (in)consistencies . . . 59 3) Finally from the state s transfers over the next sym- its working head. The automaton A will look for a new bol a into the final state ERR, i.e., δM (s, a) = ERR. post-prefix (in)consistency behind (to the right from) We can see that A has founded by the previous sim- the currently inserted sign !. A will take instead of ulation of M the prefix inconsistency of w. For mark- the set {ERR} as the current value of the set S the ing of the prefix inconsistency A inserts the sign ! set δM (SM , a), where a is the symbol scanned by the between the correct prefix «x and the symbol a. working head. A will continue in the robust analysis The prefix-consistency of M ensures that M has of the remaining suffix by the schema of the cycle C1 . visited in the last step described above the symbol a An ambiguous reduction. S does not contain at the first time. Therefore if w′ = «xay for some y ACC, and either does contain two different reducing then ay is a suffix of the original input word w. states of M , or does contain at least one non-final Let us now informaly describe how A continues in state, and at least one reducing state; i.e., the robust analysis over the mentioned suffix ay of the ACC 6∈ S, and ∃n : RED(n) ∈ S 6⊆ {RED(n), ERR}. word w. We say that S fulfilling the condition above is an am- We will use the function δM for this aim. This func- biguous set. tion was introduced as an enhancement of the transi- The task for A is to work without false inconsistency tion function fM . It describes not only the transfers messages. From that reason A separates the ambigu- between the individual states, but also the tranfers be- ous part from the remaining suffix. It inserts the sign tween the indiviual subsets of the set SM ∪FM ∪{RED}, for the ambiguity ? in the place of the current ambi- i.e., of the set of all final and non-final states, and of guity, i.e., immediately to the left from the position a special state RED. We will use it in the following in of the working head (if the sign is not already placed order to describe the all possible (partial) computation there in some of the previous stages). At this moment of M over the suffix ay at the same time. A takes for the set S the complete set SM , and con- We let A to compute the function δM over the suffix tinues in the robust analysis behind the sign ? by the ay = a0 a1 . . . a|y| starting from the set SM of all non- scheme of the cycle C1 . final states of M . A will control the computation in An unambiguous reduction. The set S con- the following way. Let us initially take the set SM as tains exactly one reducing operation, and possibly be- a set further denoted as SI . side it the final state ERR; i.e., ∃n : RED(n) ∈ S ⊆ Let us denote the following part of the computation {RED(n), ERR}. of A as a cycle C1 . The cycle C1 is performed until for Let us denote as u the sub-word which is created ∗ by the input symbols positioned between the last sign the set S = δM (SI , a0 . . . ai ), where 0 ≤ i ≤ |y|, holds that ∅ ⊂ S ⊆ SM ∪ {ERR}, and S contains some non- ! or ?, and the position of the working head includ- final state. Then A performs the following action: the ing the scanned symbol. The sub-word u is because of head of A will be placed to the next item to the right, the prefix-consistency, and because of the state min- and as (the current value of) the set S will be taken imality of M a sub-word of some word from L(M ). the set δM (S, ai+1 ). Here ends the description of C1 . Moreover, the u is reduced in any word w ∈ L(M ) of The core of the post-prefix analysis by A are the the form w = vux by the reducing seguence n, i.e, the following four cases where is not fulfilled the condition reducing sequence and the position of the reduction for the continuation of the computation by cycle C1 . are determined unambiguously. A will reduce also by Correct suffix. The set S contains the accepting n, but only the symbols from u if we consider the case state ACC; i.e., ACC ∈ S. that n can be longer then u. If ACC ∈ S, then the current suffix of the analyzed Observation. The reducing sequence n deletes at word w by A is a suffix of some word from L(M ). least one symbol from u. This observation follows from Therefore, the current suffix cannot contain any fur- the unambiguity of the reduction of u. Let us sup- ther inconsistency. The work of A on w is finished at pose the opposite. Then for some z, and n′ is n = this moment. n′ · 0|u| , and δM ∗ (sM , zu) = RED(n). ′ An unambiguous inconsistency. The set S con- Let z be the shortest z with the properties described tains a single state – the rejecting state ERR; i.e., above. Since M is prefix-correct «z ′ u is a prefix of some S = {ERR}. word from «L(M )». In the next stage occurs one of the following variant: All the possible computations of M over the ∗ word w behind the previous inconsistency has ended δM (sM , (z ′ /n′ ) · u) = RED(n′′ ) for some n′′ at the same time in the state ERR. We have found ∗ δM (sM , (z ′ /n′ ) · u) = RED a suffix of a correct core of the analyzed word, i.e., one ∗ δM (sM , (z ′ /n′ ) · u) = ACC of its post-prefix (in)consistency. At this moment A ∗ inserts the sign ! immediately before the position of δM (sM , (z ′ /n′ ) · u) ∈ SM 60 Martin Procházka, Martin Plátek Each of the presented variant leads to a contradiction pair of inserted set U is bounded by the characteristic with the unambiguity of the reduction of the word u. constant kM which ensures that U is finite. If occurs the first one, then n′′ reduces some symbol of Let us note that A stores in its finite control a suit- the sub-word u (since z ′ cannot be shorter), therefore able suffix of its working list before the position of its n′′ 6= n and RED(n′′ ) ∈ S 6⊆ {RED(n), ERR}. By the working head. The length of this suffix need not be remaining variants is the contradiction obvious. longer then 2 · kM . It contains the input items, and Apart from the reduction of u by n, A will insert the inserted values of the set S in the last 2 · kM steps. into the list a new item with an auxiliary symbol – the set U of pairs of an internal state and a word over an Recall, that we suppose that the automaton M is input alphabet of the length kM at most. This auxil- minimal and prefix-consistent. The minimality of M iary symbol will be used in the next stage to adjust the ensures that each state of M is reachable. The state- set of states computed by the function δM . Our goal is reachability, and the prefix-consistence of the automa- to avoid situation when δM ∗ ∗ (s, u) = ERR 6= δM (s, u/n) ton M ensure for any word u, that u is a sub-word of ∗ for some s ∈ SM . Such internal states s must be elim- some word of L(M ), if δM (SM , u) 6⊆ {ERR, RED}. inated. The set U is defined by the following way: If A inserts the sign ! or ? immediately before the right sentinel » it finishes its computation. Since we – If |u| ≥ |n|, then A reduces the working list of suppose that L(M ) is non-empty, is » a suffix of some items by n and A puts a new item with the auxil- word of the language «L(M )» (i.e.,L(M ) with sen- iary symbol U just in front of the leftmost deleted tinels). item. U = {(s, λ) | ∃s′ ∈ SM : δM ∗ (s′ , u1 ) = ∗ s and δM (s, u2 ) = RED(n)} where u = u1 u2 and Now we have outlined the behavior of A in the first |u2 | = |n|. stage after the localization of the prefix-inconsistence. – If |u| < |n|, then A cannot reduce by the whole n as In the next stages we need also to consider the signs such a reduction would impact a part of the work- and the other auxiliary symbols inserted in the previ- ing list in front of u; this part would be reduced by ous stages. We will not describe here these details. n1 such that n = n1 n2 and |n2 | = |u|. But a part of the working list in front of the rightmost marker We illustrate the outlined method by the following ! or ? can be reduced in some word of L(M ) in example. other way or even not at all. So, A will reduce Example 1. We explain the presented method by two items behind the rightmost marker ! or ? by the prefix-consistent, state-minimal mon-red-automata. reducing sequence n2 and it will insert a new item We will present two different robust analyses of the containing an auxiliary symbol U just to the right following inconsistent word «a + +(a)+)(a». ∗ of the rightmost marker. U = {(s, x) | ∃v ∈ ΣM : The transition functions of the automata M1 ∗ x = v/n1 a |v| = |n1 | a δM (s, vu) = RED(n)}. and M from [6] which we use in this example are 3 Insertion of the set U into the working list is impor- defined by the tables in the figure 1. We do not present tant as it ensures the continuity of subsequent stages here the automaton M2 from [6]. of computation. In next stage, A will use this set to The robust reduction analyses of the considered adjust the set S of internal states computed by func- word are on figures 1a, and 1b. Both figures contain tion δM . As soon as A reach the item with U , it substi- also the input word enriched by the signs ! and ? on tute S by S ′ = {δM ∗ (s, v) | (s, v) ∈ U }. It guarantees the corresponding places. that A enter a part of the list impacted by the last Let us note that in [6] is a detailed description reduction in such states only that led to the last re- of a construction which constructs to a given prefix duction of u by n resp. n2 . consistent, state minimal mon-red-automaton M its A uses just defined set U in such a case only when robust analyzer A. This construction implements the the new item with U is inserted just behind an item outlined method. The robust analyzer A is by this con- containing a symbol of the input alphabet or a marker. struction given unambiguously for a given M . Otherwise, when this item contains an auxiliary sym- bol U ′ different from both markers, then (instead of insertion of U ) A replaces U ′ with U computed in the 3.3 Guarantees of the presented method following way: We formulate the guarantees of the presented method U = {(s, x) | ∃y, (s′ , x′ ) ∈ U ′ : x = yx′ /n1 , and as theorems. The detailed proofs can be found in [6]. ∗ δM (s, y) = s′ , and δM∗ (s′ , x′ u) = RED(n) , andTheorem 1. Let M be a prefix consistent, state- ′ |y| = max{0, |n1 | − |x |}}, minimal mon-red-automaton and A its post-prefix ro- where n1 is a prefix of n of the length |n| − |u|. In bust analyzer. For any w ∈ «Σ ∗ » the following propo- all cases, the length of the word x contained in any sitions hold: Localization of (in)consistencies . . . 61 1. The analyzer A reads (in one or more stages) the a + ( ) » complete word «w» and finishes its computation ⇒ s0 s1 ERR s2 ERR ERR in a special state END. Any computation of A is s1 ERR RED(11) ERR ERR ACC monotone. s2 s3 ERR s2 ERR ERR 2. If pA («w») does not contain any sign !, then s3 ERR s4 ERR RED(101) ERR s4 s5 ERR s2 ERR ERR pA («w») = «w» and w is from L(M ). s5 ERR s4 ERR RED(110) ERR 3. If «w» ∈ «L(M )» then pA («w») = «w». 4. If «u! is a prefix of pA («w») and u does not con- (a) automaton M1 reducing the word tain the sign ! then u does not contain the sign ? a+ without brackets around it “from as well, and «u is the longest correct-prefix of the the left” and the +a with brackets around “from the right” word «w» with respect to the language «L(M )». 5. If !u! or ?u! is a sub-word of the word pA («w») a + ( ) » and u does not contain any sign ! or ? then u is ⇒ s0 s1 ERR s2 ERR ERR a suffix of some corect core of the word «w» with s1 ERR RED(11) ERR ERR ACC respect to the language «L(M )». s2 s3 ERR s2 ERR ERR 6. If !u» or ?u» is a suffix of the word pA («w»), s3 ERR RED(11) ERR RED(101) ERR and u does not contain any sign ! or ? then u» is a sufix of some word from «L(M )». (b) (strongly) monotone automaton 7. If !u? or ?u? is a sub-word of pA («w») and u does M3 reducing a+ only “from the left” not contain any sign ! or ? then u is a sub-word Table 1: The transition functions for M1 and M3 . of some word from «L(M )». We will discuss the meaning of the sign ?, and we will formulate properties of the mon-red-automaton M « a + + ( a ) + ) ( a » s0 s1RED(11) which ensure that its post-prefix robust analyzer A does not use the sign ? at all. This sign serves as the « ! + ? ( a ) + ) ( a » right sentinel for the correct sub-words of L(M ) which s0 s4 s2 sRED(101) 3 RED(11) lead A to some of two following types of a reduction « ! + ? a + ? ) ? ( a ! » conflict. Let u be such a sub-word which is followed by s0 SM s SM s1 s4 s2 s3 4 RED(11) s3 s5 RED(11) RED(11) RED(101) ACC ?. The first type of the conflict means that there are 0 1 2 3 4 5 6 7 8 9 10 11 two different words of L(M ) containing u which lead « a + ! + ? ( a ) + ? ) ? ( a ! » M by reading the complete u and its prefixes to two (a) The robust analyzer of M1 different reductions. The second type of the conflict means that there is a transfer trough u by M which leads to a reduction, and at the same time there is an « a + + ( a ) + ) ( a » another transfer leading to the shift to the right of M s0 s1RED(11) from u to the next symbol. Now we gradually introduce the notions of unam- « ! S′ + ( a ) + ) ( a » biguously reducible sub-word and unambiguously re- s0 RED(11) ducing red-automaton, and we will show that a post- « ! S′ ( a ) + ) ( a » prefix robust analyzer A of M which is unambiguously s0 SM S ′ s2 s3RED(101) reducing, need not to use the sign ? at all. We say that a sub-word w is reducible by M if for « ! S′ a + ) ( a » some n holds that RED(n) ∈ δM ∗ (SM , w). s0 SM S ′ s1 s3 RED(11) We say that a sub-word w is unambiguously re- « ! S′ a ! S ′′ ) ( a » ducible by M if for some n holds that RED(n) ∈ ∗ s0 SM S ′ s1 s3 RED(101) δM (SM , w) ⊆ {RED(n), ERR}. We say that a sub-word which is reducible but it « ! S′ a ! S ′′ ! ( a ! » is not unambiguously reducible is an ambiguously re- s0 SM S ′ s1 SM S ′′ s2 s3 ACC s3 ducible sub-word. 0 1 2 3 4 5 6 7 8 9 10 11 We say that M is unambiguously reducing if any of « a + ! + ( a ) + ! ) ! ( a ! » its reducible sub-words is unambiguously reducible. (b) The robust analyzer of M3 Example 2. The automaton M1 from the example 1 Fig. 1: The robust analysis of «a++(a)+)(a». is not unambiguously reducing. All its ambiguously reducible sub-words are presented in Table 2a. Let us 62 Martin Procházka, Martin Plátek note the length of this words is not limited by any create a very special type of (in)consistencies consid- constant, since for any i ≥ 0 holds that ered by the method presented in this paper. In the ∗ δM 1 (SM1 , (+a)i +) = {ERR, RED(110), s4 }. close future we will show that the set of languages rec- The minimal unambiguously reducible sub-words ognized by unambiguously reducing, prefix consistent, of M1 are in Table 2b. state-minimal mon-red-automata creates a proper subclass of DCFL. sub-word w ) a) + a+ +a+ . . . ∗ δM 1 (S M 1 , w) RED(110) RED(110) RED(11) RED(11) RED . . . References RED(101) RED(101) s4 s4 s4 (a) ambiguously reducible sub-words 1. G. V. Cormack: An LR substring parser for noncor- recting syntax error recovery. In: Proc. of PLDI ’89, sub-word w +a) (a) «a+ 1989, 161–169. ∗ δM (S , w) RED(110) RED(101) RED(11) 2. P. Jančar, F. Mráz, M. Plátek, J. Vogel: Restarting au- 1 M 1 tomata. In Proc. FCT’95, Dresden, Germany, August (b) minimal unambiguously reducible sub- 1995, LNCS 965, Springer Verlag 1995, 283–292. words 3. F. Otto: Restarting automata. In: Z. Ésik, C. Martin- Table 2: Reducible sub-words of M1 . Vide, and V. Mitrana (Eds.), Recent Advances in For- mal Languages and Applications, Studies in Compu- tational Intelligence, Vol. 25, Springer, Berlin, 2006, 269–303. We can see that in this example the robust ana- 4. Gh. Pǎun: Marcus Contextual Grammars, Kluwer, lyzer of M3 has founded all inconsistencies in the word Dordrecht, Boston, London, 1997. «a + +(a)+)(a»., i.e., it has not used the sign ? at all. 5. M. Plátek: Construction of a robust parser from a de- This example illustrates the fact that M3 is an unam- terministic reduced parser. Kybernetika 33(3), 1997, biguously reducing red-automaton. We can see that 311–332. 6. M. Procházka: Redukční automaty a syntaktické chyby. directly from Table 1b. (in Czech) Text for PhD Dissertation, it will be to The meaning of the reducing unambiguity for the achieve soon on the web. 7. M. Procházka: Concepts of syntax error recovery for localization of the post-pefix inconsistencies summa- monotonic reducing automata. MIS 2004, 94–103, rizes the following theorem. http://ulita.ms.mff.cuni.cz/pub/MIS/MIS2004.pdf Theorem 2. Let M be a prefix consistent, state-mini- 8. M. Procházka, M. Plátek: Redukční automaty – mono- tonie a redukovanost. ITAT 2002, 23–32. mal mon-red-automaton and A its post-prefix robust 9. M. Procházka, M. Plátek: Redukční automaty a syn- analyzer. If M is at the same time unambiguously re- taktické chyby. Proceedings of ITAT 2011 (in Czech), ducing then its post-prefix robust analyzer A does not 2011, 23–30. use the sign ? at all. That means, that A in any word ∗ from «ΣM » determines the prefix inconsistency and all its post-prefix inconsistencies with respect to the language L(M ). Corollary 1. Let M be a mon-red-automaton which is at the same time prefix-consistent and state- minimal, and A be its robust analyzer. Then there is a deterministic push-down transducer which translates ∗ any word w from ΣM on the word pA (w). 4 Conclusion The presented method can be considered as a direct generalization of the method presented in [9], and as an essential refinement and a generalization of the method from [5]. The method in [9] is based on mono- tone reducing automata, the method in [5] is based on (monotone) list automata with auxiliary symbols. Both the methods are based on the so called head- symbols. The head-symbol (in)consistencies from [9]