-

Localization of (in)consistencies by monotone reducing automata

Martin Procházka

Martin Plátek

0 0 Charles University, MFF UK, Department of Computer Science Malostranské náměstí 25 , 118 00 Praha 1 , Czech Republic

55 62

A reducing automaton (red-automaton) is prefix-inconsistency (prefix error) in w. The prefix ina deterministic automaton proposed for checking word and consistency is the minimal incorrect prefix of w. Let sub-word correctness by the use of analysis by reduction. Its x ∈ Σ be the leftmost symbol in the word w such that monotone version characterizes the class of deterministic w = uxv, u, v ∈ Σ∗, there exists a word v′ ∈ Σ∗ for context-free languages (DCFL). We propose a method for which it holds uv′ ∈ L and there is no word v′ such a construction of a deterministic monotone enhancement that uxv′ ∈ L. The u is the maximal correct prefix ohfelapnyofmsopnecoitaolnaeurxeidluiacriyngsyamutboomlsattoonlowchailcizheisitasbplerewfiixthatnhde of w, and ux the prefix-inconsistency of w. post-prefix (in)consistencies, and certain types of reducing In a similar way we can consider (in)correct infixes conflicts. In other words this method ensures a robust anal- for a given language L and a word w 6∈ L. We can easysis by reduction without spurious error messages. We for- ily see that a prefix-inconsistence can occur in a word mulate natural conditions for which this method ensures at most once. Our effort is to study properties of rethe localization of all prefix and post-prefix inconsistencies ducing automata which will ensure the detection of in any (incorrect) word with respect to a DCFL. (in)correct prefixes, and/or certain types of (in)correct infixes. The types of the (in)correct infixes studied here are studied by different techniques already in [1]. 1 Introduction We call them here post-prefix (in)consistencies. In this paper we use the advantage of the fact, that A reducing automaton (red-automaton for short) is to any deterministic context-free language L there is a device that models the so called analysis by reduc- a monotone reducing automaton recognizing L which tion. Analysis by reduction consists in a stepwise sim- is also able to detect the prefix inconsistency (error). plification of an extended sentence (word) until a sim- Such type of automaton characterizes the class of ple sentence (word) is obtained or an error is found. It DCFL. This fact is shown in [6]. is based on another automata model - restarting au- The paper is structured as follows. First, in Sectomaton (R-automaton) introduced in [2]. Similarly to tion 2 we introduce red-automata and their basic propR-automaton, red-automaton can only delete symbols. erties. The Section 3 creates the core of this paper. At some place it decides to delete some of the last k Conclusion contains some remarks about connections visited symbols, where k is limited by a fixed constant of the presented method with the methods based on and then restarts its computations, i.e. it enters its the so called head-symbols. initial state and its head is placed on the left end of the remaining word. 2 Definitions and basic properties Reducing automaton is formalized as an extension of deterministic finite automaton. This kind of formalization serves here as a basic tool for the method The reducing automaton has a finite control unit and of algorithmic localization of syntactic inconsistencies a working head attached to a list with sentinels on (errors) for the languages from the class of DCFL. both ends. It works in certain cycles called stages. At The notion of red-automata was introduced in order to the beginning of each stage, the head points at the present naturaly the techniques of minimization. In [7] leftmost item behind the left sentinel, and the control we construct to any red-automaton M an unambigu- unit is in a special initial state. In the process of the ously determined minimal red-automaton Mm which stage the automaton moves the head from the item it preserves the recognized language, and the set of all currently points to the next item on the right. Durreductions defined by M . ing such a transition it changes the state of its control For a given language L and a word w 6∈ L, it unit according to the current state and the currently is natural to define the maximal correct prefix and scanned symbol. The stage ends as the control unit gets to any of special states called operations. There ⋆ This work was supported by the grant projects of the are three kinds of operations: ACC, ERR, and RED. Both Grant Agency of the Czech Republic No. P202/10/1333 ACC and ERR-operation halts the whole computation, and P103/10/0783. ACC accepts and ERR rejects the word in the list. The

RED-operation RED(n) determines how the list should We will often use the following conventions: be shortened. Its parameter n – a binary word of a lim- δM∗ (sM , w) = δM∗ («w), Δ∗M (sM , w) = Δ∗M («w). ited size – specifies which item on the left of the head We define for the both function a further important are to be removed from the list. Bit 1 means “remove enhancement, namely for the final subsets S of the set the item from the list”, bit 0 means “leave the item SM ∪ FM ∪ {RED} resp. SM ∪ F M∗ : in the list”. After all items designated for deletion are δM∗ (S, u) = {δM∗ (s, u) | s ∈ S} removed, the automaton resets its control unit to the Δ∗M (S, u) = {Δ∗M (s, u) | s ∈ S} initial state and places the head at the leftmost item Let us note that the tuple behind the left sentinel. The string n ∈ (10∗)+ deter- (ΣM ∪ {«, »}, SM ∪ FM ∪ {RED}, sM , FM , δM ) is mines, which items will be deleted from the list. If the a finite automaton which accepts exactly all prefixes i-th symbol of n from the right is equal to 1, then the (words) which lead M to some reduction, or to an acautomaton deletes the i-th item to the left from the ceptation, or to a rejection. position of the head. The item scanned by the head is We will consider in the following only the reducing considered as the first one. automata which fulfills the following natural condi

All final states of a reducing automaton M create tion: a finite subset FM of the (unbounded) set { ACC, ERR }∪ δM∗ (sM , u) = RED(n) =⇒ |u| ≥ |n|. { RED(n) | n ∈ (1·0∗)+ }. Now we are able to introduce Let us take: L0(M ) = {w ∈ ΣM∗ | Δ∗M (« w ») = ACC}. reducing automata in a formal way. Constant. Let kM = max |n| RED(n) ∈ FM . We

A reducing automaton (red-automaton) is a 7-tuple call kM the characteristic constant of M . M = (ΣM , «, », SM , sM , FM , fM ), where ΣM is a fi- The operation of reduction. We will exactly denite input alphabet, «, » 6∈ ΣM are the (left and right) scribe a reduction of a word by a binary sequence with sentinels, SM is the finite set of internal states, the help of the following operation /: sM ∈ SM is the (re)starting state, FM is the finite set a/0 = a, a/1 = λ, λ/n = λ, u/λ = u, of final states (operations), fM : SM × (ΣM ∪ {»}) −→ (u · a)/(n · i) = (u/n) · (a/i), (SM ∪ FM ) is the transition function of M , which ful- where u ∈ Σ∗, a ∈ Σ, n ∈ (10∗)+ and i ∈ {0, 1}. The fills the following condition: size of the strings u, n is here unbounded, moreover u ∀s ∈ SM : fM (s, ») ∈ FM . can be longer then n, and vice versa. The reduction of We will describe the behavior of M in more details by the word (a) by the sequence 1 · 0 · 1 is given in the two functions enhancing the transition function fM : following way: (a)/101 = a. δM : (SM ∪ FM ∪ {RED}) × (ΣM ∪ {«, »}) −→ (SM ∪ Using just defined operation we can describe the way FM ∪ {RED}) how the red-automaton M reduces a word w ∈ ΣM∗ . ΔM : (SM ∪ F M∗ ) × (ΣM ∪ {«, »}) −→ (SM ∪ F M∗ ) The relation of reduction denoted by ⇒M is introRED is a new (helping) state which is different from duced in the following way: «w» ⇒M «w′ », all states from SM , and the set F M∗ is defined in the if Δ∗M («w») = RED(n), and «w»/n = «w′ ». following way: If «w» ⇒M «w′ » holds, we say, that the automaton M

F M∗ = FM ∪ {RED(n · 0k) | RED(n) ∈ FM a k ≥ 1} reduces the word w into the word w′ . We can see that Both functions δM , ΔM for all pairs created by a state |w| > |w′ |. s ∈ SM and by a symbol a ∈ (ΣM ∪ {»} are equal to The relation ⇒+ is the transitive closure of ⇒; ⇒∗ is the function fM . We define the new functions for the the reflexive and transitive closure of ⇒. remaining relevant pairs in the following way: Analysis by reduction by M is any sequence of reductions «w1» ⇒ «w2» ⇒ . . . ⇒ «wn», which cannot δM (s, «) = sM ΔM (s, «) = sM be further prolonged. If wn ∈ L0(M ), we speak about accepting analysis by reduction, in the other case we and for all a ∈ (ΣM ∪ {»}), speak about rejecting analysis by reduction. Often we will speak about analysis instead of analysis by reducδM (ACC, a) = ACC ΔM (ACC, a) = ACC tion. δM (ERR, a) = ERR ΔM (ERR, a) = ERR Stages. Let us recall that each computation of a redautomaton is divided in stages. At the beginning of δM (RED(n), a) = RED ΔM (RED(n), a) = RED(n·0) each stage the head points at the leftmost item beδM (RED, a) = RED hind the left sentinel, and the control unit is in the (re)starting state. The stage ends as the control unit The first enhancement of δM : gets to any final state (operation) from FM . There are δM∗ (s, λ) = s, δM∗ (s, ua) = δM (δM∗ (s, u), a) three kinds of operations: ACC, ERR, and RED. AccordThe first enhancement of ΔM : ingly, we have accepting (ACC-), rejecting (ERR-), and Δ∗M (s, λ) = s, Δ∗M (s, ua) = ΔM (Δ∗M (s, u), a) reducing (RED-)stages. Recognized language. The language recognized by in stages. Therefore we can define the monotony for M is defined in the following way: robust analyzers in the same way as for reducing auL(M ) = { w | «w» ⇒∗M «w′ », and w′ ∈ L0(M ) }. tomata.

Equivalences of red-automata. Two red-automata We divide auxiliary symbols into two types. Each M1 and M2 are equivalent, if L(M1) = L(M2). type serves to a different purpose:

We will often consider a stronger equivalence. Let 1) for a transfer of local informations between different us suppose that for any w ∈ (ΣM∗1 ∪ ΣM∗2 )·{λ, »} hold stages, at the same time 2) for a marking of correct and incorrect sub-words δM∗1(«w) = ACC ⇐⇒ δ∗M∗2(«w) = ACC, of the analyzed word, and for marking of the place of δM∗1 («w) = RED(n) ⇐⇒ δM2 («w) = RED(n). certain types of a reducing conflict.

We say that M1 and M2 are strongly equivalent. The auxiliary symbols of the type 2 are called signs. We can see that if M1 and M2 are strongly equivalent Here we use two signs: then they are equivalent, as well. ! – for marking of incorrectnesses, ? – for marking of reducing ambiguities.

Error and correctness preserving property. We understand under the reducing ambiguity a subWe can see the following usefull property: word for which is obtained by the robust analysis an Lemma 1. If «w1» ⇒M «w2», then w1 ∈ L(M ), ex- ambiguous information about its current reduction. actly if w2 ∈ L(M ). There is a technical difference between reducing automata, and robust parsers. The behavior of the roMonotony. Monotony is an important property that bust parser is described by the following three funcenables to characterize the class of DCFL in terms tions: of monotonic reducing automata. This property was Transition function tA : SA × (XA ∪ YA) −→ SA. introduced for restarting automata in [ 2 ], first. Infor- ta determines the state, into which will be transfered mally a red-automaton M is monotonic if the size of the finite control from its current state after scanning sequences of non-visited items (symbols) in individ- the symbol from the item visited by the working head. ual stages of any analysis by reduction by M is non- Inserting function iA : SA × (XA ∪ YA) −→ IA. increasing. A monotonic reducing automaton will be iA assigns to a state, and to a scanned symbol an incalled a mon-red-automaton for short. As an example serting sequence from a (final) set IA ⊆ (YA · 0∗)∗. see Table 1. Inserting sequences serves in a similar way as reducing sequences. They describe the inserted auxiliary 3 Robust analyzer symbols, and their inserting positions. If the value of the inserting function is λ, it will be nothing inserted.

We introduce for the robust analysis by reduction If the value is !, A inserts new item with the marking ! a new type of automaton – robust analyzer. Robust an- immediately before (to the left) the scanned item. The alyzer enhances the reducing automaton by the ability value ?000 says, that the marking ? should be inserted of inserting special auxiliary symbols, and by the abil- before the third item to the left before the scanned ity to read any input word about its input alphabet to item. the end. Robust analyzer A consists of finite control Reducing function rA : SA × (XA ∪ YA) −→ RA. unit with a finite set of states SA, and from a working rA assigns to a state, and to a scanned symbol a rehead connecting the finite control unit with a linear ducing sequence from a (final) set RA ⊆ (1 · 0∗)∗. list of items. The list of items is bounded by a left Reducing sequence is here interpreted in the same and right sentinels « and ». All other items contain way as for reducing automata. a symbol from a finite input alphabet XA, or of a finite A step of the robust analyzer A consists from the auxiliary alphabet YA. These alphabets are mutually sequence of the following actions: disjunct. Shift to the right. Analyzer A starts each step by

Robust analyzer is able to delete some items from a shift to the right of its working head to the next the list (operation of reduction). Deleted can be the item of the working list. item visited by the working head and some items po- Application of the inserting function. iA by the sitioned not far to the left from the working head. current situation (state, symbol) determines the Operations of reductions are controlled by reducing inserting sequence Is. A controlled by Is inserts sequences. Each operation of reduction is followed by new auxiliary symbols. a restart, i.e., transfer of the control unit into the Application of the reduction function. rA deter(re)starting state sA, and a placement of the work- mines by the current situation the current reducing head on the left sentinel «. It means, that the ro- ing sequence Rs. If RS is non-empty, A controlled bust analyzer, similarly as reducing automaton, works by Rs reduces (deletes) determined items from the working list. After such a reduction A finishes the of any word w ∈ Σ∗ is contained in some correct core step by a restart, i.e., moves the working head on of this word. the left sentinel «, and transfers the control unit Prefix consistence is the longest correct prefix v of into the (re)starting state sA. Then a new stage the analyzed word w. Prefix inconsistence is the shortwill be started on the reduced list. est incorrect prefix of the analyzed word w, i.e., it is If Rs is empty then A does not perform any reduc- the prefix va of w, where a ∈ Σ. Post-prefix consistion, neither the restart, and it finishes the step by tence is a suffix x of a correct core behind (to the right the following action. of) the prefix consistence, or behind some of the preApplication of the transition function. tA deter- vious post-prefix consistencies. We assign to the postmines by the current situation the new state q. prefix consistency x the incorrect sub-word xa of w. Then A continues from the state q by a further We say that xa is a post-prefix inconsistence of w (with step of the current stage. respect to L).

The last difference of A from reducing automata Our effort in the following is to deterministically, icnognssitsatsteinENtDhe∈ fSacAt. tWhaetwAill csoenetathinast otnhleysiognnes hoaflAt- iannda pmoostn-optroenfiixc (wina)yc,oannsdisteexnaccitelsyintot hloecaanliazleyzthede wproerfidxs will refine the ability of accepting and rejecting by the from DCFL. states ACC and ERR of reducing automata. For this purpose we observe the signs ! and ? inserted in the 3.2 Post-prefix robust analyzer A different stages of the computation into the gradually reduced working list. We will project the signs into Prefix consistence. A red-automaton M is prefixthe original input list in such a way that we will in- consistent when for each word u and each symbol a sert the signs into the same positions (i.e. before the (including the right sentinel) it holds the following: if same items) into which they were inserted during the ΔM (sM , u) ∈ SM and ΔM (sM , ua) 6= ERR, then ua is individual stages of the computation (robust analysis) a prefix of some word from L(M ) · {»}. by A. The following proposition is derived from the main

We denote by pA(w) a word w enriched by the result from [ 2 ]. The detailed proof is in [ 6 ]. There is signs (in the way mentioned above) inserted into the also connected with some other propositions. list during the analysis by the robust analyzer A. We Proposition 1. Monotone, prefix consistent, redwill later formulate our results using this denotation. automata characterize the class of DCFL. As an example see Fig.1. We can consider pA(w) as the output word of A.

It is shown in [7] that the notion of red-automata is

useful for the techniques of minimization. There is to 3.1 Prefix and post-prefix (in)consistencies any red-automaton M constructed an unambiguously determined state-minimal red-automaton Mm which Assumption. We assume in the following that is strongly equivalent with M .

L ⊆ Σ∗, and any symbol of Σ is a symbol of some We will show informally in the next part a method word from L. how construct for a given monotone, prefix-correct,

We call a word v inconsistent (incorrect) with re- state-minimal reducing automaton M a robust anaspect to the language L ⊆ Σ∗, if for any u, w ∈ Σ∗ is lyzer A which determines in any word w ∈ «ΣM∗ » the uvw 6∈ L. prefix-(in)consistence, and (not obligatory all) post

We can see that incorrect words can obtain proper prefix (in)consistencies. We suppose for the construcincorrect sub-words. This fact lead us to the following tion that L(M ) 6= ∅. notion. At first A will use the prefix-consistency of M for We say that a word v is an incorrect core of the word w the finding of the prefix-(in)consistency of the anawith respect to the language L, if it is a subword of w, lyzed word w. if it is incorrect with respect to the language L, and if Such a situation can occur after one, or after more it is minimal by the ordering “to be a sub-word”. stages if M will be transfered into the final rejecting

On the other hand, a word v is a correct sub-word state ERR. The computation (analysis) of M on the of a word w with respect to the language L, if w = xvy, word w until this moment we describe in the following and for some x′ , y′ is x′ vy′ ∈ L. We say that v is way: a correct core of a word w with respect to the language 1) At first M (possibly) gradually reduces the word w L, if it is a correct sub-word of w with respect to L, into the word w′ , i.e., w ⇒∗M w′ . and it is maximal by the ordering “o be a sub-word”. 2) Then in the next stage M transfers over some preThe assumption that each symbol of Σ is a symbol of fix x of the word w′ into some non-final state s ∈ SM , some word of the language L ensures that each symbol i.e., δM∗ (sM , x) = s ∈ SM , 3) Finally from the state s transfers over the next sym- its working head. The automaton A will look for a new bol a into the final state ERR, i.e., δM (s, a) = ERR. post-prefix (in)consistency behind (to the right from)

We can see that A has founded by the previous sim- the currently inserted sign !. A will take instead of ulation of M the prefix inconsistency of w. For mark- the set {ERR} as the current value of the set S the ing of the prefix inconsistency A inserts the sign ! set δM (SM , a), where a is the symbol scanned by the between the correct prefix «x and the symbol a. working head. A will continue in the robust analysis

The prefix-consistency of M ensures that M has of the remaining suffix by the schema of the cycle C1. visited in the last step described above the symbol a An ambiguous reduction. S does not contain at the first time. Therefore if w′ = «xay for some y ACC, and either does contain two different reducing then ay is a suffix of the original input word w. states of M , or does contain at least one non-final

Let us now informaly describe how A continues in state, and at least one reducing state; i.e., the robust analysis over the mentioned suffix ay of the ACC 6∈ S, and ∃n : RED(n) ∈ S 6⊆ {RED(n), ERR}. word w. We say that S fulfilling the condition above is an am

We will use the function δM for this aim. This func- biguous set. tion was introduced as an enhancement of the transi- The task for A is to work without false inconsistency tion function fM . It describes not only the transfers messages. From that reason A separates the ambigubetween the individual states, but also the tranfers be- ous part from the remaining suffix. It inserts the sign tween the indiviual subsets of the set SM ∪FM ∪{RED}, for the ambiguity ? in the place of the current ambii.e., of the set of all final and non-final states, and of guity, i.e., immediately to the left from the position a special state RED. We will use it in the following in of the working head (if the sign is not already placed order to describe the all possible (partial) computation there in some of the previous stages). At this moment of M over the suffix ay at the same time. A takes for the set S the complete set SM , and con

We let A to compute the function δM over the suffix tinues in the robust analysis behind the sign ? by the ay = a0a1 . . . a|y| starting from the set SM of all non- scheme of the cycle C1. final states of M . A will control the computation in An unambiguous reduction. The set S conthe following way. Let us initially take the set SM as tains exactly one reducing operation, and possibly bea set further denoted as SI . side it the final state ERR; i.e., ∃n : RED(n) ∈ S ⊆

Let us denote the following part of the computation {RED(n), ERR}. of A as a cycle C1. The cycle C1 is performed until for Let us denote as u the sub-word which is created the set S = δM∗ (SI , a0 . . . ai), where 0 ≤ i ≤ |y|, holds by the input symbols positioned between the last sign that ∅ ⊂ S ⊆ SM ∪ {ERR}, and S contains some non- ! or ?, and the position of the working head includfinal state. Then A performs the following action: the ing the scanned symbol. The sub-word u is because of head of A will be placed to the next item to the right, the prefix-consistency, and because of the state minand as (the current value of) the set S will be taken imality of M a sub-word of some word from L(M ). the set δM (S, ai+1). Here ends the description of C1. Moreover, the u is reduced in any word w ∈ L(M ) of

The core of the post-prefix analysis by A are the the form w = vux by the reducing seguence n, i.e, the following four cases where is not fulfilled the condition reducing sequence and the position of the reduction for the continuation of the computation by cycle C1. are determined unambiguously. A will reduce also by

Correct suffix. The set S contains the accepting n, but only the symbols from u if we consider the case state ACC; i.e., ACC ∈ S. that n can be longer then u.

If ACC ∈ S, then the current suffix of the analyzed Observation. The reducing sequence n deletes at word w by A is a suffix of some word from L(M ). least one symbol from u. This observation follows from Therefore, the current suffix cannot contain any fur- the unambiguity of the reduction of u. Let us supther inconsistency. The work of A on w is finished at pose the opposite. Then for some z, and n′ is n = this moment. n′ · 0|u|, and δM∗ (sM , zu) = RED(n).

An unambiguous inconsistency. The set S con- Let z′ be the shortest z with the properties described tains a single state – the rejecting state ERR; i.e., above. Since M is prefix-correct «z′ u is a prefix of some S = {ERR}. word from «L(M )». In the next stage occurs one of the following variant:

All the possible computations of M over the word w behind the previous inconsistency has ended at the same time in the state ERR. We have found a suffix of a correct core of the analyzed word, i.e., one of its post-prefix (in)consistency. At this moment A inserts the sign ! immediately before the position of δM∗ (sM , (z′ /n′ ) · u) = RED(n′ ) for some n′ δM∗ (sM , (z′ /n′ ) · u) = RED δM∗ (sM , (z′ /n′ ) · u) = ACC δM∗ (sM , (z′ /n′ ) · u) ∈ SM Each of the presented variant leads to a contradiction pair of inserted set U is bounded by the characteristic with the unambiguity of the reduction of the word u. constant kM which ensures that U is finite. If occurs the first one, then n′ reduces some symbol of Let us note that A stores in its finite control a suitthe sub-word u (since z′ cannot be shorter), therefore able suffix of its working list before the position of its n′ 6= n and RED(n′ ) ∈ S 6⊆ {RED(n), ERR}. By the working head. The length of this suffix need not be remaining variants is the contradiction obvious. longer then 2 · kM . It contains the input items, and

Apart from the reduction of u by n, A will insert the inserted values of the set S in the last 2 · kM steps. into the list a new item with an auxiliary symbol – the set U of pairs of an internal state and a word over an Recall, that we suppose that the automaton M is input alphabet of the length kM at most. This auxil- minimal and prefix-consistent. The minimality of M iary symbol will be used in the next stage to adjust the ensures that each state of M is reachable. The stateset of states computed by the function δM . Our goal is reachability, and the prefix-consistence of the automato avoid situation when δM∗ (s, u) = ERR 6= δM∗ (s, u/n) ton M ensure for any word u, that u is a sub-word of for some s ∈ SM . Such internal states s must be elim- some word of L(M ), if δM∗ (SM , u) 6⊆ {ERR, RED}. inated. The set U is defined by the following way: If A inserts the sign ! or ? immediately before the right sentinel » it finishes its computation. Since we – If |u| ≥ |n|, then A reduces the working list of suppose that L(M ) is non-empty, is » a suffix of some items by n and A puts a new item with the auxil- word of the language «L(M )» (i.e.,L(M ) with seniary symbol U just in front of the leftmost deleted tinels). item. U = {(s, λ) | ∃s′ ∈ SM : δM∗ (s′ , u1) = s and δM∗ (s, u2) = RED(n)} where u = u1u2 and Now we have outlined the behavior of A in the first |u2| = |n|. stage after the localization of the prefix-inconsistence. – If |u| < |n|, then A cannot reduce by the whole n as In the next stages we need also to consider the signs such a reduction would impact a part of the work- and the other auxiliary symbols inserted in the previing list in front of u; this part would be reduced by ous stages. We will not describe here these details. n1 such that n = n1n2 and |n2| = |u|. But a part of the working list in front of the rightmost marker We illustrate the outlined method by the following ! or ? can be reduced in some word of L(M ) in example. other way or even not at all. So, A will reduce Example 1. We explain the presented method by two items behind the rightmost marker ! or ? by the prefix-consistent, state-minimal mon-red-automata. reducing sequence n2 and it will insert a new item We will present two different robust analyses of the containing an auxiliary symbol U just to the right following inconsistent word «a + +(a)+)(a». of the rightmost marker. U = {(s, x) | ∃v ∈ ΣM∗ : The transition functions of the automata M1 x = v/n1 a |v| = |n1| a δM∗ (s, vu) = RED(n)}. and M3 from [ 6 ] which we use in this example are Insertion of the set U into the working list is impor- defined by the tables in the figure 1. We do not present tant as it ensures the continuity of subsequent stages here the automaton M2 from [ 6 ]. of computation. In next stage, A will use this set to The robust reduction analyses of the considered adjust the set S of internal states computed by func- word are on figures 1a, and 1b. Both figures contain tion δM . As soon as A reach the item with U , it substi- also the input word enriched by the signs ! and ? on tute S by S′ = {δM∗ (s, v) | (s, v) ∈ U }. It guarantees the corresponding places. that A enter a part of the list impacted by the last Let us note that in [ 6 ] is a detailed description reduction in such states only that led to the last re- of a construction which constructs to a given prefix duction of u by n resp. n2. consistent, state minimal mon-red-automaton M its

A uses just defined set U in such a case only when robust analyzer A. This construction implements the the new item with U is inserted just behind an item outlined method. The robust analyzer A is by this concontaining a symbol of the input alphabet or a marker. struction given unambiguously for a given M . Otherwise, when this item contains an auxiliary symbol U ′ different from both markers, then (instead of insertion of U ) A replaces U ′ with U computed in the 3.3 Guarantees of the presented method following way:

We formulate the guarantees of the presented method as theorems. The detailed proofs can be found in [6].

U = {(s, x) | ∃y, (s′ , x′ ) ∈ U ′ : x = yx′ /n1 , and δM∗ (s, y) = s′ , and δM∗ (s′ , x′ u) = RED(n) , and |y| = max{0, |n1| − |x′ |}}, Theorem 1. Let M be a prefix consistent, stateminimal mon-red-automaton and A its post-prefix rowhere n1 is a prefix of n of the length |n| − |u|. In bust analyzer. For any w ∈ «Σ∗» the following propoall cases, the length of the word x contained in any sitions hold:

a + ( ) » ⇒ s0 s1 ERR s2 ERR ERR s1 ERR RED(11) ERR ERR ACC s2 s3 ERR s2 ERR ERR s3 ERR s4 ERR RED(101) ERR s4 s5 ERR s2 ERR ERR s5 ERR s4 ERR RED(110) ERR (a) automaton M1 reducing the word a+ without brackets around it “from the left” and the +a with brackets around “from the right”

a + ( ) » ⇒ s0 s1 ERR s2 ERR ERR s1 ERR RED(11) ERR ERR ACC s2 s3 ERR s2 ERR ERR s3 ERR RED(11) ERR RED(101) ERR 1. The analyzer A reads (in one or more stages) the complete word «w» and finishes its computation in a special state END. Any computation of A is monotone. 2. If pA(«w») does not contain any sign !, then

pA(«w») = «w» and w is from L(M ). 3. If «w» ∈ «L(M )» then pA(«w») = «w». 4. If «u! is a prefix of pA(«w») and u does not contain the sign ! then u does not contain the sign ? as well, and «u is the longest correct-prefix of the word «w» with respect to the language «L(M )». 5. If !u! or ?u! is a sub-word of the word pA(«w») and u does not contain any sign ! or ? then u is a suffix of some corect core of the word «w» with respect to the language «L(M )». 6. If !u» or ?u» is a suffix of the word pA(«w»), and u does not contain any sign ! or ? then u» is a sufix of some word from «L(M )». 7. If !u? or ?u? is a sub-word of pA(«w») and u does not contain any sign ! or ? then u is a sub-word of some word from «L(M )».

We will discuss the meaning of the sign ?, and we will formulate properties of the mon-red-automaton M which ensure that its post-prefix robust analyzer A does not use the sign ? at all. This sign serves as the right sentinel for the correct sub-words of L(M ) which lead A to some of two following types of a reduction conflict. Let u be such a sub-word which is followed by ?. The first type of the conflict means that there are two different words of L(M ) containing u which lead M by reading the complete u and its prefixes to two different reductions. The second type of the conflict means that there is a transfer trough u by M which leads to a reduction, and at the same time there is an another transfer leading to the shift to the right of M from u to the next symbol.

Now we gradually introduce the notions of unambiguously reducible sub-word and unambiguously reducing red-automaton, and we will show that a postprefix robust analyzer A of M which is unambiguously reducing, need not to use the sign ? at all.

We say that a sub-word w is reducible by M if for some n holds that RED(n) ∈ δM∗ (SM , w).

We say that a sub-word w is unambiguously reducible by M if for some n holds that RED(n) ∈ δM∗ (SM , w) ⊆ {RED(n), ERR}.

We say that a sub-word which is reducible but it is not unambiguously reducible is an ambiguously reducible sub-word.

We say that M is unambiguously reducing if any of its reducible sub-words is unambiguously reducible.

Example 2. The automaton M1 from the example 1 is not unambiguously reducing. All its ambiguously reducible sub-words are presented in Table 2a. Let us « a + s0 s1RED(11)

+ ( a ) + ! S′ + ( a ) +

RED(11) « s0 « s0 « s0 « s0 « s0 ! S′ SM S′ ! S′ SM S′ ! S′ SM S′ ! S′ SM S′ ( a ) + s2 s3RED(101) a + s3 s1 RED(11) a s3 s1 a s3 s1 0 1 2 « a + ! 3 4 5 6 7 + ( a ) + ! 8 9 10 11 ) ! ( a ! » (b) The robust analyzer of M3 Fig. 1: The robust analysis of «a++(a)+)(a».

) ) ) ) ( a ( a ( a ( a ( a ! S′ )

RED(101) S!M SS′′ ! ( a ! » s2 s3 ACC » » » » » note the length of this words is not limited by any create a very special type of (in)consistencies considconstant, since for any i ≥ 0 holds that ered by the method presented in this paper. In the δM∗1 (SM1 , (+a)i+) = {ERR, RED(110), s4}. close future we will show that the set of languages recThe minimal unambiguously reducible sub-words ognized by unambiguously reducing, prefix consistent, of M1 are in Table 2b. state-minimal mon-red-automata creates a proper subclass of DCFL. sub-word w ) a) + a+ +a+ . . . ∗ δM1(SM1 , w) RED(110) RED(110) RED(11) RED(11) RED . . .

RED(101) RED(101) s4 s4 s4 (a) ambiguously reducible sub-words sub-word w +a) (a) «a+ ∗ δM1 (SM1 , w) RED(110) RED(101) RED(11) (b) minimal unambiguously reducible subwords

Table 2: Reducible sub-words of M1.

We can see that in this example the robust ana

lyzer of M3 has founded all inconsistencies in the word «a + +(a)+)(a»., i.e., it has not used the sign ? at all. This example illustrates the fact that M3 is an unambiguously reducing red-automaton. We can see that directly from Table 1b.

The meaning of the reducing unambiguity for the localization of the post-pefix inconsistencies summarizes the following theorem.

Theorem 2. Let M be a prefix consistent, state-minimal mon-red-automaton and A its post-prefix robust analyzer. If M is at the same time unambiguously reducing then its post-prefix robust analyzer A does not use the sign ? at all. That means, that A in any word from «ΣM∗ » determines the prefix inconsistency and all its post-prefix inconsistencies with respect to the language L(M ).

Corollary 1. Let M be a mon-red-automaton which is at the same time prefix-consistent and stateminimal, and A be its robust analyzer. Then there is a deterministic push-down transducer which translates any word w from ΣM∗ on the word pA(w). 4

Conclusion

The presented method can be considered as a direct

generalization of the method presented in [ 9 ], and as an essential refinement and a generalization of the method from [ 5 ]. The method in [ 9 ] is based on monotone reducing automata, the method in [ 5 ] is based on (monotone) list automata with auxiliary symbols. Both the methods are based on the so called headsymbols. The head-symbol (in)consistencies from [ 9 ]

G. V.

Cormack : An LR substring parser for noncorrecting syntax error recovery . In: Proc. of PLDI '89 , 1989 , 161 - 169 .

Jančar ,

Mráz ,

Plátek , J. Vogel: Restarting automata . In Proc. FCT'95 , Dresden , Germany, August 1995 , LNCS 965, Springer Verlag 1995, 283 - 292 .

Otto : Restarting automata . In:

Ésik , C. MartinVide, and V. Mitrana (Eds.), Recent Advances in Formal Languages and Applications, Studies in Computational Intelligence , Vol. 25 , Springer, Berlin, 2006 , 269 - 303 .

4. Gh. Pˇaun: Marcus Contextual Grammars, Kluwer, Dordrecht, Boston, London, 1997 .

5. M. Plátek: Construction of a robust parser from a deterministic reduced parser . Kybernetika 33 ( 3 ), 1997 , 311 - 332 .

6. M. Procházka: Redukční automaty a syntaktické chyby. (in Czech) Text for PhD Dissertation, it will be to achieve soon on the web .

7. M. Procházka: Concepts of syntax error recovery for monotonic reducing automata . MIS 2004 , 94 - 103 , http://ulita.ms.mff.cuni.cz/pub/MIS/MIS2004.pdf

Procházka , M. Plátek: Redukční automaty - monotonie a redukovanost . ITAT 2002 , 23 - 32 .

Procházka , M. Plátek: Redukční automaty a syntaktické chyby . Proceedings of ITAT 2011 (in Czech) , 2011 , 23 - 30 .