Localization of (in)consistencies by monotone reducing automata⋆

                                         Martin Procházka and Martin Plátek

                         Charles University, MFF UK, Department of Computer Science
     Malostranské náměstí 25, 118 00 Praha 1, Czech Republic prma@centrum.cz, martin.platek@mff.cuni.cz

Abstract. A reducing automaton (red-automaton) is        prefix-inconsistency (prefix error) in w. The prefix in-
a deterministic automaton proposed for checking word and consistency is the minimal incorrect prefix of w. Let
sub-word correctness by the use of analysis by reduction. Its
                                                         x ∈ Σ be the leftmost symbol in the word w such that
monotone version characterizes the class of deterministicw = uxv, u, v ∈ Σ ∗ , there exists a word v ′ ∈ Σ ∗ for
context-free languages (DCFL). We propose a method for   which it holds uv ′ ∈ L and there is no word v ′′ such
a construction of a deterministic monotone enhancement
                                                         that uxv ′′ ∈ L. The u is the maximal correct prefix
of any monotone reducing automaton which is able with the
                                                         of w, and ux the prefix-inconsistency of w.
help of special auxiliary symbols to localize its preﬁx and
post-preﬁx (in)consistencies, and certain types of reducing  In a similar way we can consider (in)correct infixes
                                                         for a given language L and a word w 6∈ L. We can eas-
conﬂicts. In other words this method ensures a robust anal-
                                                         ily see that a prefix-inconsistence can occur in a word
ysis by reduction without spurious error messages. We for-
mulate natural conditions for which this method ensures  at most once. Our effort is to study properties of re-
                                                         ducing automata which will ensure the detection of
the localization of all preﬁx and post-preﬁx inconsistencies
in any (incorrect) word with respect to a DCFL.          (in)correct prefixes, and/or certain types of (in)correct
                                                         infixes. The types of the (in)correct infixes studied
                                                         here are studied by different techniques already in [1].
1 Introduction                                           We call them here post-prefix (in)consistencies.
                                                             In this paper we use the advantage of the fact, that
A reducing automaton (red-automaton for short) is        to  any   deterministic context-free language L there is
a device that models the so called analysis by reduc-    a  monotone     reducing automaton recognizing L which
tion. Analysis by reduction consists in a stepwise sim-  is also  able  to detect the prefix inconsistency (error).
plification of an extended sentence (word) until a sim-  Such    type   of automaton characterizes the class of
ple sentence (word) is obtained or an error is found. It DCFL.     This  fact is shown in [6].
is based on another automata model – restarting au-          The    paper  is structured as follows. First, in Sec-
tomaton (R-automaton) introduced in [2]. Similarly to    tion  2 we  introduce  red-automata and their basic prop-
R-automaton, red-automaton can only delete symbols.      erties.  The   Section   3 creates the core of this paper.
At some place it decides to delete some of the last k    Conclusion     contains  some remarks about connections
visited symbols, where k is limited by a fixed constant  of the   presented   method   with the methods based on
and then restarts its computations, i.e. it enters its   the  so  called  head-symbols.
initial state and its head is placed on the left end of
the remaining word.
    Reducing automaton is formalized as an extension
                                                         2 Definitions and basic properties
of deterministic finite automaton. This kind of for-
malization serves here as a basic tool for the method The reducing automaton has a finite control unit and
of algorithmic localization of syntactic inconsistencies a working head attached to a list with sentinels on
(errors) for the languages from the class of DCFL. both ends. It works in certain cycles called stages. At
The notion of red-automata was introduced in order to the beginning of each stage, the head points at the
present naturaly the techniques of minimization. In [7] leftmost item behind the left sentinel, and the control
we construct to any red-automaton M an unambigu- unit is in a special initial state. In the process of the
ously determined minimal red-automaton Mm which stage the automaton moves the head from the item it
preserves the recognized language, and the set of all currently points to the next item on the right. Dur-
reductions defined by M .                                ing such a transition it changes the state of its control
    For a given language L and a word w 6∈ L, it unit according to the current state and the currently
is natural to define the maximal correct prefix and scanned symbol. The stage ends as the control unit
                                                         gets to any of special states called operations. There
 ⋆
   This work was supported by the grant projects of the are three kinds of operations: ACC, ERR, and RED. Both
   Grant Agency of the Czech Republic No. P202/10/1333 ACC and ERR-operation halts the whole computation,
   and P103/10/0783.                                     ACC accepts and ERR rejects the word in the list. The
56        Martin Procházka, Martin Plátek

RED-operation RED(n) determines how the list should             We will often use the following conventions:
                                                              ∗              ∗
be shortened. Its parameter n – a binary word of a lim-     δM  (sM , w) = δM  («w), ∆∗M (sM , w) = ∆∗M («w).
ited size – specifies which item on the left of the head        We define for the both function a further important
are to be removed from the list. Bit 1 means “remove        enhancement, namely for the final subsets S of the set
                                                                                                 ∗
the item from the list”, bit 0 means “leave the item        SM ∪ FM ∪ {RED} resp. SM ∪ FM          :
                                                                  ∗             ∗
in the list”. After all items designated for deletion are       δM  (S, u) = {δM  (s, u) | s ∈ S}
removed, the automaton resets its control unit to the           ∆∗M (S, u) = {∆∗M (s, u) | s ∈ S}
initial state and places the head at the leftmost item          Let us note that the tuple
behind the left sentinel. The string n ∈ (10∗ )+ deter-         (ΣM ∪ {«, »}, SM ∪ FM ∪ {RED}, sM , FM , δM ) is
mines, which items will be deleted from the list. If the    a finite automaton which accepts exactly all prefixes
i-th symbol of n from the right is equal to 1, then the     (words) which lead M to some reduction, or to an ac-
automaton deletes the i-th item to the left from the        ceptation, or to a rejection.
position of the head. The item scanned by the head is           We will consider in the following only the reducing
considered as the first one.                                automata which fulfills the following natural condi-
     All final states of a reducing automaton M create      tion:
                                                              ∗
a finite subset FM of the (unbounded) set { ACC, ERR }∪     δM  (sM , u) = RED(n)      =⇒       |u| ≥ |n|.
{ RED(n) | n ∈ (1·0∗ )+ }. Now we are able to introduce     Let us take: L0 (M ) = {w ∈ ΣM | ∆∗M (« w ») = ACC}.
                                                                                               ∗

reducing automata in a formal way.                          Constant. Let kM = max |n| RED(n) ∈ FM . We
     A reducing automaton (red-automaton) is a 7-tuple      call kM the characteristic constant of M .
M = (ΣM , «, », SM , sM , FM , fM ), where ΣM is a fi-      The operation of reduction. We will exactly de-
nite input alphabet, «, » 6∈ ΣM are the (left and right)    scribe a reduction of a word by a binary sequence with
sentinels, SM is the finite set of internal states,         the help of the following operation /:
sM ∈ SM is the (re)starting state, FM is the finite set         a/0 = a, a/1 = λ, λ/n = λ, u/λ = u,
of final states (operations), fM : SM × (ΣM ∪ {»}) −→       (u · a)/(n · i) = (u/n) · (a/i),
(SM ∪ FM ) is the transition function of M , which ful-     where u ∈ Σ ∗ , a ∈ Σ, n ∈ (10∗ )+ and i ∈ {0, 1}. The
fills the following condition:                              size of the strings u, n is here unbounded, moreover u
     ∀s ∈ SM : fM (s, ») ∈ FM .                             can be longer then n, and vice versa. The reduction of
We will describe the behavior of M in more details by       the word (a) by the sequence 1 · 0 · 1 is given in the
two functions enhancing the transition function fM :        following way: (a)/101 = a.
δM : (SM ∪ FM ∪ {RED}) × (ΣM ∪ {«, »}) −→ (SM ∪             Using just defined operation we can describe the way
FM ∪ {RED})                                                 how the red-automaton M reduces a word w ∈ ΣM      ∗
                                                                                                                 .
                 ∗                                 ∗
∆M : (SM ∪ FM      ) × (ΣM ∪ {«, »}) −→ (SM ∪ FM     )      The relation of reduction denoted by ⇒M is intro-
RED is a new (helping) state which is different from        duced in the following way: «w» ⇒M «w′ »,
                                      ∗
all states from SM , and the set FM     is defined in the   if ∆∗M («w») = RED(n), and «w»/n = «w′ ».
following way:                                              If «w» ⇒M «w′ » holds, we say, that the automaton M
       ∗
     FM   = FM ∪ {RED(n · 0k ) | RED(n) ∈ FM a k ≥ 1}       reduces the word w into the word w′ . We can see that
Both functions δM , ∆M for all pairs created by a state     |w| > |w′ |.
s ∈ SM and by a symbol a ∈ (ΣM ∪ {»} are equal to           The relation ⇒+ is the transitive closure of ⇒; ⇒∗ is
the function fM . We define the new functions for the       the reflexive and transitive closure of ⇒.
remaining relevant pairs in the following way:              Analysis by reduction by M is any sequence of re-
                                                            ductions «w1 » ⇒ «w2 » ⇒ . . . ⇒ «wn », which cannot
          δM (s, «) = sM         ∆M (s, «) = sM             be further prolonged. If wn ∈ L0 (M ), we speak about
                                                            accepting analysis by reduction, in the other case we
and for all a ∈ (ΣM ∪ {»}),                                 speak about rejecting analysis by reduction. Often we
                                                            will speak about analysis instead of analysis by reduc-
       δM (ACC, a) = ACC      ∆M (ACC, a) = ACC             tion.
        δM (ERR, a) = ERR   ∆M (ERR, a) = ERR               Stages. Let us recall that each computation of a red-
                                                            automaton is divided in stages. At the beginning of
     δM (RED(n), a) = RED ∆M (RED(n), a) = RED(n·0)         each stage the head points at the leftmost item be-
       δM (RED, a) = RED                                    hind the left sentinel, and the control unit is in the
                                                            (re)starting state. The stage ends as the control unit
The first enhancement of δM :                               gets to any final state (operation) from FM . There are
 ∗               ∗                ∗
δM (s, λ) = s,  δM (s, ua) = δM (δM (s, u), a)              three kinds of operations: ACC, ERR, and RED. Accord-
The first enhancement of ∆M :                               ingly, we have accepting (ACC-), rejecting (ERR-), and
∆∗M (s, λ) = s,   ∆∗M (s, ua) = ∆M (∆∗M (s, u), a)          reducing (RED-)stages.
                                                                         Localization of (in)consistencies . . .   57

Recognized language. The language recognized by             in stages. Therefore we can define the monotony for
M is defined in the following way:                          robust analyzers in the same way as for reducing au-
L(M ) = { w | «w» ⇒∗M «w′ », and w′ ∈ L0 (M ) }.            tomata.
Equivalences of red-automata. Two red-automata                  We divide auxiliary symbols into two types. Each
M1 and M2 are equivalent, if L(M1 ) = L(M2 ).               type serves to a different purpose:
    We will often consider a stronger equivalence. Let      1) for a transfer of local informations between different
us suppose that for any w ∈ (ΣM  ∗      ∗
                                     ∪ ΣM   )·{λ, »} hold   stages,
                                   1      2
at the same time                                            2) for a marking of correct and incorrect sub-words
     ∗
    δM   («w) = ACC ⇐⇒ δM  ∗
                               («w) = ACC,                  of the analyzed word, and for marking of the place of
       1                     2
 ∗                         ∗
δM1 («w) = RED(n) ⇐⇒ δM2 («w) = RED(n).                     certain types of a reducing conflict.
    We say that M1 and M2 are strongly equivalent.              The auxiliary symbols of the type 2 are called signs.
We can see that if M1 and M2 are strongly equivalent        Here we use two signs:
then they are equivalent, as well.                          ! – for marking of incorrectnesses,
                                                            ? – for marking of reducing ambiguities.
Error and correctness preserving property.
                                                            We understand under the reducing ambiguity a sub-
We can see the following usefull property:
                                                            word for which is obtained by the robust analysis an
Lemma 1. If «w1 » ⇒M «w2 », then w1 ∈ L(M ), ex-            ambiguous information about its current reduction.
actly if w2 ∈ L(M ).                                            There is a technical difference between reducing
                                                            automata, and robust parsers. The behavior of the ro-
Monotony. Monotony is an important property that            bust parser is described by the following three func-
enables to characterize the class of DCFL in terms          tions:
of monotonic reducing automata. This property was               Transition function tA : SA × (XA ∪ YA ) −→ SA .
introduced for restarting automata in [2], first. Infor-    ta determines the state, into which will be transfered
mally a red-automaton M is monotonic if the size of         the finite control from its current state after scanning
sequences of non-visited items (symbols) in individ-        the symbol from the item visited by the working head.
ual stages of any analysis by reduction by M is non-            Inserting function iA : SA × (XA ∪ YA ) −→ IA .
increasing. A monotonic reducing automaton will be          iA assigns to a state, and to a scanned symbol an in-
called a mon-red-automaton for short. As an example         serting sequence from a (final) set IA ⊆ (YA · 0∗ )∗ .
see Table 1.                                                    Inserting sequences serves in a similar way as re-
                                                            ducing sequences. They describe the inserted auxiliary
3   Robust analyzer                                         symbols, and their inserting positions. If the value of
                                                            the inserting function is λ, it will be nothing inserted.
We introduce for the robust analysis by reduction           If the value is !, A inserts new item with the marking !
a new type of automaton – robust analyzer. Robust an-       immediately before (to the left) the scanned item. The
alyzer enhances the reducing automaton by the ability       value ?000 says, that the marking ? should be inserted
of inserting special auxiliary symbols, and by the abil-    before the third item to the left before the scanned
ity to read any input word about its input alphabet to      item.
the end. Robust analyzer A consists of finite control           Reducing function rA : SA × (XA ∪ YA ) −→ RA .
unit with a finite set of states SA , and from a working    rA assigns to a state, and to a scanned symbol a re-
head connecting the finite control unit with a linear       ducing sequence from a (final) set RA ⊆ (1 · 0∗ )∗ .
list of items. The list of items is bounded by a left           Reducing sequence is here interpreted in the same
and right sentinels « and ». All other items contain        way as for reducing automata.
a symbol from a finite input alphabet XA , or of a finite       A step of the robust analyzer A consists from the
auxiliary alphabet YA . These alphabets are mutually        sequence of the following actions:
disjunct.                                                 Shift to the right. Analyzer A starts each step by
    Robust analyzer is able to delete some items from        a shift to the right of its working head to the next
the list (operation of reduction). Deleted can be the        item of the working list.
item visited by the working head and some items po- Application of the inserting function. iA by the
sitioned not far to the left from the working head.          current situation (state, symbol) determines the
Operations of reductions are controlled by reducing          inserting sequence Is. A controlled by Is inserts
sequences. Each operation of reduction is followed by        new auxiliary symbols.
a restart, i.e., transfer of the control unit into the Application of the reduction function. rA deter-
(re)starting state sA , and a placement of the work-         mines by the current situation the current reduc-
ing head on the left sentinel «. It means, that the ro-      ing sequence Rs. If RS is non-empty, A controlled
bust analyzer, similarly as reducing automaton, works        by Rs reduces (deletes) determined items from the
58      Martin Procházka, Martin Plátek

  working list. After such a reduction A finishes the     of any word w ∈ Σ ∗ is contained in some correct core
  step by a restart, i.e., moves the working head on      of this word.
  the left sentinel «, and transfers the control unit         Prefix consistence is the longest correct prefix v of
  into the (re)starting state sA . Then a new stage       the analyzed word w. Prefix inconsistence is the short-
  will be started on the reduced list.                    est incorrect prefix of the analyzed word w, i.e., it is
  If Rs is empty then A does not perform any reduc-       the prefix va of w, where a ∈ Σ. Post-prefix consis-
  tion, neither the restart, and it finishes the step by  tence is a suffix x of a correct core behind (to the right
  the following action.                                   of) the prefix consistence, or behind some of the pre-
Application of the transition function. tA deter-         vious post-prefix consistencies. We assign to the post-
  mines by the current situation the new state q.         prefix consistency x the incorrect sub-word xa of w.
  Then A continues from the state q by a further          We say that xa is a post-prefix inconsistence of w (with
  step of the current stage.                              respect to L).
                                                              Our effort in the following is to deterministically,
     The last difference of A from reducing automata
                                                          in a monotonic way, and exactly to localize the prefix
consists in the fact that A contains only one halt-
                                                          and post-prefix (in)consistencies in the analyzed words
ing state END ∈ SA . We will see that the signs of A
                                                          from DCFL.
will refine the ability of accepting and rejecting by the
states ACC and ERR of reducing automata. For this
purpose we observe the signs ! and ? inserted in the 3.2 Post-prefix robust analyzer A
different stages of the computation into the gradually
                                                          Prefix consistence. A red-automaton M is prefix-
reduced working list. We will project the signs into
                                                          consistent when for each word u and each symbol a
the original input list in such a way that we will in-
                                                          (including the right sentinel) it holds the following: if
sert the signs into the same positions (i.e. before the
                                                          ∆M (sM , u) ∈ SM and ∆M (sM , ua) 6= ERR, then ua is
same items) into which they were inserted during the
                                                          a prefix of some word from L(M ) · {»}.
individual stages of the computation (robust analysis)
                                                              The following proposition is derived from the main
by A.
                                                          result from [2]. The detailed proof is in [6]. There is
     We denote by pA (w) a word w enriched by the
                                                          also connected with some other propositions.
signs (in the way mentioned above) inserted into the
list during the analysis by the robust analyzer A. We Proposition 1. Monotone, prefix consistent, red-
will later formulate our results using this denotation. automata characterize the class of DCFL.
As an example see Fig.1. We can consider pA (w) as
the output word of A.                                         It is shown in [7] that the notion of red-automata is
                                                          useful for the techniques of minimization. There is to
                                                          any red-automaton M constructed an unambiguously
3.1 Prefix and post-prefix (in)consistencies
                                                          determined state-minimal red-automaton Mm which
Assumption. We assume in the following that is strongly equivalent with M .
L ⊆ Σ ∗ , and any symbol of Σ is a symbol of some             We will show informally in the next part a method
word from L.                                              how construct for a given monotone, prefix-correct,
     We call a word v inconsistent (incorrect) with re- state-minimal reducing automaton M a robust ana-
spect to the language L ⊆ Σ ∗ , if for any u, w ∈ Σ ∗ is lyzer A which determines in any word w ∈ «ΣM       ∗
                                                                                                              » the
uvw 6∈ L.                                                 prefix-(in)consistence, and (not obligatory all) post-
     We can see that incorrect words can obtain proper prefix (in)consistencies. We suppose for the construc-
incorrect sub-words. This fact lead us to the following tion that L(M ) 6= ∅.
notion.                                                       At first A will use the prefix-consistency of M for
We say that a word v is an incorrect core of the word w the finding of the prefix-(in)consistency of the ana-
with respect to the language L, if it is a subword of w, lyzed word w.
if it is incorrect with respect to the language L, and if     Such a situation can occur after one, or after more
it is minimal by the ordering “to be a sub-word”.         stages if M will be transfered into the final rejecting
     On the other hand, a word v is a correct sub-word state ERR. The computation (analysis) of M on the
of a word w with respect to the language L, if w = xvy, word w until this moment we describe in the following
and for some x′ , y ′ is x′ vy ′ ∈ L. We say that v is way:
a correct core of a word w with respect to the language 1) At first M (possibly) gradually reduces the word w
L, if it is a correct sub-word of w with respect to L, into the word w′ , i.e., w ⇒∗M w′ .
and it is maximal by the ordering “o be a sub-word”. 2) Then in the next stage M transfers over some pre-
The assumption that each symbol of Σ is a symbol of fix x of the word w′ into some non-final state s ∈ SM ,
                                                                 ∗
some word of the language L ensures that each symbol i.e., δM      (sM , x) = s ∈ SM ,
                                                                                 Localization of (in)consistencies . . .      59

3) Finally from the state s transfers over the next sym-           its working head. The automaton A will look for a new
bol a into the final state ERR, i.e., δM (s, a) = ERR.             post-prefix (in)consistency behind (to the right from)
     We can see that A has founded by the previous sim-            the currently inserted sign !. A will take instead of
ulation of M the prefix inconsistency of w. For mark-              the set {ERR} as the current value of the set S the
ing of the prefix inconsistency A inserts the sign !               set δM (SM , a), where a is the symbol scanned by the
between the correct prefix «x and the symbol a.                    working head. A will continue in the robust analysis
     The prefix-consistency of M ensures that M has                of the remaining suffix by the schema of the cycle C1 .
visited in the last step described above the symbol a                   An ambiguous reduction. S does not contain
at the first time. Therefore if w′ = «xay for some y               ACC, and either does contain two different reducing
then ay is a suffix of the original input word w.                  states of M , or does contain at least one non-final
     Let us now informaly describe how A continues in              state, and at least one reducing state; i.e.,
the robust analysis over the mentioned suffix ay of the            ACC 6∈ S, and ∃n : RED(n) ∈ S 6⊆ {RED(n), ERR}.
word w.                                                            We say that S fulfilling the condition above is an am-
     We will use the function δM for this aim. This func-          biguous set.
tion was introduced as an enhancement of the transi-               The task for A is to work without false inconsistency
tion function fM . It describes not only the transfers             messages. From that reason A separates the ambigu-
between the individual states, but also the tranfers be-           ous part from the remaining suffix. It inserts the sign
tween the indiviual subsets of the set SM ∪FM ∪{RED},              for the ambiguity ? in the place of the current ambi-
i.e., of the set of all final and non-final states, and of         guity, i.e., immediately to the left from the position
a special state RED. We will use it in the following in            of the working head (if the sign is not already placed
order to describe the all possible (partial) computation           there in some of the previous stages). At this moment
of M over the suffix ay at the same time.                          A takes for the set S the complete set SM , and con-
     We let A to compute the function δM over the suffix           tinues in the robust analysis behind the sign ? by the
ay = a0 a1 . . . a|y| starting from the set SM of all non-         scheme of the cycle C1 .
final states of M . A will control the computation in                   An unambiguous reduction. The set S con-
the following way. Let us initially take the set SM as             tains exactly one reducing operation, and possibly be-
a set further denoted as SI .                                      side it the final state ERR; i.e., ∃n : RED(n) ∈ S ⊆
     Let us denote the following part of the computation           {RED(n), ERR}.
of A as a cycle C1 . The cycle C1 is performed until for                Let us denote as u the sub-word which is created
                 ∗                                                 by the input symbols positioned between the last sign
the set S = δM     (SI , a0 . . . ai ), where 0 ≤ i ≤ |y|, holds
that ∅ ⊂ S ⊆ SM ∪ {ERR}, and S contains some non-                  ! or ?, and the position of the working head includ-
final state. Then A performs the following action: the             ing the scanned symbol. The sub-word u is because of
head of A will be placed to the next item to the right,            the prefix-consistency, and because of the state min-
and as (the current value of) the set S will be taken              imality of M a sub-word of some word from L(M ).
the set δM (S, ai+1 ). Here ends the description of C1 .           Moreover, the u is reduced in any word w ∈ L(M ) of
     The core of the post-prefix analysis by A are the             the form w = vux by the reducing seguence n, i.e, the
following four cases where is not fulfilled the condition          reducing sequence and the position of the reduction
for the continuation of the computation by cycle C1 .              are determined unambiguously. A will reduce also by
     Correct suffix. The set S contains the accepting              n, but only the symbols from u if we consider the case
state ACC; i.e., ACC ∈ S.                                          that n can be longer then u.
If ACC ∈ S, then the current suffix of the analyzed                     Observation. The reducing sequence n deletes at
word w by A is a suffix of some word from L(M ).                   least one symbol from u. This observation follows from
Therefore, the current suffix cannot contain any fur-              the unambiguity of the reduction of u. Let us sup-
ther inconsistency. The work of A on w is finished at              pose the opposite. Then for some z, and n′ is n =
this moment.                                                       n′ · 0|u| , and δM ∗
                                                                                        (sM , zu) = RED(n).
                                                                          ′
     An unambiguous inconsistency. The set S con-                  Let z be the shortest z with the properties described
tains a single state – the rejecting state ERR; i.e.,              above. Since M is prefix-correct «z ′ u is a prefix of some
S = {ERR}.                                                         word from «L(M )». In the next stage occurs one of the
                                                                   following variant:
    All the possible computations of M over the
                                                                         ∗
word w behind the previous inconsistency has ended                      δM (sM , (z ′ /n′ ) · u) = RED(n′′ )   for some n′′
at the same time in the state ERR. We have found                         ∗
                                                                        δM (sM , (z ′ /n′ ) · u) = RED
a suffix of a correct core of the analyzed word, i.e., one               ∗
                                                                        δM (sM , (z ′ /n′ ) · u) = ACC
of its post-prefix (in)consistency. At this moment A                     ∗
inserts the sign ! immediately before the position of                   δM (sM , (z ′ /n′ ) · u) ∈ SM
60      Martin Procházka, Martin Plátek

Each of the presented variant leads to a contradiction       pair of inserted set U is bounded by the characteristic
with the unambiguity of the reduction of the word u.         constant kM which ensures that U is finite.
If occurs the first one, then n′′ reduces some symbol of        Let us note that A stores in its finite control a suit-
the sub-word u (since z ′ cannot be shorter), therefore      able suffix of its working list before the position of its
n′′ 6= n and RED(n′′ ) ∈ S 6⊆ {RED(n), ERR}. By the          working head. The length of this suffix need not be
remaining variants is the contradiction obvious.             longer then 2 · kM . It contains the input items, and
    Apart from the reduction of u by n, A will insert        the inserted values of the set S in the last 2 · kM steps.
into the list a new item with an auxiliary symbol – the
set U of pairs of an internal state and a word over an         Recall, that we suppose that the automaton M is
input alphabet of the length kM at most. This auxil-         minimal and prefix-consistent. The minimality of M
iary symbol will be used in the next stage to adjust the     ensures that each state of M is reachable. The state-
set of states computed by the function δM . Our goal is      reachability, and the prefix-consistence of the automa-
to avoid situation when δM  ∗                  ∗
                              (s, u) = ERR 6= δM (s, u/n)    ton M ensure for any word u, that u is a sub-word of
                                                                                       ∗
for some s ∈ SM . Such internal states s must be elim-       some word of L(M ), if δM   (SM , u) 6⊆ {ERR, RED}.
inated. The set U is defined by the following way:               If A inserts the sign ! or ? immediately before the
                                                             right sentinel » it finishes its computation. Since we
 – If |u| ≥ |n|, then A reduces the working list of suppose that L(M ) is non-empty, is » a suffix of some
   items by n and A puts a new item with the auxil- word of the language «L(M )» (i.e.,L(M ) with sen-
   iary symbol U just in front of the leftmost deleted tinels).
   item. U = {(s, λ) | ∃s′ ∈ SM : δM        ∗
                                               (s′ , u1 ) =
             ∗
   s and δM (s, u2 ) = RED(n)} where u = u1 u2 and            Now we have outlined the behavior of A in the first
   |u2 | = |n|.                                              stage after the localization of the prefix-inconsistence.
 – If |u| < |n|, then A cannot reduce by the whole n as In the next stages we need also to consider the signs
   such a reduction would impact a part of the work- and the other auxiliary symbols inserted in the previ-
   ing list in front of u; this part would be reduced by ous stages. We will not describe here these details.
   n1 such that n = n1 n2 and |n2 | = |u|. But a part
   of the working list in front of the rightmost marker        We illustrate the outlined method by the following
   ! or ? can be reduced in some word of L(M ) in example.
   other way or even not at all. So, A will reduce Example 1. We explain the presented method by two
   items behind the rightmost marker ! or ? by the prefix-consistent, state-minimal mon-red-automata.
   reducing sequence n2 and it will insert a new item We will present two different robust analyses of the
   containing an auxiliary symbol U just to the right following inconsistent word «a + +(a)+)(a».
                                                         ∗
   of the rightmost marker. U = {(s, x) | ∃v ∈ ΣM          :     The transition functions of the automata M1
                               ∗
   x = v/n1 a |v| = |n1 | a δM (s, vu) = RED(n)}.            and M from [6] which we use in this example are
                                                                    3
Insertion of the set U into the working list is impor-       defined by the tables in the figure 1. We do not present
tant as it ensures the continuity of subsequent stages       here the automaton M2 from [6].
of computation. In next stage, A will use this set to            The robust reduction analyses of the considered
adjust the set S of internal states computed by func-        word are on figures 1a, and 1b. Both figures contain
tion δM . As soon as A reach the item with U , it substi-    also the input word enriched by the signs ! and ? on
tute S by S ′ = {δM ∗
                      (s, v) | (s, v) ∈ U }. It guarantees   the corresponding places.
that A enter a part of the list impacted by the last             Let us note that in [6] is a detailed description
reduction in such states only that led to the last re-       of a construction which constructs to a given prefix
duction of u by n resp. n2 .                                 consistent, state minimal mon-red-automaton M its
    A uses just defined set U in such a case only when       robust analyzer A. This construction implements the
the new item with U is inserted just behind an item          outlined method. The robust analyzer A is by this con-
containing a symbol of the input alphabet or a marker.       struction given unambiguously for a given M .
Otherwise, when this item contains an auxiliary sym-
bol U ′ different from both markers, then (instead of
insertion of U ) A replaces U ′ with U computed in the       3.3   Guarantees of the presented method
following way:                                               We formulate the guarantees of the presented method
U = {(s, x) | ∃y, (s′ , x′ ) ∈ U ′ : x = yx′ /n1 , and       as theorems. The detailed proofs can be found in [6].
 ∗
δM  (s, y) = s′ , and δM∗
                            (s′ , x′ u) = RED(n) , andTheorem 1. Let M be a prefix consistent, state-
                          ′
|y| = max{0, |n1 | − |x |}},                          minimal mon-red-automaton and A its post-prefix ro-
where n1 is a prefix of n of the length |n| − |u|. In bust analyzer. For any w ∈ «Σ ∗ » the following propo-
all cases, the length of the word x contained in any sitions hold:
                                                                                                                         Localization of (in)consistencies . . .   61

                                                                                                            1. The analyzer A reads (in one or more stages) the
                              a        +                    (           )               »
                                                                                                               complete word «w» and finishes its computation
                   ⇒ s0 s1 ERR      s2 ERR       ERR
                                                                                                               in a special state END. Any computation of A is
                     s1 ERR RED(11) ERR ERR      ACC
                                                                                                               monotone.
                     s2 s3 ERR      s2 ERR       ERR
                                                                                                            2. If pA («w») does not contain any sign !, then
                     s3 ERR s4      ERR RED(101) ERR
                     s4 s5 ERR      s2 ERR       ERR                                                           pA («w») = «w» and w is from L(M ).
                     s5 ERR s4      ERR RED(110) ERR                                                        3. If «w» ∈ «L(M )» then pA («w») = «w».
                                                                                                            4. If «u! is a prefix of pA («w») and u does not con-
                   (a) automaton M1 reducing the word
                                                                                                               tain the sign ! then u does not contain the sign ?
                   a+ without brackets around it “from
                                                                                                               as well, and «u is the longest correct-prefix of the
                   the left” and the +a with brackets
                   around “from the right”                                                                     word «w» with respect to the language «L(M )».
                                                                                                            5. If !u! or ?u! is a sub-word of the word pA («w»)
                              a        +                    (           )               »                      and u does not contain any sign ! or ? then u is
                   ⇒ s0 s1 ERR      s2 ERR       ERR
                                                                                                               a suffix of some corect core of the word «w» with
                     s1 ERR RED(11) ERR ERR      ACC                                                           respect to the language «L(M )».
                     s2 s3 ERR      s2 ERR       ERR                                                        6. If !u» or ?u» is a suffix of the word pA («w»),
                     s3 ERR RED(11) ERR RED(101) ERR                                                           and u does not contain any sign ! or ? then u» is
                                                                                                               a sufix of some word from «L(M )».
                   (b) (strongly) monotone automaton
                                                                                                            7. If !u? or ?u? is a sub-word of pA («w») and u does
                   M3 reducing a+ only “from the left”
                                                                                                               not contain any sign ! or ? then u is a sub-word
Table 1: The transition functions for M1 and M3 .
                                                                                                               of some word from «L(M )».

                                                                                                                We will discuss the meaning of the sign ?, and we
                                                                                                            will formulate properties of the mon-red-automaton M
          « a +                +           ( a ) +                          )       ( a           »
          s0 s1RED(11)
                                                                                                            which ensure that its post-prefix robust analyzer A
                                                                                                            does not use the sign ? at all. This sign serves as the
          «                 ! + ? ( a ) +                                   )       ( a           »         right sentinel for the correct sub-words of L(M ) which
          s0                   s4          s2 sRED(101)
                                               3
                             RED(11)
                                                                                                            lead A to some of two following types of a reduction
          «                 ! + ?               a               + ? ) ? ( a ! »                             conflict. Let u be such a sub-word which is followed by
          s0               SM s SM              s1         s4                       s2 s3
                                4
                            RED(11)             s3
                                                s5
                                                                 RED(11)
                                                         RED(11) RED(101)
                                                                                              ACC
                                                                                                            ?. The first type of the conflict means that there are
          0        1   2       3           4    5       6       7           8       9   10     11
                                                                                                            two different words of L(M ) containing u which lead
          « a + ! + ? ( a ) + ? ) ? ( a ! »                                                                 M by reading the complete u and its prefixes to two
                           (a) The robust analyzer of M1                                                    different reductions. The second type of the conflict
                                                                                                            means that there is a transfer trough u by M which
                                                                                                            leads to a reduction, and at the same time there is an
« a +                          + ( a ) +                                        )        ( a          »     another transfer leading to the shift to the right of M
s0   s1RED(11)                                                                                              from u to the next symbol.
                                                                                                                Now we gradually introduce the notions of unam-
«                      ! S′ + ( a ) +                                           )        ( a          »     biguously reducible sub-word and unambiguously re-
s0                           RED(11)
                                                                                                            ducing red-automaton, and we will show that a post-
«                      !
                      S′               ( a ) +                                  )        ( a          »     prefix robust analyzer A of M which is unambiguously
s0                 SM S ′          s2      s3RED(101)                                                       reducing, need not to use the sign ? at all.
                                                                                                                We say that a sub-word w is reducible by M if for
«                      !
                      S′                    a               +                   )        ( a          »     some n holds that RED(n) ∈ δM     ∗
                                                                                                                                                (SM , w).
s0                 SM S ′                  s1
                                           s3           RED(11)
                                                                                                                We say that a sub-word w is unambiguously re-
«                      !
                      S′                    a                       ! S ′′ )             ( a          »     ducible by M if for some n holds that         RED(n) ∈
                                                                                                              ∗
s0                 SM S ′                  s1
                                           s3                               RED(101)                        δM  (SM , w) ⊆ {RED(n), ERR}.
                                                                                                                We say that a sub-word which is reducible but it
«                      !
                      S′                    a                       !
                                                                   S ′′             ! ( a ! »               is not unambiguously reducible is an ambiguously re-
s0                 SM S ′                  s1                   SM S ′′                 s2   s3       ACC
                                           s3
                                                                                                            ducible sub-word.
0     1        2               3       4    5       6       7                   8        9   10       11
                                                                                                                We say that M is unambiguously reducing if any of
« a + !                        + ( a ) + !                                      ) ! ( a ! »                 its reducible sub-words is unambiguously reducible.
                           (b) The robust analyzer of M3                                                    Example 2. The automaton M1 from the example 1
     Fig. 1: The robust analysis of «a++(a)+)(a».                                                           is not unambiguously reducing. All its ambiguously
                                                                                                            reducible sub-words are presented in Table 2a. Let us
62      Martin Procházka, Martin Plátek

note the length of this words is not limited by any             create a very special type of (in)consistencies consid-
constant, since for any i ≥ 0 holds that                        ered by the method presented in this paper. In the
    ∗
   δM 1
        (SM1 , (+a)i +) = {ERR, RED(110), s4 }.                 close future we will show that the set of languages rec-
   The minimal unambiguously reducible sub-words                ognized by unambiguously reducing, prefix consistent,
of M1 are in Table 2b.                                          state-minimal mon-red-automata creates a proper
                                                                subclass of DCFL.

 sub-word w )              a)       +       a+      +a+ . . .
  ∗
 δM 1
      (S M 1 , w) RED(110) RED(110) RED(11) RED(11) RED . . .   References
                  RED(101) RED(101) s4      s4      s4
           (a) ambiguously reducible sub-words                   1. G. V. Cormack: An LR substring parser for noncor-
                                                                    recting syntax error recovery. In: Proc. of PLDI ’89,
         sub-word w +a)           (a)      «a+                      1989, 161–169.
         ∗
        δM   (S     , w) RED(110) RED(101) RED(11)               2. P. Jančar, F. Mráz, M. Plátek, J. Vogel: Restarting au-
           1    M 1
                                                                    tomata. In Proc. FCT’95, Dresden, Germany, August
       (b) minimal unambiguously reducible sub-                     1995, LNCS 965, Springer Verlag 1995, 283–292.
       words                                                     3. F. Otto: Restarting automata. In: Z. Ésik, C. Martin-
         Table 2: Reducible sub-words of M1 .                       Vide, and V. Mitrana (Eds.), Recent Advances in For-
                                                                    mal Languages and Applications, Studies in Compu-
                                                                    tational Intelligence, Vol. 25, Springer, Berlin, 2006,
                                                                    269–303.
    We can see that in this example the robust ana-              4. Gh. Pǎun: Marcus Contextual Grammars, Kluwer,
lyzer of M3 has founded all inconsistencies in the word             Dordrecht, Boston, London, 1997.
«a + +(a)+)(a»., i.e., it has not used the sign ? at all.        5. M. Plátek: Construction of a robust parser from a de-
This example illustrates the fact that M3 is an unam-               terministic reduced parser. Kybernetika 33(3), 1997,
biguously reducing red-automaton. We can see that                   311–332.
                                                                 6. M. Procházka: Redukční automaty a syntaktické chyby.
directly from Table 1b.
                                                                    (in Czech) Text for PhD Dissertation, it will be to
    The meaning of the reducing unambiguity for the                 achieve soon on the web.
                                                                 7. M. Procházka: Concepts of syntax error recovery for
localization of the post-pefix inconsistencies summa-
                                                                    monotonic reducing automata. MIS 2004, 94–103,
rizes the following theorem.
                                                                    http://ulita.ms.mﬀ.cuni.cz/pub/MIS/MIS2004.pdf
Theorem 2. Let M be a prefix consistent, state-mini-             8. M. Procházka, M. Plátek: Redukční automaty – mono-
                                                                    tonie a redukovanost. ITAT 2002, 23–32.
mal mon-red-automaton and A its post-prefix robust
                                                                 9. M. Procházka, M. Plátek: Redukční automaty a syn-
analyzer. If M is at the same time unambiguously re-                taktické chyby. Proceedings of ITAT 2011 (in Czech),
ducing then its post-prefix robust analyzer A does not              2011, 23–30.
use the sign ? at all. That means, that A in any word
          ∗
from «ΣM    » determines the prefix inconsistency and
all its post-prefix inconsistencies with respect to the
language L(M ).

Corollary 1. Let M be a mon-red-automaton which
is at the same time prefix-consistent and state-
minimal, and A be its robust analyzer. Then there is
a deterministic push-down transducer which translates
                    ∗
any word w from ΣM    on the word pA (w).


4    Conclusion
The presented method can be considered as a direct
generalization of the method presented in [9], and as
an essential refinement and a generalization of the
method from [5]. The method in [9] is based on mono-
tone reducing automata, the method in [5] is based
on (monotone) list automata with auxiliary symbols.
Both the methods are based on the so called head-
symbols. The head-symbol (in)consistencies from [9]