On Reducing Automata and Their Normalizations Martin Prochรกzka Abstract The reducing automaton, a variant of the restarting automaton, is introduced and its normalizations are studied. They provide useful features such as prefix-correctness and state minimality. The LR(0) grammar generating the same language is constructed for any monotone reducing automaton. This grammar is used to construct an equivalent monotone reducing automaton that is prefix-correct. The minimization of a set of states of any reducing automaton using the method already invented in the theory of Moore machines is described. Both normalizations can be applied to the monotone reducing automaton sequentially so that the obtained automaton is both prefix-correct and state-minimal. Keywords reducing automata, LR(0) grammars, prefix-correctness, state-minimality 1. Introduction Figure 1: Reducing automaton with a state ๐‘  in its control unit and a working head scanning an item of the working list This paper is a revised translation of the authorโ€™s disser- at position 2 that contains a symbol + . tation thesis [1]. Some parts are reworded, some supple- mented, others omitted. The reducing automaton (first mentioned in [2]), like 0 1 2 3 4 5 6 7 8 its predecessor and ideal, the restarting automaton (see ยซ a + ( ( a ) ) ยป [3]), is a device suitable for describing a syntax of both formal and natural languages. It is based on the notion of reduction analysis, i.e., a gradual truncation of the an- alyzed string that preserves both its incorrectness and ๐‘  correctness1 . The following example of a gradual sim- plification of the arithmetic expression a + ( ( a ) ) shows that the reduction analysis is quite natural. a+((a)) state-minimal reducing automaton (Section 4). On the a+( a ) other hand the mentioned differences does not prevent a+ a us to adopt results reached for restarting automata as a any reducing automaton can be simulated by a restarting automaton and vice versa while preserving (combina- The reducing automaton differs from the restarting au- tion of) properties like determinism or monotony. Here tomaton in several details: 1. It lacks a fixed size looka- we focus on monotone reducing automata, which recog- head window that is actually moved to the control unit. nize deterministic context-free languages, as shown for 2. The sequence of the head movement and the state their predecessors, deterministic monotone restarting change is reversed. 3. Positions of the reduced work- automata in [3]. list items are precisely determined. It allows us to reuse some of the concepts and methods invented in standard grammar and automata theory and simplifies the tech- 2. Basic notions and properties niques presented here: (a) construction of an LR(0) gram- mar that generates the same language as a given mono- The reducing automaton is shown in Figure 1. It processes tone reducing automaton (Section 3), (b) construction a finite list of items. The first and the last item of the list of an equivalent prefix-correct monotone reducing au- contains the delimiter ยซ and ยป respectively. Any other tomaton, and (c) construction of a strongly equivalent item contains a symbol of the finite input alphabet dif- ferent from both delimiters and a natural number, the ITATโ€™22: Information technologies โ€“ Applications and Theory, Septem- ber 23โ€“27, 2022, Zuberec, Slovakia position of the item in the list. The positions of the items Envelope-Open martproc@gmail.com (M. Prochรกzka) from left to right form an increasing sequence. The posi- ยฉ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License tion of any item does not change during the computation. Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 The reducing automaton resembles a finite-state automa- 1 Restarting automata preserve correctness if they are deterministic. ton. It consists of a control unit and a working head. At Reducing automata are deterministic by definition, as we shall see below. any given time the control unit is in one of the finitely Figure 2: Reducing the working list. in the initial state with the head on the left delimiter and finishes in any final or reducing state. Formally, we define the reducing automaton ๐‘€ (red- 0 1 2 3 4 5 6 7 8 automaton, for short) as a 7-tuple ยซ a + ( ( a ) ) ยป ๐‘€ = (ฮฃ, ยซ, ยป, ๐‘†, ๐‘ 0 , ๐น , ๐‘“ ), where ฮฃ is the finite input alphabet, ยซ,ยป โˆ‰ ฮฃ is the left and right delimiter of an input word, respectively, ๐‘† is the finite set of transition states, ๐‘ 0 โˆˆ ๐‘† is the initial state, ๐น RED (101 ) is the finite set of final and reducing states, ๐‘“ โˆถ ๐‘† ร— (ฮฃ โˆช {ยป}) โŸถ (๐‘† โˆช ๐น ) is the transition function. Sets ๐น and ๐‘† are disjunctive. ๐น optionally contains accepting and rejecting final states ACC and ERR and finitely many reducing states 0 1 2 3 5 7 8 of a form RED (๐‘›), where ๐‘› is a sequence of 0 and 1 that ยซ a + ( a ) ยป begins with 1 (๐‘› โˆˆ 1 โ‹… (1 |0 )โˆ— ). 0 means โ€œleave a symbol at a corresponding position in a working listโ€, 1 means โ€œdelete a symbol at a corresponding position from a working listโ€. If ๐‘“ (๐‘ , ๐‘Ž) = ๐‘  โ€ฒ , we say that the automaton ๐‘€ moves from the state ๐‘  over the symbol ๐‘Ž to the new state ๐‘  โ€ฒ , or also ๐‘ 0 that the transition function switches the control unit of the automaton from the state ๐‘  over the symbol ๐‘Ž to the new state ๐‘  โ€ฒ . We extend the transition function ๐‘“ to the domain many states and the working head reads one item of the (๐‘† โˆช๐น โˆช{RED })ร—(ฮฃโˆ— โ‹…{๐œ†, ยป}), where RED is the new auxiliary list. The automaton starts its computation in the initial state different from all states from ๐‘† and ๐œ† is an empty state with the head placed on the very first item of the word. We mark the new extended function as ๐›ฟ. The list containing the left delimiter ยซ. At each step of the function ๐›ฟ equals to the transition function ๐‘“ for all pairs computation the automaton moves its working head to (๐‘ , ๐‘Ž) โˆˆ ๐‘† ร— (ฮฃ โˆช {ยป}). For any other pair from its domain the right onto the next item and changes its state accord- it is defined as follows: ing to the current state and the symbol contained in the scanned item of the list. The new state is determined by ๐›ฟ(ACC , ๐‘Ž) = ACC ๐›ฟ(RED (๐‘›), ๐‘Ž) = RED a transition function. ๐›ฟ(ERR , ๐‘Ž) = ERR ๐›ฟ(RED , ๐‘Ž) = RED Some states of the automaton are final, some others are reducing. The final states are ACC , the accepting state, We define the reflexive and transitive closure ๐›ฟ โˆ— of the and ERR the rejecting state. Reaching the ACC state means function ๐›ฟ in the usual way. We consider only automata finishing the computation and accepting the word con- with the extended transition function satisfying the fol- tained in the working list. By switching into the ERR lowing conditions for any ๐‘  โˆˆ ๐‘†, ๐‘Ž โˆˆ ฮฃ โˆช {ยป}, ๐‘  โ€ฒ โˆˆ ๐‘† โˆช ๐น, state, the automaton terminates the computation by re- and ๐‘ข โˆˆ ฮฃโˆ— โ‹… {๐œ†, ยป}: jecting the word being processed. RED (๐‘›), ๐‘› โˆˆ 1 โ‹… (1 |0 )โˆ— , is the reducing state. Just after transition into this state, the ๐›ฟ(๐‘ , ยป) = ๐‘  โ€ฒ โŸน ๐‘ โ€ฒ โˆˆ ๐น automaton truncates the working list as prescribed by ๐›ฟ(๐‘ , ยป) = RED (๐‘›) โŸน 0 is the suffix of ๐‘› the reducing sequence ๐‘›, moves its working head to the ๐›ฟ(๐‘ , ๐‘Ž) = ACC โŸน ๐‘Ž=ยป beginning of the working list, and transfers its control ๐›ฟ โˆ— (๐‘ 0 , ๐‘ข) = RED (๐‘›) โŸน |๐‘ข| โ‰ฅ |๐‘›| unit to the initial state. The reducing sequence deter- Characteristic constant of the automaton ๐‘€ is the mines items to be removed from the list in the following length of the longest binary sequence contained in re- way: If its ๐‘–-th symbol from the right equals to 1 then the ducing states of the automaton ๐‘€, automaton removes the ๐‘–-th item from the list counting from the item under the working head to the left (count- ๐‘˜(๐‘€) = max {|๐‘›| โˆฃ RED (๐‘›) โˆˆ ๐น} (1) ing the item under the working head as the first one). If it equals to 0 the corresponding item remains in the list. We describe the reduction of a word by a reduction se- An example of truncation of the word ยซ a+((a)) ยป by the quence using the reduction operation / which we define reducing sequence 101 can be seen in the figure 2. Final as follows: state closes a certain stage of automaton computation. By ๐‘Ž/0 = ๐‘Ž ๐œ†/๐‘› = ๐œ† (๐‘ข โ‹… ๐‘Ž)/๐œ† = ๐‘ข โ‹… ๐‘Ž stage, we mean any part of the computation that starts ๐‘Ž/1 = ๐œ† ๐‘ข/๐œ† = ๐‘ข (๐‘ข โ‹… ๐‘Ž)/(๐‘› โ‹… ๐‘–) = (๐‘ข/๐‘›) โ‹… (๐‘Ž/๐‘–) where ๐‘ข โˆˆ ฮฃโˆ— , ๐‘Ž โˆˆ ฮฃโˆ— โˆช {ยป}, ๐‘› โˆˆ (10 โˆ— )โˆ— and ๐‘– โˆˆ {0 , 1 }. Here ๐›ฟ1โˆ— (๐‘ 1 , ๐‘ข) = ACC โŸบ ๐›ฟ2โˆ— (๐‘ 2 , ๐‘ข) = ACC we restrict neither the word ๐‘ข nor the reduction sequence ๐›ฟ1โˆ— (๐‘ 1 , ๐‘ข) = ERR โŸบ ๐›ฟ2โˆ— (๐‘ 2 , ๐‘ข) = ERR ๐‘› by any constant, ๐‘ข can be longer than ๐‘› and vice versa. The truncation of the words (a) , +a ยป by the reduction Using the reduction relation, we can express a basic sequences 101 , 110 respectively looks like this: property of reducing automata called error and correctness preserving property. ( a ) + a ยป / 1 0 1 / 1 1 0 Lemma 2.1. If ยซ๐‘ค1 ยป โ‡’๐‘€ ยซ๐‘ค2 ยป, then ๐‘ค1 โˆˆ ๐ฟ(๐‘€), iff = a = ยป ๐‘ค2 โˆˆ ๐ฟ(๐‘€). Based on the reduction operation, we introduce reduction Proof. Letโ€™s assume that ยซ๐‘ค1 ยป โ‡’๐‘€ ยซ๐‘ค2 ยป. 1. If ๐‘ค2 โˆˆ relation. This allows us to describe how the automaton ๐ฟ(๐‘€), then ยซ๐‘ค2 ยป โ‡’โˆ—๐‘€ ยซ๐‘คยป โˆˆ ยซ๐ฟ0 (๐‘€)ยป for some ๐‘ค and successively rewrites the processed word ๐‘ค โˆˆ ฮฃโˆ— . The ยซ๐‘ค1 ยป โ‡’๐‘€ ยซ๐‘ค2 ยป โ‡’โˆ—๐‘€ ยซ๐‘คยป is the accepting analysis for the reducing automaton ๐‘€ reduces the word ยซ๐‘คยป to the word word ๐‘ค1 . 2. If ๐‘ค1 โˆˆ ๐ฟ(๐‘€), then ยซ๐‘ค1 ยป โ‡’โˆ—๐‘€ ยซ๐‘คยป โˆˆ ยซ๐ฟ0 (๐‘€)ยป ยซ๐‘ค โ€ฒ ยป, for some ๐‘ค. This analysis starts by ยซ๐‘ค1 ยป โ‡’๐‘€ ยซ๐‘ค2 ยป, ยซ๐‘คยป โ‡’๐‘€ ยซ๐‘ค โ€ฒ ยป, otherwise ๐‘€ wouldnโ€™t be deterministic. So, we have ยซ๐‘ค2 ยป โ‡’โˆ—๐‘€ ยซ๐‘คยป โˆˆ ยซ๐ฟ0 (๐‘€)ยป. if ๐›ฟ โˆ— (๐‘ 0 , ๐‘ข) = RED (๐‘›), ๐‘คยป = ๐‘ข๐‘ฃ, ๐‘ค โ€ฒ ยป = (๐‘ข/๐‘›)โ‹…๐‘ฃ for some ๐‘ข โˆˆ ฮฃโˆ— โ‹… {๐œ†, ยป}. Obviously, |๐‘ค| > |๐‘ค โ€ฒ | and the above reduction Similarly to [3], we introduce monotony for reducing is shortening. We refer to the reflexive and transitive automata. A reducing automaton is monotone reducing closure of the reduction relation as โ‡’โˆ—๐‘€ . By reduction automaton (mon-red-automaton, for short), if โ€“ at any analysis of the automaton ๐‘€ we mean any sequence of stage โ€“ it visits all the items visited at the previous stage reductions that remain in the working list. Formally, the reducing automaton ๐‘€ is monotone if the following condition is ยซ๐‘ค1 ยป โ‡’ ยซ๐‘ค2 ยป โ‡’ โ€ฆ โ‡’ ยซ๐‘ค๐‘› ยป, satisfied for each ๐‘ค โˆˆ ฮฃโˆ— โ‹… {๐œ†, ยป}: which cannot be extended further. If ๐›ฟ โˆ— (๐‘ค๐‘› ยป) = ACC , ๐›ฟ โˆ— (๐‘ 0 , ๐‘ค) = RED (๐‘›) โŸน ๐›ฟ โˆ— (๐‘ 0 , (๐‘ค/๐‘›)) โˆ‰ {ERR , RED } we are talking about an accepting reduction analysis, otherwise it is a rejecting reduction analysis. We often use Monotony is important for characterizing the deter- the shorter term analysis instead of reduction analysis. ministic context-free languages (DCFL for short). De- The words over the alphabet ฮฃ accepted by the reduc- terministic monotone restarting automata (det-mon-R- ing automaton ๐‘€ in one stage form its simple language automata) recognize just all deterministic context-free languages. In the same way, we can characterize the ๐ฟ0 (๐‘€) = {๐‘ค โˆˆ ฮฃโˆ— โˆฃ ๐›ฟ โˆ— (๐‘ 0 , ๐‘คยป) = ACC }. DCFL class using mon-red-automata. An R-automaton It is obvious that ๐ฟ0 (๐‘€) is regular. We define the language differs from a red-automaton only in that its working accepted or also recognized by the reducing automaton head is extended by a so-called lookahead window, which ๐‘€ as the set of all words over the alphabet ฮฃ for which scans a continuous subword of fixed length to the right there exists an accepting analysis of the automaton ๐‘€, of the working head, and that it reduces the symbols scanned by the working head and the lookahead win- ๐ฟ(๐‘€) = {๐‘ค โˆˆ ฮฃโˆ— โˆฃ ยซ๐‘คยป โ‡’โˆ—๐‘€ ยซ๐‘ค โ€ฒ ยป โˆˆ ยซ๐ฟ0 (๐‘€)ยป}. dow. Thus, any R-automaton can be simulated by a red- automaton that stores the contents of the lookahead win- Let us assume that ๐‘€1 = (ฮฃ1 , ยซ, ยป, ๐‘†1 , ๐‘ 1 , ๐น1 , ๐‘“1 ) and dow in its control unit. Conversely, any red-automaton ๐‘€2 = (ฮฃ2 , ยซ, ยป, ๐‘†2 , ๐‘ 2 , ๐น2 , ๐‘“2 ) are any reducing automata. ๐‘€ can be simulated by an R-automaton with a lookahead We say that ๐‘€1 and ๐‘€2 are equivalent, if window of size ๐‘˜(๐‘€). It is obvious that at each stage the simulated and simulating automaton visit the same items ๐ฟ(๐‘€1 ) = ๐ฟ(๐‘€2 ). in the working list, so that monotony is preserved. โ€ฒ โˆ— โˆ— Representation. A red-automaton can be represented If for any ๐‘ค, ๐‘ค โˆˆ (ฮฃ1 โˆช ฮฃ2 ) by a transition table in the same way as a finite-state ๐ฟ0 (๐‘€1 ) = ๐ฟ0 (๐‘€2 ) and machine. A transition table of a red-automaton recog- โ€ฒ โ€ฒ nizing a language {a ๐‘› b ๐‘› โˆฃ ๐‘› โ‰ฅ 0} is shown in Table 1. ยซ๐‘คยป โ‡’๐‘€1 ยซ๐‘ค ยป โŸบ ยซ๐‘คยป โ‡’๐‘€2 ยซ๐‘ค ยป, Each state encodes a regular expression mentioned in then these automata are reductionally equivalent. They the first column of the Table. So, a lookahead window of are strongly equivalent if for any prefix ยซ๐‘ข of the word R-automata is in case of red-automata moved into their ยซ๐‘คยป, the following holds states. ๐›ฟ1โˆ— (๐‘ 1 , ๐‘ข) = RED (๐‘›) โŸบ ๐›ฟ2โˆ— (๐‘ 2 , ๐‘ข) = RED (๐‘›) Table 1 Table 2 Transition table. Graphical representation of grammar rules. ๐‘“ a b ยป classical our โ†’ ๐‘ 0 = ยซ ๐‘ 1 ERR ACC rule representation representation ๐‘ 1 = ยซa โˆ— a ๐‘ 1 ๐‘ 2 ERR ๐‘‹ ๐‘Ž ๐‘ 2 = ยซa โˆ— ab ERR RED (110 ) RED (110 ) ๐‘‹ โ†’ ๐‘Ž๐‘Œ โ†™ โ†˜ โ†‘ ๐‘Ž ๐‘Œ ๐‘‹โ†’๐‘Œ ๐‘‹ 3. Grammars ๐‘‹ โ†’๐‘Œ โ†“ ๐‘‹โ†’๐‘Œ ๐‘Œ For any mon-red-automaton ๐‘€, we construct a gram- ๐‘Ž mar ๐บ๐‘€ whose derivation trees correspond closely to ๐‘‹ โ†‘ computations of a given automaton ๐‘€. We show that ๐‘‹ โ†’ ๐‘Ž๐‘Œ๐‘ โ†™โ†“โ†˜ ๐‘‹โ†’๐‘Œ this grammar generates the language recognized by the ๐‘Ž ๐‘Œ ๐‘ โ†“ automaton ๐‘€ and that it is an LR(0) grammar. ๐‘ Letโ€™s assume that ๐‘€ = (ฮฃ, ยซ, ยป, ๐‘†, ๐‘ 0 , ๐น , ๐‘“ ) is a mon- ๐‘‹ ๐‘‹โ†’๐‘Œ ๐‘‹ โ†’ ๐‘Œ๐‘ โ†™ โ†˜ โ†“ red-automaton. The grammar of the automaton ๐‘€ is the ๐‘Œ ๐‘ ๐‘ grammar ๐บ๐‘€ , which is obtained by reducing the grammar ๐‘‹ ๐‘‹ ๐บ = (๐‘‰ , ๐‘ , ๐‘†, ๐‘ƒ), ๐‘‹ โ†’๐‘Ž โ†“ โ†‘ ๐‘Ž ๐‘Ž where ๐‘‰ = ฮฃ โˆช {ยซ, ยป} is the set of terminals; ๐‘ is the set ๐‘‹ of nonterminals containing 5-tuples (๐‘Ž, ๐‘ , ๐‘ข, ๐‘œ, ๐‘ฃ), where ๐‘‹ โ†’๐œ† โ†“ ๐‘‹ ๐‘Ž โˆˆ ฮฃ โˆช {ยซ, ยป}, ๐‘  โˆˆ ๐‘† โˆช (๐น โงต {ERR }), ๐‘ข โˆˆ {๐œ†, ยซ} โ‹… ฮฃโˆ— โ‹… {๐œ†, ยป}, ๐œ† |๐‘ข| โ‰ค ๐‘˜(๐‘€), ๐‘ฃ โˆˆ ฮฃโˆ— โ‹… {๐œ†, ยป}, |๐‘ฃ| โ‰ค ๐‘˜(๐‘€), and ๐‘œ โˆˆ {ACC } โˆช {RED (๐‘›) โˆฃ โˆƒ๐‘›โ€ฒ โˆถ RED (๐‘›๐‘›โ€ฒ ) โˆˆ ๐น }; ๐‘† = (ยซ, ๐‘ 0 , ๐œ†, ACC , ๐œ†) โˆˆ ๐‘ is the initial nonterminal; ๐‘ƒ is the set of rules defined as We draw derivation trees of grammar ๐บ๐‘€ differently follows: than we are used to for context-free grammars. We start with the initial nonterminal ๐‘†0 at the top right and expand ๐‘‹ โ†’ ๐‘Ž ๐‘Œ โˆˆ ๐‘ƒ, if ๐‘‹ = (๐‘Ž, ๐‘ , ๐œ†, RED (๐‘›), (๐‘/๐‘–) โ‹… ๐‘ฃ) each nonterminal as indicated in Table 2. For example ๐‘Œ = (๐‘, ๐‘“ (๐‘ , ๐‘), ๐œ†, RED (๐‘› โ‹… ๐‘–), ๐‘ฃ) the rule ๐‘‹ โ†’ ๐‘Ž๐‘Œ ๐‘ is drawn so that its right hand side is written clockwise around the left hand side nonterminal or ๐‘‹ = (๐‘Ž, ๐‘ , ๐œ†, ACC , ๐œ†) ๐‘‹, the terminal ๐‘Ž at 12 oโ€™clock, the nonterminal ๐‘Œ at 3 ๐‘Œ = (๐‘, ๐‘“ (๐‘ , ๐‘), ๐œ†, ACC , ๐œ†), oโ€™clock and the nonterminal ๐‘ at 6 oโ€™clock. This yields ๐‘‹ โ†’ ๐‘Œ โˆˆ ๐‘ƒ, if ๐‘‹ = (๐‘Ž, ๐‘ , ๐‘Ž๐‘ข, RED (๐‘›), (๐‘/๐‘–) โ‹… ๐‘ฃ) derivation tree pictures in which each maximum non- ๐‘Œ = (๐‘, ๐‘“ (๐‘ , ๐‘), ๐‘ข, RED (๐‘› โ‹… ๐‘–), ๐‘ฃ) terminal branch resembles a working list visited by the automaton during the stage of its computation. When or ๐‘‹ = (๐‘Ž, ๐‘ , ๐‘Ž๐‘ข, ACC , ๐œ†) drawing the derivation tree we prevent the edges from ๐‘Œ = (๐‘, ๐‘“ (๐‘ , ๐‘), ๐‘ข, ACC , ๐œ†), crossing. We omit ๐œ†-rules. ๐‘‹ โ†’ ๐‘Ž ๐‘Œ ๐‘ โˆˆ ๐‘ƒ, if ๐‘‹ = (๐‘Ž, ๐‘ , ๐œ†, ๐‘œ, ๐‘ฃ) Suppose ๐‘ค โ‡’๐‘€ ๐‘ค โ€ฒ for some words ๐‘ค and ๐‘ค โ€ฒ . We show how to move from a derivation tree that gives the word ๐‘Œ = (๐‘, ๐‘“ (๐‘ , ๐‘), ๐œ†, RED (1 ), ๐‘ฅ) ๐‘ค to a derivation tree that gives the word ๐‘ค โ€ฒ . First, we ๐‘ = (๐‘Ž, ๐‘ , ๐‘Ž๐‘ฅ, ๐‘œ, ๐‘ฃ), introduce the following notation: Let ๐‘‹ โ†’ ๐‘Œ ๐‘ โˆˆ ๐‘ƒ, if ๐‘‹ = (๐‘Ž, ๐‘ , ๐‘Ž๐‘ข, ๐‘œ, ๐‘ฃ) ๐‘‹ = (๐‘Ž, ๐‘ , ๐‘ข, ๐‘œ, ๐‘ฃ) ๐‘Œ = (๐‘, ๐‘“ (๐‘ , ๐‘), ๐‘ข, RED (1 ), ๐‘ฅ) ๐‘ = (๐‘Ž, ๐‘ , ๐‘Ž๐‘ฅ, ๐‘œ, ๐‘ฃ), be any nonterminal of the grammar ๐บ๐‘€ . Then ๐‘‹ โ†’ ๐‘Ž โˆˆ ๐‘ƒ, if ๐‘‹ = (๐‘Ž, RED (๐‘›), ๐œ†, RED (๐‘›), ๐œ†) ๐‘‹ ๐œ† = (๐‘Ž, ๐‘ , ๐œ†, ๐‘œ, ๐‘ฃ). or ๐‘‹ = (๐‘Ž, ACC , ๐œ†, ACC , ๐œ†), The following implications follow directly from the defi- ๐‘‹ โ†’ ๐œ† โˆˆ ๐‘ƒ, if ๐‘‹ = (๐‘Ž, RED (๐‘›), ๐‘Ž, RED (๐‘›), ๐œ†) nition of the rules of the grammar ๐บ๐‘€ , where ๐‘Ž is a symbol or ๐‘‹ = (๐‘Ž, ACC , ๐‘Ž, ACC , ๐œ†), in the first component of ๐‘‹ and ๐œ” โˆˆ ๐‘ โˆช {๐œ†}: for any ๐‘  โˆˆ ๐‘† โˆช (๐น โงต {ERR }), ๐‘ข โˆˆ {๐œ†, ยซ} โ‹… ฮฃโˆ— โ‹… {๐œ†, ยป}, |๐‘ข| โ‰ค ๐‘˜(๐‘€), ๐‘‹ โˆˆ ๐‘‰ โŸน ๐‘‹๐œ† โˆˆ ๐‘‰ (2) ๐‘ฃ โˆˆ ฮฃโˆ— โ‹… {๐œ†, ยป}, |๐‘ฃ| โ‰ค ๐‘˜(๐‘€), ๐‘› โˆˆ (10 โˆ— )+ , ๐‘– โˆˆ {0 , 1 }, ๐‘Ž โˆˆ ฮฃ โˆช {ยซ}, ๐‘‹ โ†’ ๐‘Ž ๐‘Œ๐‘ โˆˆ ๐‘ƒ โŸน ๐‘‹ = ๐‘๐œ† (3) and ๐‘ โˆˆ ฮฃ โˆช {ยป}. ๐บ (and therefore ๐บ๐‘€ as well) is obviously a context-free grammar. ๐‘‹ โ†’ ๐‘Œ ๐œ” โˆˆ ๐‘ƒ โŸน ๐‘‹ ๐œ† โ†’ ๐‘Ž ๐‘Œ ๐œ†๐œ” โˆˆ ๐‘ƒ (4) Figure 3: Function ๐‘… applied to a derivation tree ๐‘‡ with the first two branches shown in Fig. 4a. The function ๐‘… transforms these two branches into the single one shown in Fig. 4b, the rest of tree keeping unchanged. ยซ ๐‘Ž1 โ€ฆ ๐‘Ž๐‘–โˆ’1 ๐‘Ž๐‘– ๐‘Ž๐‘–+1 โ€ฆ ๐‘Ž๐‘š1 ๐‘Ž๐‘—โ€ฒ โ€ฆ ๐‘Ž๐‘šโ€ฒ 2 โ†‘ โ†‘ โ†‘ โ†‘ โ†‘ โ†‘ โ†‘ โ†‘ ๐‘†0 โ†’ ๐‘†1 โ†’ โ‹ฏ โ†’ ๐‘†๐‘–โˆ’1 โ†’ ๐‘†๐‘– โ†’ ๐‘†๐‘–+1 โ†’ โ‹ฏ โ†’ ๐‘†๐‘š 1 โ†“ ๐‘†๐‘–โ€ฒ โ†’ โ‹ฏ โ†’ ๐‘†๐‘—โ€ฒ โ†’ โ‹ฏ โ†’ ๐‘†๐‘šโ€ฒ 2 (a) the first two maximum nonterminal paths of the original tree ๐‘‡ ยซ ๐‘Ž1 โ€ฆ ๐‘Ž๐‘–โˆ’1 ๐‘Ž๐‘–โ€ฒ ๐‘Ž๐‘–+1 โ€ฆ ๐‘Ž๐‘š1 ๐‘Ž๐‘—โ€ฒ โ€ฆ ๐‘Ž๐‘šโ€ฒ 2 โ†‘ โ†‘ โ†‘ โ†‘ โ†‘ โ†‘ ๐‘†0 โ†’ ๐‘†1 โ†’ โ‹ฏ โ†’ ๐‘†๐‘–โˆ’1 โ†’ ๐‘†๐‘–โ€ฒ๐œ† โ†’ โ‹ฏ โ†’ ๐‘†๐‘—โ€ฒ๐œ† โ†’ โ‹ฏ โ†’ ๐‘†๐‘šโ€ฒ๐œ† 2 (b) the first maximum nonterminal path of the resulting tree ๐‘…(๐‘‡ ) ๐‘‹ โ†’ ๐œ† โˆˆ ๐‘ƒ โŸน ๐‘‹๐œ† โ†’ ๐‘Ž โˆˆ ๐‘ƒ (5) (i) ๐‘…(๐‘‡ ) is the derivation tree of the grammar ๐บ๐‘€ , ๐‘‹ โ†’ ๐‘Ž ๐‘Œ ๐œ” โˆˆ ๐‘ƒ โŸน ๐‘‹ = ๐‘‹ ๐œ† and ๐‘Œ = ๐‘Œ ๐œ† (6) (ii) ๐‘…(๐‘‡ ) contains one less maximal nonterminal path than ๐‘‡, Next, we introduce a function ๐‘… that assigns a tree ๐‘…(๐‘‡ ) to any derivation tree ๐‘‡ of the grammar ๐บ๐‘€ with at least (iii) if ๐‘‡ gives the word ๐‘ค and ๐‘…(๐‘‡ ) gives the word ๐‘ค โ€ฒ , then two maximum nonterminal paths. We show that the ๐‘ค โ‡’๐‘€ ๐‘ค โ€ฒ . resulting tree is a derivation tree of the grammar ๐บ๐‘€ . Suppose that the first two maximum nonterminal paths Proof. Suppose that the first two maximum nonterminal of the tree ๐‘‡, together with terminals they generate, are paths of the tree ๐‘‡, together with the generated terminals, drawn in Figure 4a. Note that the vertex denoted by the look like paths depicted in Figure 4a. nonterminal ๐‘†๐‘– is the branching point of these nontermi- (i). It follows from (2) that the vertices ๐‘†๐‘–โ€ฒ๐œ† , โ€ฆ, ๐‘†๐‘—โ€ฒ๐œ† , โ€ฆ, nal paths. We construct the tree ๐‘…(๐‘‡ ) in the following โ€ฒ๐œ† ๐‘†๐‘š2 in Figure 4b are nonterminals of the grammar ๐บ๐‘€ . way: The following rules are used in the first maximal non- 1. Remove the terminal vertices ๐‘Ž๐‘– , โ€ฆ, ๐‘Ž๐‘š1 and ๐‘Ž๐‘—โ€ฒ , โ€ฆ, terminal path of the tree ๐‘‡ for some ๐œ” โˆˆ ๐‘‰ โˆช {๐œ†}: โ€ฒ . ๐‘Ž๐‘š2 ๐‘†๐‘–โˆ’1 โ†’ ๐‘Ž๐‘–โˆ’1 ๐‘†๐‘– ๐œ” 2. Remove the vertices ๐‘†๐‘– , โ€ฆ , ๐‘†๐‘š1 . ๐‘†๐‘– โ†’ ๐‘Ž๐‘– ๐‘†๐‘–+1 ๐‘†๐‘–โ€ฒ 3. For each ๐‘™ โˆˆ {๐‘–, โ€ฆ , ๐‘š2 } do the following: According to (3), ๐‘†๐‘– = ๐‘†๐‘–โ€ฒ๐œ† . So a) replace the vertex ๐‘†๐‘™โ€ฒ with the vertex ๐‘†๐‘™โ€ฒ๐œ† b) add the terminal vertex ๐‘Ž๐‘™โ€ฒ , where ๐‘Ž๐‘™โ€ฒ is the ๐‘†๐‘–โˆ’1 โ†’ ๐‘Ž๐‘–โˆ’1 ๐‘†๐‘–โ€ฒ๐œ† ๐œ” symbol in the first component of the non- terminal ๐‘†๐‘™โ€ฒ๐œ† is a rule of grammar ๐บ๐‘€ . The following rules are used in the second maximal nonterminal path of the tree ๐‘‡ for c) add an edge from ๐‘†๐‘™โ€ฒ๐œ† to ๐‘Ž๐‘™โ€ฒ some ๐œ”๐‘–+1 , โ€ฆ, ๐œ”๐‘š2 โˆˆ ๐‘‰ โˆช {๐œ†}: 4. Add a horizontal edge from the vertex ๐‘†๐‘–โˆ’1 to the vertex ๐‘†๐‘™โ€ฒ๐œ† . ๐‘†๐‘–โ€ฒ โ†’ ๐‘†๐‘–+1 โ€ฒ ๐œ” ๐‘–+1 โ‹ฎ Function ๐‘… converts the first two maximum nonterminal โ€ฒ โ†’ ๐‘†โ€ฒ ๐œ” ๐‘†๐‘—โˆ’1 ๐‘— ๐‘— paths of the tree ๐‘‡ (including the terminals they generate) in Figure 4a to the path drawn in Figure 4b. ๐‘†๐‘—โ€ฒ โ†’ ๐‘Ž๐‘—โ€ฒ ๐‘†๐‘—+1 โ€ฒ ๐œ” ๐‘—+1 Lemma 3.1. For every derivation tree ๐‘‡ of the grammar โ‹ฎ ๐บ๐‘€ that lies in the domain of the function ๐‘… (contains ๐‘†๐‘šโ€ฒ 2 โˆ’1 โ†’ ๐‘Ž๐‘šโ€ฒ ๐‘†โ€ฒ ๐œ” 2 โˆ’1 ๐‘š2 ๐‘š2 at least two maximal nonterminal paths), the following ๐‘†๐‘šโ€ฒ 2 โ†’ ๐‘Ž๐‘š โ€ฒ statements hold: 2 According to (4), (5) and (6) the following rules Proof. If ยซ๐‘ค โ€ฒ ยป โ‡’๐‘€ ยซ๐‘คยป, then for some ๐‘ข, ๐‘ฃ, and ๐‘› the word ๐‘ข๐‘ฃ is a prefix of ๐‘ค โ€ฒ ยป, ๐›ฟ โˆ— (๐‘ , ๐‘ข๐‘ฃ) = RED (๐‘›), and |๐‘ฃ| = |๐‘›|. ๐‘†๐‘–โ€ฒ๐œ† โ†’ ๐‘Ž๐‘–โ€ฒ ๐‘†๐‘–+1 โ€ฒ๐œ† ๐œ” ๐‘–+1 Let ๐‘†0 โ†’ ๐‘†1 โ†’ โ€ฆ โ†’ ๐‘†๐‘š be the first maximal nonterminal โ‹ฎ path of the derivation tree ๐‘‡ and the first components of โ€ฒ๐œ† โ†’ ๐‘Ž โ€ฒ ๐‘† โ€ฒ๐œ† ๐œ” nonterminals on this path contain the terminal symbols ๐‘†๐‘—โˆ’1 ๐‘—โˆ’1 ๐‘— ๐‘— ยซ, ๐‘Ž0 , ๐‘Ž1 , โ€ฆ, ๐‘Ž๐‘š . Since the automaton ๐‘€ is monotone, โ€ฒ๐œ† โ€ฒ โ€ฒ๐œ† ๐‘†๐‘— โ†’ ๐‘Ž๐‘— ๐‘†๐‘—+1 ๐œ”๐‘—+1 ๐‘š โ‰ฅ |๐‘ข๐‘ฃ/๐‘›| = ๐‘™ and ๐‘ข๐‘ฃ/๐‘› = ๐‘Ž1 โ€ฆ ๐‘Ž๐‘˜ โ€ฆ ๐‘Ž๐‘™ , where ๐‘˜ = |๐‘ข|. The tree ๐‘‡ โ€ฒ is obtained from the tree ๐‘‡ in the following โ‹ฎ way: ๐‘†๐‘šโ€ฒ๐œ†2 โˆ’1 โ†’ ๐‘Ž๐‘š โ€ฒ ๐‘† โ€ฒ๐œ† ๐œ” 2 โˆ’1 ๐‘š2 ๐‘š2 1. Remove the terminal symbols ๐‘Ž๐‘˜ , โ€ฆ, ๐‘Ž๐‘™ . ๐‘†๐‘šโ€ฒ๐œ†2 โ†’ ๐‘Ž๐‘š โ€ฒ 2 2. Replace each nonterminal ๐‘†๐‘— , ๐‘˜ โ‰ค ๐‘— โ‰ค ๐‘™, on the are rules of the grammar ๐บ๐‘€ . The tree ๐‘…(๐‘‡ ) is therefore first nonterminal path with the nonterminals ๐‘†๐‘—โ€ฒ , the derivation tree of the grammar ๐บ๐‘€ . which we get from ๐‘†๐‘— by replacing the empty word (ii). The construction of the tree ๐‘…(๐‘‡ ) from the tree ๐‘‡ ๐œ† in the third component with the word ๐‘Ž๐‘— โ€ฆ ๐‘Ž๐‘™ . involves the removal of the suffix of the first nonterminal path just behind the branching point with the second 3. Add new nonterminals nonterminal path. All other nonterminals are eventually ๐‘†๐‘˜โ€ณ = (๐‘Ž๐‘˜ , ๐‘ ๐‘˜ , ๐œ†, ๐‘œ, ๐‘ฃ) = ๐‘†๐‘˜ , replaced by other nonterminals, and all other edges be- โ€ณ = (๐‘ , ๐‘  , ๐œ†, RED (๐‘›โ€ฒ ), ๐‘ฃ /๐‘› ), tween nonterminals in the tree remain unchanged. Thus, ๐‘†๐‘˜+๐‘– ๐‘– ๐‘˜+๐‘– ๐‘– ๐‘– ๐‘– the derivation tree ๐‘…(๐‘‡ ) contains one less maximal non- where ๐‘๐‘– is the ๐‘–-th symbol of the word ๐‘ฃ from terminal path than the derivation tree ๐‘‡. the left, ๐‘ฃ๐‘–โ€ฒ is the prefix of the word ๐‘ฃ of length ๐‘–, (iii). If the tree ๐‘‡ gives the word ๐‘ค and its first two ๐‘ฃ๐‘–โ€ฒ ๐‘ฃ๐‘– = ๐‘ฃ, ๐‘›๐‘–โ€ฒ is the prefix of the reduction sequence maximum nonterminal paths are shown in Figure 4a, of length ๐‘–, ๐‘›๐‘–โ€ฒ ๐‘›๐‘– = ๐‘› and ๐‘ ๐‘˜+๐‘– = ๐›ฟ(๐‘ ๐‘˜ , ๐‘ฃ๐‘–โ€ฒ ) for all then obviously ๐‘– โˆˆ {1, โ€ฆ , |๐‘ฃ|}. ๐‘ค = ๐‘Ž0 ๐‘Ž1 โ€ฆ ๐‘Ž๐‘– ๐‘Ž๐‘–+1 โ€ฆ ๐‘Ž๐‘š1 ๐‘ฅ 4. Add a terminal ๐‘Ž๐‘˜ and an edge from the ๐‘†๐‘˜โ€ณ ter- minal to this terminal. Similarly, we add the ter- for some ๐‘ฅ. The nonterminal ๐‘†๐‘– is the last common non- minals ๐‘1 , โ€ฆ, ๐‘|๐‘ฃ| and connect them to the new terminal of the first two nonterminal paths of the tree derivation tree with an edge leading along the ๐‘‡ and is rewritten to ๐‘Ž๐‘– ๐‘†๐‘–+1 ๐‘†๐‘–โ€ฒ using the grammar ๐บ๐‘€ โ€ณ , โ€ฆ, ๐‘† โ€ณ . line from the ๐‘†๐‘˜+1 ๐‘˜+|๐‘ฃ| rule. The definition of the grammar ๐บ๐‘€ rule for each ๐‘™ โˆˆ {๐‘–, โ€ฆ , ๐‘š1 โˆ’ 1} implies that ๐‘ ๐‘™+1 = ๐‘“ (๐‘ ๐‘™ , ๐‘Ž๐‘™+1 ), where ๐‘ ๐‘™ is 5. We join the nonterminal ๐‘†๐‘˜โ€ณ to the new derivation the state contained in the second component of the non- tree by an edge leading from the nonterminal ๐‘†๐‘˜โˆ’1 . terminal ๐‘†๐‘™ , ๐‘ ๐‘š1 = RED (๐‘›) for some reduction sequence We use the grammar rules ๐‘†๐‘˜โˆ’1 โ†’ ๐‘Ž๐‘˜โˆ’1 ๐‘†๐‘˜โ€ณ ๐‘†๐‘˜โ€ฒ . Sim- ๐‘› is the state contained in the second component of the โ€ณ , โ€ฆ, ๐‘† โ€ณ ilarly, we join the nonterminals ๐‘†๐‘˜+1 ๐‘˜+|๐‘ฃ| to nonterminal ๐‘†๐‘š1 , the derivation tree using the rules ๐‘†๐‘˜โ€ณ โ†’ ๐‘Ž๐‘˜ ๐‘†๐‘˜+1 โ€ณ , โ€ณ โ€ณ โ€ฆ, ๐‘†๐‘˜+|๐‘ฃ|โˆ’1 โ†’ ๐‘Ž๐‘˜+|๐‘ฃ|โˆ’1 ๐‘†๐‘˜+|๐‘ฃ| . ๐‘Ž๐‘– โ†‘ The above construction yields a derivation tree ๐‘‡ โ€ฒ which gives the word ยซ๐‘ค โ€ฒ ยป and ๐‘…(๐‘‡ โ€ฒ ) = ๐‘‡. We can check the (๐‘Ž๐‘– , ๐‘ ๐‘– , ๐œ†, ๐‘œ, ๐‘ฃ) = ๐‘†๐‘– โ†’ ๐‘†๐‘–+1 = (๐‘Ž๐‘–+1 , ๐‘ ๐‘–+1 , ๐œ†, RED (1 ), ๐‘ข) correctness of this construction using the definition of nonterminals and the rules of the grammar ๐บ๐‘€ . โ†“ (๐‘Ž๐‘– , ๐‘ ๐‘– , ๐‘Ž๐‘– โ‹… ๐‘ข, ๐‘œ, ๐‘ฃ) = ๐‘†๐‘–โ€ฒ Lemma 3.3. Let ๐‘ be any natural number, ๐‘‡1 , โ€ฆ, ๐‘‡๐‘ be any derivation trees of the grammar ๐บ๐‘€ , and ๐‘ค1 , โ€ฆ, ๐‘ค๐‘ for some words ๐‘ข and ๐‘ฃ and some operation ๐‘œ, and ๐‘ข = any words from ยซ ฮฃโˆ— ยป. If at the same time โ€ฒ โ€ฆ ๐‘Ž โ€ฒ . Together we get that ๐‘ค โ€ฒ = (๐‘Ž๐‘–+1 โ€ฆ ๐‘Ž๐‘š1 )/๐‘› = ๐‘Ž๐‘–+1 ๐‘—โˆ’1 ๐‘Ž0 ๐‘Ž1 โ€ฆ ๐‘Ž๐‘– ๐‘ข๐‘ฅ, and hence ๐‘ค โ‡’๐‘€ ๐‘ค โ€ฒ . โ€ข ๐‘‡๐‘– = ๐‘…(๐‘‡๐‘–+1 ) for all ๐‘– โˆˆ {1, โ€ฆ , ๐‘ โˆ’ 1}, Lemma 3.2. For any derivation tree ๐‘‡ of the grammar ๐บ๐‘€ โ€ข ๐‘‡๐‘– gives the word ๐‘ค๐‘– for all ๐‘– โˆˆ {1, โ€ฆ , ๐‘}, and any words ยซ๐‘คยป and ยซ๐‘ค โ€ฒ ยป, such that ๐‘‡ gives the word โ€ข ๐‘‡1 contains exactly one maximal nonterminal path, ยซ๐‘คยป and ยซ๐‘ค โ€ฒ ยป โ‡’๐‘€ ยซ๐‘คยป, there is a derivation tree ๐‘‡ โ€ฒ of the grammar ๐บ๐‘€ that gives the word ยซ๐‘ค โ€ฒ ยป and ๐‘…(๐‘‡ โ€ฒ ) = ๐‘‡. then at the same time โ€ข ๐‘ค๐‘–+1 โ‡’๐‘€ ๐‘ค๐‘– for all ๐‘– โˆˆ 1, โ€ฆ , ๐‘ โˆ’ 1, โ€ข ๐‘ค1 is accepted by the ๐‘€ automaton in a single stage. Lemma 3.4. Let ๐‘ be any natural number and ๐‘ค1 , โ€ฆ, ๐‘ค๐‘ any words from ยซ ฮฃโˆ— ยป. If at the same time Proof. We prove the theorem by induction on the nat- ural number ๐‘. Obviously, ๐‘‡๐‘– contains just ๐‘– maximal โ€ข ๐‘ค๐‘–+1 โ‡’๐‘€ ๐‘ค๐‘– for all ๐‘– โˆˆ {1, โ€ฆ , ๐‘ โˆ’ 1}, nonterminal paths. โ€ข ๐‘ค1 is accepted by the ๐‘€ automaton in a single stage. ๐‘ = 1. Let ๐‘‡1 be the derivation tree of the grammar ๐บ๐‘€ , which gives the word ๐‘ค1 and contains exactly one then for some derivation trees ๐‘‡1 , โ€ฆ, ๐‘‡๐‘ of the grammar maximal nonterminal path ๐บ๐‘€ ๐‘†0 โ†’ ๐‘†1 โ†’โ‹ฏ โ†’ ๐‘†๐‘–โˆ’1 โ†’ ๐‘†๐‘– โ†’โ‹ฏ โ†’ ๐‘†๐‘š . โ€ข ๐‘‡๐‘– = ๐‘…(๐‘‡๐‘–+1 ) for all ๐‘– โˆˆ {1, โ€ฆ , ๐‘ โˆ’ 1}, โ€ข ๐‘‡๐‘– gives the word ๐‘ค๐‘– for all ๐‘– โˆˆ {1, โ€ฆ , ๐‘}. This means that no rule with two nonterminals on the right-hand side is used in this tree. Thus, only the fol- Proof. We prove the theorem by induction on the natural lowing rules are used: number ๐‘. ๐‘ = 1. Suppose that the word ๐‘ค is accepted by the ๐‘†๐‘–โˆ’1 โ†’ ๐‘Ž๐‘– | ๐‘Ž๐‘– ๐‘†๐‘– | ๐‘†๐‘– | ๐œ† automaton ๐‘€ in one stage. So for some ๐‘ 0 , โ€ฆ, ๐‘ ๐‘š (๐‘ 0 is The word in the third component of the nonterminal ๐‘†๐‘– the initial state of ๐‘€, ๐‘ ๐‘š = ACC ) and ๐‘Ž0 , ๐‘Ž1 , โ€ฆ, ๐‘Ž๐‘š (๐‘Ž0 = ยซ, is empty (for all ๐‘– โˆˆ {1, โ€ฆ , ๐‘š}) because it is empty in the ๐‘Ž๐‘š = ยป), and for each ๐‘– โˆˆ {1, โ€ฆ , ๐‘š} initial nonterminal ๐‘†0 and for all rules ๐‘†๐‘–โˆ’1 โ†’ ๐‘Ž๐‘– ๐‘†๐‘– | ๐‘†๐‘– , ๐›ฟ(๐‘ ๐‘–โˆ’1 , ๐‘Ž๐‘– ) = ๐‘ ๐‘– . the word in the third component of the nonterminal ๐‘†๐‘– is not longer than the word in the third component of It follows that the nonterminal ๐‘†๐‘–โˆ’1 . Thus, only the following rules are used in the tree ๐‘‡1 : ๐‘†๐‘– = (๐‘Ž๐‘– , ๐‘ ๐‘– , ๐œ†, ACC , ๐œ†) ๐‘†๐‘–โˆ’1 โ†’ ๐‘Ž๐‘– | ๐‘Ž๐‘– ๐‘†๐‘– is the nonterminal of the grammar ๐บ๐‘€ , ๐‘†0 is the initial nonterminal of the grammar ๐บ๐‘€ and and the calculation tree ๐‘‡1 for the word ๐‘ค1 looks like this: ๐‘†๐‘–โˆ’1 โ†’ ๐‘Ž๐‘–โˆ’1 ๐‘†๐‘– ๐‘†๐‘š โ†’ ๐‘Ž๐‘š ยซ ๐‘Ž1 โ€ฆ ๐‘Ž๐‘–โˆ’1 ๐‘Ž๐‘– โ€ฆ ๐‘Ž๐‘š โ†‘ โ†‘ โ†‘ โ†‘ โ†‘ are the rules of the grammar ๐บ๐‘€ . So we can construct the following derivation tree of the grammar ๐บ๐‘€ , which ๐‘†0 โ†’ ๐‘†1 โ†’โ‹ฏ โ†’ ๐‘†๐‘–โˆ’1 โ†’ ๐‘†๐‘– โ†’โ‹ฏ โ†’ ๐‘†๐‘š gives the word ๐‘ค = ยซ๐‘Ž1 โ€ฆ ๐‘Ž๐‘šโˆ’1 ยป: It follows immediately from the definition of the grammar ยซ ๐‘Ž1 โ€ฆ ๐‘Ž๐‘–โˆ’1 ๐‘Ž๐‘– โ€ฆ ยป โ†‘ โ†‘ โ†‘ โ†‘ โ†‘ ๐บ๐‘€ rules that ๐‘“ (๐‘ ๐‘–โˆ’1 , ๐‘Ž๐‘– ) = ๐‘ ๐‘– for all ๐‘– โˆˆ {1, โ€ฆ , ๐‘š}, where ๐‘ 0 is the state in the second component of the initial nonter- โ†’ ๐‘†1 โ†’โ‹ฏ โ†’ ๐‘†๐‘–โˆ’1 โ†’ ๐‘†๐‘– โ†’โ‹ฏ โ†’ ๐‘†๐‘š ๐‘†0 minal ๐‘†0 and also the initial state of the automaton ๐‘€, ๐‘ ๐‘– is the state in the second component of the nonterminal Induction step. Assume that the theorem holds for ๐‘†๐‘– . ๐‘ = ๐‘ž โ‰ฅ 1. We show that it is also true for ๐‘ = ๐‘ž + 1. Let Because the fourth components of the nonterminals ๐‘ค be an arbitrary word of the language ยซ ๐ฟ(๐‘€) ยป that the ๐‘†๐‘–โˆ’1 and ๐‘†๐‘– are the same for all ๐‘– โˆˆ {1, โ€ฆ , ๐‘š} and there is automaton ๐‘€ accepts in exactly ๐‘ž stages. Let further ๐‘ค โ€ฒ ACC in the fourth component of the initial nonterminal be any word such that ๐‘ค โ€ฒ โ‡’๐‘€ ๐‘ค. Thus, by the induction ๐‘†0 , the operation ACC is contained in all nonterminals of assumption, some derivation tree ๐‘‡ of the grammar ๐บ๐‘€ the tree ๐‘‡1 . yields the word ๐‘ค. Then, according to 3.2, there is a The automaton ๐‘€ only accepts after moving over the derivation tree ๐‘‡ โ€ฒ which gives the word ๐‘ค โ€ฒ and ๐‘‡ = right delimiter ยป, so ๐‘Ž๐‘š = ยป. We get ๐‘Ž0 ๐‘Ž1 โ€ฆ ๐‘Ž๐‘šโˆ’1 ๐‘Ž๐‘š = ๐‘ค1 ๐‘…(๐‘‡ โ€ฒ ). So the theorem holds for the word ๐‘ค โ€ฒ , which the and automaton ๐‘€ accepts in ๐‘ž + 1 stages. ๐›ฟ โˆ— (๐‘ 0 , ๐‘Ž1 โ€ฆ ๐‘Ž๐‘šโˆ’1 ๐‘Ž๐‘š ) = ACC . Induction step. Suppose the theorem holds for ๐‘ = ๐‘ž โ‰ฅ The following theorem is a direct consequence of lem- 1. We show that it is also true for ๐‘ = ๐‘ž + 1. Let ๐‘‡ be any mata 3.3 and 3.4. derivation tree of the grammar ๐บ๐‘€ with ๐‘ž + 1 maximal Theorem 3.1. ๐ฟ(๐บ๐‘€ ) = ยซ๐ฟ(๐‘€)ยป. nonterminal paths, which yields the word ๐‘ค. Then, by 3.1, ๐‘‡ โ€ฒ = ๐‘…(๐‘‡ ) is a derivation tree of the grammar ๐บ๐‘€ The next theorem says that the grammar ๐บ๐‘€ of the with ๐‘ž maximal nonterminal paths, and if it gives the monotone reducing automaton ๐‘€ is suitable for con- word ๐‘ค โ€ฒ , then ๐‘ค โ‡’๐‘€ ๐‘ค โ€ฒ . For the tree ๐‘‡ โ€ฒ the theorem by structing a classical syntactic parser for the language induction holds, so it also holds for the tree ๐‘‡. recognized by this automaton. We define the characteristic of the item ๐‘ = ๐‘‹ โ†’ ๐›ผ . ๐›ฝ as Theorem 3.2. ๐บ๐‘€ is an LR(0) grammar. follows: Proof. First, let us recall some classical notions intro- duced in the context of LR(0) grammars, as given, for [๐‘] = ([๐‘‹ ], ๐›ผ, [๐›ฝ]) example, in [4]. The characteristic of the set of items ๐ผ of the grammar ๐บ๐‘€ We call any expression of type is defined as the following set: ๐‘‹ โ†’ ๐›ผ . ๐›ฝ, [๐ผ ] = {[๐‘] โˆฃ ๐‘ โˆˆ ๐ผ } an item of the context-free grammar ๐บ, where ๐‘‹ โ†’ ๐›ผ๐›ฝ is a rule of the grammar ๐บ. In particular, ๐‘‹ โ†’ . is an item if Checking all types of rules of the grammar ๐บ๐‘€ , we ๐‘‹ โ†’ ๐œ† is a rule of grammar ๐บ. For each rule ๐‘‹ โ†’ ๐›พ, we get that the first two components of itemโ€™s characteristic call ๐‘‹ โ†’ ๐›พ . a complete item. We say that ๐ด โ†’ ๐›ผ . ๐›ฝ is a uniquely determine its third component. More precisely, valid item for the string ๐œ” if there exists a right sentential if (๐‘1,1 , ๐›ผ1 , ๐‘1,2 ) and (๐‘2,1 , ๐›ผ2 , ๐‘2,2 ) are characteristics of two form ๐œ‰ ๐ด๐‘ข (๐‘ข is a string of terminals) such that ๐œ‰ ๐›ผ = ๐œ”. items of the grammar ๐บ๐‘€ , then We use the notation ๐ผ (๐œ”) for the set of all valid items for the string ๐œ” . (๐‘1,1 = ๐‘2,1 and ๐›ผ1 = ๐›ผ2 ) โŸน ๐‘1,2 = ๐‘2,2 A context-free grammar ๐บ = (๐‘‰ , ๐‘ , ๐‘†, ๐‘ƒ) is called an LR(0) grammar if it satisfies the following conditions: A set of items ๐ผ has a continuous characteristic if 1. The initial symbol ๐‘† does not appear on the right- [๐ผ ] = {(๐‘0 , ๐›ผ, ๐‘1 ), (๐‘1 , ๐œ†, ๐‘2 ), โ€ฆ , (๐‘๐‘›โˆ’1 , ๐œ†, ๐‘๐‘› )} hand side of any rule. for some ๐‘0 , ๐‘1 , ๐‘2 , โ€ฆ, ๐‘๐‘›โˆ’1 , ๐‘๐‘› and ๐›ผ. 2. No reduce/reduce conflict. For any string ๐›พ โˆˆ By verifying all types of rules of the grammar ๐บ๐‘€ , (๐‘‰ โˆช ๐‘ )โˆ— , there is at most one complete item in we can prove the validity of the following statements, the set ๐ผ (๐›พ ). which say that construction of the states of the item 3. No shift/reduce conflict. If a complete item occurs automaton creates only sets of items with a continuous in ๐ผ (๐›พ ), then there is no item with a terminal to characteristic. the right of the dot in ๐ผ (๐›พ ). 1. ๐ผ (๐œ†) has a continuous characteristic. The first condition is satisfied for grammar ๐บ๐‘€ , because the initial nonterminal ๐‘† does not occur on the right-hand 2. If ๐ผ has a continuous characteristic, ๐‘ = ๐ด โ†’ side of any rule of ๐บ๐‘€ . ๐›ผ . ๐ต๐›ฝ โˆˆ ๐ผ and ๐ผ๐‘ = {๐ต โ†’ . ๐›พ โˆฃ ๐ต โ†’ ๐›พ โˆˆ ๐บ๐‘€ }, then We prove the remaining two conditions using a method ๐ผ โˆช ๐ผ๐‘ has a continuous characteristic. taken from [4] which is based on the notion of character- 3. If ๐ผ has a continuous characteristic, then for each istic. We define the characteristic sequentially, first for symbol ๐‘ฅ of the grammar ๐บ๐‘€ , the set ๐ผ๐‘ฅ = {๐ด โ†’ terminal and nonterminal symbols, then for strings of ter- ๐›ผ๐‘ฅ . ๐›ฝ โˆฃ ๐ด โ†’ ๐›ผ . ๐‘ฅ๐›ฝ โˆˆ ๐ผ } also has a continuous minal and nonterminal symbols, then for item sets of the characteristic. grammar, and finally for item sets obtained by construct- ing the states of the item automaton of the grammar ๐บ๐‘€ . Thus, any state ๐ผ (๐›พ ) has a continuous characteristic. The We show that each item set that is a state of the item definition of a set of items with continuous characteris- automaton has a continuous characteristic, which results tic implies that such a set can contain at most one triple in the absence of both types of conflicts. whose third component is empty or terminal . Each com- For each terminal or nonterminal symbol ๐‘ฅ of grammar plete item ๐ด โ†’ ๐›ผ . has characteristic ([๐ด], ๐›ผ, empty ), and ๐บ๐‘€ , we define the characteristic [๐‘ฅ] of the symbol ๐‘ฅ as each item ๐ด โ†’ ๐›ผ . ๐›ฝ containing a terminal to the right follows: of the dot has characteristic ([๐ด], ๐›ผ, terminal ). There- initial , if ๐‘ฅ = ๐‘†, fore, any set of items with a continuous characteristic โŽง โŽชterminal , can contain at most one complete item, and if it does, it [๐‘ฅ] = if ๐‘ฅ โˆˆ ๐‘‰ โŽจ(๐‘Ž, ๐‘ , ๐‘ข), no longer contains any item with a terminal to the right โŽช if ๐‘ฅ = (๐‘Ž, ๐‘ , ๐‘ข, ๐‘œ, ๐‘ฃ) โˆˆ ๐‘ โงต {๐‘†} of the dot. โŽฉ for some ๐‘œ and ๐‘ฃ. Thus, each set ๐ผ (๐›พ ) satisfies the second and third con- The characteristic of the strings ๐›ผ formed by terminals ditions of the definition of LR(0) grammar and ๐บ๐‘€ is an and nonterminals of the grammar ๐บ๐‘€ is defined by the LR(0) grammar. following rule: empty , if ๐›ผ = ๐œ†, [๐›ผ] = {[๐‘ฅ], if ๐‘ฅ โˆˆ ๐‘‰ โˆช ๐‘ and ๐›ผ = ๐‘ฅ๐›ผ โ€ฒ for some ๐›ผ โ€ฒ . 4. Normalizations the first nonterminal path of any of grammarโ€™s derivation tree uses rules of types listed in the first row only, i.e. the Normalized mon-red-automata form a subclass of reduc- rules with the right hand side starting with the terminal. ing automata. A normalized mon-red-automaton is any Thus, the last rule used in the first maximal nonterminal mon-red-automaton that is both prefix-correct and state path of any derivation tree of grammar ๐บ๐‘€ is of type ๐‘‹ โ†’ minimal. In this section, we show how to construct an ๐‘Ž, and the first complete item encountered by the item equivalent normalized mon-red-automaton to any mon- automaton is of type ๐‘‹ โ†’ ๐‘Ž .. Therefore, we construct red-automaton that accepts a non-empty language. the states of the reducing automaton ๐‘€ โ€ฒ only from the We use a grammar of mon-red-automaton constructed following types of items in the previous section to construct an equivalent prefix- correct mon-red-automaton. Then we show how to min- ๐‘‹ โ†’ .๐‘Ž๐‘Œ๐‘ ๐‘‹ โ†’ .๐‘Ž๐‘Œ ๐‘‹ โ†’ .๐‘Ž imize the set of states of any red-automaton including ๐‘‹ โ†’ ๐‘Ž.๐‘Œ๐‘ ๐‘‹ โ†’ ๐‘Ž.๐‘Œ ๐‘‹ โ†’ ๐‘Ž. the elimination of unreachable states. and we consider the transitions between item sets over terminal symbols only. 4.1. Prefix-correctness Now let us describe a construction of the prefix-correct Suppose that ๐‘€ = (ฮฃ, ยซ, ยป, ๐‘†, ๐‘ 0 , ๐น , ๐‘“ ) is a monotone re- automaton using just mentioned principles. For any ๐‘ข โˆˆ ducing automaton. We say that ๐‘€ is prefix-correct if for ฮฃโˆ— and ๐‘Ž โˆˆ ฮฃ โˆช {ยซ, ยป} we define the item sets using the any word ๐‘ข โˆˆ ฮฃโˆ— and reduction sequence ๐‘› the following following rules implications hold: ๐ผ (๐œ†) = {๐‘† โ†’ . ยซ ๐›พ โˆฃ ๐‘† โ†’ ยซ ๐›พ โˆˆ ๐‘ƒ}, ๐›ฟ โˆ— (๐‘ 0 , ๐‘ข) โˆˆ ๐‘† โŸน โˆƒ๐‘ฃ โˆˆ ฮฃโˆ— โˆถ ๐‘ข๐‘ฃ โˆˆ ๐ฟ(๐‘€) ๐ผ (๐‘ข๐‘Ž) = {๐‘‹ โ†’ ๐‘Ž . ๐›ผ โˆฃ ๐‘‹ โ†’ . ๐‘Ž ๐›ผ โˆˆ ๐ผ (๐‘ข)} ๐›ฟ โˆ— (๐‘ 0 , ๐‘ข) = RED (๐‘›) โŸน โˆƒ๐‘ฃ โˆˆ ฮฃโˆ— โˆถ ๐‘ข๐‘ฃ โˆˆ ๐ฟ(๐‘€) โˆช {๐‘Œ โ†’ . ๐‘ ๐›ฝ โˆฃ ๐‘‹ โ†’ . ๐‘Ž ๐‘Œ ๐›พ โˆˆ ๐ผ (๐‘ข) a ๐‘Œ โ†’ ๐‘ ๐›ฝ โˆˆ ๐‘ƒ}, โˆ— ๐›ฟ (๐‘ 0 , ๐‘ขยป) = RED (๐‘›) โŸน ๐‘ข โˆˆ ๐ฟ(๐‘€) where ๐‘† is the starting nonterminal of the grammar ๐บ๐‘€ . We obtain the above item sets utilizing slightly modified The first two implications say that any word ๐‘ข โˆˆ ฮฃโˆ— , over which the automaton ๐‘€ moves its working head while method already used in classical theory of parsing to construct an item automaton for a given LR(0) grammar. switching its control unit from the initial state ๐‘ 0 to any Here, we consider only items of the above types. Thus, transition or reducing state, is a prefix of some word from the language ๐ฟ(๐‘€). The third implication states that any our item sets are subsets of states of the classical item word ๐‘ข ยป โˆˆ ฮฃโˆ— ยป, over which the automaton ๐‘€ moves automaton, and hence contain neither shift/reduce nor re- its working head while switching its control unit from duce/reduce conflict. A set of all nonterminal states of the reducing automaton ๐‘€ โ€ฒ consists of all constructed item the initial state ๐‘ 0 to any reducing state, is a word of the language ๐ฟ(๐‘€)ยป. However, the automaton may need a sets except ๐ผ (๐œ†) and sets containing only a complete item. few more stages to formally accept it. Its initial state is the set ๐ผ (ยซ). The reducing states are just all RED (๐‘›) for which there exists a complete item ๐‘‹ โ†’ ๐‘Ž . Theorem 4.1. For each mon-red-automaton, an equiva- of the grammar ๐บ(๐‘€ โ€ฒ ) and ๐‘‹ = (๐‘Ž, RED (๐‘›), ๐œ†, RED (๐‘›), ๐œ†). lent prefix-correct mon-red-automata can be constructed. We define the transition function as follows: Proof. Suppose that ๐‘€ = (ฮฃ, ยซ, ยป, ๐‘†, ๐‘ 0 , ๐น , ๐‘“ ) is a mon- RED (๐‘›), if ๐ผ (๐‘ข๐‘Ž) = {๐‘‹ โ†’ ๐‘Ž .} and โŽง red-automaton. First, we construct an LR(0) grammar โŽช ๐‘‹ = (๐‘Ž, RED (๐‘›), ๐œ†, RED (๐‘›), ๐œ†) ๐บ๐‘€ . Next, we construct a reducing automaton ๐‘€ โ€ฒ = โŽช for some ๐‘›, (ฮฃ, ยซ, ยป, ๐‘† โ€ฒ , ๐‘ 0โ€ฒ , ๐น โ€ฒ , ๐‘“ โ€ฒ ) based on the grammar ๐บ๐‘€ . Finally, โŽช โŽชACC , if ๐‘Ž = ยป and we show that ๐‘€ โ€ฒ is monotone, prefix-correct, and equiv- ๐‘“ โ€ฒ (๐ผ (๐‘ข), ๐‘Ž) = ๐ผ (๐‘ขยป) = {๐‘‹ โ†’ ยป .} and alent to the original reducing automaton ๐‘€. โŽจ Construction of the reducing automaton ๐‘€ resembles โ€ฒ โŽช ๐‘‹ = (ยป, ACC , ๐œ†, ACC , ๐œ†), โŽช a construction of an item automaton for the grammar โŽชERR , if ๐ผ (๐‘ข๐‘Ž) = โˆ…, ๐บ๐‘€ . Transition states of the automaton ๐‘€ โ€ฒ are defined โŽช as sets of items of this grammar, in the same way as in โŽฉ๐ผ (๐‘ข๐‘Ž), otherwise. case of an item automaton. However, we are interested in those grammar rules only that can be used in the first Equivalence of automata ๐‘€ โ€ฒ and ๐‘€. Apparently, nonterminal path of ๐บ๐‘€ โ€™s derivation tree. While the ยซ๐ฟ(๐‘€)ยป = ๐ฟ(๐บ๐‘€ ). We show that grammar ๐บ๐‘€ contains rules of the following types ๐ฟ(๐บ๐‘€ ) = ยซ๐ฟ(๐‘€ โ€ฒ )ยป. ๐‘‹ โ†’ ๐‘Ž๐‘Œ๐‘ ๐‘‹ โ†’ ๐‘Ž๐‘Œ ๐‘‹โ†’๐‘Ž Suppose that ยซ๐‘คยป โˆˆ ๐ฟ(๐บ๐‘€ ) and ๐‘‡ is the derivation tree ๐‘‹ โ†’ ๐‘Œ๐‘ ๐‘‹โ†’๐‘Œ ๐‘‹โ†’๐œ† of the grammar ๐บ๐‘€ , which gives ยซ๐‘คยป. By induction on the number ๐‘ of maximal nonterminal paths in the tree the left column from the sets ๐ผ (ยซ), โ€ฆ, ๐ผ (ยซ๐‘ข๐‘ฃ) and then use ๐‘‡, we prove that ๐‘ค โˆˆ ๐ฟ(๐‘€ โ€ฒ ). their rules in the derivation listed in the right column: If the tree ๐‘‡ contains only one maximal nonter- minal path, then ๐ผ (ยซ๐‘คยป) = {๐‘‹ โ†’ ยป . } and ๐‘‹ = ๐‘† โ‡’ ยซ . ๐‘†1 ๐›พ1 ๐‘† โ‡’ ยซ ๐‘†1 ๐›พ 1 (ยป, ACC , ๐œ†, ACC , ๐œ†), so ๐›ฟ โ€ฒโˆ— (๐ผ (ยซ), ๐‘คยป) = ACC and thus ๐‘ค โˆˆ ๐‘†1 โ†’ ๐‘Ž1 . ๐‘†2 ๐›พ2 โ‡’ ยซ ๐‘Ž1 ๐‘†2 ๐›พ2 ๐›พ1 ๐ฟ(๐‘€ โ€ฒ ). โ‹ฎ โ‹ฎ Assume that the statement holds for all derivation trees of the grammar ๐บ๐‘€ with at most ๐‘ maximal nonterminal ๐‘†๐‘˜โˆ’1 โ†’ ๐‘Ž๐‘˜โˆ’1 . ๐‘†๐‘˜ ๐›พ๐‘˜ โ‡’ ยซ ๐‘Ž1 โ€ฆ ๐‘Ž๐‘˜โˆ’1 ๐‘†๐‘˜ ๐›พ๐‘˜ โ€ฆ ๐›พ2 ๐›พ1 paths. We then prove that it also holds for derivation ๐‘†๐‘˜ โ†’ ๐‘Ž๐‘˜ . ๐‘†๐‘˜+1 ๐‘‹0 โ‡’ ยซ ๐‘Ž1 โ€ฆ ๐‘Ž๐‘˜โˆ’1 ๐‘Ž๐‘˜ ๐‘†๐‘˜+1 ๐‘‹0 ๐›พ trees with ๐‘ + 1 maximal nonterminal paths. If ๐‘‡ is a ๐‘†๐‘˜+1 โ†’ ๐‘Ž๐‘˜+1 . ๐‘†๐‘˜+2 โ‡’ ยซ ๐‘ข ๐‘Ž๐‘˜+1 ๐‘†๐‘˜+2 ๐‘‹0 ๐›พ derivation tree with ๐‘ + 1 maximal nonterminal paths, โ‹ฎ โ‹ฎ which gives the word ยซ๐‘คยป, then by Lemma 3.3, ๐‘…(๐‘‡ ) is a tree with ๐‘ maximal nonterminal paths that yields the ๐‘†๐‘™ โ†’ ๐‘Ž ๐‘™ . โ‡’ ยซ ๐‘ข ๐‘Ž๐‘˜+1 โ€ฆ ๐‘Ž๐‘™โˆ’1 ๐‘Ž๐‘™ ๐‘‹0 ๐›พ , word ยซ๐‘ค โ€ฒ ยป such that ยซ๐‘คยป โ‡’๐‘€ ยซ๐‘ค โ€ฒ ยป. By the induction assumption ๐‘ค โ€ฒ โˆˆ ๐ฟ(๐‘€ โ€ฒ ). Thus, ๐›ฟ โˆ— (๐‘ 0 , ๐‘ข๐‘Ž) = RED (๐‘›) for where ๐›พ = ๐›พ๐‘˜ โ€ฆ ๐›พ1 , ๐‘ข = ๐‘Ž1 โ€ฆ ๐‘Ž๐‘˜ , and ๐‘ฃ = ๐‘Ž๐‘˜+1 โ€ฆ ๐‘Ž๐‘™ . Since some prefix ยซ๐‘ข๐‘Ž of ยซ๐‘คยป, where ๐ผ (ยซ๐‘ข๐‘Ž) = {๐‘‹ โ†’ ๐‘Ž . } and ๐บ๐‘€ is reduced, some terminal string can be derived from ๐‘‹ = (๐‘Ž, RED (๐‘›), ๐œ†, RED (๐‘›), ๐œ†). So ๐›ฟ โ€ฒโˆ— (๐ผ (ยซ๐‘ข), ๐‘Ž) = RED (๐‘›) ๐‘‹0 . Suppose that ๐‘ฃ/๐‘› = ๐‘1 โ€ฆ ๐‘๐‘š . From the definition of and ยซ๐‘คยป โ‡’๐‘€ โ€ฒ ยซ๐‘ค โ€ฒ ยป. Therefore ๐‘ค โˆˆ ๐ฟ(๐‘€ โ€ฒ ). the grammar ๐บ๐‘€ , it follows that some of its nonterminals We can prove the reverse inclusion (ยซ๐ฟ(๐‘€ โ€ฒ )ยป โІ ๐ฟ(๐บ๐‘€ )) ๐‘‹1 , ๐‘‹2 , โ€ฆ, ๐‘‹๐‘š contain the symbols ๐‘1 , ๐‘2 , โ€ฆ, ๐‘๐‘š in their in a similar way by induction on the number of stages first component, and the grammar contains the following of the reduction analysis of ยซ๐‘คยป by the automaton ๐‘€ โ€ฒ rules using Lemma 3.2. ๐‘†๐‘˜ โ†’ ๐‘Ž๐‘˜ ๐‘†๐‘˜+1 ๐‘‹0 ๐‘†๐‘˜ = ๐‘‹0๐œ† Prefix correctness. Suppose first that the automaton ๐‘€ โ€ฒ moves from the initial state ๐ผ (ยซ) over the prefix ๐‘ข โˆˆ ฮฃโˆ— ๐‘‹0 โ†’ ๐‘‹1 ๐›พ1โ€ฒ ๐‘‹0๐œ† โ†’ ๐‘Ž๐‘˜ ๐‘‹1๐œ† ๐›พ1โ€ฒ of the word ๐‘คยป to the state ๐ผ (ยซ๐‘ข), formally ๐‘‹1 โ†’ ๐‘‹2 ๐›พ2โ€ฒ ๐‘‹1๐œ† โ†’ ๐‘1 ๐‘‹2๐œ† ๐›พ2โ€ฒ ๐›ฟ โ€ฒโˆ— (๐ผ (ยซ), ๐‘ข) = ๐ผ (ยซ๐‘ข). ๐‘‹2 โ†’ ๐‘‹3 ๐›พ3โ€ฒ ๐‘‹2๐œ† โ†’ ๐‘2 ๐‘‹3๐œ† ๐›พ3โ€ฒ โ‹ฎ โ‹ฎ This means that the items listed below in the left column can be selected from the sets ๐ผ (ยซ), โ€ฆ, ๐ผ (ยซ๐‘ข) and used in ๐‘‹๐‘š โ†’ ๐›พโ€ฒ ๐‘‹๐‘š๐œ† โ†’ ๐‘๐‘š ๐›พ โ€ฒ the derivation by ๐บ๐‘€ listed in the right column The equality of the nonterminals ๐‘†๐‘˜ and ๐‘‹0๐œ† together with ๐‘† โ‡’ ยซ . ๐‘†1 ๐›พ1 ๐‘† โ‡’ ยซ ๐‘†1 ๐›พ 1 the existence of the rules in the second column follows ๐‘†1 โ†’ ๐‘Ž 1 . ๐‘†2 ๐›พ 2 โ‡’ ยซ ๐‘Ž1 ๐‘†2 ๐›พ2 ๐›พ1 from the application of implications (3), (4), and (5). These โ‹ฎ โ‹ฎ rules allow the automaton ๐‘€ โ€ฒ to move from the state ๐ผ (ยซ๐‘ข) over the word ๐‘ฃ/๐‘› = ๐‘1 ๐‘2 โ€ฆ ๐‘๐‘š to the transition ๐‘†๐‘˜ โ†’ ๐‘Ž๐‘˜ . ๐‘†๐‘˜+1 ๐›พ๐‘˜+1 โ‡’ ยซ ๐‘Ž1 โ€ฆ ๐‘Ž๐‘˜ ๐‘†๐‘˜+1 ๐›พ๐‘˜+1 ๐›พ๐‘˜ โ€ฆ ๐›พ2 ๐›พ1 state ๐ผ (ยซ๐‘ข๐‘ฃ/๐‘›) or to ACC or to some reducing state. Thus the reducing automaton ๐‘€ โ€ฒ is monotone. where ๐‘Ž1 โ€ฆ ๐‘Ž๐‘˜ = ๐‘ข. Since the grammar ๐บ๐‘€ is reduced, all nonterminals in ๐‘†๐‘˜+1 ๐›พ๐‘˜+1 ๐›พ can be rewritten into some terminal strings, so that ๐‘ข is a prefix of some word from 4.2. State minimality ๐ฟ(๐‘€). We show how to minimize the set of transition, final, and Another possibility is that ๐›ฟ โ€ฒโˆ— (๐ผ (ยซ), ๐‘ข) = RED (๐‘›) for reducing states of a reducing automaton while preserving some reduction sequence ๐‘›. Then ๐ผ (ยซ๐‘ข) = {๐‘‹ โ†’ ๐‘Ž๐‘˜ .}, reduction analysis and, moreover, visited working list where ๐‘‹ = (๐‘Ž๐‘˜ , RED (๐‘›), ๐œ†, RED (๐‘›), ๐œ†). Just replace the last positions in each stage. item ๐‘†๐‘˜ โ†’ ๐‘Ž๐‘˜ . ๐‘†๐‘˜+1 ๐›พ๐‘˜+1 in the list above with the com- Suppose ๐‘€ = (ฮฃ, ยซ, ยป, ๐‘†, ๐‘ 0 , ๐น , ๐‘“ ) is any reducing au- plete item ๐‘†๐‘˜ โ†’ ๐‘Ž๐‘˜ . to get the derivation ๐‘†๐‘˜ โ‡’โˆ—๐บ๐‘€ ยซ ๐‘ข ๐›พ. If tomaton and ๐‘  โˆˆ ๐‘† โˆช ๐น. We call the state ๐‘  reachable if ๐‘Ž๐‘˜ โ‰  ยป, then ๐‘ข is a prefix of some word of the language ๐›ฟ โˆ— (๐‘ 0 , ๐‘ค) = ๐‘  for some ๐‘ค โˆˆ ฮฃโˆ— โ‹… {๐œ†, ยป}. A state that is not ๐ฟ(๐‘€). If ๐‘Ž๐‘˜ = ยป, then ๐›พ โ‡’โˆ—๐บ๐‘€ ๐œ† (otherwise the grammar reachable is called unreachable. ๐บ๐‘€ would generate a word outside ยซฮฃโˆ— ยป) and ๐‘ข โˆˆ ๐ฟ(๐‘€)ยป. Monotony. Suppose the automaton ๐‘€ โ€ฒ moves from Theorem 4.2. An equvalent reducing automaton with the state ๐ผ (ยซ) via the prefix ๐‘ข๐‘ฃ of the word ๐‘คยป to the state only reachable states can be constructed to any reducing RED (๐‘›) โˆˆ ๐น โ€ฒ , |๐‘ฃ| = |๐‘›|, and that the word ๐‘ข consists of automaton. the symbols ๐‘Ž1 , โ€ฆ, ๐‘Ž๐‘˜ โˆˆ ฮฃ and the word ๐‘ฃ consists of the symbols ๐‘Ž๐‘˜+1 , โ€ฆ, ๐‘Ž๐‘™โˆ’1 โˆˆ ฮฃ, ๐‘Ž๐‘™ โˆˆ ฮฃ โˆช {ยป}. As in the proof of Proof. A reducing automaton can be viewed as a finite- prefix correctness, we can now select the items listed in state machine with an alphabet ฮฃ โˆช {ยป}, a set of states ๐‘† โˆช ๐น โˆช {RED }, an initial state ๐‘ 0 , a set of final states ๐น, and a output function, marks in the input word whether and transition function ๐›ฟ. Thus, we can use the construction where to accept, reject or reduce it (and how to reduce it) of a finite-state machine containing only reachable states in the same way as the reducing automaton ๐‘€. We then from the theory of finite-state machines. construct its reduct ๐ดโ€ฒ (with the same behaviour as ๐ด) and finally convert it back to the reducing automaton ๐‘€. Let ๐‘€1 = (ฮฃ, ยซ, ยป, ๐‘†1 , ๐‘ 1 , ๐น1 , ๐‘“1 ) and ๐‘€2 = The Moore machine ๐ด = (๐‘†๐ด , ๐‘ ๐ด , ฮฃ๐ด , ฮ“๐ด , ๐›ฟ๐ด , ๐œ‡๐ด ), (ฮฃ, ยซ, ยป, ๐‘†2 , ๐‘ 2 , ๐น2 , ๐‘“2 ) be reducing automata with where ๐‘†๐ด is a set of states, ๐‘ ๐ด is an initial state, ฮฃ๐ด is the same input alphabet ฮฃ. We define the relation an input alphabet, ฮ“๐ด is an output alphabet, ๐›ฟ๐ด is a tran- โˆผ โІ ๐‘†1 ร— ๐‘†2 in the following way: ๐‘  โˆผ ๐‘  โ€ฒ , iff for each sition function, and ๐œ‡๐ด is an output function, is defined word ๐‘ค โˆˆ ฮฃโˆ— โ‹… {๐œ†, ยป} from ๐›ฟ1โˆ— (๐‘ , ๐‘ค) โˆˆ ๐น1 or ๐›ฟ2โˆ— (๐‘  โ€ฒ , ๐‘ค) โˆˆ ๐น2 it in the following way: follows that ๐›ฟ1โˆ— (๐‘ , ๐‘ค) = ๐›ฟ2โˆ— (๐‘  โ€ฒ , ๐‘ค). If ๐‘€1 = ๐‘€2 , then the relation โˆผ is the equivalence on the set of transition ๐‘†๐ด = ๐‘† โˆช ๐น โˆช {RED } states ๐‘†1 of the automaton ๐‘€1 . We call the transition ๐‘ ๐ด = ๐‘  0 states, that are in the relation โˆผ, stage-equivalent. We ฮฃ๐ด = ฮฃ โˆช {ยป} call reducing automata ๐‘€1 and ๐‘€2 stage-equivalent, if their initial states are stage-equivalent (๐‘ 1 โˆผ ๐‘ 2 ). Two ฮ“๐ด = ๐น โˆช {RED } โˆช {๐‘ 0 } stage-equivalent reducing automata either both accept, ๐›ฟ๐ด = ๐›ฟ both reject, or both reduce any word, in all three cases ๐‘  , if ๐‘  โˆˆ ๐‘† at the same place in the working list, and in the case of ๐œ‡๐ด (๐‘ ) = { 0 reduction also according to the same reduction sequence. ๐‘ , if ๐‘  โˆˆ ๐น โˆช {RED } Thus, any two stage-equivalent reducing automata are For the Moore machine ๐ด defined in this way, we con- obviously strongly equivalent and vice versa. Since the struct its reduct ๐ดโ€ฒ = (๐‘†๐ดโ€ฒ , ๐‘ ๐ดโ€ฒ , ฮฃ๐ดโ€ฒ , ฮ“๐ดโ€ฒ , ๐›ฟ๐ดโ€ฒ , ๐œ‡๐ดโ€ฒ ) by the stage equivalence and strong equivalence name the same construction given, for example, in [5]. phenomenon, we henceforth use only the term strong Finally, we move from the reduct ๐ดโ€ฒ back to the re- equivalence. ducing automaton ๐‘€ โ€ฒ = (ฮฃโ€ฒ , ยซ, ยป, ๐‘† โ€ฒ , ๐‘ 0โ€ฒ , ๐น โ€ฒ , ๐‘“ โ€ฒ ) defined as Let ๐‘€ be any reducing automaton. An automaton ๐‘€ is follows: state minimal if (i) all its states are reachable, and (ii) no its different transition states are equivalent. A reducing ฮฃโ€ฒ = ฮฃ๐ดโ€ฒ โงต {ยป} = ฮฃ๐ด โงต {ยป} = ฮฃ automaton ๐‘€ โ€ฒ is called a reduct of a reducing automaton ๐‘† โ€ฒ = {๐‘  โˆฃ ๐œ‡๐ดโ€ฒ (๐‘ ) = ๐‘ 0 } ๐‘€ if (i) ๐‘€ โ€ฒ is strongly equivalent to ๐‘€, and (ii) ๐‘€ โ€ฒ is state minimal. ๐‘ 0โ€ฒ = ๐‘ ๐ดโ€ฒ = ๐‘ ๐ด = ๐‘ 0 ๐น โ€ฒ = {๐‘  โ€ฒ โˆˆ ๐น โˆฃ โˆƒ๐‘  โˆˆ ๐‘†๐ดโ€ฒ โˆถ ๐‘  โ€ฒ = ๐œ‡๐ดโ€ฒ (๐‘ )} Theorem 4.3. A reduct can be constructed to any reducing automaton. ๐‘ โ€ฒ, if ๐‘  โ€ฒ = ๐›ฟ๐ดโ€ฒ (๐‘ , ๐‘Ž) โˆˆ ๐‘† โ€ฒ ๐‘“ โ€ฒ (๐‘ , ๐‘Ž) = { ๐œ‡๐ดโ€ฒ (๐‘  ), if ๐‘  โ€ฒ = ๐›ฟ๐ดโ€ฒ (๐‘ , ๐‘Ž) โˆ‰ ๐‘† โ€ฒ โ€ฒ Proof. Let ๐‘€ = (ฮฃ, ยซ, ยป, ๐‘†, ๐‘ 0 , ๐น , ๐‘“ ) be a reducing automa- ton. We show how to construct its reduct ๐‘€ โ€ฒ . Now it is not hard to see that the constructed reducing We convert the problem of constructing a reduct of automaton ๐‘€ โ€ฒ is a reduct of the automaton ๐‘€. a reducing automaton to the problem of constructing The strong equivalence of the reducing automata ๐‘€ a reduct of a Moore machine, which is already solved and ๐‘€ โ€ฒ follows from the following facts: 1. The set ๐น โ€ฒ in the theory of finite-state machines and finite-state contains exactly all reachable final or reducing states of transducers. the reducing automaton ๐‘€. 2. The initial state ๐‘ ๐ดโ€ฒ of the We can look at a reducing automaton as a finite-state Moore machine ๐ดโ€ฒ is behaviorally equivalent to the initial machine extended with the ability to reduce a working state ๐‘ ๐ด of the Moore machine ๐ด. Thus, for any word list. The reducing automaton ๐‘€ computes a transition ๐‘ค โˆˆ ฮฃโˆ— โ‹… {๐œ†, ยป} and the reachable reducing state RED (๐‘›) function over a word contained in a working list to de- of the reducing automaton ๐‘€, the following equivalence termine whether and where to accept, reject, or reduce holds: a word, and in the case of reduction, how to reduce it. ๐›ฟ โ€ฒโˆ— (๐‘ 0โ€ฒ , ๐‘ค) = RED (๐‘›) โŸบ Moore machine is also, in principle, a finite-state ma- โˆ— ๐œ‡๐ดโ€ฒ (๐›ฟ๐ดโ€ฒ (๐‘ ๐ดโ€ฒ , ๐‘ค)) = RED (๐‘›) โŸบ chine augmented by marking the input word. It also computes a transition function over an input word and ๐œ‡๐ด (๐›ฟ๐ดโˆ— (๐‘ ๐ด , ๐‘ค)) = RED (๐‘›) โŸบ continuously marks the input word based on its value. ๐›ฟ โˆ— (๐‘ 0 , ๐‘ค) = RED (๐‘›) The reduct ๐ดโ€ฒ of Moore machine ๐ด is a state-minimal machine that marks any word in the same way as the The same equivalences holds if we replace the reducing original machine ๐ด. So the idea is to design a Moore state RED (๐‘›) with the accepting or rejecting states ACC or machine ๐ด for a reducing automaton ๐‘€ which, by its ERR . The state-minimality of the automaton ๐‘€ โ€ฒ follows di- References rectly from the state-minimality of Moore machine ๐ดโ€ฒ and from the way we obtained the automaton ๐‘€ โ€ฒ from [1] M. Prochรกzka, Redukฤnรญ automaty a syntaktickรฉ chy- the machine ๐ดโ€ฒ . The reducing automaton ๐‘€ โ€ฒ is therefore by, Ph.D. thesis, Univerzita Karlova v Praze, fakulta a reduct of the reducing automaton ๐‘€. Matematicko-fyzikรกlnรญ, Praha, 2012. [2] M. Prochรกzka, M. Plรกtek, Redukฤnรญ automaty, mono- We say that reducing automata ๐‘€ and ๐‘€ โ€ฒ with the tonie a redukovanost, ITAT, 2002. same alphabet ฮฃ are isomorphic if there is a one to one [3] P. Janฤar, F. Mrรกz, M. Plรกtek, J. Vogel, Restarting mapping โ„Ž, automata, Lecture Notes in Computer Science 965 โ„Ž โˆถ ๐‘† โˆช ๐น โŸถ ๐‘† โ€ฒ โˆช ๐น โ€ฒ, (1995) 283โ€“292. [4] M. Chytil, Automaty a gramatiky, SNTL Praha, such that Praha, 1984. (i) โ„Ž(๐‘ ) = ๐‘  for any ๐‘  โˆˆ ๐น; [5] E. F. Moore, Gedanken-experiments on sequential machines, Annals of Mathematics studies 34 (1956) (ii) ๐›ฟ(๐‘ , ๐‘Ž) = ๐‘  โ€ฒ โŸบ ๐›ฟ โ€ฒ (โ„Ž(๐‘ ), ๐‘Ž) = โ„Ž(๐‘  โ€ฒ ) for any ๐‘  โˆˆ ๐‘†, 129โ€“153. ๐‘Ž โˆˆ ฮฃ โˆช {ยป}, and ๐‘  โ€ฒ โˆˆ ๐‘† โˆช ๐น. Theorem 4.4. Any two reducts of the same reducing au- tomaton are isomorphic. This theorem can obviously be proved by modifying the proof of the same theorem for Moore machines. Corollary 4.1. No reducing automaton has fewer states than its reduct. As the reduct is strongly equivalent to the reducing automaton for which it is constructed, the construction retains both monotony and prefix-correctness. So, for any monotone reducing automaton we can construct an equivalent monotone reducing automaton, which is both prefix-correct and state-minimal. 5. Conclusion We introduced the reducing automaton and described two of its normalizations. Further, we plan to pro- pose a construction of an equivalent mon-red-automaton with only repeatable reductions. A reduction is repeat- able if ยซ๐‘ฅ๐‘ข๐‘ฆ๐‘ฃ๐‘งยป โ‡’๐‘€ ๐‘ฅ๐‘ฆ๐‘ง implies ยซ๐‘ฅ๐‘ข ๐‘›+1 ๐‘ฆ๐‘ฃ ๐‘›+1 ๐‘งยป โ‡’๐‘€ ยซ๐‘ฅ๐‘ข ๐‘› ๐‘ฆ๐‘ฃ ๐‘› ๐‘งยป for each ๐‘› โˆˆ โ„•0 . Further, we would like to distinguish repeatable reductions reducing regular con- texts (either ๐‘ข or ๐‘ฃ is empty) and linear contexts (neither ๐‘ข nor ๐‘ฃ is empty) and normalize a monotone reducing automaton so that it reduces regular contexts from the right. Acknowledgments This paper was created as part of the Parsing and Syn- tactic Analysis seminar led by Frantiลกek Mrรกz and Martin Plรกtek at the Faculty of Mathematics and Physics of Charles University in Prague.