Non-Regularity of Complete CF(¢,$)-Grammars Martin Plátek1 , František Mráz1,* , Dana Pardubská2 and Martin Procházka1 1 Charles University, Department of Computer Science, Malostranské nám. 25, 118 00 Praha 1, Czech Republic 2 Comenius University in Bratislava, Department of Computer Science, Mlynská Dolina, 84248 Bratislava, Slovakia Abstract We study pumping analysis by reduction represented by complete CF(c| ,$)-grammars and their languages. A complete CF(c| ,$)-grammar generates both a language and its complement. Complete CF(c| ,$)-grammars serve as a tool to study the class of context-free languages that are closed under complement. Recall that the class of context-free grammars is the single class of languages from the Chomsky hierarchy that is not closed under the complement. The pumping reductions used in this paper ensure a correctness- and error-preserving pumping analysis by reduction for each word over its input alphabet. We introduce tests for each pumping reduction, which serve as tests of non-regularity for accepted and rejected languages by corresponding grammars. That can help to develop natural error localization and error recovery techniques for languages defined by complete CF(c| ,$)-grammars. Keywords complete CF(¢,$)-grammar, pure pumping infix, pumping test, non-regularity 1. Introduction rewritten. Instead, at most two continuous segments of the current word are deleted. In addition, we consider This paper is a continuation of the papers [1, 2, 3, 4], the pumping reduction analysis for languages generated inspired also by [5, 6]. We introduce and study com- by the so-called complete grammars. plete context-free grammars with sentinels, mainly their Informally, a complete grammar (with sentinels c| and pumping reductions and pumping tests. These notions $) 𝐺𝐶 is an extended context-free grammar (CFG) with are motivated by the linguistic method called analysis two initial nonterminals 𝑆𝐴 and 𝑆𝑅 . Such grammar has by reduction (here mentioned as reduction analysis); see a finite alphabet Σ of terminals not containing c| and $, [7, 8, 9, 10]. a finite alphabet of nonterminals, and a set of rewriting Reduction analysis is a method for checking the cor- rules of the form 𝑋 → 𝛼, where 𝑋 is a nonterminal rectness of an input word by stepwise rewriting some and 𝛼 is a string of terminals, nonterminals, and sen- part of the current form with a shorter one until we obtain tinels c, | $. The language generated by the grammar a simple word for which we can decide its correctness is the set {c} | · Σ* · {$}. The set of words generated easily. In general, reduction analysis is nondeterministic, from the initial nonterminal 𝑆𝐴 called the accepting lan- and in one step, we can rewrite a substring of a length guage, is a language of the form {c} | · 𝐿 · {$}, where limited by a constant with a shorter string. An input 𝐿 ⊆ Σ* , and the set of words generated from the sec- word is accepted if there is a sequence of reductions such ond initial nonterminal 𝑆𝑅 , called rejecting language, is that the final simple word is from the language. Then, exactly {c} * | · (Σ ∖ 𝐿) · {$}. intermediate words obtained during the analysis are also Pumping reduction analysis corresponds to a complete accepted. Each reduction must be error-preserving; that grammar 𝐺𝐶 when for each pair of terminal words 𝑢, 𝑣 is, no word outside the target language can be rewritten such that 𝑢 can be reduced to 𝑣, it holds that there are into a word from the language. some terminal words 𝑥1 , 𝑥2 , 𝑥3 , 𝑥4 , 𝑥5 , and a nontermi- This paper focuses mainly on a restricted version of nal 𝐴 satisfying 𝑢 = 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 , 𝑣 = 𝑥1 𝑥3 𝑥5 , and the reduction analysis called pumping reduction analysis, 𝑆 ⇒*𝐺 𝑥1 𝐴𝑥5 ⇒*𝐺 𝑥1 𝑥2 𝐴𝑥4 𝑥5 ⇒*𝐺 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 , 𝐶 𝐶 𝐶 which has several additional properties. In each step of where 𝑆 equals 𝑆𝐴 or 𝑆𝑅 . Additionally, there exists a the pumping reduction analysis, the current word is not constant 𝑐 that depends only on grammar 𝐺𝐶 , such that each word of length at least 𝑐 can be reduced to a shorter ITAT’24: Information Technologies – Applications and Theory, Septem- ber 20–24, 2024, Drienica, Slovakia word. * Corresponding author. In general, it is undecidable whether an arbitrary con- $ martin.platek@mff.cuni.cz (M. Plátek); text-free grammar generates a regular language [11]. frantisek.mraz@mff.cuni.cz (F. Mráz); This means that no algorithm can universally determine pardubska@dcs.fmph.uniba.sk (D. Pardubská); if a given CFG produces a regular language. We propose martproc@gmail.com (M. Procházka)  0000-0003-3147-6442 (M. Plátek); 0000-0001-3869-3340 (F. Mráz); to use some tests to test the non-regularity of accepting 0000-0001-9383-8117 (D. Pardubská) and rejecting languages. The main result of the paper © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). says that if a complete grammar (with sentinels c| and $) CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 𝐺𝐶 has a switching pumping test, then its accepting and Note that we have not omitted the sentinels in the rejecting languages are non-regular. pumping infix and pumping reduction. This paper is structured as follows. Section 2 intro- If (c𝑥, | 𝑢1 , 𝐴, 𝑣, 𝑢2 , 𝑦$) is a pumping infix by 𝐺, then duces CF(c,$)-grammars, | pumping infixes and reductions, all words of the form c𝑥𝑢 | 1 𝑣𝑢2 𝑦$, for all integers 𝑖 ≥ 0, 𝑖 𝑖 and complete CF(c,$)-grammars. | Section 3 presents the belong to 𝐿(𝐺). main result. It is followed by a section that discusses Let 𝐺 = (𝑁, Σ ∪ {c, | $}, 𝑆, 𝑅) be a CF(c,$)-grammar, | open problems and future work. 𝑡 be the number of nonterminals of 𝐺, and 𝑘 be the maxi- mal length of the right-hand side of the rules in 𝑅. Let 𝑇 be a derivation tree according to 𝐺. If 𝑇 has more than 𝑘𝑡 2. Basic notions leaves, a path exists from a leaf to the root of 𝑇 such that it contains at least 𝑡 + 1 nodes labeled by nonterminals. Definition 1 (CF(¢,$)-grammars). Let 𝑁 and Σ be As 𝐺 has only 𝑡 nonterminals, at least two nodes on the | $ ∈ two disjoint alphabets, c, / (𝑁 ∪ Σ) and 𝐺 = (𝑁, Σ ∪ path are labeled with the same nonterminal 𝐴. In that {c, | $}, 𝑆, 𝑅) be a context-free grammar generating a lan- case, there is a pumping reduction corresponding to this guage of the form {c} | · 𝐿 · {$}, where 𝐿 ⊆ Σ* , and 𝑆 word. We say that 𝐾𝐺 = 𝑘𝑡 is the grammar number of does not occur in the right-hand side of any rule in 𝑅. We 𝐺. say that 𝐺 is a CF(c,$)-grammar. | The language 𝐿 is the Note that for each word from 𝐿(𝐺) of length greater internal language of 𝐺, and is denoted 𝐿in (𝐺). than 𝐾𝐺 , some pumping infix by 𝐺 must correspond. The closure properties of the class of context-free lan- On the other hand, each word generated by 𝐺 that is not guages imply that for a CF(c,$)-grammar | 𝐺, both lan- pumped is of length at most 𝐾𝐺 . guages 𝐿(𝐺) and 𝐿in (𝐺) are context-free. The added Note that in the above derivation (1), the length of the right sentinel $ facilitates the recognition of languages. words 𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦 is not limited. E.g., if 𝐿 is a deterministic context-free language, then it A pumping reduction 𝑤 ⇝𝑃 (𝐺) 𝑤′ corresponds to can be generated by an LR(1)-grammar. But then, 𝐿 · {$} removing a part of the derivation tree between some and {c} | · 𝐿 · {$} are both generated by simpler LR(0) two nodes 𝑟1 , 𝑟2 labeled with the same nonterminal 𝐴 grammars. The left sentinel c| is included in CF(c,$)- | occurring on a path from the root of the derivation tree grammars for compatibility with RP-automata. The class for 𝑤. of all 𝐿in (𝐺) characterizes the class CFL. 2.2. Complete CF(¢,$)-grammars Definition 3. Let 𝐺𝐶 = (𝑁, Σ ∪ {c, | $}, 𝑆, 𝑅) be a 2.1. Pumping infixes and reductions CF(c,$)-grammar. | Then 𝐺𝐶 is called a complete CF(c,$)- | grammar if Definition 2. Let 𝐺 = (𝑁, Σ ∪ {c, | $}, 𝑆, 𝑅) be a 1. 𝑆 → 𝑆𝐴 | 𝑆𝑅 , where 𝑆𝐴 , 𝑆𝑅 ∈ 𝑁 , are the only CF(c,$)-grammar, | 𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦 be words over Σ, rules in 𝑅 containing the initial nonterminal 𝑆. No |𝑢1 𝑢2 | > 0, and 𝐴 ∈ 𝑁 be a nonterminal. If other rule of 𝐺𝐶 contains 𝑆𝐴 or 𝑆𝑅 in its right- hand side. 𝑆 ⇒* c𝑥𝐴𝑦$ | ⇒* c𝑥𝑢 | * 1 𝐴𝑢2 𝑦$ ⇒ c𝑥𝑢 | 1 𝑣𝑢2 𝑦$ (1) 2. The languages 𝐿(𝐺𝐴 ) and 𝐿(𝐺𝑅 ) generated by we say that (c𝑥, | 𝑢1 , 𝐴, 𝑣, 𝑢2 , 𝑦$) is a pumping infix by the grammars 𝐺𝐴 = (𝑁, Σ ∪ {c, | $}, 𝑆𝐴 , 𝑅) 𝐺, and that c𝑥𝑢| 1 𝑣𝑢2 𝑦$ ⇝𝑃 (𝐺) c𝑥𝑣𝑦$ | is a pumping and 𝐺𝑅 = (𝑁, Σ ∪ {c, | $}, 𝑆𝑅 , 𝑅), respectively, reduction by 𝐺. are disjoint and complementary with respect to * If both 𝑢1 and 𝑢2 are not empty, we say that (c𝑥, | 𝑢1 , {c} | · Σ · {$}. That is, 𝐿(𝐺𝐴 ) ∩ 𝐿(𝐺𝑅 ) = ∅ and * 𝐴, 𝑣, 𝑢2 , 𝑦$) is a two-side pumping infix by 𝐺, and that 𝐿(𝐺𝐶 ) = 𝐿(𝐺𝐴 ) ∪ 𝐿(𝐺𝑅 ) = {c} | · Σ · {$}. c𝑥𝑢 | 1 𝑣𝑢2 𝑦$ ⇝𝑃 (𝐺) c𝑥𝑣𝑦$ | is a two-side pumping reduc- We will denote the grammar as 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ). Further, tion by 𝐺. we will call 𝐺𝐴 and 𝐺𝑅 as accepting and rejecting gram- If 𝑢1 = 𝜆 we say that (c𝑥, | 𝑢1 , 𝐴, 𝑣, 𝑢2 , 𝑦$) is a right- mar of the complete CF(c,$)-grammar | 𝐺𝐶 , respectively. side pumping infix by 𝐺, and c𝑥𝑣𝑢 | 2 𝑦$ ⇝𝑃 (𝐺) c𝑥𝑣𝑦$ | is a right-side pumping reduction by 𝐺. For each word of the form c𝑤$, | where 𝑤 ∈ Σ* , there If 𝑢2 = 𝜆 we say that (c𝑥, | 𝑢1 , 𝐴, 𝑣, 𝑢2 , 𝑦$) is a left- is some derivation tree 𝑇 according to 𝐺𝐶 . The node side pumping infix by 𝐺, and c𝑥𝑢 | 1 𝑣𝑦$ ⇝𝑃 (𝐺) c𝑥𝑣𝑦$ | is under the root of 𝑇 is labeled either 𝑆𝐴 or 𝑆𝑅 . If it is a left-side pumping reduction by 𝐺. 𝑆𝐴 , the word is generated by the accepting grammar 𝐺𝐴 . The relation ⇝*𝑃 (𝐺) is the reflexive and transitive closure Otherwise, it is generated by the rejecting grammar 𝐺𝑅 . of the pumping reduction relation ⇝𝑃 (𝐺) . Moreover, for each word, two or more derivation trees can exist, but all of them are accepting or all of them are rejecting. 3. Non-regularity by complete Clearly, grammar 𝐺𝐴 generates the language 𝐿(𝐺𝐴 ) = 𝑛 𝑚 CF(¢,$)-grammars {c}·𝐿 | 𝐴 ·$, where 𝐿𝐴 = {𝑎 𝑏 | 𝑛 ≥ 𝑚 ≥ 0} and gram- mar 𝐺𝑅 generates the language 𝐿(𝐺𝑅 ) = {c} | · 𝐿𝑅 · $, * If 𝐺𝐶 is a complete CF(c,$)-grammar, | then both 𝐿(𝐺𝐴 ) where 𝐿𝑅 = {𝑎, 𝑏} ∖ 𝐿𝐴 . and 𝐿(𝐺𝑅 ) are context-free languages. How can we de- As we have the following derivation according to 𝐺𝐶 cide whether those languages are regular or non-regular? 𝑆 ⇒𝐺𝐶 𝑆𝐴 ⇒𝐺𝐶 c𝐶$ | ⇒𝐺𝐶 c𝑎𝐷𝑏$| ⇒𝐺𝐶 In this section, we show some properties that help answer c𝑎𝑎𝐶𝑏𝑏$ | ⇒𝐺𝐶 c𝑎𝑎𝑎𝐷𝑏𝑏𝑏$ | ⇒𝐺𝐶 that question. c𝑎𝑎𝑎𝑎𝐶𝑏𝑏𝑏𝑏$ | ⇒𝐺𝐶 c𝑎𝑎𝑎𝑎𝑎𝑏𝑏𝑏𝑏𝑏$, | At first, we introduce a weaker notion of pumping infix that does not contain the information on which the pumping infix (c𝑎𝑎, | 𝑎𝑎, 𝐶, 𝑎𝑏, 𝑏𝑏, 𝑏𝑏$) is a pumping nonterminal is pumped. infix by 𝐺 and by 𝐺 . On the other hand, (c, | 𝑎, 𝑎𝑏, 𝑏, $) 𝐶 𝐴 is a pure pumping infix by 𝐺𝐶 and by 𝐺𝐴 such that Definition 4 (Pure pumping infix/reduction). Let there does not exist any pumping infix by 𝐺𝐶 of the form 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) = (𝑁, Σ ∪ {c, $}, 𝑆, 𝑅) be a com- | ( c, | 𝑎, 𝑋, 𝑎𝑏, 𝑏, $), where 𝑋 is a nonterminal of grammar plete CF(c, | $)-grammar, 𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦 be some words, * * * 𝐺 𝐶. 𝑥 ∈ {c} | · Σ , 𝑢1 , 𝑢2 ∈ Σ , |𝑢1 𝑢2 | > 0, 𝑦 ∈ Σ · {$}. • If 𝑥𝑢𝑛 𝑛 1 𝑣𝑢2 𝑦 is in 𝐿(𝐺𝐴 ), for each integer 𝑛 ≥ 0, Theorem 1. Let 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ), 𝐺𝐶 = (𝑁, Σ ∪ we say that (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure pumping in- {c, | $}, 𝑆, 𝑅) be a complete CF(c,$)-grammar, | and at least fix by 𝐺𝐴 . We say that the pair of words 𝑥𝑢1 𝑣𝑢2 𝑦, one of the following conditions is fulfilled (for some words * * * 𝑥𝑣𝑦 is pure pumping reduction by 𝐺𝐴 and write 𝑥 ∈ {c} | · Σ , 𝑢1 , 𝑣, 𝑢2 ∈ Σ , and 𝑦 ∈ Σ · {$}): 𝑥𝑢1 𝑣𝑢2 𝑦 ≡>𝐺𝐴 𝑥𝑣𝑦. (ARl) The words 𝑢1 and 𝑢2 are nonempty, (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure pumping infix by 𝐺𝐴 , and there are inte- • If 𝑥𝑢𝑛 𝑛 1 𝑣𝑢2 𝑦 is in 𝐿(𝐺𝑅 ), for each integer 𝑛 ≥ 0, gers 𝑖 ≥ 0, 𝑗 > 0 such that we say that (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure pumping in- fix by 𝐺𝑅 . We say that the pair of words 𝑥𝑢1 𝑣𝑢2 𝑦, 𝑥𝑢𝑗·𝑚 1 𝑢𝑖+𝑗·𝑛 1 𝑣𝑢𝑖+𝑗·𝑛 2 𝑦 ∈ 𝐿(𝐺𝑅 ), 𝑥𝑣𝑦 is pure pumping reduction by 𝐺𝑅 . We write 𝑥𝑢1 𝑣𝑢2 𝑦 ≡>𝐺𝑅 𝑥𝑣𝑦. for each 𝑚 > 0, 𝑛 ≥ 0. We say that (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure pumping infix by (ARr) There exists 𝑤 = 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝐴 ) such that 𝐺𝐶 if it is a pure pumping infix by 𝐺𝐴 or by 𝐺𝑅 . We say 𝑢1 and 𝑢2 are nonempty, (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure that the pair of words 𝑥𝑢1 𝑣𝑢2 𝑦, 𝑥𝑣𝑦 is a pure pumping pumping infix by 𝐺𝐴 , and there are integers 𝑖 ≥ 0, reduction by 𝐺𝐶 if it is a pumping reduction by 𝐺𝐴 or by 𝑗 > 0 such that 𝐺𝑅 . We write 𝑥𝑢1 𝑣𝑢2 𝑦 ≡>𝐺𝐶 𝑥𝑣𝑦. 𝑥𝑢𝑖+𝑗·𝑛 1 𝑣𝑢𝑖+𝑗·𝑛 2 𝑢𝑗·𝑚 2 𝑦 ∈ 𝐿(𝐺𝑅 ), Actually, pure pumping infix need not directly cor- respond to any pumping infix by the given complete for each 𝑚 > 0, 𝑛 ≥ 0. CF(c,$)-grammar. | This is illustrated with the following (RAl) There exists 𝑤 = 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝑅 ) such that example. 𝑢1 and 𝑢2 are nonempty, (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure pumping infix by 𝐺𝑅 , and there are integers 𝑖 ≥ 0, Example 1. Let 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ), 𝐺𝐶 = (𝑁, Σ ∪ 𝑗 > 0 such that {c, | $}, 𝑆, 𝑅) be a complete CF(c,$)-grammar, | where 𝑁 = {𝑆, 𝑆𝐴 , 𝑆𝑅 , 𝐴, 𝐵, 𝐶, 𝐷, 𝐸}, Σ = {𝑎, 𝑏}, 𝑆𝐴 and 𝑥𝑢𝑗·𝑚 𝑢𝑖+𝑗·𝑛 𝑣𝑢𝑖+𝑗·𝑛 𝑦 ∈ 𝐿(𝐺𝐴 ), 1 1 2 𝑆𝑅 are the initial nonterminals of the grammars 𝐺𝐴 and 𝐺𝑅 , respectively, and 𝑅 consists of the following rules: for each 𝑚 > 0, 𝑛 ≥ 0. 𝑆 → 𝑆𝐴 | 𝑆𝑅 , (RAr) There exists 𝑤 = 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝑅 ) such that 𝑆𝐴 → c$ | | c𝐴$ | | c𝐴𝐶$ | | c𝐶$, | 𝑢1 and 𝑢2 are nonempty, (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure 𝐴 → 𝑎𝐴 | 𝑎, pumping infix by 𝐺𝑅 , and there are integers 𝑖 ≥ 0, 𝐶 → 𝑎𝐷𝑏 | 𝑎𝑏, 𝑗 > 0 such that 𝐷 → 𝑎𝐶𝑏 | 𝑎𝑏, 𝑆𝑅 → c𝐵$ | | c𝐶𝐵$ | | c𝑏𝑎$ | | c𝐸𝑎𝑏$ | | 𝑥𝑢𝑖+𝑗·𝑛 1 𝑣𝑢𝑖+𝑗·𝑛 2 𝑢𝑗·𝑚 2 𝑦 ∈ 𝐿(𝐺𝐴 ), c𝑎𝑏𝐸$ | | c𝐸𝑎𝑏$, | 𝐵 → 𝑏𝐵 | 𝑏, for each 𝑚 > 0, 𝑛 ≥ 0. 𝐸 → 𝑎𝐸 | 𝑏𝐸 | 𝑎 | 𝑏. Then 𝐿(𝐺𝐴 ) and 𝐿(𝐺𝑅 ) are non-regular languages. Proof: We prove the case (ARl), whose name comes from (ARr’) There exists 𝑤 = 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝐴 ) such that Accept-Reject-left with the meaning that the words of the 𝑢1 and 𝑢2 are nonempty, (𝑥, 𝑢1 , 𝐴, 𝑣, 𝑢2 , 𝑦) is a form 𝑥, 𝑢𝑟1 𝑣𝑢𝑟2 𝑦 are generated by the accepting grammar pure pumping infix by 𝐺𝐴 , and there are integers 𝐺𝐴 and the words of the form 𝑥𝑢𝑗·𝑚 𝑢 𝑖+𝑗·𝑛 𝑣𝑢𝑖+𝑗·𝑛 𝑦 𝑖 ≥ 0, 𝑗 > 0 such that 1 1 2 are generated by the rejecting grammar 𝐺𝑅 , and they 𝑥𝑢𝑖+𝑗·𝑛 𝑣𝑢𝑖+𝑗·𝑛 𝑢𝑗·𝑚 𝑦 ∈ 𝐿(𝐺𝑅 ), contain more copies of 𝑢1 on the left from 𝑣 than the 1 2 2 number of copies of 𝑢2 to the right from 𝑣. Then, the for each 𝑚 > 0, 𝑛 ≥ 0. cases (ARr) (Accept-Reject-right), (RAl) (Reject-Accept- left), and (RAr) (Reject-Accept-right) can be shown anal- (RAl’) There exists 𝑤 = 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝑅 ) such that ogously. 𝑢1 and 𝑢2 are nonempty, (𝑥, 𝑢1 , 𝐴, 𝑣, 𝑢2 , 𝑦) is a Let 𝑤 = 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝐴 ), where 𝑢1 and 𝑢2 are pure pumping infix by 𝐺𝑅 , and there are integers non-empty, (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) be a pure pumping infix by 𝑖 ≥ 0, 𝑗 > 0 such that 𝐺𝐴 , and 𝑖 ≥ 0, 𝑗 > 0 be integers such that 𝑥𝑢𝑗·𝑚 1 𝑢𝑖+𝑗·𝑛 1 𝑣𝑢𝑖+𝑗·𝑛 2 𝑦 ∈ 𝐿(𝐺𝐴 ), 𝑗·𝑚 𝑖+𝑗·𝑛 𝑖+𝑗·𝑛 𝑥𝑢1 𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝑅 ), for each 𝑚 > 0, 𝑛 ≥ 0. for each 𝑚 > 0, 𝑛 ≥ 0. Assume for a contradiction that 𝐿(𝐺𝐴 ) and 𝐿(𝐺𝑅 ) (RAr’) There exists 𝑤 = 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝑅 ) such that are regular languages. According to Myhill-Nerode The- 𝑢1 and 𝑢2 are nonempty, (𝑥, 𝑢1 , 𝐴, 𝑣, 𝑢2 , 𝑦) is a orem [12], a right congruence ≡ with a finite index 𝑟 pure pumping infix by 𝐺𝑅 , and there are integers exists such that language 𝐿(𝐺𝐴 ) is a union of some of 𝑖 ≥ 0, 𝑗 > 0 such that its equivalence classes. 𝑥𝑢𝑖+𝑗·𝑛 𝑣𝑢𝑖+𝑗·𝑛 𝑢𝑗·𝑚 𝑦 ∈ 𝐿(𝐺𝐴 ), {︁ 1 2 2 Consider the set of words 𝑥𝑢𝑖+𝑗·1 1 , 𝑥𝑢 𝑖+𝑗·2 1 , . . . , 𝑖+𝑗·(𝑟+1) }︁ for each 𝑚 > 0, 𝑛 ≥ 0. 𝑥𝑢1 . Obviously, there are 1 ≤ 𝑘1 < 𝑘2 ≤ 𝑟+1 Then 𝐿(𝐺𝐴 ) and 𝐿(𝐺𝑅 ) are not regular languages. such that 𝑥𝑢𝑖+𝑗·𝑘1 1 and 𝑥𝑢𝑖+𝑗·𝑘 1 2 belong to the same equivalence class 𝐶𝑙 of the equivalence ≡. By appending Any of the conditions (ARl), (ARr), (RAl), (RAr), (ARl’), 𝑣𝑢𝑖+𝑗·𝑘 2 1 𝑦 to 𝑥𝑢𝑖+𝑗·𝑘 1 1 , we obtain 𝑥𝑢𝑖+𝑗·𝑘 1 1 𝑣𝑢2𝑖+𝑗·𝑘1 𝑦 ∈ (ARr’), (RAl’), and (RAr’) is sufficient for non-regularity of 𝐿(𝐺𝐴 ), since (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure pumping infix a complete CF(c,$)-grammar. | Now, we examine whether by 𝐺𝐴 . On the other hand, 𝑘2 = 𝑘1 + 𝑚1 for some the previous sufficient conditions for non-regularity are 𝑚1 > 0. According to condition (ARl), by append- also necessary for non-regularity. We start with the def- ing the same word 𝑣𝑢𝑖+𝑗·𝑘 2 1 𝑦 to 𝑥𝑢𝑖+𝑗·𝑘 1 2 , we obtain inition of pumping test sets and a rather technical defi- that 𝑥𝑢1 𝑖+𝑗·𝑘2 𝑣𝑢2 𝑖+𝑗·𝑘1 𝑦 = 𝑥𝑢𝑗·𝑚 1 1 𝑖+𝑗·𝑘1 𝑢 1 𝑣𝑢 𝑖+𝑗·𝑘1 2 𝑦 is nition of preserving and switching tests. Based on these in 𝐿(𝐺𝑅 ). Thus, 𝑥𝑢1 𝑖+𝑗·𝑘1 and 𝑥𝑢1 𝑖+𝑗·𝑘2 cannot be in notions, we get another condition for non-regularity in the same equivalence class 𝐶𝑙. This contradiction im- Theorem 2. plies that the language 𝐿(𝐺𝐴 ) is not regular. Since the If we know that (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure pumping in- class of regular languages is closed under the comple- fix by a grammar 𝐺𝐴 , we have that 𝑥𝑢𝑟1 𝑣𝑢𝑟2 𝑦 is in 𝐿(𝐺𝐴 ), ment and intersection, the language 𝐿(𝐺𝑅 ) must also be for all integers 𝑟 ≥ 0. This could indicate a context-free non-regular. That finishes the proof of this case. □ dependence between the number of copies of 𝑢1 in front As a direct consequence of Theorem 1, we get the of the factor 𝑣 and the number of copies of 𝑢2 after the analogous statement for (non-pure) pumping infixes. factor 𝑣. However, it is still possible that all words of the form 𝑥𝑢𝑚 1 𝑣𝑢2 𝑦 belong to 𝐿(𝐺𝐴 ). Therefore, in the 𝑛 Corollary 1. Let 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) = (𝑁, Σ ∪ {c, | $}, following, we define a switching test that should detect 𝑆, 𝑅) be a complete CF(c,$)-grammar, | and at least one the situation in which there is a dependence between the of the following conditions is fulfilled (for some words number of occurrences of 𝑢1 before and the number of 𝑥 ∈ {c} * * * | · Σ , 𝑢1 , 𝑣, 𝑢2 ∈ Σ , 𝑦 ∈ Σ · {$}, and a non- occurrences of 𝑢2 after 𝑣. terminal 𝐴 ∈ 𝑁 ): We define two types of test sets. The first one with a subscript ‘left’ should detect the situation when the (ARl’) There exists 𝑤 = 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝐴 ) such that number of copies of 𝑢1 can be “pumped” more times 𝑢1 and 𝑢2 are nonempty, (𝑥, 𝑢1 , 𝐴, 𝑣, 𝑢2 , 𝑦) is than the number of copies of 𝑢2 . A symmetric test set a pumping infix by 𝐺𝐴 , and there are integers should detect the situation where the number of copies 𝑖 ≥ 0, 𝑗 > 0 such that of 𝑢2 can be “pumped” more times than the number of 𝑥𝑢𝑗·𝑚 𝑢1𝑖+𝑗·𝑛 𝑣𝑢𝑖+𝑗·𝑛 𝑦 ∈ 𝐿(𝐺𝑅 ), copies of 𝑢1 . 1 2 Let us introduce the ‘left’ test set. A pumping may for each 𝑚 > 0, 𝑛 ≥ 0. require pumping several, say 𝑗, copies of 𝑢1 and 𝑢2 simultaneously in one step. Furthermore, several, say Theorem 2. Let 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) be a complete 𝑖, copies of 𝑢1 and symmetrically of 𝑢2 could be pro- CF(c, | $)-grammar, and suppose there exists a switching duced together with the prefix 𝑥 and suffix 𝑦, respectively. pumping test set by 𝐺𝐶 . Then 𝐿(𝐺𝐴 ), and 𝐿(𝐺𝑅 ) are Hence, the left test set below contains words of the form non-regular languages. 𝑥𝑢𝑖1 𝑢𝑗·𝑚 1 𝑣𝑢2 𝑢2 𝑦, for all 𝑚 > 0 and 𝑛 ≥ 0. 𝑢𝑗·𝑛 𝑗·𝑛 𝑖 1 In order to restrict the set of pumping test sets, we Proof: We prove that both 𝐿(𝐺𝐴 ) and 𝐿(𝐺𝑅 ) are non- require that 𝑖 is not greater than 𝐾𝐺𝐶 + 2𝑡, and 𝑗 is not regular languages when the condition (AR) holds. The greater than 2𝑡, where 𝑡 denotes the number of nonter- other cases can be shown similarly. minals of 𝐺𝐶 . Let 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) be a complete CF(c,$)-gram- | mar with 𝑡 nonterminals, 𝜄 = (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦), where Definition 5 (Pumping test set). Let |𝑢1 | > 0, |𝑢2 | > 0, be a pure pumping infix by 𝐺𝐶 and 𝑖, 𝑗 be integers such that 𝑖 ≤ 𝐾𝐺𝐶 + 2𝑡 and 0 < 𝑗 ≤ 2𝑡, 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) and be a complete CF(c,| $)-grammar with 𝑡 nonterminals. Let 𝜏 = 𝑇 𝑝(𝐺𝐶 , 𝜄, 𝑖, 𝑗) = [𝜄, 𝑇left (𝜄, 𝑖, 𝑗), 𝑇right (𝜄, 𝑖, 𝑗)] 𝜄 = (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦), where |𝑢1 | > 0, |𝑢2 | > 0, be a pure pumping infix by 𝐺𝐶 and 𝑖, 𝑗 be integers such that be a switching pumping test set such that 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝑖 ≤ 𝐾𝐺𝐶 + 2𝑡, and 0 < 𝑗 ≤ 2𝑡: 𝐿(𝐺𝐴 ) and at least one of the following conditions is true: 1. The set 𝑇left (𝜄, 𝑖, 𝑗) = 𝑥𝑢𝑖1 𝑢𝑗·𝑚 𝑢𝑗·𝑛 𝑗·𝑛 𝑖 {︀ 1 1 𝑣𝑢2 𝑢2 𝑦 | 𝑚 > 0, 𝑛 ≥ 0 is called the left test set of 𝜄. 1. 𝑇left (𝜄, 𝑖, 𝑗) ⊆ 𝐿(𝐺𝑅 ), or }︀ 2. 𝑇right (𝜄, 𝑖, 𝑗) ⊆ 𝐿(𝐺𝑅 ). 2. The set 𝑇right (𝜄, 𝑖, 𝑗) = 𝑥𝑢𝑖1 𝑢𝑗·𝑛 𝑗·𝑛 𝑗·𝑚 𝑖 {︀ 1 𝑣𝑢2 𝑢2 𝑢2 𝑦 | 𝑚 > 0, 𝑛 ≥ 0 } is called the right test set of 𝜄. As 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝐴 ) and 𝜄 = (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) is a pure pumping infix by 𝐺𝐶 , 𝜄 is a pure pumping infix by We say that the triple 𝐺𝐴 . In case 1, the condition (ARl) of Theorem 1 is satisfied. 𝑇 𝑝(𝐺𝐶 , 𝜄, 𝑖, 𝑗) = [𝜄, 𝑇left (𝜄, 𝑖, 𝑗), 𝑇right (𝜄, 𝑖, 𝑗)] Hence, according to Theorem 1, both languages 𝐿(𝐺𝐴 ) and 𝐿(𝐺𝑅 ) are not regular. is a pumping test set by 𝐺𝐶 . In case 2, the condition (ARr) of Theorem 1 is satisfied. Definition 6 (Preserving/switching test set). Let Hence, according to Theorem 1, both languages 𝐿(𝐺𝐴 ) 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) be a complete CF(c, | $)-grammar with and 𝐿(𝐺𝑅 ) are not regular. 𝑡 nonterminals, 𝜄 = (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦), where |𝑢1 | > 0, Similarly, we can show the case where the condition |𝑢2 | > 0, be a pure pumping infix by 𝐺𝐶 , and 𝑖, 𝑗 be (RA) holds. □ integers such that 𝑖 ≤ 𝐾𝐺𝐶 + 2𝑡 and 0 < 𝑗 ≤ 2𝑡. We say that the pumping test set 𝜏 = 𝑇 𝑝(𝐺𝐶 , 𝜄, 𝑖, 𝑗) = [𝜄, 𝑇left (𝜄, 𝑖, 𝑗), 𝑇right (𝜄, 𝑖, 𝑗)] is preserving if 4. Open problems and future work (Aaa) 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝐴 ) and both sets 𝑇left (𝜄, 𝑖, 𝑗) and Many open problems are left related to our original effort 𝑇right (𝜄, 𝑖, 𝑗) are subsets of 𝐿(𝐺𝐴 ); or to compare regularity and non-regularity connected with complete CF(c,| $)-grammars. This section gives a partial (Rrr) 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝑅 ) and both sets 𝑇left (𝜄, 𝑖, 𝑗) and idea of our plans for the future. In general, we will try 𝑇right (𝜄, 𝑖, 𝑗) are subsets of 𝐿(𝐺𝑅 ). to solve the decidability questions connected with (non- We say that 𝜏 is switching if one of the following two )regularity of complete CF(c, | $)-grammars. cases is true: Test languages. Let 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) be a complete CF(c,| $)-grammar. Let 𝑢1 and 𝑢2 be nonempty words, (AR) 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝐴 ) and [ 𝑇left (𝜄, 𝑖, 𝑗) ⊆ 𝐿(𝐺𝑅 ) and 𝜄 = (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦) be a pure pumping infix by 𝐺𝐶 . or 𝑇right (𝜄, 𝑖, 𝑗) ⊆ 𝐿(𝐺𝑅 ) ], We say that the languages 1 𝑣𝑢2 𝑦 | 𝑛, 𝑚 ≥ 0} and (RA) 𝑥𝑢1 𝑣𝑢2 𝑦 ∈ 𝐿(𝐺𝑅 ) and [ 𝑇left (𝜄, 𝑖, 𝑗) ⊆ 𝐿(𝐺𝐴 ) 𝐿(𝐺𝐴 ) ∩ {𝑥𝑢𝑛 𝑚 or 𝑇right (𝜄, 𝑖, 𝑗) ⊆ 𝐿(𝐺𝐴 ) ]. 𝐿(𝐺𝑅 ) ∩ {𝑥𝑢1 𝑣𝑢𝑚 𝑛 2 𝑦 | 𝑛, 𝑚 ≥ 0} The following theorem is a direct consequence of The- are test languages of 𝜄. We also say that the languages orem 1 and the definition of the switching pumping test are test languages of 𝐺𝐶 . set. Concerning the test languages, we have several con- jectures. Conjecture 1. Let 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) be a complete M. Holeňa, R. Jajcay, T. Jajcayová, M. Lexa, F. Mráz, CF(c,| $)-grammar. Let 𝜄 = (𝑥, 𝑢1 , 𝑣, 𝑢2 , 𝑦), where D. Pardubská, M. Plátek (Eds.), Proceedings of the |𝑢1 | > 0 and |𝑢2 | > 0, be a pure pumping infix by 𝐺𝐶 . 23rd Conference Information Technologies – Ap- Let all pumping tests sets 𝑇 𝑝(𝐺𝐶 , 𝜄, 𝑖, 𝑗) = plications and Theory (ITAT 2023), volume 3498 of [𝜄, 𝑇left (𝜄, 𝑖, 𝑗), 𝑇right (𝜄, 𝑖, 𝑗)] of 𝜄, for all integers 𝑖, 𝑗, such CEUR Workshop Proceedings, CEUR-WS.org, 2023, that 𝑖 ≤ 𝐾𝐺𝐶 + 2𝑡 and 0 < 𝑗 ≤ 2𝑡, are preserving. Then, pp. 110–120. URL: https://ceur-ws.org/Vol-3498/ the test languages of 𝜄 are regular. paper14.pdf. [5] P. Jančar, J. Šíma, The simplest non-regular de- Conjecture 2. Let 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) be a complete terministic context-free language, in: F. Bonchi, CF(c,| $)-grammar, and there does not exist any switch- S. J. Puglisi (Eds.), 46th International Symposium ing pumping test by 𝐺𝐶 . Then, each test language of 𝐺𝐶 on Mathematical Foundations of Computer Sci- is regular. ence, MFCS 2021, August 23–27, 2021, Tallinn, Es- tonia, volume 202 of LIPIcs, Schloss Dagstuhl – Conjecture 3. Let 𝐺𝐶 = (𝐺𝐴 , 𝐺𝑅 ) be a complete Leibniz-Zentrum für Informatik, 2021, pp. 63:1– CF(c,| $)-grammar. Then 𝐿(𝐺𝐴 ) and 𝐿(𝐺𝑅 ) are regu- 63:18. doi:10.4230/LIPIcs.MFCS.2021.63. lar if and only if all test languages of 𝐺𝐶 are regular. [6] J. Šíma, M. Plátek, One analog neuron cannot rec- ognize deterministic context-free languages, in: Remark. Note that the notions of switching test and T. Gedeon, K. W. Wong, M. Lee (Eds.), Neural Infor- preserving test give an opportunity to introduce degrees mation Processing – 26th International Conference, of regularity and degrees for non-regularity of complete ICONIP 2019, Sydney, NSW, Australia, December CF(c,| $)-grammars. That will also be one direction of 12–15, 2019, Proceedings, Part III, volume 11955 of our efforts in the future. Lecture Notes in Computer Science, Springer, 2019, pp. 77–89. doi:10.1007/978-3-030-36718-3_ References 7. [7] M. Lopatková, M. Plátek, P. Sgall, Towards a formal [1] F. Mráz, D. Pardubská, M. Plátek, J. Šíma, Pump- model for functional generative description: Anal- ing deterministic monotone restarting automata ysis by reduction and restarting automata, Prague and DCFL, in: M. Holeňa, T. Horváth, A. Kele- Bull. Math. Linguistics 87 (2007) 7–26. URL: http: menová, F. Mráz, D. Pardubská, M. Plátek, P. Sosík //ufal.mff.cuni.cz/pbml/87/lopatkova-et-al.pdf. (Eds.), Proceedings of the 20th Conference Infor- [8] P. Jančar, F. Mráz, M. Plátek, J. Vogel, Restarting mation Technologies – Applications and Theory automata, in: H. Reichel (Ed.), Fundamentals of (ITAT 2020), volume 2718 of CEUR Workshop Pro- Computation Theory, FCT ’95, volume 965 of Lec- ceedings, CEUR-WS.org, 2020, pp. 51–58. URL: http: ture Notes in Computer Science, Springer, 1995, pp. //ceur-ws.org/Vol-2718/paper13.pdf. 283–292. doi:10.1007/3-540-60249-6_60. [2] M. Plátek, F. Mráz, D. Pardubská, D. Průša, J. Šíma, [9] P. Jančar, F. Mráz, M. Plátek, J. Vogel, On mono- On separations of LR(0)-grammars by two types tonic automata with a restart operation, J. Au- of pumping patterns, in: B. Brejová, L. Cien- tom. Lang. Comb. 4 (1999) 287–311. doi:10.25596/ cialová, M. Holeňa, F. Mráz, D. Pardubská, M. Plátek, jalc-1999-287. T. Vinař (Eds.), Proceedings of the 21st Conference [10] F. Otto, Restarting automata, in: Z. Ésik, Information Technologies – Applications and The- C. Martín-Vide, V. Mitrana (Eds.), Recent Ad- ory (ITAT 2021), volume 2962 of CEUR Workshop vances in Formal Languages and Applications, Proceedings, CEUR-WS.org, 2021, pp. 140–146. URL: volume 25 of Studies in Computational Intelli- http://ceur-ws.org/Vol-2962/paper05.pdf. gence, Springer, 2006, pp. 269–303. URL: https://doi. [3] M. Plátek, F. Mráz, D. Pardubská, D. Průša, On org/10.1007/978-3-540-33461-3_11. doi:10.1007/ pumping RP-automata controlled by complete 978-3-540-33461-3_11. LRG(¢, $)-grammars, in: L. Ciencialová, M. Holeňa, [11] J. O. Shallit, A Second Course in Formal Languages R. Jajcay, T. Jajcayová, F. Mráz, D. Pardubská, and Automata Theory, Cambridge University Press, M. Plátek (Eds.), Proceedings of the 22nd Confer- 2008. ence Information Technologies – Applications and [12] J. Hopcroft, J. Ullman, Introduction to Automata Theory (ITAT 2022), volume 3226 of CEUR Work- Theory, Languages, and Computation, Addison- shop Proceedings, CEUR-WS.org, 2022, pp. 111–121. Wesley, N. Reading, MA, 1980. URL: https://ceur-ws.org/Vol-3226/paper13.pdf. [4] F. Mráz, M. Plátek, D. Pardubská, D. Průša, One- side pumping and two-side pumping by complete CF(¢, $)-grammars, in: B. Brejová, L. Ciencialová,