Teaching Prolog through Grammars David S. Warren*,† Stony Brook University, New York, USA Abstract This short paper describes how one might teach Prolog to undergraduate students using grammars as the motivating examples. Grammars for natural and formal languages provide great examples with which students are already familiar. They require only positive rules and display a fully declarative framework. This paper shows that perhaps the original motivation of natural language processing for the invention of Prolog was not happenstance. 1. Introduction This informal short paper describes the idea of teaching Prolog using context-free grammars. Grammars are a perfect framework to display and explore the declarativity and generality of the Prolog language. Grammars underlay the original motivation for the invention of Prolog by Alain Colmerauer. Context- free grammars (CFGs) are an independently formulated, studied, and understood formal mathematical structure, with much known about them and their processing [1]. So, there is an independent body of knowledge to which we can compare what Prolog does with grammars. We can compare what Prolog does to what is done by the best independently, specially developed, algorithms. Also, from the point of view of Prolog, grammars are positive programs so they don’t raise the complicated issues of negation. Students are familiar with and understand the idea of a (formal) language so they have introspective access to data that they can use to determine if a grammar is correctly representing a known language. Also grammars fundamentally involve inductive definitions, where phrases are embedded in other phrases. Students intuitively understand this property of language and can use that understanding as an entry to understanding induction and recursion in programming. Also grammars are, in some intuitive sense, “Datalog complete” in that many kinds of complexities of Datalog programs can be exemplified, and made visual, in grammars. Thus, grammars are a good focus of study to learn about what Datalog (and Prolog) programs can do. And when we use lists to represent input strings, students are introduced to Prolog structures. Also building parse trees is a natural grammatical problem, and that introduces the more general Prolog data structure. Finally, the study of grammars in Prolog highlights the importance of tabled evaluation. Many grammars for familiar languages are naturally left recursive. Native Prolog does not process them correctly. But with tabled evaluation, Prolog does. And even when native Prolog evaluation does terminate, it is often exponential where tabled evaluation is guaranteed to be polynomial. And well- known general context-free recognition algorithms, such as Earley and CKY, are the algorithms carried out by tabled evaluation. The student can see that in fact, tabled evaluation is the more natural and complete Prolog evaluation strategy, with native Prolog evaluation being best thought of as an optimization that in some cases improves time and/or space performance. There is a long history of good ideas about how to teach Prolog, recent ones including those presented in this workship, and some presented in an edited collection [2]. We hope to contribute the approach described in this paper to the mix. PEG 2024: 2nd Workshop on Prolog Education, October, 2024, Dallas, USA. $ warren@cs.stonybrook.edu (D. S. Warren)  0000-0001-7567-8156 (D. S. Warren) © 2024 Copyright for this paper by its author. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Initial Considerations Prolog is noted for its natural handling of grammars, so natural in fact that it includes a special syntax for writing grammars, the “Definite Clause Grammar” (DCG) syntax. We do not use the DCG syntax in our initial presentation and development of grammars in Prolog, but instead use native Prolog syntax. The DCG notation, while convenient for knowledgeable users, can obscure the basic workings of Prolog for beginners. And since we are trying to teach Prolog through grammars, the DCG notation is just a distraction or source of confusion. At the very end of teaching grammars, one could introduce the students to the DCG notation when their relationship to Prolog is clearly understood. Also, normally grammars in Prolog use lists to represent the strings in a language. In fact difference lists are used. But for beginning students lists are relatively complex data structures, and their repre- sentation of difference lists, with their rather sophisticated use of unification to efficiently implement concatenation, is not so easy to understand. So we avoid the list representation of strings and rather use a Datalog representation [3]. Given a string, we number the spaces between the symbols in the string and store 3-ary facts to represent the words in the various positions in the string. For example, the string consisting of the words “mary kicked the ball” is represented by four Prolog database facts: word(0,mary,1). word(1,kicked,2). word(2,the,3). word(3,ball,4). In this way, a subsequence of an input sequence of symbols can be represented by the pair of position integers that begin and end the subsequence. With this representation the representation of grammars in Prolog is in fact in the Datalog sublanguage. This treatment allows students to build an understand- ing of the declarative meaning and computational characteristics of Prolog before undertaking the complexity of complex data structures and structure unification. Then, after grammars in Datalog are well understood, students can be introduced to lists and the power of structure unification and see how the same Prolog grammar can be used for generation as well as recognition. And finally, we use tabled evaluation as the default evaluation strategy for computing with grammars. We want our development to emphasize the declarative aspects of grammars, that they are a general way to inductively define languages, i.e., sets of sequences of symbols. One needn’t think about an algorithm that recognizes whether a sequence is in the language defined by a grammar in order to understand what sequences are indeed in the language. But if we use Prolog’s native evaluation strategy we have to explain why Prolog does not process left-recursive grammars correctly, but will loop infinitely on some strings. So, either we make the seemingly arbitrary (and in fact, unnecessary and awkward) restriction to grammars that do not contain left-recursion, or we use the stronger evaluation strategy of tabling. We could, of course, begin with an explanation of Prolog’s native strategy and show why the non- left-recursion restriction is necessary. However, that would intertwine the declarative understanding of grammars with the computational limitations of Prolog’s native strategy, in the end potentially confusing the students. So we strongly suggest using the tabled evaluation strategy when introducing Prolog to students using grammars. Native Prolog evaluation can come later. 3. The Development In this section we provide an outline of a sequence of topics and exercises one might use to introduce Prolog using grammars. It is important to use a very few motivating example languages throughout the development. The example languages should be already relatively familiar to students, so they can draw on their intuitions to understand the various concepts. I think perhaps the best general language is the students’ natural language. I would use English. Of course English, and any natural language, is much too big and complex to be mostly covered in a single unit. So the grammar would be highly simplified covering only a small, formalized fragment of the language. For knowledgeable computer science students a simple programming language, or fragment of a more complex programming language, might be appropriate. Even a mathematical language, like the language of algebraic expressions or equations, might be appropriate. We will first use English in this presentation and later introduce simple arithmetic expressions. We start with the introduction of the idea of a CFG using the example grammar. We use the usual notation of CFGs. The beginning of an example grammar for English might be: sentence --> noun_phrase verb_phrase noun_phrase --> proper_noun noun_phrase --> determiner noun verb_phrase --> intransitive_verb verb_phrase --> transitive_verb noun_phrase verb_phrase --> verb_phrase prepositional_phrase prepositional_phrase --> preposition noun_phrase proper_noun --> john | maria | manuel determiner --> the | a | every noun --> man | woman | student | dog | park | telescope noun --> noun prepositional_phrase intransitive_verb --> walks | talks transitive_verb --> loves | hits | walks | sees preposition --> in | with | to Depending on student backgrounds, one would describe how the grammar generates sentences: defining terminals and nonterminals, how rules replace in symbol sequences nonterminals by other symbol sequences, and sentences are such generated sequences of terminals starting with the start symbol. An example sentence in this grammar is “A man sees a student in the park with a telescope.” It would be represented in the Prolog database as: word(0,a,1). word(1,man,2). word(2,sees,3). word(3,a,4). word(4,student,5). word(5,in,6). word(6,the,7). word(7,park,8). word(8,with,9). word(9,a,10). word(10,telescope,11). Possible questions at this point: 1. How many sentences are in this language? 2. What are the noun phrases in this sentence? Give the pairs of integers that are the endpoint of the noun phrase subsequences. Note that some noun phrases are contained in other noun phrases. Note that each nonterminal determines a set of finite sequences of terminal symbols: those generated from the rules by starting with that single nonterminal as the start symbol. Next we introduce Prolog rules. Prolog defines relations, and here we want to define a relation consisting of pairs of integers. For each nonterminal we want to define a relation that contains the pair ⟨𝑖, 𝑗⟩ if and only if the input string from 𝑖 to 𝑗 can be generated by that nonterminal. Example rules would be: noun(I,J) :- word(I,man,J). .. noun_phrase(I,J) :- determiner(I,K), noun(K,J). .. And so on. We get one rule for each grammar rule, and explain the grammar vertical bar (or) symbol and what to do about it in Prolog. I would probably say it’s shorthand for multiple simpler rules, rather than introduce Prolog’s disjunction at this point. Next, we talk about Prolog’s evaluation strategy of top-down backward chaining with tabling (or memo-ization), which is done in XSB Prolog by adding :- auto\_table command to indicate tabling. Then we can pose the query :- sentence(0,10), which asks whether the 10-word sentence represented by the word/3 facts above is indeed a sentence generated by our grammar. After experi- mentation, changing the word/2 facts, we can modify the Prolog program by adding write statements at the end of the Prolog rules that define the nonterminal noun_phrase to print out the nonterminal and endpoints of recognized noun phrases. We can run this revised program see if Prolog finds the same noun phrases we do, and we can discuss the differences, if any. If we have available a bottom-up evaluator (or XSB’s subsumptive table indexing capability by for each nonterminal NT, :- table NT/2 as subsumptive,index(0).), we can run the program bottom up. We can start with a diagram that consists of the input sentence as a labeled chain graph, the nodes are integers, and the words are labels on the appropriate edges. We draw this labeled graph. Then for each new fact added by the bottom-up evaluation, we add a new edge, labeled by the nonterminal symbol and connecting the integer nodes of the added fact. This algorithm can be seen to be the CYK algorithm (originally by Itiroo Sakai [4] and rediscovered by each of Cocke, Younger, Kasami, and Schwartz). We can do the same for the original tabled evaluation of the Prolog program (seeing the added edges by adding write statements at the ends of clauses). This expanded graph is called a “chart”, and this is known as recognition using the chart parsing strategy. If the grammar is in Chomsky form (at most two symbols on the right-hand-side of any rule), then one gets the Earley recognition algorithm [5] for that grammar. 4. Possible Lectures and/or Exercises In the following we describe other aspects and uses of grammars that can be used as ways to explore Prolog. 4.1. Parse Trees We can add an argument to the nonterminal predicates (say as a new first argument) that constructs the parse tree of the string recognized. For example, we would define the rule: sentence(sentence(S,V),I,J) :- noun_phrase(S,I,K), verb_phrase(V,K,J). We can then experiment with generating parse trees for various sentences (and phrases). This introduces Prolog terms in a natural way, since terms are naturally the right kind of tree. It also introduces multiple answers when parse trees for ambiguous sentences are generated. This is a good point to discuss nondeterminism, and how it arises and is treated by Prolog. 4.2. Number Agreement Constraints in English We could explore refinements of our, admittedly very simple, grammar for English. We might elaborate it by adding plural nouns and third person plural verb forms, and then adding additional arguments to the necessary nonterminals to ensure number agreement. By adding nouns with the same singular and plural forms (e.g., “sheep”), we can show the use of unification to infer the number of some particular uses of those nouns, e.g., “the sheep walks” marks this subject as singular from the third-person verb. 4.3. Pronouns, Case, and Antecedents We could continue with the English grammar to add more complex elaborations, such as a treatment of pronouns of the appropriate case, and even finding antecedents to pronouns by accumulating a list of possible antecedent noun phrases and then choosing one of the appropriate gender from the accumulated list to assign as the coreferent of a later pronoun. This, however, would take us further into the innards of Prolog and its list processing and is probably best left for later, if at all. 4.4. Exploring an Expression Grammar An important grammar elaboration is the association of some semantics to the strings of its language. While much interesting work has been done with English grammars to this end, these concepts are probably easier for students to understand in a more formal, mathematical setting. So, to discuss issues of language semantics, my choice would be to introduce a new grammar to explore these aspects of grammar processing. I would introduce a simple arithmetic expression grammar, such as: expr --> expr + term expr --> term term --> term * primary term --> primary primary --> primary --> primary --> ( expression ) This is again left-recursive, so we need tabling to get a complete recognition strategy. With this grammar we can discuss the grammatical concepts of operator priorities and association, with overriding with the use of parentheses. 4.4.1. Semantics With this grammar we can begin to discuss the concepts of the semantics, or meaning, of phrases, and the translation of phrases into other languages. It is natural to add an argument to each nonterminal of this expression grammar to contain the computed numeric value of the recognized expression, the “meaning” of the expression. For this we must introduce Prolog’s treatment of arithmetic and the is/2 builtin predicate. We would constrain the expressions to be evaluated to have only integer leaves. (We might extend this to handle atom constants as variable names by adding a parameter of a variable-value mapping and use it to retrieve the values of the variables as identified by atom leaves. This would require the introduction of at least simple lists and their use.) 4.4.2. Translation As an example of translation, we could add an extra argument of a sequence (list) of 3-operand intermediate code instructions that when executed by an appropriate machine would evaluate the input expression. This might be of interest to computer science students interested in compilers. 4.4.3. Interpretation A bit further afield, but perhaps interesting to more advance computer science students, is an interpreter for a simple procedural programming language. The programming language grammar would generate an abstract syntax tree, which would be used by an interpreter to carry out the program provided an initial state, or perhaps a sequence of inputs. 5. Data Structures This all raises the question of how and when to introduce data structures to Prolog programmers. I would introduce general terms by adding an argument to the nonterminal predicates of a recognizer to build the parse tree, as suggested above. This is a very natural thing to do for grammars and it introduces terms in their full generality. After that I would introduce lists as a special form of term expression, a right-linear tree with the ./2 structure symbol and the [] constant as the final tail (properly explained). I would give them the library list predicates of member/2 (and maybe append/3) and give them simple exercises to use them in the context of grammars. For example, some of the suggested possible elaborations above could be done in this way. After students have some simple familiarity with using lists, I would then introduce the append/3 predicate and its definition by using it to go from a parse tree back to the string that it generates (or generates it). This problem can be used to explore complex uses of append/3 in its multiple modes. So first take the program that recognizes and constructs a parse tree, and eliminate the two input string arguments. The predicates of this program have just one argument and simply tests whether an input parse tree is indeed a parse tree for the nonterminal of the grammar. Then add one new argument that is the list of terminals generated by the associated parse tree. Here one uses append/3 to put together the strings associated with the component (sub) parse trees. So now one can give a valid parse tree and get back a list of terminals that it would generate. Note one has to be careful that the append/3 calls all terminate in the modes in which they will be called. This can be pointed out to students and used to introduce the idea of modes and their importance in keeping computations finite. Then one can think about running this parse-tree-to-string backwards, giving it a string (list) and generating a parse tree. This will assuredly break the mode requirements of the existing order of the append calls in the previous definition. So we reorder the append calls so they all comply with the finitary requirements for the modes. Now this program can be run given a string and generate a parse tree. This is another parser with the input representation of a list rather than as a chain graph in the database. At this point it may be appropriate to introduce the difference list input string representation of Prolog’s DCGs. Rather than initially explaining difference lists in their full generality, I would explain the two arguments as list of tokens passed into a predicate and the remaining list of tokens passed out of the predicate after the matched sublist has been recognized. This, I think, accords better with programming students understanding of parameters. Once it is seen how this works, then we might try to explain this as an example of the general difference list data structure, and explain how it allows concatenation to be done with simple unification. That’s why the append/3 calls that we saw above are no longer necessary. And we can do sentence recognition and generation with the same program. At this point, we might introduce and explain the DCG notation. 5.1. More with DCGs There are, of course, many more ideas we could explore using grammars and many more exercises to work through. For example, we could point out that the same DCG can use either the word/3 input string representation or the difference list representation; only the definition of (the invisible) ’C’/3 predicate needs to be changed. In fact, DCG’s can be used for defining quite general state-changing algorithms, the two added arguments being the state at entry to the predicate and the state at exit from the predicate. The “terminals” are parameters to the ’C’/3 predicate that can describe state changes. We might also note that we could extend the word/3 input representation to define a general labeled graph, not just a chain as we have done here. We could then interpret that graph as a finite state automaton with a single start state and single final accepting state. With this interpretation our grammar recognizer actually tests for intersection of the two languages: the context-free language of the DCG and the finite-state language of the automaton. Depending on the sophistication of our students, we might want to explore the computational complexity of recognition with various grammars. The following simple grammar has interesting complexity properties. It can be coded as a DCG and then explored by students generating and running example inputs. They can get a sense of complexity by generating longer and longer input strings to recognize and seeing the time it takes to recognize each. The grammar: A --> a A A A --> generates all nonempty sequences of “a”. It is not left recursive so recognition does terminate under the native Prolog evaluation strategy. However, it is exponential. With tabled evaluation, it is polynomial, cubic in fact (when the word/3 input representation is used). We might also compare the tabled evaluation of recognition for the following two grammars for all strings consisting of “a”s, one left recursive and the second right recursive: A --> A a A --> and B --> a B B --> One can build the chart for an input string of “a”s and see how the recognition complexity differs for the two grammars. The first is linear and the second is quadratic. Another interesting grammar to consider with tabled evaluation is: A --> A a A --> A b A --> It generates all sequences of “a”s and “b”s. It is left recursive, so terminates only when tabled evaluation is used. This example shows that tabled evaluation requires “full” coroutining, since the first and second rules must be applied alternately to recognize strings with both “a”s and “b”s. There is no top-down stack-based scheduling that will allow this grammar to be properly processed. 6. Conclusion We enumerate several advantages of introducing Prolog to students through grammars. 1. Language is a familiar domain: students come with strong intuitions that can be used to advantage in the development. 2. The problems are intrinsically interesting. Precise, formal specification of infinite sets of interest- ing sequences and problems of recognition and parsing are natural and interesting. 3. Recursion (and induction) are fundamental concepts for languages. They are made visible through embeddings; students know and see that noun phrases may contain other noun phrases. Recursion is demystified. Another advantage in introducing recursion and induction through grammars is that it is not obvious how to use iteration. When introducing recursion in a context in which a student knows iteration works can result in confusion and rejection. With grammars, there is no iteration, only natural recursion. 4. Unification has natural application in English, as in subject-verb number agreement: number can be determined from the subject or the verb, and if both, then they must agree. This is exactly implemented with simple unification and it is easy to see why it is useful. 5. Grammars are inherently declarative and thus center declarativity. They don’t require cut or assert or any nonlogical features. Students see how much can be done within the declarative framework, and tend to not to resort to nonlogical constructs quickly when frustrated. 6. Grammars exemplify the power of evaluation in multiple modes. Their declarativity makes changing modes easier even if the subgoals have to be reordered to satisfy mode requirements. 7. Grammars show the criticality of tabled evaluation to obtain the full benefit of declarativity. Without tabling, this entire development would feel (and be) significantly more contrived. References [1] M. A. Harrison, Introduction to Formal Language Theory, Addison-Wesley, 1979. [2] D. S. Warren, V. Dahl, T. Eiter, M. V. Hermenegildo, R. A. Kowalski, F. Rossi (Eds.), Prolog: The Next 50 Years, volume 13900 of Lecture Notes in Computer Science, Springer, 2023. [3] R. A. Kowalski, Logic for problem solving, volume 7 of The computer science library : Artificial intelligence series, North-Holland, 1979. URL: https://www.worldcat.org/oclc/05564433. [4] I. Sakai, Syntax in universal translation, in: Proceedings 1961 International Conference on Machine Translation of Languages and Applied Language Analysis, volume II, Teddington, England, 1962, pp. 593–608. [5] J. Earley, An efficient context-free parsing algorithm, Commun. ACM 13 (1970) 94–102. URL: https://doi.org/10.1145/362007.362035. doi:10.1145/362007.362035.