. Pyttern: a Python-Based Program Query Language Julien Liénard1 , Kim Mens1 and Siegfried Nijssen1 1 Université catholique de Louvain Abstract Despite an abundance of tools available for expressing and detecting structural patterns in program source code, their steep learning curve often creates a barrier for non-expert developers. To address this issue, we present Pyttern, a program query language for Python that is easy to learn and use while maintaining sufficient expressiveness for defining and detecting small structural coding idioms. Pyttern leverages a combination of Python syntax and regex-like wildcards to identify coding patterns. To evaluate the strengths and limitations of our language prototype, we conducted a study involving 30 junior developers. Participants were asked to compare our language with established program query languages by expressing and detecting small coding idioms or flaws. Based on the feedback gathered from this study, we highlight Pyttern’s strengths and areas for improvement. Keywords Programming Language, Query Language, Pattern Matching, Python, Static Program Analysis, Software Tools 1. Introduction Detecting structural patterns in program source code can be useful, to check whether developers produce high-quality code that adheres to best practices [1]. In an educational context, it can help raise students’ awareness of such practices and help educators assess their students’ understanding of programming concepts and techniques. By identifying common misconceptions in student code, personalised feedback and instruction could be provided to overcome such mistakes. Teachers of introductory programming courses (e.g., secondary school teachers), however, are not always expert developers. While program query languages (PQL) may help providing insights into common coding patterns or flaws observed in students’ programs, teachers’ lack of extensive software engineering experience may pose challenges when adopting such languages that often have a steep learning curve. To address this issue, we propose a program query language that balances usability and expressiveness, providing a tool that is intuitive and easy to learn while still sufficiently powerful to write and detect a wide range of structural code patterns. This paper presents an initial prototype of that language, which was presented to a group of junior developers with the aim of identifying the language’s strengths and limitations in practical usage. Our proposed program query language is called Pyttern, a deliberate blend of the words ‘Python’ and ‘pattern’. Pyttern is a Python-based query language that combines regular expression elements with Python syntax while performing its search on abstract syntax trees. Its main purpose is to help non-experts write expressive structural patterns to detect in programs, in a language that is easy to read and write by remaining close to the Python language it reasons about. In effect, the patterns to detect resemble Python programs with holes represented by expressive wildcards. The feedback obtained on the usage of this language during our preliminary evaluation has identified several areas of improvement that will influence Pyttern’s future development. By iteratively refining the language based on user feedback, we hope to improve its usability and efficacy, eventually making it a helpful resource for teachers looking for ways to analyze student coding patterns. BENEVOL24: The 23rd Belgium-Netherlands Software Evolution Workshop, November 21-22, Namur, Belgium $ julien.lienard@uclouvain.be (J. Liénard); kim.mens@uclouvain.be (K. Mens); siegfried.nijssen@uclouvain.be (S. Nijssen) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Related Work In our evaluation (Section 4) we compare the strengths and weaknesses of Pyttern to some other program query languages. Although many different PQLs have been proposed over the years, we limited ourselves to three of them because of the limited number of participants in our study and the time they had available. We opted for logic meta programming, AST traversal and regular expressions, because of their prevalence and popularity in the field. They also represent vastly different approaches to program querying. Regular expressions (RegEx) are a widely used instrument for finding patterns in character sequences. Codegex [2] and other program query languages or tools [3, 4, 5] often rely on regular expressions, among other techniques, to analyze and query code. A majority of lint [6] and many other program analysis tools work by parsing programs into an abstract syntax tree (AST) and traversing that tree to analyse the structure of a program. Finally, quite some research has been conducted to explore the power of logic meta programming to declare programming patterns as logic rules on top of object-oriented programs, reified as a logic knowledge base [7, 8, 9, 10, 11]. Each of these different approaches will be compared to Pyttern in Section 4. In other program query languages, like Pyttern, queries are written in a syntax close to that of the host programming language. Such queries typically take the form of a code snippet, potentially incorporating additional syntax that goes beyond the capabilities of the underlying programming language. Such queries are relatively straightforward to express, even for non-expert users, assuming that they are sufficiently familiar with programming in the host language. The level of expressiveness and precision achievable with such code queries varies, depending on the user intent, language definition and the particular search engine in use. Plain code represents the simplest form of code query, involving code snippets defined within the syntax of the host programming language itself [12, 13, 14]. Code with holes takes a different stance. Instead of relying on a search engine to determine to what similar code fragments a given code snippet matches, some search engines support queries that explicitly define one or more ‘holes’ within the code [15, 16, 17]. Such queries allow the user to specify holes as placeholders for part of the code. Some search engines go further and accept queries in custom languages that significantly extend an existing programming language [18, 19]. In particular, a main source of inspiration for Pyttern was SCRUPLE [15]. SCRUPLE is a tool to describe high-level patterns for querying over C source code. It contains a set of symbols that can be used as substitutes (‘holes’) for syntactic entities in C. The SCRUPLE pattern language derives its expressive capability from four primary categories of pattern symbols: wildcards for single syntactic entities, wildcards for collections of syntactic entities, named wildcards and some additional features. SCRUPLE uses an extended nondeterministic finite state automaton to match patterns in the code. This Code Pattern Automaton (or CPA) uses 3 types of transitions: move to the left child, move to the right neighbour and move to the next parent. It ensures that the CPA will always terminate as every node can only be checked once with those transitions. If the CPA finishes on a terminal state, there is a match. This technique served as inspiration for the matching algorithm used in Pyttern which uses the same set of transitions when matching patterns in code. But Pyttern does not use a CPA but relies on Tree Matching between the pattern and the code AST. 3. The Pyttern language Pyttern is a new pattern description language with a Python-like syntax that blends readability and flexibility, enabling developers of any skill level to define and detect various coding patterns and flaws. By incorporating explicit wildcard elements, Pyttern allows for clear and expressive definition of complex code patterns. 1 def count(lst, predicate): 2 →tot = 0 3 →for val in lst: 4 →→if predicate(val): 5 →→→tot += 1 6 →return tot Listing 1: Example of an accumulator pattern in Python. 1 def ?(?*): 2 →?accumulator = 0 3 →?:* 4 →→for ? in ?: 5 →→→?:* 6 →→→→?accumulator += ? 7 →return ?accumulator Listing 2: Accumulator pattern in Pyttern. 1 def approx_pi(n): 2 →var = 0 3 →if n > 0: 4 →→for i in range(n): 5 →→→var = (-1)**i / (2*i + 1) 6 →→var *= 4 7 →return var Listing 3: Example of a flawed instance of the accumulator pattern in Python due to the use of ’=’ instead of ’+=’ at line 5. Motivating example To illustrate this, let’s assume a teacher wants to check that their students are using an accumulator pattern as instructed to solve a specific exercise. An accumulator pattern, as illustrated in code listing 1, refers to a technique where you first initialize an accumulator variable (line 2), and then iterate over a sequence (line 3), updating the accumulator variable with the result of some operation during the iteration (line 5). In the end, the accumulator variable which retains the cumulative result of all these operations is returned as result (line 6). A possible Pyttern pattern to check whether a Python function implements an accumulator, is shown in Listing 2. The key elements of the accumulator pattern are clearly visible on lines 2 (initialisation), 4 (iteration), 6 (accumulation) and 7 (returning) of the pattern. The remainder of the pattern consists mainly of wildcards that serve as placeholders or "holes" in the program. For example, we are not interested in the exact function name or argument list on line 1, nor the exact iteration variable or range on line 4. Other wildcards such as those on lines 3 and 5 indicate that the for-loop and accumulation, respectively, can occur at any indentation level. The code shown earlier in listing 1 matches this Pyttern (accumulator) pattern. The colours indicate what wildcards match what part of the code. The program shown in listing 3 does not match this pattern. It does not correctly increment the accumulation variable on line 5 since it uses a mere assignment "=" instead of the "+=" as expected. AST matching Even though Pyttern users are not exposed to it, the core idea underlying Pyttern is to match an Abstract Syntax Tree (AST) representing a pattern, to the AST of a program in which to detect the pattern. When matching a Pyttern pattern to a program’s AST, Pyttern aligns each node in the pattern’s AST with corresponding nodes in the program’s AST. Pyttern matches nodes using their labels starting from the root node. If the pattern would contain no wildcards, the pattern would match the program code only if it is identical to the program. However, Pyttern offers more flexibility through wildcards, which allows a pattern to match a program by skipping some nodes in the program’s AST. A one-to-one correspondence between nodes in the pattern and program ASTs is hence not rigidly enforced any more when these wildcards are employed, enabling Pyttern to identify more complex code patterns where certain nodes may be omitted or repeated in the program’s structure. Pyttern’s wildcards Pyttern’s syntax and semantics thus revolve around the specification and definition of matches of patterns that capture structural code characteristics within a program. They are defined using a combination of Python and regular expression-like syntax. Table 1 summarises some of the wildcards Pyttern currently supports to represent various code patterns. To make the understanding of the language easier, each of its wildcards start with the ’?’ character. A more in depth description and examples for each of the Pyttern wildcards can be found on GitHub1 . Wildcard Meaning ? Matches a single node in the AST ?* Matches zero or more nodes in an AST list node ?{n, m} Match between n and m nodes ?name Matches a single node and names it for later reference ?[Type, ...] Match an AST node of any of the given types ?: Matches the body of this wildcard at one nesting level ?:* Matches the body of this wildcard at any nesting level Table 1 Wildcards supported by the Pyttern language. Some implementation details For wildcard matching, Pyttern employs a recursive approach. It relies on backtracking to take into account all potential instances of matching characters and wildcards, which is necessary to process the named wildcards correctly, as they lead to dependencies between matched parts of code that are far away from each other. The pattern matching begins by taking into account the program’s top-level structures and then proceeds recursively through nested parts. To construct ASTs for the Python programs on which the pattern will be matched, we use the native AST class provided by Python.2 We cannot use the same AST class to produce ASTs for patterns expressed in Pyttern, as it features special wildcard constructs that don’t exist in Python. Instead, we use an ANTLR [20] parser library for Python, along with a Python grammar extension that complies with Pyttern’s wildcard syntactic constructs. Our main motivation for choosing ANTLR is that it is relatively straightforward to extend it with additional constructs for the wildcards. Pyttern’s matching algorithm then employs a modified Depth-First Search (DFS) strategy to align and match both ASTs. This DFS traversal is adapted to account for situations where nodes may be skipped based on the wildcards used in the pattern. The DFS traversal navigates through both ASTs at the same time to match corresponding nodes. If there are no children in both trees, the algorithm moves on to the nodes’ right neighbour. Otherwise, it moves to the current nodes’ first child. The method goes to the parent nodes’ neighbour if there are no right neighbours. When matching a wildcard, the wildcard can change this sequence by directly going to the nodes’ right neighbour or the parents’ neighbour, skipping some steps in the process. We use the same principle as explained in section 2 for SCRUPLE [15]. This ensures that we always terminate the matching process. 4. Comparing Program Query Languages Experiment To assess possible strengths and limitations of Pyttern w.r.t. other program query languages (PQL), we conducted a study involving 30 junior developers. The participants were master- 1 https://github.com/JulienLie/pyttern/tree/main 2 https://docs.python.org/3.12/library/ast.html Short name Description if instead of while Code fragments without a while loop containing an if where instead there should have been a while statement. Erase argument value Code fragments that accidentally or purposefully erase an argument variable by reassigning it to a constant. Loop variable not used Code containing a loop with a loop variable that is not used inside the loop. Misplaced return In Python, nested blocks are defined using indentation. This sometimes causes a return statement to be at the wrong indentation level. Too much indentation When writing programs, less experienced students sometimes create complex solutions with lots of different indentation levels. print instead of return Some students use a print statement instead of a return statement at the end of their Python function. Table 2 Coding idioms and flaws to be detected with different PQLs. level university students in computer science or computer science engineering who followed a course on programming paradigms. As part of this course they were asked to compare different PQLs by expressing and detecting the same coding idioms or flaws in a dataset of student code that was provided to them. The requested coding idioms and flaws can be found in Table 2. The four PQLs selected for this study were: regular expressions with Python’s re module, AST traversal with Python’s ast module, our Pyttern language prototype, and the use of Prolog with a set of predicates reasoning over an AST reified as logic facts. Before the study, the participants completed anonymous surveys on their background as well as on their knowledge of the two PQLs assigned to them. The survey results are summarised in Figure 1. Our participants were primarily intermediate to advanced Python developers with little experience in program query languages and intermediate knowledge of Prolog (the basic concepts of Prolog were introduced earlier in a course they all attended). Previous knowledge of different topics for the experiment Complete Mastery 5 Expert 4 Advanced 3 Rating Intermediate 2 Beginner 1 2.312 1.000 2.062 3.375 Never Heard Of It 0 Design Patterns PQL Prolog Python Topics Figure 1: Background knowledge of the study participants Next, during a two-hour session per language, they had to write program queries in two PQLs they were assigned to. After the study, they completed another anonymous survey on how they appreciated the two PQLs they experimented with. Finally, they were asked to experiment on their own with a third PQL of their choice, and write a final report of about 5 pages, to compare the different PQLs they had tried from different points of view. Table 3 shows how the languages were distributed over the different participants, in such a way that we obtained sufficient coverage of the different PQLs, but with higher coverage for Pyttern as that was the main language we wanted to compare against. Language Pyttern AST Traversal RegEx Prolog first 6 6 4 second 8 8 third 3 6 5 2 Total 17 12 9 10 Table 3 Distribution of query languages over participants Results Since in this study we observed little or no difference due to the order in which the PQLs were assigned to the particpants, in what follows we decided to group all results, regardless of the order in which participants experimented with their first or second PQL. Several participants did confirm that Pyttern strikes a right balance between expressiveness and ease of use, although they also confirmed that this is the case for AST traversal. In the case of Pyttern, with its syntax close to Python, its ease of use comes at the cost of a lack of expressiveness for more complex queries. In the case of AST traversal, its expressiveness comes at the cost of a steeper learning curve. Figure 2 indeed shows a tendency of Pyttern being appreciated as more easy to use than other PQLs, and in particular than using AST traversal. Figure 3 shows that Pyttern scores less than the other languages on its ability to express more complex patterns. I thought this program query language was easy to use. AST RegEx Pyttern Prolog Strongly disagree Disagree Neutral Agree Strongly agree Figure 2: Box plot illustrating participants’ ratings on the ease of use of different program query languages. This program query language is sufficiently powerful to allow me to express complex patterns in sufficient level of detail. AST RegEx Pyttern Prolog Strongly disagree Disagree Neutral Agree Strongly agree Figure 3: Participants’ ratings on the expressiveness of different PQLs to capture complex code patterns. To improve Pyttern’s expressiveness, missing language features highlighted by participants were having a negation operator or the ability to combine simpler patterns into more complex ones. As opposed to expressiveness, Pyttern scores better than logic meta programming on several criteria. As emphasized in Figure 4, it was considered (perhaps surprisingly) more declarative than the other languages, including Prolog. This is due to the explicit AST representation (or complex regular expres- sions) that queries need to manipulate in the other languages, whereas in Pyttern this remains hidden under the hood. A frequent remark was that Pyttern still lacks extensive documentation. Because of that, occasionally students had to request help from the Pyttern developer. This is confirmed in Figure 5 and is also the case for logic metaprogramming, which was also a prototype. Due to its current lack of maturity Pyttern Queries in this program query language are declarative. AST RegEx Pyttern Prolog Strongly disagree Disagree Neutral Agree Strongly agree Figure 4: Box plot illustrating participants’ ratings on how declarative different query languages are. I think that I would need the support of a technical person to be able to use this program query language. AST RegEx Pyttern Prolog Strongly disagree Disagree Neutral Agree Strongly agree Figure 5: Participants’ perceptions of their need for technical support to use the various query languages. was also considered weak w.r.t. understanding how to debug queries. A tool similar to Regex101 that visually illustrates what parts of the code a query matches would be ideal. Nevertheless, despite those limitations, Pyttern was considered the easiest to learn and use without needing a lot of background other than Pyttern. 5. Conclusion and Future Work Conclusion Encouraged by the results of our user study, we feel that Pyttern could become an effective tool for finding symptoms of simple programming misconceptions [21] in small programs. The purpose of the first prototype of Pyttern presented in this paper was to collect initial feedback on its strengths and weaknesses in order to further improve it in the future. The user study indeed highlighted a number of issues that we had not considered before. We will improve upon Pyttern based on that feedback and conduct a more thorough validation on a more mature prototype in the future. Limitations One limitation comes from the fact that Pyttern’s parser compares AST structures in a quite literal manner. As a result, we sometimes have to define multiple alternative patterns. For example, consider the pattern defined in Listing 2. It will detect all accumulators that use the += statement, but not those using ?accumulator = ?accumulator + ?. To overcome this, we have to define an alternative pattern using the second statement. Adding a disjuntion operator (OR) to the language could overcome this but may come at the cost of less readable patterns. Note that Pyttern’s syntactic construct ?[Type, ...] already provides some enhanced expressiveness. If in Listing 2 we would replace line 4 by →→?[𝐹 𝑜𝑟, 𝑊 ℎ𝑖𝑙𝑒] : this would allow the looping construct in the accumulator pattern to be either a for-loop OR a while-loop. Another issue is that Pyttern expects methods and expressions to appear in exactly the order in which they were specified in the pattern. This is not always desired. For method definitions in a class the order should play no role, for example. Future work Some feedback received from the study participants has already been included in Pyttern. We are increasing the speed of matching and improving the documentation, but more can be done still. Performance could be enhanced further, especially regarding backtracking, by implementing smart memoisation or tabling techniques. We also envision improving the output and providing debugging and visualisation tools to improve the self-efficacy of Pyttern users so that they need not rely on a technical person (cf. Figure 5). We also intend to extend the expressiveness of Pyttern to include negation, the ability to express alternatives, or the ability to define more complex patterns in terms of previously defined ‘subpatterns’. Since we are building Pyttern incrementally, further validation of these new language features will be required. We already have a second experiment in preparation, where Pyttern will be validated by students following a software maintenance and evolution course. A further survey and validation with Pyttern’s intended users (non-expert teachers that quickly want to express interesting structural patterns regarding their students’ potential programming misconceptions) also needs to be envisaged. Threats to validity Several factors can affect the validity of our current findings. The limited sample size of participants in the experiment may not accurately reflect the intended user base. In particular, some participants in our study may have more development experience than our target users. We observed that despite the higher learning curve, some of our participants were more in favour of using complex regular expressions or writing parse tree visitors for Python ASTs, because it would allow them to write more complex patterns. Given that the experiment was carried out during a Master’s course, there may be bias in the way the students view the tool that the assistant and course teacher developed. We kept the surveys anonymous in an effort to mitigate this bias. For organisational reasons and to gather more data points specifically for the Pyttern language, the assignment of languages to participants was distributed semi-randomly. The distribution method and partitioning can be seen in Figure 6. This semi-random distribution could have influenced participants’ perceptions and appreciation of the different languages. Common Flaws Python AST 3/4 Pyttern Flaw 1 1/3 3/4 Flaw 2 1/3 Regex 1/4 survey Flaw 3 1/4 1/3 Flaw 4 Prolog Flaw 5 Pyttern Figure 6: Assignation method and repartition for the experiment During the study, participants who were assigned Pyttern as their first language identified some small problems and concerns with the documentation. Because the study’s goal was targeted at enhancing our early Pyttern prototype, we addressed these concerns right away. As a result, individuals who were assigned Pyttern as second language received a slightly enhanced version of the prototype, which may have resulted in a different evaluation than those who used it first. The box plots were generated based on evaluations conducted after participants had used their first two languages. After experimenting with a third language, participants were required to write a report comparing all three languages. These reports may therefore provide a more comprehensive and nuanced understanding of participants’ final assessments than the box plots alone. References [1] E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design patterns: Abstraction and reuse of object- oriented design, in: ECOOP’93—Object-Oriented Programming: 7th European Conference Kaiser- slautern, Germany, July 26–30, 1993 Proceedings 7, Springer, 1993, pp. 406–431. [2] X. Zhang, Y. Zhou, S. H. Tan, Efficient pattern-based static analysis approach via regular-expression rules, in: 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), 2023, pp. 132–143. doi:10.1109/SANER56733.2023.00022. [3] N. Ayewah, W. Pugh, D. Hovemeyer, J. D. Morgenthaler, J. Penix, Using static analysis to find bugs, IEEE Software 25 (2008) 22–29. doi:10.1109/MS.2008.130. [4] Spotbugs website, 2006. URL: https://spotbugs.github.io/. [5] Spotbugs rules, 2006. URL: https://spotbugs.readthedocs.io/en/stable/bugDescriptions.html. [6] S. C. Johnson, Lint, a C program checker, Bell Telephone Laboratories Murray Hill, 1977. [7] R. Wuyts, Declarative reasoning about the structure of object-oriented systems, in: Proceedings. Technology of Object-Oriented Languages. TOOLS 26 (Cat. No. 98EX176), IEEE, 1998, pp. 112–124. [8] K. Mens, I. Michiels, R. Wuyts, Supporting software development through declaratively codified programming patterns, Expert Systems with Applications 23 (2002) 405–413. URL: https://www.sciencedirect.com/science/article/pii/S0957417402000763. doi:https://doi.org/ 10.1016/S0957-4174(02)00076-3. [9] C. De Roover, T. D’Hondt, J. Brichau, C. Noguera, L. Duchien, Behavioral similarity matching using concrete source code templates in logic queries, in: Proceedings of the 2007 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation, 2007, pp. 92–101. [10] E. Hajiyev, M. Verbaere, O. De Moor, Codequest: Scalable source code queries with datalog, in: ECOOP 2006–Object-Oriented Programming: 20th European Conference, Nantes, France, July 3-7, 2006. Proceedings 20, Springer, 2006, pp. 2–27. [11] K. De Volder, Jquery: A generic code browser with a declarative configuration language, in: Practical Aspects of Declarative Languages: 8th International Symposium, PADL 2006, Charleston, SC, USA, January 9-10, 2006. Proceedings 8, Springer, 2006, pp. 88–102. [12] K. Kim, D. Kim, T. F. Bissyandé, E. Choi, L. Li, J. Klein, Y. L. Traon, Facoy: a code-to-code search engine, in: Proceedings of the 40th International Conference on Software Engineering, 2018, pp. 946–957. [13] S. Zhou, H. Zhong, B. Shen, Slampa: Recommending code snippets with statistical language model, in: 2018 25th Asia-Pacific Software Engineering Conference (APSEC), IEEE, 2018, pp. 79–88. [14] S. Zhou, B. Shen, H. Zhong, Lancer: Your code tell me what you need, in: 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), IEEE, 2019, pp. 1202–1205. [15] S. Paul, A. Prakash, A framework for source code search using program patterns, IEEE Transactions on Software Engineering 20 (1994) 463–475. [16] A. Mishne, S. Shoham, E. Yahav, Typestate-based semantic code search over partial programs, in: Proceedings of the ACM international conference on Object oriented programming systems languages and applications, 2012, pp. 997–1016. [17] R. Mukherjee, S. Chaudhuri, C. Jermaine, Searching a database of source codes using contextualized code search, arXiv preprint arXiv:2001.03277 (2020). [18] K. Inoue, Y. Miyamoto, D. M. German, T. Ishio, Code clone matching: A practical and effective approach to find code snippets, arXiv preprint arXiv:2003.05615 (2020). [19] J. Lawall, G. Muller, Coccinelle: 10 years of automated evolution in the linux kernel, in: 2018 USENIX Annual Technical Conference (USENIX ATC 18), 2018, pp. 601–614. [20] T. Parr, The Definitive ANTLR 4 Reference, 2nd ed., Pragmatic Bookshelf, 2013. [21] L. Chiodini, I. Moreno Santos, A. Gallidabino, A. Tafliovich, A. L. Santos, M. Hauswirth, A curated inventory of programming language misconceptions, in: Proceedings of the 26th ACM Conference on Innovation and Technology in Computer Science Education V. 1, 2021, pp. 380–386.