1. Introduction

O. AbedelKader);

Package-Aware Approach for Repository-Level Code Completion in Pharo

Omar AbedelKader

Stéphane Ducasse

Oleksandr Zaitsev

Romain Robbes

Guillermo Polito

2 0 CIRAD, UMR SENS , F-34000 Montpellier , France 1 CNRS, University of Bordeaux , Bordeaux INP, LaBRI, UMR5800, F-33400, Talence , France 2 Univ. Lille, Inria , CNRS, Centrale Lille, UMR 9189 CRIStAL, Park Plaza, Parc scientifique de la Haute-Borne, 40 Av. Halley Bât A, 59650 Villeneuve-d'Ascq , France

2025

000 0 0001

Pharo ofers a sophisticated completion engine based on semantic heuristics, which coordinates specific fetchers within a lazy architecture. These heuristics can be recomposed to support various activities (e.g., live programming or history usage navigation). While this system is powerful, it does not account for the repository structure when suggesting global names such as class names, class variables, or global variables. As a result, it does not prioritize classes within the same package or project, treating all global names equally. In this paper, we present a new heuristic that addresses this limitation. Our approach searches variable names in a structured manner: it begins with the package of the requesting class, then expands to other packages within the same repository, and finally considers the global namespace. We describe the logic behind this heuristic and evaluate it against the default semantic heuristic and one that directly queries the global namespace. Preliminary results indicate that the Mean Reciprocal Rank (MRR) improves, confirming that package-awareness completions deliver more accurate and relevant suggestions than the previous flat global approach.

eol>Repository-Level Completion Code Completion Pharo Smalltalk

1. Introduction

not only be semantically relevant but also eficiently scoped to avoid information overload. One significant omission in the original design of Complishon was awareness of package structure.

While it excelled in suggesting global names like classes or variables, it did not favor entities from the same package or related parts of the system. Complishon does not prioritize or even adequately surface entities from the same package as the current editing context. As shown in Figure 1, when invoking code completion within the SpPresenter class (located in the Spec2-Core package), the suggestions are entirely drawn from the global namespace, ignoring nearby classes such as SpPresenterBuilder, SpTextPresenter, or SpApplication, which are defined in the same package. This leads to a degraded developer experience, especially in large codebases where numerous globally accessible entities can easily overshadow locally relevant ones.

Figure 1 shows the potential candidate completions from packages Spec2-core, highlighting Complishon’s failure to leverage local package context efectively. This is critical in Pharo, where projects are modularized into packages (e.g.,Spec2-Core, Spec2-Dialogs, Spec2-Interaction) that group functionally related classes. Developers often work within a small subset of the system, typically their packages, its dependencies, and core libraries, and should not be distracted by completions from unrelated parts of the codebase. Specifically, the figure illustrates an error where the first ten completion suggestions (such as SpInteractionError, SpJobListPresenter, etc.) are not part of the Spec2-Core package, thus underscoring the importance of package-aware code completion that respects package boundaries.

This article proposes and evaluates a repository-level completion strategy, a simple yet efective enhancement to the Complishon architecture that ranks candidates from the current package highest, followed by suggestions from lateral packages, and only then from the rest global namespace. The goal is to improve both relevance and responsiveness by exploiting the modularity already present in Pharo subsystems. This approach resonates with trends in completion research that use structural or probabilistic models such as Bayesian strategies [PLM15] or JetBrains’ log-based rankings [BKL+22] to make completions more context-aware. Our work is further motivated by principles from the moldable development paradigm, as described by Chis et al., [CNG15], which argues that tools like code completion should be extensible and adaptable to specific development contexts. In this sense, Complishon aligns with these ideals by ofering a plugin architecture for heuristics, including our proposed package-awareness logic.

This article is structured as follows: Section 2 gives an overview of the Complishon engine and its modular heuristic-based architecture. Section 3 identifies the limitations of global-environment-based completion, presents our approach, and outlines our hypothesis for package awareness suggestions. Section 4 describes our evaluation across multiple projects and strategies. Section 5 discusses the ifndings and their implications for completion systems in dynamic, large-scale environments. Section 7 situates our work within the broader landscape of static code completion. Section 6 outlines the main limitations of our approach, including the reliance on static reference points, the use of truncated identifiers that may not reflect real-world usage, and the challenges posed by Pharo-specific naming and package structures. Finally, Section 8 outlines future directions for integrating adaptive and learned strategies into the Complishon engine.

2. Background

Complishon the Pharo completion engine (see Figure 2), consists primarily of three key components: Heuristics, Lazy Fetchers, and a lazily cached Result Set. At the core, heuristics provide semantic guidance for the completion process by analyzing the Abstract Syntax Tree (AST) node located at the cursor (editor caret) and selecting the appropriate fetchers for completion suggestions. These heuristics are structured in a chain of responsibility [GHJV95], allowing a heuristic to pass handling responsibilities down the chain if it cannot process the current AST node itself. Fetchers, implemented using combinators, lazily retrieve and filter potential code completion candidates based on the context and user input, significantly optimizing performance and memory usage. A decorator pattern further enhances fetchers by preventing duplicate suggestions, particularly crucial in scenarios involving method inheritance ensuring results remain relevant and unique. Fetchers utilize specialized filters (e.g.,CoBeginsWithFilter), to match completion suggestions with the user’s partially typed input.

The Result Set component serves as a lazy, cached store that accumulates the suggestions provided by fetchers only as required, further enhancing eficiency. Complishon’s leverages AST-based analysis and utilizes a double-dispatch mechanism to adapt dynamically to the surrounding context. It constructs a dedicated context for code completion, considering factors such as source text and caret position. The generation and presentation of suggestions are managed by the IDE, which also integrates strategies such as case sensitivity filtering and adaptive configuration based on the structure and semantics of the current code environment. This adaptability employs heuristic-based fetchers configured by visitor patterns, informed by parsing and typing processes, to refine output and dynamically eliminate redundant or irrelevant suggestions.

The heuristics are modular, specialized for diferent code elements such as messages, variables, and symbols, and systematically connected in sequences forming a robust and comprehensive filtering framework. This chaining process includes sophisticated program semantics-based strategies such as prioritizing instance variables before superclass variables, self message suggestions, inherited methods, and inferred initialization constructs. Consequently, Complishon provides an eficient, contextually aware, and highly accurate completion experience tailored precisely to the user’s current coding scenario.

3. Our approach: Repository level package structure 3.1. Repository Level Completion

Although Complishon is efective in identifying global entities such as class names, class variables, or global variables, it currently does not leverage the package structure of a project efectively. As a result, it treats all global names uniformly, ofering no preference to entities located in the same or related packages. This behavior contrasts with several insights from prior research, which emphasize the importance of contextual filtering and locality in improving the relevance of code completion results. For instance, Hou et al., [HP10] propose heuristic techniques to filter and sort API suggestions using type hierarchies and grouping, significantly reducing the visual clutter of irrelevant entries. Similarly, Bruch et al., [BMM09] introduced the Best Matching Neighbors (BMN) approach, which ranks suggestions based on similarity to local usage contexts, an idea analogous to prioritizing completions from the same or nearby packages. Robbes et al., [RL08a] further underscore the value of recent usage and lexical proximity, showing that local context often outperforms global frequency. More recently, Hellendoorn et al., [HPGB19] demonstrated that intra-project completions remain the hardest for current models, largely due to their inability to distinguish between local and global identifiers. In a complementary direction, Li et al., [LHL+21] proposed learned acceptance and ranking models to suppress noisy completions, optimizing not just for correctness but for developer efort.

Inspired by these works, we introduce a new heuristic in Complishon that leverages package structure to improve completion prioritization (See Figure 3). For example, when performing auto-completion within a class in the P1-Core package, typing the letter A should ideally yield suggestions in the following order: first from P1-Core, then from related packages such as P1-Extension or P1-Test, and only afterward from the global namespace.

3.2. Leveraging Project Package Structure

To enable this behavior, we designed a package-aware completion heuristic that operates in three steps: • Identifying the current package: The completion engine determines the active package context, such as P1-Core. • Collecting potential matches: Typing A triggers a scan for all relevant matches in local and global names. • Prioritizing based on package proximity:

1. First, suggest entities defined within the current package (e.g., P1-Core).

2. Next, prefer suggestions from closely related packages (Lateral Dependencies) (e.g., P1Extension or P1-Test). These are currently inferred from naming similarity rather than formal dependencies.

3. Finally, include entities from the remaining global scope, if further suggestions are needed.

This strategy mirrors user expectations by elevating locally relevant completions, thereby reducing cognitive load and supporting the findings of prior empirical and usability studies.

3.3. Implementation Details

This behavior is implemented through new extension points in Complishon, enabling fine-grained control over completion fetchers and filtering logic. These enhancements support dynamic reordering of suggestions based on a package-locality heuristic, which prioritizes elements from the current package and its immediate relations. For instance, when auto-completing in a class within the package P1-Core, typing the letter A will first yield suggestions from P1-Core, followed by related packages like P1Extension or P1-Test, and finally the global namespace. This structured prioritization reduces cognitive overhead and improves developer productivity. To evaluate the impact of this heuristic, we refined our benchmarks to focus on variable completions specifically. As shown in Figure 4, invoking completion in a Spec2-Core context now highlights relevant classes such as SpIconProvider, SpPresenterBuilder, and SpTextPresenter. These results confirm the heuristic’s efectiveness in promoting package-local relevance and improving suggestion accuracy.

4. Evaluation 4.1. Benchmark Logic

In this work, we only focus on the completion of global variables such as class names 2. To evaluate the efectiveness of variable name completion algorithms, we based our implementation on the benchmark methodology introduced by Robbes et al., [RL08a, RL08b]. Although their original benchmark relied on a change-based repository of program history, our approach adapts the core idea for static analysis and variable-centric benchmarking, making it applicable in contexts where historical data is unavailable. The essence of this benchmark is to test whether a completion engine can correctly suggest the 2In Pharo, class names are global variables. In practice, most globals considered in this study are class names. original names of variables when only partial prefixes are provided. The fundamental insight is that by systematically rewriting every variable access site with increasingly longer prefixes (ranging from 2 to 8 characters), we can invoke the completion engine and evaluate whether it ranks the correct name high in its suggestions. This method simulates realistic completion scenarios and allows us to measure not just correctness but also ranking performance and user efort reduction. Our system, implemented via the StaticBenchmarksVariables class, carries out the following steps for each method in a given package: 1. AST Extraction: The method’s AST is retrieved to analyze variable usages within the code. 2. Variable Filtering: Each variable node is examined to confirm it is global (usually indicated by an uppercase-starting name). 3. Name Masking: The variable name is programmatically shortened to generate several prefixes of increasing lengths, from 2 up to 8 characters (or the full name length, if shorter). For example, the variable OrderedCollection yields prefixes such as Or, Ord, Orde, etc. 4. Completion Invocation: For each prefix, the completion engine is triggered as if the user were requesting suggestions after typing that fragment. 5. Result Logging: The engine’s output is analyzed to determine if the original name appears among the top 10 suggestions. If found, we record the rank at which it was suggested. If not, it is considered a failure for that prefix length.

This process is repeated for all global variables in all methods of the targeted package or class scope. The benchmark is implemented in StaticBenchmarks class and uses a ResultSetBuilder to generate completions based on specified heuristics. We provide a modular API allowing execution across individual classes or entire packages. Several metrics are collected: 1. Accuracy: The proportion of cases where the original variable name appears within the top-k results (typically top-10). 2. Rank Distribution: The frequency at which the correct name appears at each specific rank position from 1st through 10th. 3. Mean Reciprocal Rank (MRR): Measures the average reciprocal rank position of the first correct prediction. Formally, given a set of queries , the MRR is calculated as:

MRR = 1 ∑|︁|

1 || =1 rank where rank is the rank position of the first correct prediction for the -th query. MRR emphasizes the importance of placing the first correct result as high as possible and is particularly suitable when only one relevant result is suficient per query [MC18]. 4. Normalized Discounted Cumulative Gain (NDCG): Measures the usefulness of predictions based on their positions and graded relevance. The DCG is calculated as: where rel represents the graded relevance of the prediction at position . NDCG normalizes this score by dividing by the ideal DCG (IDCG), yielding: thus accounting logarithmically for position discounting [MC18]. 5. Timing and Memory Metrics: Includes total and average completion times and memory usage, evaluated per prefix length.

The benchmark can be run in various configurations, from baseline alphabetical sorters to advanced heuristic-guided engines. It can also be configured with diferent heuristic templates, ofering flexibility to evaluate a wide range of completion strategies. Our benchmark is designed not only to test whether a completion system finds the correct result but also how eficiently it does so, reflecting real-world usage where users expect high precision with minimal typing. By measuring performance across multiple prefix lengths and analyzing rank-based metrics, we can assess the trade-ofs and efectiveness of various completion algorithms. We chose MRR as our primary evaluation metric because it efectively captures the quality of ranked suggestions in code completion tasks. MRR reflects how early the correct suggestion appears in the list, aligning closely with real user experience. This makes it particularly suitable for comparing ranking-based approaches, such as the STAN-based reranking models we evaluate, and is consistent with evaluation protocols in prior work on neural code completion [SLH+21].

4.2. Package Selection

Based on the statistical analysis of Pharo code by Zaitsev et al., [ZDA20] we selected packages that ensure a broad domain diversity, including web development, visualization, software analysis, and user-interface frameworks, while also focusing on projects that demonstrated active maintenance, extensive test suites, and representative structural attributes, such as method length distributions and language feature usage patterns.

We selected Pharo packages reflecting significant diversity in domain, size, and development activity. The chosen packages span essential areas such as visualization (Roassal), software analysis (Moose), web application development (Seaside), user interface framework (Spec2), and version control systems (Iceberg). These packages were carefully selected based on their substantial adoption, active maintenance, extensive test coverage, and their use of key language features including polymorphism, reflective capabilities, and Pharo-specific syntax. This selection strategy ensures our benchmarks efectively capture common programming practices, typical complexity, and diverse usage patterns prevalent within the Pharo community. A comprehensive description of the selected packages is provided in Appendix A.

Table 1 provides a comprehensive overview of the selected Pharo frameworks evaluated in our benchmarking experiments. It includes the total number of packages, classes, defined classes, and methods analyzed in each framework. Additionally, the table presents key metrics such as the ratio of internal references ( int), number of internal references (int), and the number of external references (ext). A higher int indicates stronger intra-package cohesion, reflecting more frequent references to internal entities rather than external ones. For example, Iceberg shows a relatively high int (0.35), suggesting significant internal cohesion, whereas Roassal exhibits the lowest int (0.14), implying a greater reliance on external references. These metrics provide context for understanding how package structure and usage patterns influence the efectiveness of our package-aware heuristic.

5. Results and Discussion 5.1. Overall Results

Our evaluation demonstrates the advantages and limitations of introducing a package-aware heuristic into Pharo’s completion engine. Overall, we observed (Table 2) a significant improvement in MRR, indicating that developers receive more contextually relevant completion suggestions when package structure is leveraged. The average MRR improvement across all evaluated frameworks was notable, especially pronounced in frameworks like Spec (7.59%) and Iceberg (6.09%), which exhibit well-defined package structures with strong intra-package cohesion. However, the results also reveal nuanced behavior depending on the type of package and the framework’s architectural characteristics. For instance, the Moose framework showed modest overall gains (1.05%), which were more substantial in non-test packages (1.19%) but negligible or negative in test packages (0.31%).

This highlights a key insight completion accuracy gains are context-sensitive, often dependent on package dependencies and naming conventions. Test packages, for example, frequently access numerous external global variables and classes, reducing the efectiveness of a strictly package-local prioritization heuristic. The Roassal framework exhibited similar mixed outcomes. Non-test packages show improvements (2.14%), whereas test packages experienced performance degradation (-2.62%). This negative outcome indicates that the current heuristic, which infers package relationships from naming conventions alone, may inadvertently prioritize less relevant local completions in testing scenarios, where global or cross-package dependencies are prevalent.

Interestingly, Seaside and Spec demonstrated consistent improvements across both test and non-test packages, suggesting that in frameworks with strong internal modularization and delineated package structures, the heuristic significantly enhances completion relevance across diverse coding contexts. Another critical observation is that the heuristic’s efectiveness diminishes as prefix lengths increase, with the most significant improvements consistently appearing at shorter prefix lengths (2-4 characters). This outcome aligns with real-world coding behavior, where developers rely heavily on early suggestions to reduce typing efort. However, the diminishing returns at longer prefixes suggest a reduced practical advantage once developers have provided extensive typing context. Our analysis also uncovered Framework/Bib Iceberg Moose Roassal Seaside Spec

Overall Overall Test Non-test Overall Test Non-test Overall Test Non-test Overall Test Non-test edge cases where similarly named global variables (e.g.,IceSBBrowserAbstractMethodCommand vs. IceSBBrowseFullMethodCommand) resulted in ambiguous completions. Addressing these cases may require additional heuristics or statistical models trained on historical usage data, highlighting an avenue for future work.

5.2. Challenges

In the appendix 8, detailed benchmarks for each evaluated framework are provided. It is important to note that, when computing averages and deltas presented in Table 2, zero values were omitted. A zero value typically represents cases where a package contains only one class with methods that are either trivial, purely for testing, or abstract (marked by self subclassResponsibility) with concrete implementations residing outside the package. Additionally, some packages might only contain extensions without complete implementations.

The occurrence of unchanged or marginal delta values (e.g., values close to 1) in MRR is often due to methods or variables sharing lengthy prefixes, such as IceSBBrowserAbstractMethodCommand and IceSBBrowseFullMethodCommand. Given that our current benchmarking evaluates prefixes limited to 8 characters, diferences beyond this length remain undetected. This represents a significant limitation of our evaluation method.

To overcome this limitation, future research should explore more sophisticated completion strategies. For instance, completing only the common prefix and then progressively refining the completion. Consider an example from the Moose framework: typing the letter ’M’ could immediately propose prefixes like ’Moose...’, ’MooseMSE...’, and ’MooseMSEImporter...’. Pressing ’Tab’ after selecting a prefix would insert it directly without adding extra spaces, enabling seamless continued typing. Further keystrokes could then complete subsequent portions of the name (e.g.,, typing ’I’ to complete ’Importer’). Such an approach could significantly reduce typing efort, allowing a long name like MooseMSEImporterTestEntity to be entered with just a few keystrokes (’M’ - ’Tab’ completes ’MooseMSE’, then ’I’ ’Tab’ completes ’ImporterTestEntity’). Integrating this prefix-driven completion strategy into future benchmarks will provide deeper insights into code completion efectiveness, especially in projects characterized by extensive naming conventions.

6. Limitations

A key limitation of our evaluation is its dependence on static references extracted from existing code. We simulate completion sites by truncating identifier names to 2–8 characters and then measuring whether our approach ranks the correct name near the top of the suggestion list. Although this technique is standard in code-completion research, it does not fully capture how developers behave in live sessions. Results could also difer when analyzing older repositories or external libraries that diverge from Pharo’s usual package conventions.

Additionally, an important assumption underlying our heuristics is that global variables or classes are more likely to be referenced within the same package where they are defined, rather than in other packages within the same project. This assumption justifies our strategy of prioritizing classes first from the same package, followed by "friend" packages, and finally external packages. However, this assumption has not been empirically validated, and there are multiple scenarios where it may not hold true. For instance, test packages rarely reference classes from the same package but frequently reference classes from other packages within the same project. Similarly, classes defined in core packages or packages implementing design patterns such as visitors, exceptions, or commands often exhibit fewer references within their own packages but are extensively referenced by others.

To address this potential threat to validity, future work should investigate the actual degree of self-referencing within packages. If self-referencing is found to be low, prioritizing classes from the same package could negatively impact completion accuracy, as indicated by our results in Table 2 for test packages. Thus, it might be beneficial to prioritize classes from the same project rather than strictly from the same package, potentially improving completion suggestions, particularly in cases such as test packages.

Finally, very long prefixes, such as IceSBBrowserAbstractMethodCommand and IceSBBrowseFullMethodCommand remain challenging because our current methodology considers only short prefixes. Future work should extend the analysis to longer prefixes and a broader range of code bases, thereby providing a more complete picture of completion behavior and ultimately yielding more accurate, context-aware suggestions for developers.

7. Related Works

Code completion has evolved from simple syntactic suggestions to sophisticated, context-sensitive systems powered by statistical, neural, and usability-aware methods. Early work such as Robbes et al., [RL08a] introduced context-sensitive filtering based on recent usage and program history. Shortly afterward, Bruch et al., [BMM09] and Hou et al., [HP10] emphasized syntactic similarity and ranking heuristics, relying on type hierarchies, usage popularity, and manual grouping. These heuristic and rule-based approaches established foundational techniques for example-based ranking and structural ifltering still used in IDE plugins today.

Statistical Approaches Hindle et al., [HBS+12] demonstrated that software code exhibits significant regularities that can be efectively captured using statistical language models, specifically through n-gram modeling. Statistical methods soon emerged, bringing greater accuracy through learning from large corpora. Nguyen et al., [NNNN13] introduced SLAMC, combining n-gram models with semantic roles and topic modeling. Raychev et al., contributed SLANG [RVY14], a statistical language model synthesizing code completions. Proksch et al., [PLM15] introduced Bayesian models ofering compact and accurate recommendations via probabilistic reasoning over API usage patterns. Nguyen et al., [NHC+16] further advanced statistical methods with APIREC, learning fine-grained API usage patterns. Raychev et al., subsequently presented DEEP3 [RBV16], employing decision-tree-based generative models.

Learning-Based Approaches Deep neural architectures were increasingly adopted for code completion. Bielik et al., [BRV16] proposed PHOG, a grammar-aware generative model using probabilistic higher-order rules. Jin et al., [JS18] highlighted usability concerns by addressing the hidden costs of extensive suggestion lists. Hellendoorn et al., [HPGB19] provided a significant empirical analysis revealing the limitations of models in practical intra-project completions. Svyatkovskiy et al., [SLH+21] introduced a modular neural framework for code completion, leveraging static analysis and granular token encodings to design a memory-eficient reranking model with high predictive performance. Karampatsis et al., [KBR+20] introduced open-vocabulary neural models with Byte-Pair Encoding (BPE), efectively managing out-of-vocabulary issues. Matani et al., [Mat21] proposed an eficient segment-tree-based solution for prefix-based completion without the need for statistical training. Popov et al., [POL+21] delivered a practical, time-eficient GPT-2 variant for R code completion, achieving high accuracy within strict latency constraints. Li et al., [LHL+21] proposed benefit-cost-aware metrics to filter and reorder completions, significantly reducing irrelevant suggestions.

Recent contributions Recent contributions emphasize real-time usability, contextual awareness, and cross-language generalization. Bibaev et al., [BKL+22] advanced towards practical systems by training rankers using real IDE usage logs, and personalizing suggestions to reduce developer keystroke efort. Takerngsaksiri et al., [TTL24] introduced PyCoder, a syntax-aware transformer-based model predicting token types without explicit AST parsing. Wang et al., [WZL+25] proposed TIGER, a generate-then-rank approach using lightweight transformers for Python type inference, demonstrating state-of-the-art performance. Modern code completion systems now integrate structural analysis, statistical learning, and deep neural modeling, consistently targeting developer productivity through usability-driven metrics. Despite these advancements, significant challenges remain, especially in handling dificult intra-project completions and minimizing cognitive load. Future developments must balance sophistication, runtime eficiency, and developer experience, increasingly turning to hybrid models and log-informed personalization.

8. Conclusion and future works

Complishon employs multiple semantic heuristics to produce context-aware completions in a live programming environment. However, it originally treated the system as a flat global space, limiting the precision of suggestions for large projects. In this paper, we proposed and evaluated a packageawareness completion heuristic to mitigate this issue. Our approach prioritizes local package entities, followed by similarly prefixed or related packages, and finally falls back to the global namespace. The results show that package-awareness completions indeed improve the ranking of relevant suggestions in some cases, especially when a package’s references remain largely local. Conversely, packages with major cross-package dependencies, particularly testing packages, can perform worse with naive prefix-based ordering, highlighting the need to explicitly consider package dependencies and usage patterns.

We plan to integrate more advanced dependency-aware scoping, so that Complishon understands not just one’s immediate package but also any dependencies or related packages. We will also explore hybrid approaches that combine lightweight statistical frequency analysis with structural heuristics to handle complex referencing patterns. By evolving in this direction, we aim to make Complishon an increasingly efective code-completion system for large, modular software projects. Acknowledgments.

We thank Inria and the LLM4Code défi for the funding of the first author. Declaration on Generative AI The authors have not employed any Generative AI tools.

[JH25] Hangzhan Jin and Mohammad Hamdaqa. Ccci: Code completion with contextual information for complex data transfer tasks using large language models. arXiv, 2025. [JS18] Xianhao Jin and Francisco Servant. The hidden cost of code completion: understanding the impact of the recommendation-list length on its eficiency. In International Conference on Mining Software Repositories, 2018. [KBR+20] Rafael-Michael Karampatsis, Hlib Babii, Romain Robbes, Charles Sutton, and Andrea Janes. Big code != big vocabulary: open-vocabulary models for source code. In International Conference on Software Engineering (ICSE), 2020. [LHL+21] Jingxuan Li, Rui Huang, Wei Li, Kai Yao, and Weiguo Tan. Toward Less Hidden Cost of Code Completion with Acceptance and Ranking Models. In International Conference on Software Maintenance and Evolution (ICSME), 2021. [Mat21] Dhruv Matani. An $o(k \log{n})$ algorithm for prefix based ranked autocomplete. arXiv, 2021. [MC18] Bhaskar Miutra and Nick Craswell. An introduction to neural information retrieval.

Foundations and Trends in Information Retrieval, 2018. [NDG05] Oscar Nierstrasz, Stéphane Ducasse, and Tudor Gîrba. The story of Moose: an agile reengineering environment. In Michel Wermelinger and Harald Gall, editors, Proceedings of the European Software Engineering Conference, ESEC/FSE’05, pages 1–10, New York NY, 2005. ACM Press. Invited paper. [NHC+16] Anh Tuan Nguyen, Michael Hilton, Mihai Codoban, Hoan Anh Nguyen, Lily Mast, Eli Rademacher, Tien N. Nguyen, and Danny Dig. Api code recommendation using statistical learning from fine-grained changes. In International Symposium on Foundations of Software Eng ineering, 2016 . [NNNN13] Tung Thanh Nguyen, Anh Tuan Nguyen, Hoan Anh Nguyen, and Tien N Nguyen. A statistical semantic language model for source code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, pages 532–542, 2013. [PDO20] Guillermo Polito, Stéphane Ducasse, and Allex Oliveira. Manage your code with git and iceberg, 2020. [PLM15] Sebastian Proksch, Johannes Lerch, and Mira Mezini. Intelligent code completion with bayesian networks. Transactions on Software Engineering and Methodology (TOSEM), 1(25), 2015. [POL+21] Artem Popov, Dmitrii Orekhov, Denis Litvinov, Nikolay Korolev, and Gleb Morgachev.

Time-eficient code completion model for the r programming language. In Workshop on Natural Language Processing for Programming, 2021. [RBV16] Veselin Raychev, Pavol Bielik, and Martin Vechev. Probabilistic model for code with decision trees. In International Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2016. [RL08a] Romain Robbes and Michele Lanza. How program history can improve code completion. In Proceedings of ASE 2008 (23rd International Conference on Automated Software Engineering), pages 317–326, 2008. [RL08b] Romain Robbes and Michele Lanza. SpyWare: a change-aware development toolset. In Proceedings of the 30th International Conference on Software Engineering, ICSE’08, pages 847–850, New York, NY, USA, 2008. ACM. [RL10] Romain Robbes and Michele Lanza. Improving code completion with program history.

Journal of Automated Software Engineering, 17(2):181–212, 2010. [RVY14] Veselin Raychev, Martin Vechev, and Eran Yahav. Code completion with statistical language models. In Acm Sigplan Notices, volume 49, pages 419–428. ACM, 2014. [SLH+21] Alexey Svyatkovskiy, Sebastian Lee, Anna Hadjitofi, Maik Riechert, Juliana Franco, and Miltiadis Allamanis. Fast and memory-eficient neural code completion. In International Conference on Mining Software Repositories (MSR), 2021. [TTL24] Wannita Takerngsaksiri, Chakkrit Tantithamthavorn, and Yuan-Fang Li. Syntax-aware on-the-fly code completion. Information and Software Technology, 2024. [WZL+25] Chong Wang, Jian Zhang, Yiling Lou, Mingwei Liu, Weisong Sun, Yang Liu, and Xin Peng.

TIGER: A Generating-Then-Ranking Framework for Practical Python Type Inference .

In International Conference on Software Engineering (ICSE), 2025. [ZDA20] Oleksandr Zaitsev, Stéphane Ducasse, and Nicolas Anquetil. Characterizing pharo code: A technical report. Technical report, Inria Lille Nord Europe - Laboratoire CRIStAL Université de Lille ; Arolla, January 2020.

A. Project Descriptions

Spec. Spec is a modern user interface framework integrated into Pharo. It adopts a modular architecture centered around "presenters," which allows developers to eficiently compose, nest, and manage interactive UI elements [DHDJML24]. Spec2 facilitates dynamic interface layouts, enabling real-time modifications without the need for extensive interface rebuilds, thus enhancing adaptability and responsiveness. Additionally, it supports multiple rendering backends, including Morphic and GTK+3, providing flexibility for cross-platform application development. Spec2’s design promotes streamlined communication between components, significantly simplifying interaction handling and improving maintainability.

Roassal. Roassal is a lightweight and extensible visualization engine developed in Pharo, designed to facilitate agile and interactive data visualization [Ber16]. It provides a rich set of graphical primitives and layout algorithms, enabling developers to craft domain-specific visualizations with minimal efort. Roassal supports interactive features such as zooming, dragging, and tooltips, allowing users to explore complex data structures dynamically. Its integration with the Pharo environment allows for seamless development and immediate feedback, making it an efective tool for both exploratory data analysis and the development of custom visualization tools.

Seaside. Seaside is a web application framework for Smalltalk, particularly well-integrated with Pharo, that enables the development of complex web applications through a component-based architecture [DLR07, DRS+10]. Unlike traditional web frameworks that rely on templates, Seaside allows developers to build web pages by composing stateful components, each encapsulating its rendering and behavior. It leverages continuations to manage control flow, facilitating the creation of sophisticated user interactions and workflows. Seaside’s approach promotes code reuse and modularity, and its tight integration with the Pharo development environment allows for live debugging and real-time updates, enhancing developer productivity [BDR08].

Iceberg. Iceberg is the primary version control system (VCS) integration tool within the Pharo environment, providing a seamless interface to Git repositories. It enables developers to perform standard Git operations such as cloning, committing, branching, merging, and pushing directly from the Pharo image, eliminating the need for external command-line tools. Designed to bridge the gap between Pharo’s live object model and Git’s file-based architecture, Iceberg ensures that changes made within the Pharo environment are accurately reflected in the Git repository. This integration facilitates eficient management of code versions, supports collaborative development workflows, and simplifies the process of contributing to and maintaining Pharo-based projects [PDO20].

Moose. Moose is an open-source platform for software and data analysis, developed in Pharo, that enables analysts to construct custom analysis tools and workflows [ AEH+20]. It ofers services such as data importation, modeling, measurement, querying, mining, and the development of interactive visual analysis tools. Moose supports the creation of dedicated analysis tools and the customization of analysis processes. It provides mechanisms for importing and meta-modeling through a generic meta-described engine, parsing using various technologies, and visualization via graph and chart engines. The platform is primarily used for software analysis but is designed to handle various types of data. Moose is based on Pharo and is open-source under BSD/MIT licenses [NDG05].

B. Seaside C. Spec D. Roassal E. IceBerg F. Moose

[AEH+20] Nicolas

Anquetil

, Anne Etien, Mahugnon Honoré Houekpetodji, Benoît Verhaeghe,

International Conference on Software and Systems Reuse (ICSR'20) , number 12541 in LNCS,

pages 119 - 134 , December 2020 .

[BDN+09] Andrew

Black , Stéphane Ducasse, Oscar Nierstrasz, Damien Pollet, Damien Cassou,

[BDR08]

Alexandre

Bergel , Stéphane Ducasse, and

Lukas

Renggli . Seaside - advanced composition

and control flow for dynamic web applications . ERCIM News , 72 , January 2008 .

[Ber16]

Alexandre

Bergel . Agile Visualization. LULU Press, 2016 .

[BKL+22] Vitaliy

Bibaev

, Alexey Kalina, Vadim Lomshakov, Yaroslav Golubev, Alexander Bezzubov,

by learning from anonymous IDE usage logs , 2022 .

[BMM09]

Marcel

Bruch ,

Martin

Monperrus , and

Mira

Mezini . Learning from examples to improve

code completion systems . In Proceedings of the 7th joint meeting of the European software

engineering, pages 213 - 222 , 2009 .

[BRV16] Pavol Bielik, Veselin Raychev, and Martin Vechev . PHOG: Probabilistic model for code.

In International Conference on Machine Learning , 2016 .

[CNG15]

Andrei

Chiş , Oscar Nierstrasz, and

Tudor

Gîrba . Towards moldable development tools .

In Workshop on Evaluation and Usability of Programming Languages and Tools , 2015 . [DHDJML24] Koen De Hondt, Séphane Ducasse, Sebastian Jordan Montano, and Esteban Lorenzano.

Application Building with Spec 2.0. Book on Demand - Keepers of the lighthouse , 2024 .

[DLR07]

Stéphane

Ducasse , Adrian Lienhard, and

Lukas

Renggli . Seaside: A flexible environment

for building dynamic web applications . IEEE Software , 24 ( 5 ): 56 - 63 , 2007 .

[DRS+10] Stéphane

Ducasse

, Lukas Renggli,

David Shafer ,

Rick

Zaccone , and

Michael

Davies .

Dynamic

Web

Development with Seaside . Square Bracket Associates, 2010 .

[GHJV95] Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides. Design Patterns: Elements

of Reusable

Object-Oriented

Software . Addison-Wesley , 1995 .

[HBS+12] Abram

Hindle

, Earl T Barr, Zhendong Su,

Mark

Gabel , and

Premkumar

Devanbu . On the

naturalness of software . In Software Engineering (ICSE) , 2012 34th International Conference

on, pages 837 - 847 . IEEE, 2012 .

[HP10]

Daqing

Hou and

David M.

Pletcher . Towards a better code completion system by

Recommendation Systems for Software Engineering (RSSE) , 2010 .

[HPGB19] Vincent J Hellendoorn , Sebastian Proksch, Harald C Gall, and Alberto

Bacchelli . When

code completion fails: A case study on real-world completions . In 2019 IEEE/ACM 41st

International Conference on Software Engineering (ICSE) , pages 960 - 970 . IEEE, 2019 .