Design Principles for a High-Performance Smalltalk Dave Mason Toronto Metropolitan University, Toronto, Canada Abstract In its 40+ years of existence, there have been many implementations of Smalltalk and related languages, with many different design goals. Most have emphasized its best-in-breed development environment and its rich class library. A few have focussed more on performance and/or generating stand-alone code. The system described in this paper falls in the latter camp. This paper describes the work-in-progress and the design principles that are focussed in the short term on generating high-performance stanalone executables from an applicationi developed within a Pharo environment. Future goals include being able to support a live IDE based on the OpenSmalltalk clients: Pharo/Squeak/Cuis. Keywords interpreter compiler runtime environment 1. Introduction Smalltalk was created and iterated through the 1970s[1], culminating in Smalltalk-80[2, 3]. Smalltalk-80 was the version that was commercialized along corporate pathways that culminated in products by Cincom[4] and Instantiations[5] (many other commercial versions exist, notably GemStoneS[6]). A version of Smalltalk-80 was also used as the basis for ANSI Standard INCITS 319-1998[7]. Because the commercial versions were not very accessible, many “FOSS” versions have also been produced, mostly based on specification in the “Blue Book”[2]. The most widely available versions are Pharo[8], Squeak[9], and Cuis[10], all of which run on the Opensmalltalk-VM (see S4.1). There are many parts of what makes up “Smalltalk”: 1. The language is one of the simplest of programming languages, with the syntax famously “completely visible on a postcard”. There are only 2 kinds of statements: expressions, and returns (with assignment being a kind of expression). All the control structures are semantically simply the sending of messages. A straight-forward compiler to byte-codes is quite simple to build.1 2. The image. Most Smalltalk systems are live environments where browsers, debuggers, inspectors, and compilers are all simply ways at looking at live objects running in the system. IWST’22: International Workshop on Smalltalk Technologies $ dmason@ryerson.ca (D. Mason)  0000-0002-2688-7856 (D. Mason) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 1 ...although many tricks are applied to get good performance. 3. A rich library of classes, including dozens of kinds of collections, which have extremely coherent APIs because of the “duck-typing” aspect of the dynamic type system. 4. The virtual machine, which interprets the byte-codes or compiles them to native code (typically in a JIT model). In the 2017 Stackoverflow Developers Survey[11], Smalltalk was the second most loved language. A very common experience is that people who program in Smalltalk want to program everything in Smalltalk. There are a couple of ways this can be achieved: 1. Add features to the Smalltalk environments to support the particular use case, and program/run the application within Smalltalk. One way of doing this is to use “Foreign Function Interfaces” (FFIs) to access system libraries. Another way is building systems like the extremely powerful Seaside Web Framework[12] in Smalltalk itself. 2. But sometimes there are constraints that require running code in particular environments. Examples include: • PharoJS - “Develop in Smalltalk, Run on Javascript” where one develops code in the rich Pharo IDE, and then generates Javascript to run either in a web browser or a NodeJS server (see S4.4). • GNU-Smalltalk - allows writing scripts in Smalltalk, but they load and execute as command-line scripting applications (see S4.2). • Strongtalk - generates very high-performance native executable code, to address performance requirements (see S4.3). Here the programmer wants to program as much as possible in Smalltalk and then export the code to another environment where it will run. 2. Design Principles These are the design principles that we believe are relevant to a Smalltalk VM in 2022. Large memories Memory has become extremely inexpensive, and desktops and even smart- phones have gigabyes of main memory. Caches are critical, although remain difficult to optimize for, however having a thread’s heap and stack fit comfortably within any per-core L1 cache is an obvious goal. Large datasets are a significant parts of modern computing, so memory management must be tuned so these can be accessed and released without causing memory bloat. 64-bit and IEEE-768 64 bit processors and IEEE-768 floating point are becoming ubiquitous. This makes parametric polymorphism (i.e. having all parameters be the same size) an obvious thing to do. Floating point is becoming more important, so we want to avoid allocating floating values on the heap. Fortunately, NaN-boxing as described in S3.1 works well. Multi-core and threading Processors are not likely to get appreciably faster any time soon. The only way to continue to get speedup for applications is to exploit multi-core architectures. This means both efficient support for computational threads on separate cores without any global interpreter locks. It is also critical to have a parallel garbage collector that can run with minimal interaction among threads. Fast execution For a dynamically-typed, late-binding language like Smalltalk to be taken seriously for many applications, it must have good performance. Part of this is having fast method dispatch, to minimize the cost of that late binding. The other part is to have largish methods to allow optimizers to perform well. 3. Zag Smalltalk The rest of this paper describes the Zag Smalltalk system. The current goal is to generate code from a Pharo-written application that can load into a runtime, be compiled as a stand-alone application, and run. The runtime and the generated code are in the Zig language[13]. Zag is intended to run on modern, multi-core, 64 bit architectures. There are many interesting details about the interpreter and run-time, but the rest of this paper will focus on important principles that we believe make this system unique and will lead to excellent performance. 3.1. Immediate Values Immediate values are instances of classes that have constrained or no instance variables or indexable fields - some of which may have singleton values. Examples are SmallInteger, Double, Boolean, and Character. Since we are assuming all values are 64-bit values, it is natural to consider if there are ways to encode all values in 64 bits and to minimize the objects that have to be allocated in memory. Everything that can be coded as an immediate value is something that doesn’t have to be garbage collected. It turns out that the IEEE floating point format - which is ubiquitous - has a lot of holes called Not-A-Number or NaN values. Figure 3.1 shows the IEEE-754[14] 64-bit binary floating point number format. All the values where the exponent is 0x7FF are considered NaN (except for one positive and one negative infinity value), and these can be used for any purpose other than as valid floating point. This is a technique called NaN-boxing[15, 16]. Figure 1: Bit pattern for IEEE-754 64-bit floating-point numbers There are two of these NaN ranges - positive and negative - however using just the negative range gives us a lot of flexibility, and improves the speed with which we can recognize the class of an immediate value. Table 3.1 shows the allocated ranges. All the non-float values are coded within the negative NaN range - i.e. where the top 12 bits are 0xFFF. The next 4 bits being 6 denote a reference to a heap-allocated object header, explained in S3.4. The next 4 bits being 7 represent general classes - these could be used for any class that had 32-bit unique hash values. The next 4 bits being 8-15 denote SmallInteger values. Discovering the class of an immediate value is easy and efficient, considering the value as a unsigned 64-bit (u64) value. 1. if it is greater or equal to than SmallInteger minVal, it’s a SmallInteger (class 2); 2. if it is less than or equal to -inf value, it’s a Double (class 3); 3. if it is less than the high heap object address, it’s a heap object and we need to look at the header to determine the class; 4. extract bits 32-47 and that is the class number These are done in this order because SmallIntegers are the most common values. Table 1 Mapping of IEEE 64-bit floats to Smalltalk Immediate Values S+E F F F Type 0000 0000 0000 0000 double +0 0000-7FEF xxxx xxxx xxxx double (positive) 7FF0 0000 0000 0000 +inf 7FF0-F xxxx xxxx xxxx NaN (unused) 8000 0000 0000 0000 double -0 8000-FFEF xxxx xxxx xxxx double (negative) FFF0 0000 0000 0000 -inf FFF0-5 xxxx xxxx xxxx NaN (currently unused) FFF6 xxxx xxxx xxxx heap object FFF7 0001 xxxx xxxx reserved (tag = Object) FFF7 0002 xxxx xxxx reserved (tag = SmallInteger) FFF7 0003 xxxx xxxx reserved (tag = Double) FFF7 0004 0001 0000 False FFF7 0005 0010 0001 True FFF7 0006 0100 0002 UndefinedObject FFF7 0007 aaxx xxxx Symbol FFF7 0008 00xx xxxx Character FFF8-F xxxx xxxx xxxx SmallInteger FFF8 0000 0000 0000 SmallInteger minVal FFFC 0000 0000 0000 SmallInteger 0 FFFF FFFF FFFF FFFF SmallInteger maxVal Singleton Values The encodings of nil, false, and true are the sole representatives of their respective classes. Similar encoding could be used for any other classes with singleton values, or indeed any with only a 32 bit payload. Symbols Symbols are encoded very efficiently so that method dispatch can be as fast as possible. The low 24 bits of the immediate value encode the symbol number, which can be used to access the string representation of the symbol. The next 8 bits encode the arity of the symbol so that operations like perform: can execute without having to access the string representation. Together, the low 32 bits of the symbol constiture its hash value and method dispatch uses that directly - see S3.5. Character All possible Unicode characters are encoded in the hash value of Character imme- diate objects. Heap Objects The low 48 bits encode the memory address of a heap object header. To convert an immediate heap value into the address, a simple sign extension of the 48 bits gives a full address for today’s commodity hardware. Another 3 bits are available if/when a larger range is required because every heap object is on at least 8-byte boundaries, so the low 3 bits are 0. SmallInteger This gives 51-bit SmallIntegers (less than the 61-bit SmallIntegers in the OpenS- malltalkVM, but still a very large range). Converting between tagged SmallIntegers and untagged integers is a simple matter of adding or subtracting the SmallInteger 0 value. Having SmallIntegers organized this way provides many efficiencies: • all of the comparison operations between two SmallIntegers work naturally (i.e. without having to convert to normal integers); • adding/subtracting a normal (in-range) integer to/from a SmallInteger works naturally (detect under/overflow if the result is less than SmallInteger minVal) • adding/subtracting a small normal integer constant (like +/- 1) to/from a SmallInteger that we know is moderate in size doesn’t need to be under/overflow checked because the range of SmallInteger is so large; • for immediate values basicIdentityHash will just be the values or’ed with SmallInteger 0, which will turn any value into a positive SmallInteger; • or, xor, etc. with positive normal integers will work naturally; • or, and with tagged positive integers will work naturally. 3.2. Multi-core Support As is well known, per-processor speed is no longer advancing significantly, so all performance advances in the future are expected to be from using multi-core architectures to their limits. To the best of our knowledge, most Smalltalk systems continue to only support cooperative process context switching. Since we are looking for very high performance, we are structuring our system from the outset with support for multiple operating-system threads. This means that user-written code will have to use synchronization primitives to control access to shared data. However there is very little additional syncronization required by the runtime infrastructure, beyond memory allocation described in S3.3. Only the interning of a new symbol, or manipulation of the dispatch tables described in S3.5 requires any other kind of global lock. There are 3 kinds of operating system threads used by the system: mutator, collector, and I/O. Mutator threads These are the main execution threads. Each mutator thread has its own stack and local heap. The heap is organized as a small nursery arena where most allocations take place (currently about 3k objects). The stack is in the top of this area and grows down toward the nursery heap. Each thread has a lock, and the thread will block on this lock when interacting with other threads. The thread checks at opportune times during execution to see if another thread wants to interact. A compute-bound job would typically allocate as many threads as it could use, up to 1 less than the number of CPU cores available. Global Collector thread The global collector periodically performs a mark-sweep collection on the global arena. It interacts with the mutator threads to determine the roots for collection, as described in S3.3. Input/Output threads Any blocking operations with the operating system will done by I/O threads, because they do not have to interact with the global collector. If a mutator thread needs a blocking operation, it requests the appropriate I/O thread to do the operation, leaving a reference to itself, and then blocks on its lock. When the I/O thread has completed the work, it indicates this to the mutator thread and wakes it up. 3.3. Heap Allocation As mentioned in S3.2, this system is designed to support multiple threads, with minimal in- terference. Since these arenas are unique to the thread, there is no interference or interaction with other threads, so allocation is simply checking for overflow, then storing the values and advancing the heap pointer. If at any point allocation for the heap or stack would cause those pointers to cross, the live data is copied to the current of two somewhat larger thread-local intermediate (teen) arenas (about 9k objects), the stack area will be adjusted by any forwarded objects, and some of the contexts on the stack could be moved to the teen heap if the stack is getting too big. Live data is copied back and forth between these teen arenas until they become too full or an object has been copied 8 times, at which point older data is copied to the global arena. The global arena is a non-moving mark-and-sweep arena, and a thread is dedicated to periodically collecting this arena. There are no pointers from the global arena to any of the per-thread arenas. The global arena uses a similar structure to Mist[17], which is to say it maintains a set of linked lists of objects of particular sizes. Instead of the binary-sized blocks that Mist uses, Zag uses fibonacci-sized blocks. For example, if we need a block for an object of size 15 we would look in the list for the next fibonacci number (21), and if not found, we’d see if the next-sized list had any, and so-on up. If we find a larger block, we split it into the 2 smaller blocks and put them in the appropriate linked lists. If we don’t find a larger block, we request a new block of memory from the operating system, allocate it into the appropriate lists and then search again. So in our example we would take a block of size 21, allocate the first 15 to the object, the next 5 into the 5-list, and 1 word unusable. For objects with arrays larger than about 2k words (16KiB), the array portion will be allocated its own block of pages, and the object will have a remote reference to the block. The advantage of this is that when the object goes away, we can release its block of pages back to the operating system and reduce memory footprint. If the large data were to be allocated in the object itself, odds are it would never be possible to release the memory because it would get intermingled with small objects. Mutator/Collector thread interactions At the start of the mark phase, the global arena collector first scans known global structures for roots: the class table, the dispatch table, the symbol table, and the thread table. Then it iterates to collect roots from the mutator threads: 1. set a flag in each mutator thread to say it wants roots; 2. when the mutator notices, it will do a collection and then find the first 100 global objects referenced by the stack/heap; then if there are more global references, it will block; 3. the collector looks for mutators that have provided their global objects, and marks all of them (oring 1 into the age field), then wakes up the thread if it has blocked 4. when all the mutators have provided their root global references, the mark phase is complete. While the mark phase is proceeding, allocations can still be made, they are simply allocated as marked. During the sweep phase, allocations can be made if there are appropriately-sized block available, but a mutator thread would block rather than request a new block from the operating system. The sweep phase then goes through memory accumulating blocks of unmarked memory. If any of the components of that unused memory include references to indirect blocks, those indirect blocks are put into a list for release. Each accumulated block of memory discovered is then parcelled out to the appropriate fibonacci-size lists. Each marked allocation has the mark cleared. Once the sweep phase is complete, allocations are fully enabled and any blocked mutator threads are awakened. All the indirect blocks that were discovered to be unused are returned to the operating system. Then the collector pauses for a short period and then starts the cycle over again. 3.4. Heap Objects We are inspired by some of the basic ideas from the SPUR[18] encoding for objects on the heap, used by the OpenSmalltalk VM (see S4.1). First we have the object format tag. The bits code the following: • bit 0-4: encode indexable fields – 0: no indexable fields – 1: 64-bit indexable no pointers - native words (DoubleWordArray,DoubleArray,) or non-pointer Objects (Array) – 2-3: 32-bit indexable - low bit encodes unused half-words at end (WordArray, Inte- gerArray, FloatArray, WideString) – 4-7: 16-bit indexable - low 2 bits encode unused quarter-words at end (Double- ByteArray) – 8-15: byte indexable - low 3 bits encode unused bytes at end (ByteArray, String) – 17: 64-bit indexable with some pointers - (Array) • bit 5-6: encode instance variables – 0: no instance variable – 32: instance variables - no pointers – 64: instance variables - pointers – 96: weak (implying instance variables) - pointers - even if there aren’t, because weak values are rare, and they only exist to hold pointers • bit 7: = 1 says the value is immutable Therefore, only the following values currently have meaning: • 32,64: non-indexable objects with inst vars (Association et al) • 1-17: indexable objects with no inst vars • 33-49,65-81: indexable objects with inst vars (MethodContext AdditionalMethodState et al) • 96: weak non-indexable objects with inst vars (Ephemeron) • 97-113: weak indexable objects with inst vars (WeakArray et al) Note that we differentiate for objects that contain no pointers, either in the instance variables or the indexable values. This means that if the format anded with 80 = 0, there are no pointers, and garbage collect can skip over the fields without having to scan for pointers (and if it does scan and finds no pointers, it changes the format to say there are no pointers). Heap objects are initially created as their pointer-free version. If a pointer is being stored in an object: • if the object is immutable, throw an exception; • if storing into the indexable part of a format 2-15 object, throw an exception; • if the object has a pointer-free format, update it to pointer-containing; • if the object is on the global heap, then the pointed-to object must be promoted to the global arena, recursively. If there are both instVars and indexable fields, the length field is the number of instVars which are followed by a word containing the size of the indexable portion, which follows. Weak objects are rare enough that we don’t bother to handle cases with no instance variables separately. If there aren’t both instVars and indexable fields, the size is determined by the length field. The only difference between instVars and indexables is whether ‘at:‘, ‘size‘, etc. should work or give an error. If the array length is >= 4094 (whether in the length field or the additional size word), the values are indirect, and the object will simply contain a size, the address of the indirect block, Table 2 Object header layout Bits What Characteristics 12 length number of long-words beyond the header 4 age 0 - nursery, 1-7 teen, 8+ global 8 format see above 24 identityHash 16 classIndex LSB and an entry in the linked list of indirect objects. This can only occur in the global arena, so any such large objects are allocated immediately in the global arena. Table 3.4 describes the header-word for an object. If the length field is 4095, then this is a forwarding pointer, and the low 48 bits are the address of the real object. This can occur for several reasons: • during mutator arena copying collection, when an object is copied, a forwarding pointer is left behind so other references to the same object can be updated properly; • if a value is promoted from a mutator arena to the global arena, a forwarding pointer is left behind; • if a become: exchanges two objects, the forward pointer will point to an exchange object. 3.4.1. Some Particular Heap Objects Contexts Contexts are allocated on the stack, but may be promoted to the teen arena, for example if thisContext is referenced, or if the stack becomes too large. Strings Strings are stored as UTF-8 sequences or as ASCII sequences. If a String contains any non-ASCII sequence, it must be scanned to determine its size, or to index it, and is immutable. 3.5. Unified Dispatch One of the things that is expected to most significantly improve performance is the unified - or single-level - dispatch. Every class will have the full set of methods that it has been asked to respond to. When a message is sent to an object the hash value for the selector symbol will be used to hash into the dispatch table. This is analogous to how dispatch works in a statically-typed language like Java. In Java, a method call is associated with a direct integer index into the dispatch or v-table. For a variety of reasons, this is not possible for a language like Smalltalk. However, it can be approximated with a hash from the selector to a corresponding entry in the v-table. The v-table is chosen to be large enough to create a “perfect” hash - one with no conflicts (though there may be gaps). If it’s not found, it will be searched for in the class and super-classes - if found it will be compiled and added to the dispatch table. This means that once stabilized, all message dispatches will require a single hash to access the method. 3.6. High-Performance Inlining When a method is being generated, messages sent to self, as well as methods with few versions, can be inlined. This will have a similar, but much more significant and principled, effect to the current inlining of special methods like iftrue:ifFalse:. This is a fundamental requirement for high-performance compilation. Contrast this with the traditional Smalltalk code with only a few message sends per method - this is great for developers, but death to optimizers. For example a method with a collect:thenSelect: where we know the class of the receiver would inline the method replacing the block parameters with the actual block code. Then the to:do: would be inlined, which contains a whileTrue: which would be inlined as a loop (because it is a recursive definition). Then the parameters to the blocks would be inlined. All the references to self including class, size and at: would be inlined. With some dead-code elimination and recognition that the numbers can all be converted to native values and that the index passed to at: is guaranteed to be in the valid range, this becomes a loop very close to the most efficient possible loop. 3.7. Code Generation We are pusuing two approaches to code generation. Threaded execution To support the full reflective model of execution, but with better performance than available from a traditional interpreter, in the full interpretive model we are using threaded execution. Threaded execution codes a method as a sequence of function addresses, each of which calls on to the next. This is quite convenient to do in Zig[13] as it has explicit support for tail calls. Figure 3.7 shows an example of a threaded function. Each Figure 2: Zig code for pushConst primitive pub fn pushConst(pc: [*]const Code, tos: [*]Object, heap: [*]Object, thread: *Thread, caller: Context) Object { checkSpace(pc,tos,heap,thread,caller,1); const newTos = tos-1; newTos[0]=pc[0].object; return @call(tailCall,pc[1].prim,.{pc+2,newTos,heap,thread,caller}); } function in a Code block has the same parameters: a pointer to the next “instruction”, stack and heap pointers, a reference to the current Thread, and the caller’s context. This function copies the next object from the code to the top of stack, and then passes control to the next function (“prim”), passing the modified function pointer and stack, as well as the rest of the parameters. Figure 3.7 shows what a sequence of threaded code might look like. This pushes two objects (3 and 4) on the stack and then does primitive 110 (==) and we expect false as the result. This model can be easily single-stepped and is amenable to other tools. Figure 3: A trivial sequence of threaded code p.pushConst,3, p.pushConst,4, p.p110, return_tos, Exported Zig code The other way we are currently exporting code is as Zig programs that can be compiled and linked against the runtime and produce stand-alone executable programs. This is currently only useful for benchmarking and exploring what a JIT could be expected to produce. 4. Related Work 4.1. Opensmalltalk-VM[19, 20] OpenSmalltalk-VM is a virtual machine (VM) for languages in the Smalltalk family (e.g. Squeak, Pharo) which is itself written in a subset of Smalltalk that can easily be translated to C. Devel- opment is done in Smalltalk. The production VM is derived by translating the core VM code to C. This is the VM that underlies the Pharo, Squeak, Cuis, and Newspeak systems. It is a high quality VM implementation including a JIT compiler, but doesn’t attain the performance of a similar model such as the V8 Javascript interpreter or NodeJS. It also has significant dependencies on a single hardware execution thread. 4.2. GNU-Smalltalk[21] GNU Smalltalk inspired the overall structure of the heap for Zag. Historically it has run in a strictly stanalone/scripting mode, but an IDE has become available. 4.3. StrongTalk[22] Strongtalk was a very high-performance Smalltalk system that included partial-typing. This type information was part of what made it so fast. Unfortunately, the project was abandoned. 4.4. PharoJS[23] PharoJS inspired some of the ideas here, such as generating a stand-alone module (in the PharoJS case, to run on a web browser), as well as the mechanism for bringing in all the necessary classes and methods. 4.5. Mist[17] Mist inspired important aspects of the global arena of the heap. Unfortunately, it appears to have been abandoned. 4.6. GildaVM[24] GildaVm explores some interesting approaches to minimize the effect of a Global Interpreter Lock as a way to bring multiple threading to OpenSmalltalk (see S4.1). This is to offset the known problems of the GIL that have been well documented in the Python world. 5. Status and Future Work The current goal is to generate code from a Pharo-written application that can load into a runtime, be compiled as a stand-alone application, and run. We can currently run trivial, hand-compiled, programs. Once the code generator is working, there are a variety of experiments to run, including determining how significant the unified dispatch and inlining are in affecting performance. We are very interested in benchmarking against Strongtalk. In the longer term we intend to do JIT code generation and be able to add new methods to a dispatch table dynamically. Then we will be working on generating a fully-functional system using one of the open-source Smalltalk IDEs. Another avenue is integrating a type-inference system so that the inliner can generate many more opportunities for optimization. References [1] D. Ingalls, The evolution of smalltalk: From smalltalk-72 through squeak, Proc. ACM Program. Lang. 4 (2020). URL: https://doi.org/10.1145/3386335. doi:10.1145/3386335. [2] A. Goldberg, D. Robson, Smalltalk-80: The Language and its Implementation, Addison- Wesley, Don Mills, Ontario, 1983. URL: https://rmod-files.lille.inria.fr/FreeBooks/BlueBook/ Bluebook.pdf. [3] G. Krasner, Smalltalk-80: Bits of History, Words of Advice, Addison-Wesley Longman Pub- lishing Co., Inc., Don Mills, Ontario, 1983. URL: https://rmod-files.lille.inria.fr/FreeBooks/ BitsOfHistory/BitsOfHistory.pdf. [4] Cincom, Cincom (VW) Smalltalk, Accessed 2022-06-01. URL: https://www.cincomsmalltalk. com/. [5] Instantiations, Instantiations (VAST) Smalltalk, Accessed 2022-06-01. URL: https://www. instantiations.com/. [6] Gemtalk, GemStone/S, Accessed 2022-06-01. URL: https://gemtalksystems.com/. [7] INCITS, ANSI Smalltalk Standard, 1998. URL: https://webstore.ansi.org/Standards/INCITS/ INCITS3191998S2012. [8] Pharo, Pharo Smalltalk, Accessed 2022-06-01. URL: https://pharo.org/. [9] Squeak, Squeak/Smalltalk, Accessed 2022-06-01. URL: https://squeak.org/. [10] Cuis, Cuis Smalltalk, Accessed 2022-06-01. URL: http://cuis-smalltalk.org/. [11] StackOverflow, 2017 most loved languages, 2017. URL: https://insights.stackoverflow.com/ survey/2017#most-loved-dreaded-and-wanted. [12] Seaside, Seaside web framework, Accessed 2022-06-01. URL: https://github.com/seasidest/ seaside. [13] Z. Foundation, Zig is a general-purpose programming language and toolchain for maintain- ing robust, optimal, and reusable software, Accessed 2022-06-01. URL: https://ziglang.org. [14] Wikipedia, Ieee-754, Accessed 2022-06-01. URL: https://en.wikipedia.org/wiki/IEEE_754. [15] P. Duperas, Nan boxing or how to make the world dynamic, Accessed 2022-06-01. URL: https://piotrduperas.com/posts/nan-boxing. [16] R. Nystrom, Nan boxing, Accessed 2022-06-01. URL: https://craftinginterpreters.com/ optimization.html#nan-boxing. [17] M. McClure, Mist smalltalk, Accessed 2022-08-11. URL: https://mist-project.org/. [18] E. Miranda, A spur gear for cog, Accessed 2022-06-01. URL: http://www.mirandabanda. org/cogblog/2013/09/05/a-spur-gear-for-cog/. [19] E. Miranda, C. Béra, E. G. Boix, D. Ingalls, Two decades of smalltalk vm development: Live vm development through simulation tools, in: Proceedings of the 10th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages, VMIL 2018, Association for Computing Machinery, New York, NY, USA, 2018, p. 57–66. URL: https: //doi.org/10.1145/3281287.3281295. doi:10.1145/3281287.3281295. [20] E. Miranda, C. Béra, OpenSmalltalk VM on Github, Accessed 2022-06-01. URL: https: //github.com/OpenSmalltalk. [21] G. S. Foundation, GNU Smalltalk, Accessed 2022-06-01. URL: https://www.gnu.org/ software/smalltalk/. [22] G. Bracha, D. Griswold, Strongtalk: Typechecking smalltalk in a production environ- ment, in: Proceedings of the Eighth Annual Conference on Object-Oriented Programming Systems, Languages, and Applications, OOPSLA ’93, Association for Computing Machin- ery, New York, NY, USA, 1993, p. 215–230. URL: https://doi.org/10.1145/165854.165893. doi:10.1145/165854.165893. [23] N. Bourqadi, D. Mason, Pharojs, Accessed 2022-06-01. URL: https://pharojs.org. [24] G. Polito, P. Tesone, E. Miranda, D. Simmons, Gildavm: a non-blocking i/o architecture for the cog vm, in: Proceedings of the 14th Edition of the International Workshop on Smalltalk Technologies, IWST ’19, 2019.