<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Memory Considerations</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Toronto Metropolitan University, Toronto, Canada</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Storage Type Single CPU Instruction (at 3 GHz) Registers (storage for active instructions) Memory Caches Main System Memory (RAM) NVMe SSD</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>In dynamically typed programming languages, variables and expressions don't have types associated with them; rather, values have types. This means that choosing the correct machine instructions to operate on the data requires run-time dispatch on the types associated with the values. The simplest encoding is to simply store every object in a memory heap, where they are tagged with their type. This puts tremendous pressure on the memory system and the memory allocation system. The most complex encoding will put everything possible into immediate values to minimize this pressure. In this paper we outline the relative costs and needs of the various models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Dynamically-Typed Languages</kwd>
        <kwd>Runtime Systems</kwd>
        <kwd>Value Encoding</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Real Time Scale
0.3 nSec
0.3 to 1 nSec
0.7 to 4 nSec
10 to 20 nSec
100 to 200 uSec</p>
      <p>Modern processors fetch instructions, decode them, and put them in the execution pipeline. This
means that unconditional branches and calls to fixed addresses have almost no cost as long as the
pipeline remains relatively full. Similarly, conditional branches are relatively inexpensive, because of
branch prediction hardware and sometimes decoding along more than one potential execution path.
What are expensive are indirect branches because only after the data is available can the fetch-decode
logic start adding to the pipeline. Pipelines can execute instructions out-of-order and sometimes in
parallel, as long as dependencies are observed.</p>
      <p>There are many options in terms of dynamically-typed language implementation, from byte
interpreters to threaded execution to JIT’ed native code. However the encoding choices will afect all of the
implementations, to varying degrees.</p>
      <p>Paper outline. First, in section 2 we catalog the considerations that afect performance for a given
encoding. Section 3 discusses 5 possible encodings from the perspective of those considerations. Section 4
describes an experiment to compare those encodings. Finally, section 5 draws some conclusions and
identifies aspects that are beyond the scope of this paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Encoding Considerations</title>
      <p>Encoding values, particulaly for modern architectures has 3 considerations:
Determining types. Since operations must be parameterized at run time by the types of the values,
the first step is to access the types of the values. If the types are in memory, this can cause a
pipeline stall until they can be loaded.</p>
      <p>Accessing values. In order to perform operations, the values must be made available to the CPU. As
seen in Table 1, accessing values in memory is an order of magnitude slower than accessing them
in registers, and even if the values happen to be in cache they will be several times slower than in
registers.</p>
      <p>Supporting Memory Management. Dynamically-typed languages invariably have an automatic
memory management system, whether reference counting or garbage collection. Objects need to
be allocated, and removed when no longer needed. While memory management has improved
greatly over the decades, there remains a significant cost to both actions, and the more objects
allocated in memory, the greater the cost. We use the term churn to refer to the allocation and
subsequent collecting of in-memory objects.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Catalog of Encodings</title>
      <p>In an Object-Oriented language, objects can be categorized as mutable or immutable. Mutable obects
must be allocated in memory and have references passed so that when the object is mutated it is seen as
such by all other references. These allocations are typically on a heap and have a header word. Without
loss of generality, we will use 64-bit words and Zag encoding throughout this paper. For example the
Zag memory object looks like:
n n n n n n n n n n n n a a a a f f f f f f f f h h h h h h h h h h h h h h h h h h h h h h h h c c c c c c c c c c c c c c c c
fields</p>
      <p>· · ·
Here, n is the 12-bit number of words the object occupies; a is the 4-bit age of the object; f is the 8-bit
format of the object; h is the 24-bit identity hash; and c is the 16-bit class of the object. It is followed by
the fields of the object. The reference to the object is the address of the header word.</p>
      <p>There are many possible ways to encode the range of immutable values that arise in normal
calculations. In this section we discuss 6 possible encodings that encompass the range from all values
being in memory, to treating as many things as immediate as possible. We examine each from the
perspective of the three considerations outlined in section 2. Table 2 summarizes the important encoding
considerations for the encodings discussed below.</p>
      <sec id="sec-3-1">
        <title>3.1. Every Object in Memory</title>
        <p>every created immutable requires allocation
common integers avoid allocation
integers avoid allocation
integer and most float values avoid allocation
most immutables avoid allocation
more immutables avoid allocation
This is the simplest model. Every object - mutable or immutable - is encoded as a word-aligned address:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a 0 0 0
Here the low 3 0-bits are part of the address, and the high 16 0-bits are hardware limitations.</p>
        <p>Determining the value type requires a memory access, as does accessing the actual value.
Interoperation with foreign functions is complicated. The result of every operation must be allocated to
memory. This will create a huge amount of churn, although most of the values wil be very short-lived.
Immutable values  ,  ,   , and ASCII Characters are normally allocated at fixed addresses
to gain some performance by allowing hardcoded memory address comparisons and we assume this
convention.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Every Object in Memory with SmallInteger Cache</title>
        <p>This is the same encoding as above, with a simple optimization of statically defining a set of small
integers (say -7..100). This means that if a result is in this pre-allocated set, then no allocation or
collection is required.</p>
        <p>This reduces churn because the commonly created objects can be easily identified and no allocation
or collection of them is required. It also may get some memory-cache advantage as these values will
typically be in cache.
3.3. Tag 
Early Lisp and Smalltalk implementers noticed that the vast majority of values that were created were
for  objects. Most such systems tag small integers with a 1 bit tag (either in the low bit
leaving all other addresses natural by aligning all objects on at least a word boundary, or (less commonly)
using the sign bit). This encoding was so important that the SPARC architecture had basic arithmetic
instructions that checked the low bit of the parameters to verify they were flagged as integers.</p>
        <p>Here  values are encoded as:
s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s 1
and everything else remains encoded as the address:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a 0 0 0
Note that the low bit distinguishes the integer value from the reference.</p>
        <p>
          A small drawback of this encoding is that 1 bit of precision is lost, but most such systems move
automatically to big integers with unbounded precision on overflow. Furthermore, on modern, 64-bit
systems that one bit of precision is fairly irrelevant.
3.4. Tag  , ℎ ,   (Spur Encoding)
The Spur[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] encoding cleverly extends the 1-bit tag to 3 bits and in addition to  and
general (memory) objects, supports encoding for Unicode characters and a subset of   .
        </p>
        <p>The default encoding for everything not described below remains encoded as the address:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a 0 0 0
The encoding for  is:
s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s 0 0 1
The encoding for ℎ is:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c c c c c c c c c c c c c c c c c c c c c 0 1 0
The encoding for   is:
e e e e e e e e m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m s 1 0 0
Examining the low 3 bits of an object, a simple  instruction disambiguates the encodings. Only
the pointer tag is a natural value - all the others require some decoding before use.</p>
        <p>
          The   value maintains the full mantissa from the 64-bit IEEE-754 floating point
(see section 3.5), but reduces the exponent range, so it can represent all 64-bit values in the range of
± 5.87d-39 to ± 6.80d+38. The exact algorithms for converting between   and
IEEE-754 are detailed in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] but require 7-8 instructions. If a value is not encodable, it is stored in
memory.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.5. NaN Encoding</title>
        <p>
          NaN encoding utilizes the large number of code points in the IEEE-754 floating point encoding that do
not represent valid floating-point numbers, by using those bit patterns to represent values of other
types, including pointers and  . This encoding has been used by Spidermonkey[
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], and
was the originally planned encoding for Zag Smalltalk[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The IEEE-754 64-bit floating point number is
formatted as:
s e e e e e e e e e e e m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m
When all the exponent bits are 1 and the mantissa is non-zero, it is considered not-a-number or NaN.
This gives a huge number of potential bit patterns that can be used to encode other values. Our overall
encoding looks like:
1 1 1 1 1 1 1 1 1 1 1 1 t t t t x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
where the t bits are a 4-bit tag where 0 encodes general references; the next 10 values encode various
kinds of immediate  values; the next tag (0xB) is encoded as:
1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 c c c c c c c c c c c c c c c c x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
where the c is the 16-bit class and encodes  ,   ,   , ℎ ,
   with the x bits being the extra information for the class. Finally, 
is encoded as:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s
where the s is the 50-bit, twos-complement integer value.
        </p>
        <p>This encoding recognizes all   values with a single comparison and they require no conversion
before use. All the other values, including references, require more manipulation to extract the class,
and access the value. It can encode all immutable values with up to 32 bits of extra data. It can also
encode some of the same  values as described in Section 3.6, but with much-restricted
parameters.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.6. Tag Most Possible Values (Zag Encoding)</title>
        <p>
          Zag Smalltalk[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] now uses a modified-Spur encoding. It is modified in 3 ways:
1. it uses all 8 possible values of the low 3 bits;
2. all non-float, non-reference objects are encoded with an auxiliary tag;
3. it supports a larger range of float values, with fewer instructions to decode/encode.
        </p>
        <p>The default encoding for everything not described below (and the  object) remains encoded as
the address:
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a 0 0 0
The encoding for 56-bit  is:
s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s s 1 0 0 0 0 0 0 1
Which is a special case of the encoding for immediate classes (including  ,   ,   ,
   , and ℎ , which is:
x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x c c c c c 0 0 1
where the c encodes up to 32 special classes. There are several classes that encode special
 subclasses that are common in code; for example the immediate closure that does a
non-local return of an instance variable looks like:
a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a a i i i i i i i i 0 0 0 1 0 0 0 1
where a is the address of the  that we are to return from; i is the 8-bit field number in the
 object for that  . The encoding for   is similar to Spur, except there
are 6 used values for the low 3 bits:
e e e e e e e e m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m m s E E E
where e are the low 8 bits of the exponent and m are the bits of the mantissa (like Spur), but the E
are the high 3 bits of the exponent ofset by 2. This increases the range of the representable floating
point values to all 64-bit floats smaller than ± 2.68d154, including 0 and the small fractional values.
Examining the low 3 bits of an object disambiguates the encodings. Only the pointer tag is a natural
value - all the others require some decoding before use.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Design</title>
      <p>Zag Smalltalk was originally going to use NaN-encoding, and even after the decision was made to use
the Zag-encoding described in Section 3.6, the code for both was carried in parallel, although bit-rot set
in for the NaN encoding as additional features were implemented. However the question of the best
encoding has always remained.</p>
      <p>There is a single enumeration in the Zag configuration module that determines which encoding will
be used, so in principal, Zag can be compiled with any one of these encodings. Since the decision was
made to run this experiment, much work has been done to factor out the part of the code that is aware
of the underlining encoding into a single module. This has been a more significant undertaking than
originally understood as encoding assumptions appeared in many parts of the codebase. The refactoring
is nearing completion, but was not available for this paper.</p>
      <p>The experiment is to run the fibonacci algorithms shown in Listing 1 and Listing 2.</p>
      <sec id="sec-4-1">
        <title>Listing 1: fibonacci1</title>
        <p>f i b o n a c c i 1
f i b o n a c c i 2
s e l f &lt; 2 i f T r u e : [ ^ s e l f ] .
^ ( s e l f − 1 ) f i b o n a c c i 1 + ( s e l f − 2 ) f i b o n a c c i 1</p>
      </sec>
      <sec id="sec-4-2">
        <title>Listing 2: fibonacci2 ^ s e l f &lt; 2 i f T r u e : [ s e l f ] i f F a l s e : [ ( s e l f − 1 ) f i b o n a c c i 2 + ( s e l f − 2 ) f i b o n a c c i 2 ]</title>
        <p>Each will be run 4 times with the combinations of with/without any inlining and  and
  as receiver. The plan is to evaluate 30  and 30.0  , however this may not
be tractable with the memory-allocating encodings. This will exercise the integer and floating-point
paths, and the version with no inlining will exercise the  creation in both listings and
the non-local return in Listing 1.</p>
        <sec id="sec-4-2-1">
          <title>4.1. Hypothesis</title>
          <p>Zag design features other than the choice of encoding will influence some of the runtimes. In particular,
Zag has a three-tier memory system: stack, nursery, and global heap, and references are only allowed to
go to the right. None of the test runs should hit the global heap, and the nursery is a copying collector,
so should be quite eficient.</p>
          <p>Inlining is a fundamental part of the Zag code generator, and applies to the threaded-execution
and the eventual JIT (the threaded code version drives the JIT). However, inlining is not done on a
per-selector basis, so “no-inlining” does not include inlining that most Smalltalk systems do (ifTrue:,
etc.), so that version will create the nonlocal-return closure in the first listing and both closures in the
second version and will send the ifTrue: and ifTrue:ifFalse: messages to the boolean result of
the comparison.</p>
          <p>The inlined version, on the other hand will inline everything except the recursive calls to   .
This also means that no context will be created on any path that doesn’t perform a send.</p>
          <p>For some runtime systems the  objects would have to be allocated on the heap,
but Zag actually allocates them on the stack. However, this still is a moderate amount of work, so
the encodings that produce immediate values will be faster. The non-local return block is potentially
particularly tricky, because it references the context from which it must return, so if it were allocated in
the nursery it would force the context to also spill to the nursery. This would lead to significant churn
in the first 4 cases. The last 2 encodings create immediate values for some of these blocks, so do not
force any context spilling, and may, in fact, create no nursery allocations at all.
4.1.1. Every Object in Memory
The result of every addition will allocate in the nursery. This will create a great deal of churn for the
numbers.
4.1.2. Every Object in Memory with SmallInteger Cache
The first 11 fibonacci numbers are in the cache we pre-allocated, and those are the most frequently
generated values, so the integer versions of the benchmark should run significantly faster. The
floatingpoint version will have no speedup.
4.1.3. Tag 
Now all integers (for the test) have immediate values, so the integer tests should speed up significantly.
The floating-point version will have no speedup.
4.1.4. Tag  , ℎ ,   (Spur Encoding)
Because Spur encodes immediate values for all of the generated floating-point values, the  
version of this should run significantly faster. The integer version should experience no speedup.
4.1.5. NaN Encoding
NaN encoding has the floating-point already in the correct format, therefore this should have the fastest
lfoating point version. The integer versions may be somewhat slower because more manipulation
is required to retrieve the integral value. Because the blocks are immediate values, the “no-inlining”
version should speed up noticeably.
The Zag native format should be slightly slower than Spur for inlined integer. It should also be a
bit faster than Spur for floating point, because the decoding is a couple of instructions faster. The
lfoating-point runs should be a bit slower than for NaN-encoding.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <sec id="sec-5-1">
        <title>Future Work</title>
        <p>Some of the encodings described in this paper have been common lore for decades, and highly applicable
to 32-bit machines. With the advent of 64-bit architectures, the potential design space has broadened
significantly. Other encodings are much more recent. The value of this paper is to describe the spectrum
of encodings, and to explore how the encoding interacts with the computer architecture.</p>
        <sec id="sec-5-1-1">
          <title>It is unfortunate that the actual experimental data is not yet available.</title>
          <p>The proposed benchmark, although useful to demonstrate some of the macro properties of various
encodings, does not fully exercise the potential of the Zag encoding. It would be good to run a more
realistic benchmark such as Delta Blue or Richards.</p>
          <p>A complex issue is how much jitted code can capture the flow of type information. The Zag approach
is to handle this as much as possible at the inlining stage, inferring type information, but it would be
interesting to look at how dynamic information can be made to flow.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The author has not employed any Generative AI tools.</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Béra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Miranda</surname>
          </string-name>
          ,
          <article-title>A bytecode set for adaptive optimizations</article-title>
          , in: International Workshop on Smalltalk Technologies, Cambridge, United Kingdom,
          <year>2014</year>
          . URL: https://inria.hal.science/ hal-01088801.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Béra</surname>
          </string-name>
          ,
          <volume>64</volume>
          bits
          <string-name>
            <given-names>Immediate</given-names>
            <surname>Floats</surname>
          </string-name>
          ,
          <year>2018</year>
          . URL: https://clementbera.wordpress.com/
          <year>2018</year>
          /11/09/ 64-bits-immediate-floats/, [Online; accessed 2025-
          <volume>08</volume>
          -10].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Mozilla</surname>
          </string-name>
          , Spidermonkey javascript engine,
          <year>2025</year>
          . URL: https://spidermonkey.dev/, [Online; accessed 2025-
          <volume>04</volume>
          -08].
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mason</surname>
          </string-name>
          ,
          <article-title>Design principles for a high-performance smalltalk</article-title>
          ,
          <source>in: Proceedings of the International Workshop on Smalltalk Technologies, IWST '22</source>
          ,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Zag</surname>
            <given-names>smalltalk</given-names>
          </string-name>
          ,
          <year>2025</year>
          . URL: https://github.com/Zag-Research/Zag-Smalltalk, [Online; accessed 2025-
          <volume>04</volume>
          -08].
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>