<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Smalltalk JIT Compilation: LLVM Experimentation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Janat Baig</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dave Mason</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Toronto Metropolitan University</institution>
          ,
          <addr-line>Toronto</addr-line>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper discusses the ongoing development of the Zag Smalltalk LLVM JIT Compiler project. The project is aimed at enhancing the performance of dynamic languages through JIT compilation using LLVM. We highlight the project's rationale, emphasizing LLVM's optimization capabilities and architectural support. Our methodology focuses on generating LLVM IR code and exploring optimization strategies to improve execution eficiency. Zag Smalltalk is a from-scratch implementation of a Smalltalk virtual machine whose low-level is implemented in Zig. The project aims to implement a higher-performance Smalltalk as well as to inspire and support existing OpenSmalltalk systems. To further advance Zag's performance, the integration of a Just-In-Time (JIT) compiler using LLVM[1] is underway. This paper explores the initial eforts to develop the JIT compiler, underlining its potential to improve speed and performance of execution by leveraging LLVM's extensive architecture support and optimization features that are applied to it's intermediate representation (IR). Code generation models cover a spectrum ranging from interpreters handling intermediate bytecodes to compilers producing native machine code. Currently, Zag Smalltalk is using an intermediate strategy known as threaded execution, which falls between these two extremes.[2] Although that has good performance, we also want a JIT to maximize performance. For a variety of reasons we believe LLVM is the most viable path to generating a high-quality JIT as quickly as possible. We will use the fibonacci code in listing 1 for examples in this paper.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLVM</kwd>
        <kwd>JIT code generation</kwd>
        <kwd>Smalltalk</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
    </sec>
    <sec id="sec-2">
      <title>2. Zag execution model</title>
      <p>Zag uses a continuation-passing-style of execution. A send saves return addresses in a Context and
then passes control through a tail-call to the new method.</p>
      <p>A return retrieves the return addresses from the Context and then passes control again through a
tail-call to the ReturnBlock described below.
}
const newSp = sp.push(Object.from(0));
return @call(tailCall, pc.prim(),
undefined });</p>
      <p>.{ pc.next(), newSp, process, context,</p>
      <sec id="sec-2-1">
        <title>Listing 2: Zag calling convention</title>
        <p>Listing 2 shows a very simple function (in Zig) showing the values passed through the tail-call
mechanism. pc points to a threaded program counter. sp points to the top of stack. process points to the
private data of this process (which includes the stack). context points to the context of the calling
method. sig has the selector and class of the target method.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>As this is exploratory work, we’ve opted for a from-scratch implementation of a textual LLVM IR
generator rather than invoking the LLVM IR Builder via it’s C API through a Foreign Function Interface
(FFI). Several factors influenced this decision:
1. textual IR is easy to verify by eye, making it straightforward to ensure we are generating the
correct code;
2. debugging textual IR is simpler and avoids the potential segmentation faults associated with
debugging in-memory LLVM IR produced by the IR Builder, providing a smoother debugging
experience;
3. LLVM has tools that can transform textual IR through various optimizations to machine-dependent
object code, allowing us to evaluate the quality of our code generation and our choice of
optimizations efectively.</p>
      <p>
        Given the need for rapid iteration in exploratory work, generating textual IR ofers a practical and
eficient approach. Additionally, since Zag Smalltalk does not yet support a development environment,
we are initially developing our textual LLVM code generator on Pharo[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>3.1. Zag Code Generation</title>
        <p>Zag Smalltalk stores all methods as Abstract Syntax Trees (ASTs). When a method is first required to
be executed, it must be converted from an AST to either a threaded method or a JITed method. It is
beyond the scope of this paper to discuss the decisions about when a JITed version is required.
3.1.1. Abstract Syntax Tree
The AST has Method and Block objects, which both have bodies composed of a sequence of expressions,
optionally terminated with a return. Expressions include: variable Assign, Array, Send/Cascade, Literal,
variable Reference, Self/Super, and ThisContext. This is a simplified version of the Pharo AST, and can
easily be generated from Smalltalk source, or by walking a Pharo AST.</p>
        <p>
          Regardless of whether generating for a threaded or native target, the AST compiler works in two
stages: conversion to basic blocks, followed by target code generation.
3.1.2. AST to Basic Blocks
The first stage converts the AST into a series of basic blocks, made up of a sequence of stack operations
(sends, pushes, pops, returns, etc.). Each basic block ends with a send, return, branch, or class switch.
Part of this stage is an optional target-independent inlining of methods and blocks. As part of inlining,
many of the sends may be replaced with switches based on the class of the receiver.[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
        </p>
        <p>The first basic block is a MethodBlock. A MethodBlock sets up a compile-time stack that represents the
self, parameters, and locals of the method (or BlockClosure). The basic conversion outputs a sequence
of pushes and pops, mimicking the runtime stack as well as adding the corresponding operations to the
current basic block. When a send or class switch is required it forces the creation of a new basic block.</p>
        <p>If this was a send then the new basic block will be a ReturnBlock. Because the send will execute code
we know nothing about, any and all values must be represented on the stack or the heap. This is the
exact semantics we expect from the stack operations, but as we will see below this requires particular
attention when generating LLVM code. A ReturnBlock will be initialized with a stack as it would be on
return from the message send, that is with the stack as it was just before the send, but with the receiver
and parameters of the send replaced by a result value. Then we resume the basic conversion.</p>
        <p>If this was a class switch the new basic blocks will be an InlineBlocks. Each of the new basic blocks
will have a stack initialized from the stack at the point of the creation of the class switch. Because of the
structural nature of the replacement of the send → class switch, all of the various paths will eventually
come to a common InlineBlock which will have a stack the same as if the send was still in place, that
is with the receiver and parameters replaced by a result. A path that is a sequence of InlineBlocks is
highly advantageous for a target like LLVM because they will all be part of a single function and values
don’t need to be actually saved to memory, as we’ll see below. If, along any of the paths an actual send
is required, because there is no safe inlining that can be done, a ReturnBlock may be required, in which
case the common block will be converted to a ReturnBlock, at some performance cost for a target such
as LLVM.</p>
        <p>Figure 1 shows a version of fibonacci converted to basic blocks and fully inlined. Note there are
still 2 sends because the calls are recursive.</p>
        <p>After all the basic blocks have been generated, the data-flow graph is updated for the basic blocks, if it
is required by the target code generator. This integrates the require/provides values of the various blocks.
We iterate over all the blocks, taking the values that are required for a given basic block and telling
all of the blocks that transfer to it that it needs those values. This may make those basic blocks now
require the values, so we continue the iteration until we reach a fix-point, where nothing is changing.
This analysis can reduce the memory trafic for native targets such as LLVM.
3.1.3. Basic Blocks to Target Code
The final stage is either generation of threaded code or JIT code. Threaded code generation is very
straight-forward, and will not be discussed here.</p>
        <p>Since LLVM uses a register-transfer model, the first stage of LLVM generation is to convert the
stack operations into Single Static Assignment (SSA) form. Static Single Assignment (SSA) form is a
property of the LLVM IR in which each variable is assigned exactly once and is defined before it is used.
This simplifies optimization processes in the compiler by providing a clear and unambiguous mapping
of variables to their values. In our textual IR generation, we use sequence numbers for variables to
conform to SSA form. Each variable is assigned a unique sequence number upon assignment, ensuring
that it is defined exactly once throughout the IR. The values are retrieved from the stack on return from
a message send and need to be reflected to the stack before a subsequent message send. Between sends,
values live in LLVM variables (mostly registers).</p>
        <p>Listing 3 shows an extract of the LLVM code for fibonacci which shows retrieving the value of self
from the stack. It is given the name %.self because it will be accessed from all the InlineBlocks of the
@fibonacci or @fibonacci.1 functions, such as the blocks %1 and %2. Other values that "change"
are passed along via the phi chains.
%zag.process.Process = type { [909 x i64], ptr, i64, ptr, ptr, ptr, ptr, ptr, ptr, i64, [3637 x
i64], [3637 x i64] }
%zag.context.Context = type { i64, ptr, %zag.execute.PC, ptr, ptr, i64, [0 x i64] }
%zag.execute.PC = type { ptr }
%zag.execute.Code = type { &lt;{ ptr, [8 x i8] }&gt; }
%zag.execute.Stack = type { i64, i64, i64, i64 }
%zag.execute.CompiledMethod = type { i64, i64, %zag.execute.MethodSignature, ptr, [0 x %zag.</p>
        <p>execute.Code] }
%zag.execute.MethodSignature = type { i32, i16, [2 x i8] }
@fibonacci.CM = global %struct.CompiledMethod { i64 0, %union.Object { i64 0 }, %struct.</p>
        <p>MethodSignature 0, ptr verifyMethod, [3 x %union.Code] [%union.Code { ptr @fibonacci }, %union
.Code { ptr @fibonacci.1 }, %union.Code { ptr @fibonacci.2 }]}, align 8
define ptr @fibonacci(ptr noundef %pc, ptr noundef %sp, ptr noundef %process, ptr noundef %context
, i64 %signature) #1 {
%.self = load i64, ptr %sp, align 8
%3 = i64 -1125899906842621 ; pushLocal 2
%4 = icmp ule i64 %.self, %1 ; embed: &lt;=
br i1 %4, label %2, label %1 ; case False-&gt;%1 True-&gt;%2
%1: ...
%2: ...
}
define ptr @fibonacci.1(ptr noundef %pc, ptr noundef %sp, ptr noundef %process, ptr noundef %
context, i64 %signature) #1 {
%.self = load i64, ptr %sp, align 8
...</p>
        <sec id="sec-3-1-1">
          <title>Listing 3: Extracct of LLVM code for fibonacci</title>
          <p>To generate the LLVM code, we go through each basic block, mimicking the operations that will
occur at runtime system. If we push something on the stack, there is no memory operation, but simply
assigning the value to a newly named LLVM "register" such as %3 then we push that register name
on the compile-time stack. Then to access the top of stack, we pop the register name of the compile
time stack and output a reference to that register. We also don’t modify the stack pointer, but rather
maintain at compile time where the stack pointer is. If a value is required in a subsequent InlineBlock,
that block will start with a phi expression such as %14 = phi ptr [ %1, %5 ], [ %39, %37 ]
that specifies for each source block the name that the value had in that block. In this example, it’s
saying to combine the value %1 from block %5 and the value %39 from block %37 into a new register
%14. LLVM does register allocation so as to minimize the amount of register shufling required. Any
given register, like %14, will only be assigned in one place.</p>
          <p>In this way, all of the operations within a MethodBlock or a ReturnBlock may be able to be just
register operations. However, when we are going to other code, either with a return or a send, the
register values will be spilled to the stack.
Once we’ve output the textual LLVM IR, we use the command line to invoke the LLVM compiler (LLC)
which can perform various optimization passes on the IR. LLVM has a spectrum of built-in optimizations
that can be performed in any order. Beyond these optimization options that are pre-defined, LLVM
also provides a framework for creating custom optimization passes. As our code generation is very
idiosyncratic we will also explore these in an attempt to further enhance performance by allowing us to
tailor the optimization process to our specific needs. After performing optimizations, LLC then outputs
an object file (.o) which can be linked with the Zag runtime environment from which we can execute
benchmarks to evaluate performance.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>By generating textual LLVM IR code, we can explore appropriate code generation and LLVM
optimizations which allows us to validate the viability of using LLVM to produce our JITed code.</p>
      <p>LLVM code generation fits very nicely into our code generation model. Combined with our aggressive
inlining, we anticipate signicicant performance boost.</p>
      <sec id="sec-4-1">
        <title>Future Work</title>
        <p>If the performance improvements are compelling, we will connect the LLVM IR Builder into the Zag
runtime and generate the JITed code on the fly in Zag. As the project advances, further details and
results will be available to substantiate the anticipated benefits of this innovative approach to improving
dynamic language performance.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lattner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Adve</surname>
          </string-name>
          ,
          <article-title>Llvm: A compilation framework for lifelong program analysis &amp; transformation</article-title>
          , in:
          <source>International symposium on code generation and optimization</source>
          ,
          <source>2004. CGO</source>
          <year>2004</year>
          .,
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          , San Jose, CA, USA,
          <year>2004</year>
          , pp.
          <fpage>75</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Mason</surname>
          </string-name>
          ,
          <article-title>Threaded execution as a dual to native code</article-title>
          ,
          <source>in: Companion Proceedings of the 7th International Conference on the Art</source>
          , Science, and Engineering of Programming,
          <source>Association for Computing Machinery</source>
          ,
          <year>2023</year>
          , p.
          <fpage>7</fpage>
          -
          <lpage>11</lpage>
          . URL: https://doi.org/10.1145/3594671.3594673.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Pharo</given-names>
            <surname>Development</surname>
          </string-name>
          <string-name>
            <surname>Team</surname>
          </string-name>
          ,
          <article-title>Pharo: A pure object-oriented programming language and a powerful environment</article-title>
          , http://pharo.org/,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>D.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mason</surname>
          </string-name>
          ,
          <article-title>Inlined code generation for smalltalk</article-title>
          , in: International Workshop on Smalltalk Technologies,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>