<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Blume, No-longer-foreign: Teaching an ml compiler to speak c “natively”, Electronic Notes in
Theoretical Computer Science</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.1109/ICIS.2013.6607886</article-id>
      <title-group>
        <article-title>Redesigning FFI calls in Pharo: exploiting the baseline JIT for more performance and low maintenance</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bianchi Juan Ignacio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Polito Guillermo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Lille, Inria, CNRS, Centrale Lille, UMR 9189 - CRIStAL</institution>
          ,
          <addr-line>F-59000 Lille</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2013</year>
      </pub-date>
      <volume>59</volume>
      <issue>2001</issue>
      <fpage>318</fpage>
      <lpage>326</lpage>
      <abstract>
        <p>The Pharo programming environment heavily relies on a lot of diferent C functions. Such functionality is implemented through a Foreign Function Interface (FFI). Pharo implements FFI calls through a single primitive that implements all call cases. This generalization of behavior has performance drawbacks. In this paper, we present a new design for FFI calls. The key goal of the new design is to obtain better performance for the most used callout signatures while keeping maintenance low.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Pharo</kwd>
        <kwd>FFI</kwd>
        <kwd>JIT</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Fallback on the old generic implementation. Our solution only supports optimizing a fixed set
of function signatures defined statically in the source code. We extracted such commonly used
function signatures by profiling existing applications—all non-optimized cases fall back into the
old mechanism using the pre-existing primitive.</p>
      <p>Our benchmarks show a ∼ 12x improvement over the baseline implementation when the JIT compiler
is active, and a ∼ 3x improvement when JIT compilation is not active. Moreover, signatures that are
rarely used or complex to implement and fallback on the old implementation see no degradation in
performance.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Context: optimizing FFI calls</title>
      <p>Pharo implements FFI calls through a single primitive that implements all call cases. The calls that are
the most used are handled the same as the ones that are used only once. Pharo VM leverages libffi
to support such generality. libffi is a library that handles FFI in a portable manner, implementing
the calling convention of many diferent architectures.</p>
      <sec id="sec-2-1">
        <title>2.1. Current implementation overview</title>
        <p>The current FFI implementation as per Pharo 12 uses the Unified FFI framework [ 4] (UFFI). In UFFI, FFI
function calls are done through normal methods that are bound to external functions. FFI bindings are
expressed using the fiCall: message, as shown in Figure 1. The method in the figure shows a method
bound to a function named f with argument arg of type int and returning a void*.
MyClass &gt;&gt; myMethod: arg</p>
        <p>^ self ffiCall: #(void* f(int arg))</p>
        <p>UFFI extends the Pharo bytecode compiler and transform all methods sending the fiCall: to introduce
a runner and an externalFunction, as illustrated by Figure 2. The runner is an object driving the
external function execution, specifying typically if the call should be synchronous or asynchronous. The
externalFunction is an object gathering all necessary meta-data for the FFI call, including the function
signature and the function pointer. Both the runner and the external function are generated by the
UFFI plugin and stored as literals in the compiled method and do not change during the life-cycle of the
application.</p>
        <p>MyClass &gt;&gt; myMethod: arg
^ runner
invokeFunction: externalFunction
withArguments: {arg asInteger}.</p>
        <p>The actual FFI call is performed by the runner when it is sent the invokeFunction:withArguments:
message. Both implementations of this message, synchronous and asynchronous, are defined as
primitive methods. Moreover, this message receives an array of objects that will be used as function
arguments, which is built dynamically on each FFI call. This array of arguments goes through a process
of transformation to native types, also known as marshaling, and described in the next section.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Marshaling by example</title>
        <p>Marshaling is the process of transforming objects between two diferent representations to allow
interoperability between diferent technologies. When using FFI, we consider marshaling the process
of converting Pharo objects to native values as expected by C functions and doing the inverse with
return values. The process of marshaling is split in two: a high-level marshaling and a low-level one.
The high-level marshaling takes as input arbitrary Pharo objects and outputs primitive Pharo objects
such as small integers, floats, strings, and external addresses. The low-level marshaling takes primitive
Pharo objects and outputs native equivalent values.</p>
        <p>The high-level marshaling is introduced by the UFFI code transformation. In the example shown in
Figure 2, the declared function signature of f expects an int. The function argument is then transformed
to an integer using the asInteger message. In the case of more complex function signatures, generated
bytecode includes other kinds of messages to consider e.g., floats, structs, and strings.</p>
        <p>The low-level marshaling is implemented in the Virtual machine, in the primitive. It typically requires
the untagging of tagged objects e.g., converting a Pharo’s SmallInteger to a C int or the unboxing of
external references (extracting the actual external address from the Pharo object ExternalAddress).</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.2.1. libffi integration</title>
        <p>Performing the FFI call requires using the ffi_call function defined by libffi and statically linked
with the VM source code. This function requires four diferent arguments as shown in Listing 3.
A cif object: a description of the external function signature. Currently, the cif pointer is built by
the UFFI plugin and wrapped inside the external function object.</p>
        <p>A function pointer to call, fn: The pointer to the function to call is looked up by the UFFI plugin
and wrapped inside the external function object.</p>
        <p>Arguments and return value holder. A collection of memory addresses pointing to the passed
arguments, and a holder address for the return value.
void ffi_call(ffi_cif *cif,
void (*fn)(void),
void *rvalue,
void **avalue);</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.3. Identified problems in the current implementation</title>
        <p>We have encountered three main problems with the current implementation:
Function signature is known at run time. The external function, its signature and the call
arguments are all accessed at run time. While function arguments change from one call to another,
the external function and its signature remain stable across calls from the same call site. However,
the current implementation does not take advantage of such knowledge.</p>
      </sec>
      <sec id="sec-2-5">
        <title>Cogit JIT compiler does not allow primitive specialization. The Cogit JIT compiler is a non</title>
        <p>optimizing method compiler that does a one-to-one mapping between Pharo bytecode methods
and their natively compiled code. This compiler does not automatically generate multiple versions
of a single method specialized e.g., for its arguments. Thus, even if JIT compiled, a primitive will
have only one native version and not be specialized per function call signature.</p>
        <p>Generality vs performance. The entire architecture trades of performance for generality. Having
a single primitive supporting all cases forces the dynamic construction of the argument array,
producing unnecessary stress on the garbage collector. Moreover, both libffi and the primitive
using it need to support all the existing calling conventions, the rarely used ones as well as the
most used ones.</p>
        <p>Our goal: Our goal is to propose a new design that splits (a) a fast path allowing specialized JIT
compilation of commonly-used function signatures from (b) a slow path that implements a general
form and supports all other cases and presents a performance similar to the current implementation.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Towards a more eficient FFI design</title>
      <p>We propose to extend the Pharo VM and the Opal bytecode compiler with a new FFI call bytecode
supporting synchronous FFI calls. This new bytecode instruction is not supposed to replace completely
the current implementation. Our goal is for them to coexist: the current implementation will be used as
fallback mechanism for the slow path and asynchronous calls.</p>
      <p>We will refer to the newly introduced bytecode for dealing with FFI calls as bytecodeFFICall.
bytecodeFFICall is implemented in both the interpreter and in the JIT, this last one with a
particularity: The JIT’ted implementation of bytecodeFFICall is specialized at compile time. The set
of function signatures is fixed and defined statically in the source code. We say that those function
signatures that we chose are supported. For those unsupported signatures we fall back to the same
primitive that the current implementation uses.</p>
      <p>The interpreter implementation of bytecodeFFICall is still general: it supports all kinds of function
prototypes. Even being general we found some other ways to optimize the FFI calls, taking advantage
of the context available to us when compiling the new bytecode, we will describe this in more detail in
the following sections.</p>
      <sec id="sec-3-1">
        <title>3.1. The new Bytecode</title>
        <p>The bytecodeFFICall is a 2-byte bytecode. The first byte is the opcode and the second byte
encodes two 4-bit numbers. These values will be indices in the table of literals corresponding to the
CompiledMethod containing the bytecodes. These indices will corresponds to:
• A description of the external function to call. This includes not only the name and the arguments
of the function but also the prototype of it. ?? describes what this object looks like.
• The Runner. For the fallback cases, as previously described.</p>
        <p>Also, diferently from the existing implementation, this bytecode avoids the creation of intermediate
arrays: all function arguments are pushed to the stack. The bytecode knows the number of arguments
to pop from the function meta-data found in the literals. To illustrate the new bytecode, consider the
bytecode sequence of the method myMethod shown before, shown in Figure 4
MyClass &gt;&gt; myMethod: arg
pushArgument: 0
send: asInteger
ffiCall: f
returnTop</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Fallback to the current implementation</title>
        <p>When dealing with primitives it is common to encounter some cases where the primitive just does
not work so in that case it will give control to the interpreter. Then, the interpreter will interpret the
method’s fallback bytecode. In the bytecode, in contrast to primitives, there is no such notion as a
Bytecode Failure Instead, a customary solution for this is to introduce callback messages e.g., send a
doesNotUnderstand: or mustBeBoolean message.</p>
        <p>In our case, we decided to treat failure cases in the bytecode by sending the message
invokeFunction:withArguments to the Runner object (See Figure 2), which will in turn call the
general-case primitive. This allows the new implementation to not handle the errors directly.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>At the time of writing this article, our bytecodeFFICall has support for optimizing the two function
prototypes listed below. We chose two because we wanted to first try the idea before implementing all
the prototypes we would like to support in the future.</p>
      <p>• uint64_t fn(uint64_t)
• void fn(pointer)
We chose those two signatures because:
• They are simple to implement, so the necessary machinery to implement support for them is not
complex.
• Through some micro-benchmarks using BlocBenchs [5], we found that they are part of the most
used signatures. BlocBenchs is a project to profile and benchmark Bloc [ 6] which is a framework
for graphics in Pharo using FFI calls. BlocBenchs has already all the profiling and benchmarking
infrastructure for us to rely on, which enabled us to extract the most used external function
signatures quite easily.</p>
      <p>In the following section, we will describe the benchmarks we have made. When we evaluate the
performance for our supported prototypes we refer to the two previously mentioned. We say a function
prototype is supported if the JIT implementation optimizes it in the fast path.</p>
      <sec id="sec-4-1">
        <title>4.1. Benchmarks</title>
        <p>This section presents the benchmarks we run to compare our new design against the current
implementation and to see how the two new implementations (interpreted vs. JIT’ted) difer.</p>
        <p>To do the benchmarks we decided to compare the following combination of cases:
• bytecodeFFICall vs. current implementation
• Where the external function prototype is supported and where it is not.
• With the Pharo VM built just with the interpreter and no JIT (StackVM) vs. the Pharo VM built
as default (interpreter + JIT).</p>
        <p>The first case is the most important comparison we want to make: our new proposed design against
the current one. For the second case, we wanted to make sure that our fallback mechanism was not
introducing a negative performance impact. For the third case, we wanted to evaluate how much
speed-up the JIT’ted version would bring us. Comparing the interpreter-only versions would tell us the
overhead of doing the checks at compile time vs. at run time.
Naming. Figure 5 compares the performance of four diferent implementations for FFI calls over
three diferent benchmark cases.</p>
        <p>• newStack and oldStack: refers to the new (bytecodeFFICall) and the old (current)
implementation respectively, with JIT compilation inactive. Only the interpreter version of each.
• newStock and oldStock: refers to the new (bytecodeFFICall) and the old (current)
implementation respectively, with the Stock Pharo VM, which has the JIT compilation active.
• Supported or not supported signature refers to whether that function signature is optimized.
Methodology. For each of the microbenchmarks we measured throughput: the number of calls per
second. We run each benchmark 100 times doing our best to avoid environment noise. Benchmarks
were run on a MacBook Pro with a 2,6GHz 6-Core Intel Core i7, 16 GB 2400MHz DDR4 RAM.
Results. Figure 5 shows the results of our benchmarks. Our results show that with JIT compilation,
bytecodeFFICall achieves an improvement of 12x over the current implementation when dealing
with a function signature supported (optimized). When JIT compilation is not active, the improvement
achieved by bytecodeFFICall is 3x over the baseline.</p>
        <p>For the cases when the function signature is not supported, the figure shows that both
bytecodeFFICall and the current implementation perform similarly. In the case where JIT
compilation is not active, there is a big gap because of the overhead of the work being done at compile time
rather than run time. In this case, even though we are not specializing for the function signature, all
the checks are done at compile time, so they will done only the first time, in contrast to the current
implementation where they will be done each time the primitive gets executed.</p>
        <p>The figure also shows how both oldStack and oldStock perform very uniformly across the three
benchmarks.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Maintainability</title>
        <p>The number of lines of code added to implement all of this design is 380 so far, including the changes
made in the VM as well as in the bytecode compiler. The efort to implement the new design was
divided into two main parts: The VM’s side and the bytecode compiler’s side.</p>
        <p>For bytecodeFFICall we implemented a new bytecode. This implies having extended the bytecode
set of Pharo. For maintenance purposes, this should not be an issue. It would be uncommon to have to
modify some of these changes to the bytecode set. Where most of the possible changes/fixes would
take place is in the VM.</p>
        <p>In the VM, we added support for the interpreter for the new bytecode. Again, the interpreted version
of bytecodeFFICall should not be something that we expect to get modified a lot. The JIT’ted
versions are where we expect future work to happen. As we discussed, the JIT’ted version of the new
bytecode is specialized. This means that we have to support only a couple of function signatures. If for
some reason in the future, we decide to add support for a specific signature, the only method to modify
would be genBytecodeFFICall which is where all the JIT’ted implementation of bytecodeFFICall
resides.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>When planning the new bytecode-based implementation we considered some diferent options. Figure 6
shows all the possible combinations for implementing FFI calls in the Pharo VM, including the current
implementation and bytecodeFFICall.</p>
      <p>The new bytecode could be implemented in the interpreter in a generic (all kinds of function
signatures) or in specialized way (only handle some function signatures). If the bytecode was to be
specialized, this would imply:
• Faster interpreter: We know beforehand that the bytecode being executed is for a specific function
signature (The bytecode compiler and the UFFI plugin would make sure of that) so there would
be not many checks to do. The disadvantage of this approach is its maintainability. We would
end up with a bytecode set much bigger, one bytecode for each function prototype that we want
to support.</p>
      <p>• Generic fallback: To deal with all the unsupported cases.</p>
      <p>If the bytecode was to be generic that would imply having a slower interpreter. This is so because of
all the checks we would have to perform at compile time to decide what kind of function signature we
are dealing with. In that case, we could specialize it in the JIT. This second option is the one we decided
to go with.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Related work</title>
      <p>This article describes a new design for FFI calls that achieves good performance aiming at a low
implementation complexity. The key to our solution is to distinguish between fast and slow execution
paths and apply that distinctions to a mixture of bytecode, interpretation, and JIT compilation.
Bytecode design. Bytecode and instruction design has been a matter of discussion for a long time.
Smalltalk and descendants have for a long time used execution engines based on bytecode and primitive
methods, as described in the blue book [7]. Although implementations diverged over the years, the
current design is architected similarly. Our solution introduces a new FFI-call bytecode that can be
embedded within a method and benefit from the compilation context and literals in the embedding
method. For the slow path, we decided to use a primitive method: our new bytecode can simply fall
back to it by compiling a message send and letting the runtime lookup do the rest of the work.</p>
      <p>FFI implemented as</p>
      <p>Current implementation
primitive
bytecodes
bytecodeFFICall
generic</p>
      <p>interpreter
generic</p>
      <p>interpreter
specialized
specialized
interpreter
interpreter+JIT
interpreter
interpreter+JIT</p>
      <p>More recently, Béra et al. [8] introduced a new bytecode set into Pharo, namely the Sista bytecode set.
The sista bytecode set introduced a redesign of the bytecode set originally inherited from Squeak [9],
intended to do bytecode-to-bytecode compiler optimizations [10]. For this purpose, this new bytecode
set introduces prefix bytecodes and unsafe bytecodes. Prefix bytecodes (namely extensions in the
implementation) annotate existing bytecodes to extend their behavior. Unsafe bytecodes (re-)implement
the behavior of existing bytecodes and primitives without safety checks (e.g., type, overflow, and bound
checks). Our work extends this existing bytecode set with a new 2-byte bytecode instruction in an
unused opcode, not requiring prefixes or unsafe bytecodes. Our new bytecode instruction is so far
limited to encoding literals in 4-bit nibbles. However, we envisage using prefix bytecodes to extend the
indexable literals.</p>
      <p>FFI Implementations. Many projects and programming language implementations acknowledge
the importance of integration and interaction with external libraries. We find in the literature FFI
implementations for Scheme [11, 12], ML [13], Java [14, 15], Lua [16, 17, 18], R [19] and Smalltalk [20,
4, 21]. Our work investigates the trade ofs that can be applied within these implementations.</p>
      <p>
        Instead of realizing a custom implementation, libffi [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] persents itself as the de-facto standard to
implement foreign function calls in open-source implementations, even accommodating to research
projects [22]. For example, libffi’s website describes its usage in e.g., the CPython, OpenJDK,
RubyFFI, Dalvik, and the Racket engines. Our current implementation uses libffifor the slow fall-backs,
implementing rare function signatures.
      </p>
      <p>Several research projects also considered the modularity and flexibility of the solution, proposing
solutions for data-level interoperability [23], modular foreign function interface [24, 25], and even
frameworks to configure and specify interoperability patterns [ 26]. Our solution aims at exploring the
performance landscape of FFI from a traditional closed architecture.</p>
      <p>The GildaVM extension for the OpenSmalltalkVM redesigned FFI support to implement asynchronous
calls through a global interpreter lock and software-simulated interrupts migrating the VM thread [27].
Software-simulated interrupts dynamically migrate the VM thread to allow continuing Pharo execution
when an FFI callout takes longer than a threshold. The current implementation in the Pharo VM does
not use such an implementation. It implements instead asynchronous calls through queues and worker
threads. In this alternative implementation, developers must annotate potentially expensive function
calls asynchronous. The work described in this paper extends the current Pharo VM implementation
with a new bytecode meant for synchronous FFI calls.</p>
      <sec id="sec-6-1">
        <title>Dealing with low-level concerns and FFI implementation details. In the past, many works have</title>
        <p>proposed to expose low-level behavior to the Pharo programming language, a feature that has been
exploited to implement FFI bindings [21] Salgado et al. proposed lowcode [28], a Pharo extension to
support native types and operations. Similarly, Benzo proposes a so-called Reflective Glue for Low-level
Programming, exposing native machine code to the high-level programming [29]. Chari et al. propose
Waterfall, a framework to dynamically generate primitives [30]. While we worked on the standard Pharo
VM, we believe that these approaches, orthogonal to our work, could allow further experimentation
with the trade-ofs between performance and flexibility.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>The usage of FFI calls in Pharo is crucial. In a typical run of a Pharo image, a large amount of FFI
calls get performed. It makes sense for something so important and widely used to try to give it the
best performance we can get. In this work, we examined some redesigns we can do to try to gain
performance while keeping the efort and maintenance low.</p>
      <p>This paper introduced a new design for FFI calls for Pharo. We described some of our key points in
designing a new implementation for calling functions that reside outside of Pharo. We explored how a
primitive can be redesigned into a bytecode, obtaining more context to work at compile time in the call
site opening the opportunity of getting better performance. We found that our new design achieves a
boost in performance in microbenchmarks. With little maintenance cost, we were able to achieve a 12x
improvement in the JIT’ted specialized cases.</p>
      <p>As future work, we want to have a way to easily automatically modify the genBytecodeFFICall
method to support the most used function signatures but dynamically, this means that we would not
need to add support manually statically for some prototypes, the system would do it depending on how
much the prototype is used. In some way, it would work like the JIT compiler, in that it would only do
its work only if the prototype is hot (to notice dynamically if the prototype is being used a lot instead
of statically setting them).
https://hal.archives-ouvertes.fr/hal-02379275.
[28] R. Salgado, S. Ducasse, Lowcode: Extending Pharo with C Types to Improve Performance,
in: International Workshop on Smalltalk Technologies IWST’16, Prague, Czech Republic, 2016.
doi:10.1145/2991041.2991064.
[29] C. Bruni, L. Fabresse, S. Ducasse, I. Stasenko, Benzo: Reflective glue for low-level programming,
in: International Workshop on Smalltalk Technologies 2014, 2014.
[30] G. Chari, D. Garbervetsky, C. Bruni, M. Denker, S. Ducasse, Waterfall: Primitives Generation on
the Fly, Technical Report, Inria, 2013.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Green</surname>
          </string-name>
          , Libfi,
          <year>2024</year>
          . URL: https://sourceware.org/libfi/ .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Miranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Béra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. G.</given-names>
            <surname>Boix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ingalls</surname>
          </string-name>
          ,
          <article-title>Two decades of Smalltalk VM development: live VM development through simulation tools</article-title>
          ,
          <source>in: Proceedings of International Workshop on Virtual Machines and Intermediate Languages (VMIL'18)</source>
          , ACM,
          <year>2018</year>
          , pp.
          <fpage>57</fpage>
          -
          <lpage>66</lpage>
          . doi:
          <volume>10</volume>
          .1145/3281287. 3281295.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Polito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Tesone</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ducasse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fabresse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rogliano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Misse-Chanabier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Phillips</surname>
          </string-name>
          ,
          <article-title>CrossISA Testing of the Pharo VM: Lessons Learned While Porting to ARMv8</article-title>
          , in
          <source>: Proceedings of the 18th international conference on Managed Programming Languages and Runtimes (MPLR '21)</source>
          , Münster, Germany,
          <year>2021</year>
          . URL: https://hal.inria.fr/hal-03332033. doi:
          <volume>10</volume>
          .1145/3475738.3480715.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>