<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Redesigning FFI calls in Pharo: exploiting the baseline JIT for more performance and low maintenance</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Bianchi</forename><surname>Juan</surname></persName>
							<email>ignbianchi@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="department">Inria</orgName>
								<orgName type="laboratory">UMR 9189</orgName>
								<orgName type="institution" key="instit1">University of Lille</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">Centrale Lille</orgName>
								<address>
									<addrLine>-CRIStAL</addrLine>
									<postCode>F-59000</postCode>
									<settlement>Lille</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Polito</forename><surname>Guillermo</surname></persName>
							<email>guillermo.polito@inria.fr</email>
							<affiliation key="aff0">
								<orgName type="department">Inria</orgName>
								<orgName type="laboratory">UMR 9189</orgName>
								<orgName type="institution" key="instit1">University of Lille</orgName>
								<orgName type="institution" key="instit2">CNRS</orgName>
								<orgName type="institution" key="instit3">Centrale Lille</orgName>
								<address>
									<addrLine>-CRIStAL</addrLine>
									<postCode>F-59000</postCode>
									<settlement>Lille</settlement>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Redesigning FFI calls in Pharo: exploiting the baseline JIT for more performance and low maintenance</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">8F5D1AA45E3061AE175B1194B934D9D6</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:10+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Pharo</term>
					<term>FFI</term>
					<term>JIT</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The Pharo programming environment heavily relies on a lot of different C functions. Such functionality is implemented through a Foreign Function Interface (FFI). Pharo implements FFI calls through a single primitive that implements all call cases. This generalization of behavior has performance drawbacks. In this paper, we present a new design for FFI calls. The key goal of the new design is to obtain better performance for the most used callout signatures while keeping maintenance low.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The Pharo programming environment heavily relies on native libraries. For example, Pharo's IDE and graphical environment use libraries such as Cairo and SDL implemented in C. Such native libraries are accessed through a Foreign Function Interface (FFI) that provides access to libraries respecting a common binary interface (ABI). Typically, those functions are written in the C programming language and compiled through a standard compiler such as GCC or Clang.</p><p>As of today, all Pharo FFI calls are handled by a single primitive receiving as argument the signature of the foreign function, and the list of values that should be used as function arguments. Such a primitive validates the function signature, transforms the function arguments following the signature types, and finally performs the function call using libffi <ref type="bibr" target="#b0">[1]</ref>. libffi is a library that handles FFI in a portable way, implementing the entire calling convention, i.e., the convention stating how arguments are passed to functions and how to access their result. This design suffers from performance overhead because:</p><p>• libffi trades-off performance for generality.</p><p>• the single primitive that receives the function specification as an argument forbids us from JIT compiling it, since the same primitive is used from different call sites with different function signatures. This means that many checks that could be constant (i.e., function signature does change once an FFI call site is defined) should be checked at run time.</p><p>This paper presents a new design for FFI calls in the Pharo Virtual Machine <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>. This new design aims to obtain better performance for the most-used callout signatures while keeping maintenance low. The solution is based on the following key points:</p><p>New FFI call bytecode. Using a bytecode instead of a primitive allows us to benefit from the context available at compile time (e.g., method literals). Such compilation context allows us to stage several checks at JIT compile time and reduce runtime overhead.</p><p>Fallback on the old generic implementation. Our solution only supports optimizing a fixed set of function signatures defined statically in the source code. We extracted such commonly used function signatures by profiling existing applications-all non-optimized cases fall back into the old mechanism using the pre-existing primitive.</p><p>Our benchmarks show a ∼12x improvement over the baseline implementation when the JIT compiler is active, and a ∼3x improvement when JIT compilation is not active. Moreover, signatures that are rarely used or complex to implement and fallback on the old implementation see no degradation in performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Context: optimizing FFI calls</head><p>Pharo implements FFI calls through a single primitive that implements all call cases. The calls that are the most used are handled the same as the ones that are used only once. Pharo VM leverages libffi to support such generality. libffi is a library that handles FFI in a portable manner, implementing the calling convention of many different architectures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Current implementation overview</head><p>The current FFI implementation as per <ref type="bibr">Pharo 12</ref> uses the Unified FFI framework <ref type="bibr" target="#b3">[4]</ref> (UFFI). In UFFI, FFI function calls are done through normal methods that are bound to external functions. FFI bindings are expressed using the ffiCall: message, as shown in Figure <ref type="figure">1</ref>. The method in the figure shows a method bound to a function named f with argument arg of type int and returning a void*.</p><p>MyClass &gt;&gt; myMethod: arg ^self ffiCall: #(void* f(int arg))</p><p>Figure <ref type="figure">1</ref>: A Pharo method defining an FFI binding UFFI extends the Pharo bytecode compiler and transform all methods sending the ffiCall: to introduce a runner and an externalFunction, as illustrated by Figure <ref type="figure">2</ref>. The runner is an object driving the external function execution, specifying typically if the call should be synchronous or asynchronous. The externalFunction is an object gathering all necessary meta-data for the FFI call, including the function signature and the function pointer. Both the runner and the external function are generated by the UFFI plugin and stored as literals in the compiled method and do not change during the life-cycle of the application.</p><p>MyClass &gt;&gt; myMethod: arg ^runner invokeFunction: externalFunction withArguments: {arg asInteger}.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 2:</head><p>The FFI binding as it is transformed by the UFFI framework</p><p>The actual FFI call is performed by the runner when it is sent the invokeFunction:withArguments: message. Both implementations of this message, synchronous and asynchronous, are defined as primitive methods. Moreover, this message receives an array of objects that will be used as function arguments, which is built dynamically on each FFI call. This array of arguments goes through a process of transformation to native types, also known as marshaling, and described in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Marshaling by example</head><p>Marshaling is the process of transforming objects between two different representations to allow interoperability between different technologies. When using FFI, we consider marshaling the process of converting Pharo objects to native values as expected by C functions and doing the inverse with return values. The process of marshaling is split in two: a high-level marshaling and a low-level one. The high-level marshaling takes as input arbitrary Pharo objects and outputs primitive Pharo objects such as small integers, floats, strings, and external addresses. The low-level marshaling takes primitive Pharo objects and outputs native equivalent values.</p><p>The high-level marshaling is introduced by the UFFI code transformation. In the example shown in Figure <ref type="figure">2</ref>, the declared function signature of f expects an int. The function argument is then transformed to an integer using the asInteger message. In the case of more complex function signatures, generated bytecode includes other kinds of messages to consider e.g., floats, structs, and strings.</p><p>The low-level marshaling is implemented in the Virtual machine, in the primitive. It typically requires the untagging of tagged objects e.g., converting a Pharo's SmallInteger to a C int or the unboxing of external references (extracting the actual external address from the Pharo object ExternalAddress).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1.">libffi integration</head><p>Performing the FFI call requires using the ffi_call function defined by libffi and statically linked with the VM source code. This function requires four different arguments as shown in Listing 3.</p><p>A cif object: a description of the external function signature. Currently, the cif pointer is built by the UFFI plugin and wrapped inside the external function object.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A function pointer to call, fn:</head><p>The pointer to the function to call is looked up by the UFFI plugin and wrapped inside the external function object.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Arguments and return value holder.</head><p>A collection of memory addresses pointing to the passed arguments, and a holder address for the return value. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Identified problems in the current implementation</head><p>We have encountered three main problems with the current implementation:</p><p>Function signature is known at run time. The external function, its signature and the call arguments are all accessed at run time. While function arguments change from one call to another, the external function and its signature remain stable across calls from the same call site. However, the current implementation does not take advantage of such knowledge.</p><p>Cogit JIT compiler does not allow primitive specialization. The Cogit JIT compiler is a nonoptimizing method compiler that does a one-to-one mapping between Pharo bytecode methods and their natively compiled code. This compiler does not automatically generate multiple versions of a single method specialized e.g., for its arguments. Thus, even if JIT compiled, a primitive will have only one native version and not be specialized per function call signature.</p><p>Generality vs performance. The entire architecture trades off performance for generality. Having a single primitive supporting all cases forces the dynamic construction of the argument array, producing unnecessary stress on the garbage collector. Moreover, both libffi and the primitive using it need to support all the existing calling conventions, the rarely used ones as well as the most used ones.</p><p>Our goal: Our goal is to propose a new design that splits (a) a fast path allowing specialized JIT compilation of commonly-used function signatures from (b) a slow path that implements a general form and supports all other cases and presents a performance similar to the current implementation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Towards a more efficient FFI design</head><p>We propose to extend the Pharo VM and the Opal bytecode compiler with a new FFI call bytecode supporting synchronous FFI calls. This new bytecode instruction is not supposed to replace completely the current implementation. Our goal is for them to coexist: the current implementation will be used as fallback mechanism for the slow path and asynchronous calls. We will refer to the newly introduced bytecode for dealing with FFI calls as bytecodeFFICall. bytecodeFFICall is implemented in both the interpreter and in the JIT, this last one with a particularity: The JIT'ted implementation of bytecodeFFICall is specialized at compile time. The set of function signatures is fixed and defined statically in the source code. We say that those function signatures that we chose are supported. For those unsupported signatures we fall back to the same primitive that the current implementation uses.</p><p>The interpreter implementation of bytecodeFFICall is still general: it supports all kinds of function prototypes. Even being general we found some other ways to optimize the FFI calls, taking advantage of the context available to us when compiling the new bytecode, we will describe this in more detail in the following sections.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">The new Bytecode</head><p>The bytecodeFFICall is a 2-byte bytecode. The first byte is the opcode and the second byte encodes two 4-bit numbers. These values will be indices in the table of literals corresponding to the CompiledMethod containing the bytecodes. These indices will corresponds to:</p><p>• A description of the external function to call. This includes not only the name and the arguments of the function but also the prototype of it. ?? describes what this object looks like. • The Runner. For the fallback cases, as previously described. Also, differently from the existing implementation, this bytecode avoids the creation of intermediate arrays: all function arguments are pushed to the stack. The bytecode knows the number of arguments to pop from the function meta-data found in the literals. To illustrate the new bytecode, consider the bytecode sequence of the method myMethod shown before, shown in Figure <ref type="figure">4</ref> MyClass &gt;&gt; myMethod: arg pushArgument: 0 send: asInteger ffiCall: f returnTop Figure <ref type="figure">4</ref>: A method calling function f using our new bytecode</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Fallback to the current implementation</head><p>When dealing with primitives it is common to encounter some cases where the primitive just does not work so in that case it will give control to the interpreter. Then, the interpreter will interpret the method's fallback bytecode. In the bytecode, in contrast to primitives, there is no such notion as a Bytecode Failure Instead, a customary solution for this is to introduce callback messages e.g., send a doesNotUnderstand: or mustBeBoolean message.</p><p>In our case, we decided to treat failure cases in the bytecode by sending the message invokeFunction:withArguments to the Runner object (See Figure <ref type="figure">2</ref>), which will in turn call the general-case primitive. This allows the new implementation to not handle the errors directly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Evaluation</head><p>At the time of writing this article, our bytecodeFFICall has support for optimizing the two function prototypes listed below. We chose two because we wanted to first try the idea before implementing all the prototypes we would like to support in the future.</p><formula xml:id="formula_0">• uint64_t fn(uint64_t) • void fn(pointer)</formula><p>We chose those two signatures because:</p><p>• They are simple to implement, so the necessary machinery to implement support for them is not complex. • Through some micro-benchmarks using BlocBenchs <ref type="bibr" target="#b4">[5]</ref>, we found that they are part of the most used signatures. BlocBenchs is a project to profile and benchmark Bloc <ref type="bibr" target="#b5">[6]</ref> which is a framework for graphics in Pharo using FFI calls. BlocBenchs has already all the profiling and benchmarking infrastructure for us to rely on, which enabled us to extract the most used external function signatures quite easily.</p><p>In the following section, we will describe the benchmarks we have made. When we evaluate the performance for our supported prototypes we refer to the two previously mentioned. We say a function prototype is supported if the JIT implementation optimizes it in the fast path.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Benchmarks</head><p>This section presents the benchmarks we run to compare our new design against the current implementation and to see how the two new implementations (interpreted vs. JIT'ted) differ.</p><p>To do the benchmarks we decided to compare the following combination of cases:</p><p>• bytecodeFFICall vs. current implementation • Where the external function prototype is supported and where it is not.</p><p>• With the Pharo VM built just with the interpreter and no JIT (StackVM) vs. the Pharo VM built as default (interpreter + JIT).</p><p>The first case is the most important comparison we want to make: our new proposed design against the current one. For the second case, we wanted to make sure that our fallback mechanism was not introducing a negative performance impact. For the third case, we wanted to evaluate how much speed-up the JIT'ted version would bring us. Comparing the interpreter-only versions would tell us the overhead of doing the checks at compile time vs. at run time. • newStack and oldStack: refers to the new (bytecodeFFICall) and the old (current) implementation respectively, with JIT compilation inactive. Only the interpreter version of each. • newStock and oldStock: refers to the new (bytecodeFFICall) and the old (current) implementation respectively, with the Stock Pharo VM, which has the JIT compilation active. • Supported or not supported signature refers to whether that function signature is optimized.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Methodology.</head><p>For each of the microbenchmarks we measured throughput: the number of calls per second. We run each benchmark 100 times doing our best to avoid environment noise. Benchmarks were run on a MacBook Pro with a 2,6GHz 6-Core Intel Core i7, 16 GB 2400MHz DDR4 RAM.</p><p>Results. Figure <ref type="figure" target="#fig_1">5</ref> shows the results of our benchmarks. Our results show that with JIT compilation, bytecodeFFICall achieves an improvement of 12x over the current implementation when dealing with a function signature supported (optimized). When JIT compilation is not active, the improvement achieved by bytecodeFFICall is 3x over the baseline.</p><p>For the cases when the function signature is not supported, the figure shows that both bytecodeFFICall and the current implementation perform similarly. In the case where JIT compilation is not active, there is a big gap because of the overhead of the work being done at compile time rather than run time. In this case, even though we are not specializing for the function signature, all the checks are done at compile time, so they will done only the first time, in contrast to the current implementation where they will be done each time the primitive gets executed.</p><p>The figure also shows how both oldStack and oldStock perform very uniformly across the three benchmarks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Maintainability</head><p>The number of lines of code added to implement all of this design is 380 so far, including the changes made in the VM as well as in the bytecode compiler. The effort to implement the new design was divided into two main parts: The VM's side and the bytecode compiler's side.</p><p>For bytecodeFFICall we implemented a new bytecode. This implies having extended the bytecode set of Pharo. For maintenance purposes, this should not be an issue. It would be uncommon to have to modify some of these changes to the bytecode set. Where most of the possible changes/fixes would take place is in the VM.</p><p>In the VM, we added support for the interpreter for the new bytecode. Again, the interpreted version of bytecodeFFICall should not be something that we expect to get modified a lot. The JIT'ted versions are where we expect future work to happen. As we discussed, the JIT'ted version of the new bytecode is specialized. This means that we have to support only a couple of function signatures. If for some reason in the future, we decide to add support for a specific signature, the only method to modify would be genBytecodeFFICall which is where all the JIT'ted implementation of bytecodeFFICall resides.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Discussion</head><p>When planning the new bytecode-based implementation we considered some different options. Figure <ref type="figure" target="#fig_2">6</ref> shows all the possible combinations for implementing FFI calls in the Pharo VM, including the current implementation and bytecodeFFICall.</p><p>The new bytecode could be implemented in the interpreter in a generic (all kinds of function signatures) or in specialized way (only handle some function signatures). If the bytecode was to be specialized, this would imply:</p><p>• Faster interpreter: We know beforehand that the bytecode being executed is for a specific function signature (The bytecode compiler and the UFFI plugin would make sure of that) so there would be not many checks to do. The disadvantage of this approach is its maintainability. We would end up with a bytecode set much bigger, one bytecode for each function prototype that we want to support. • Generic fallback: To deal with all the unsupported cases.</p><p>If the bytecode was to be generic that would imply having a slower interpreter. This is so because of all the checks we would have to perform at compile time to decide what kind of function signature we are dealing with. In that case, we could specialize it in the JIT. This second option is the one we decided to go with.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.">Related work</head><p>This article describes a new design for FFI calls that achieves good performance aiming at a low implementation complexity. The key to our solution is to distinguish between fast and slow execution paths and apply that distinctions to a mixture of bytecode, interpretation, and JIT compilation.</p><p>Bytecode design. Bytecode and instruction design has been a matter of discussion for a long time. Smalltalk and descendants have for a long time used execution engines based on bytecode and primitive methods, as described in the blue book <ref type="bibr" target="#b6">[7]</ref>. Although implementations diverged over the years, the current design is architected similarly. Our solution introduces a new FFI-call bytecode that can be embedded within a method and benefit from the compilation context and literals in the embedding method. For the slow path, we decided to use a primitive method: our new bytecode can simply fall back to it by compiling a message send and letting the runtime lookup do the rest of the work. The sista bytecode set introduced a redesign of the bytecode set originally inherited from Squeak <ref type="bibr" target="#b8">[9]</ref>, intended to do bytecode-to-bytecode compiler optimizations <ref type="bibr" target="#b9">[10]</ref>. For this purpose, this new bytecode set introduces prefix bytecodes and unsafe bytecodes. Prefix bytecodes (namely extensions in the implementation) annotate existing bytecodes to extend their behavior. Unsafe bytecodes (re-)implement the behavior of existing bytecodes and primitives without safety checks (e.g., type, overflow, and bound checks). Our work extends this existing bytecode set with a new 2-byte bytecode instruction in an unused opcode, not requiring prefixes or unsafe bytecodes. Our new bytecode instruction is so far limited to encoding literals in 4-bit nibbles. However, we envisage using prefix bytecodes to extend the indexable literals. FFI Implementations. Many projects and programming language implementations acknowledge the importance of integration and interaction with external libraries. We find in the literature FFI implementations for Scheme <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12]</ref>, ML <ref type="bibr" target="#b12">[13]</ref>, Java <ref type="bibr" target="#b13">[14,</ref><ref type="bibr" target="#b14">15]</ref>, Lua <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17,</ref><ref type="bibr" target="#b17">18]</ref>, R <ref type="bibr" target="#b18">[19]</ref> and Smalltalk <ref type="bibr" target="#b19">[20,</ref><ref type="bibr" target="#b3">4,</ref><ref type="bibr" target="#b20">21]</ref>. Our work investigates the trade offs that can be applied within these implementations.</p><p>Instead of realizing a custom implementation, libffi <ref type="bibr" target="#b0">[1]</ref> persents itself as the de-facto standard to implement foreign function calls in open-source implementations, even accommodating to research projects <ref type="bibr" target="#b21">[22]</ref>. For example, libffi's website describes its usage in e.g., the CPython, OpenJDK, Ruby-FFI, Dalvik, and the Racket engines. Our current implementation uses libffifor the slow fall-backs, implementing rare function signatures.</p><p>Several research projects also considered the modularity and flexibility of the solution, proposing solutions for data-level interoperability <ref type="bibr" target="#b22">[23]</ref>, modular foreign function interface <ref type="bibr" target="#b23">[24,</ref><ref type="bibr" target="#b24">25]</ref>, and even frameworks to configure and specify interoperability patterns <ref type="bibr" target="#b25">[26]</ref>. Our solution aims at exploring the performance landscape of FFI from a traditional closed architecture. The GildaVM extension for the OpenSmalltalkVM redesigned FFI support to implement asynchronous calls through a global interpreter lock and software-simulated interrupts migrating the VM thread <ref type="bibr" target="#b26">[27]</ref>. Software-simulated interrupts dynamically migrate the VM thread to allow continuing Pharo execution when an FFI callout takes longer than a threshold. The current implementation in the Pharo VM does not use such an implementation. It implements instead asynchronous calls through queues and worker threads. In this alternative implementation, developers must annotate potentially expensive function calls asynchronous. The work described in this paper extends the current Pharo VM implementation with a new bytecode meant for synchronous FFI calls.</p><p>Dealing with low-level concerns and FFI implementation details. In the past, many works have proposed to expose low-level behavior to the Pharo programming language, a feature that has been exploited to implement FFI bindings <ref type="bibr" target="#b20">[21]</ref> Salgado et al. proposed lowcode <ref type="bibr" target="#b27">[28]</ref>, a Pharo extension to support native types and operations. Similarly, Benzo proposes a so-called Reflective Glue for Low-level Programming, exposing native machine code to the high-level programming <ref type="bibr" target="#b28">[29]</ref>. <ref type="bibr">Chari et al.</ref> propose Waterfall, a framework to dynamically generate primitives <ref type="bibr" target="#b29">[30]</ref>. While we worked on the standard Pharo VM, we believe that these approaches, orthogonal to our work, could allow further experimentation with the trade-offs between performance and flexibility.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.">Conclusion</head><p>The usage of FFI calls in Pharo is crucial. In a typical run of a Pharo image, a large amount of FFI calls get performed. It makes sense for something so important and widely used to try to give it the best performance we can get. In this work, we examined some redesigns we can do to try to gain performance while keeping the effort and maintenance low.</p><p>This paper introduced a new design for FFI calls for Pharo. We described some of our key points in designing a new implementation for calling functions that reside outside of Pharo. We explored how a primitive can be redesigned into a bytecode, obtaining more context to work at compile time in the call site opening the opportunity of getting better performance. We found that our new design achieves a boost in performance in microbenchmarks. With little maintenance cost, we were able to achieve a 12x improvement in the JIT'ted specialized cases.</p><p>As future work, we want to have a way to easily automatically modify the genBytecodeFFICall method to support the most used function signatures but dynamically, this means that we would not need to add support manually statically for some prototypes, the system would do it depending on how much the prototype is used. In some way, it would work like the JIT compiler, in that it would only do its work only if the prototype is hot (to notice dynamically if the prototype is being used a lot instead of statically setting them).</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: An example of a Pharo method</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 5 :</head><label>5</label><figDesc>Figure 5: Benchmarks results. Higher is better</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 6 :</head><label>6</label><figDesc>Figure 6: Implementation options for FFI calls</figDesc></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">A</forename><surname>Green</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Libffi</forename></persName>
		</author>
		<ptr target="https://sourceware.org/libffi/" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Two decades of Smalltalk VM development: live VM development through simulation tools</title>
		<author>
			<persName><forename type="first">E</forename><surname>Miranda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Béra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><forename type="middle">G</forename><surname>Boix</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Ingalls</surname></persName>
		</author>
		<idno type="DOI">10.1145/3281287.3281295</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of International Workshop on Virtual Machines and Intermediate Languages (VMIL&apos;18)</title>
				<meeting>International Workshop on Virtual Machines and Intermediate Languages (VMIL&apos;18)</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="57" to="66" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Cross-ISA Testing of the Pharo VM: Lessons Learned While Porting to ARMv8</title>
		<author>
			<persName><forename type="first">G</forename><surname>Polito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tesone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ducasse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fabresse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Rogliano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Misse-Chanabier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">H</forename><surname>Phillips</surname></persName>
		</author>
		<idno type="DOI">10.1145/3475738.3480715</idno>
		<ptr target="https://hal.inria.fr/hal-03332033.doi:10.1145/3475738.3480715" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 18th international conference on Managed Programming Languages and Runtimes (MPLR &apos;21)</title>
				<meeting>the 18th international conference on Managed Programming Languages and Runtimes (MPLR &apos;21)<address><addrLine>Münster, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Unified ffi -calling foreign functions from pharo</title>
		<author>
			<persName><forename type="first">G</forename><surname>Polito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ducasse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tesone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Brunzie</surname></persName>
		</author>
		<ptr target="http://books.pharo.org/booklet-uffi/" />
		<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">M</forename><surname>Dias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Blocbenchs</forename></persName>
		</author>
		<ptr target="https://github.com/pharo-graphics/BlocBenchs" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">Bloc</forename><surname>Pharo</surname></persName>
		</author>
		<ptr target="https://github.com/pharo-graphics/Bloc" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<monogr>
		<title level="m" type="main">Smalltalk 80: the Language and its Implementation</title>
		<author>
			<persName><forename type="first">A</forename><surname>Goldberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Robson</surname></persName>
		</author>
		<ptr target="http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf" />
		<imprint>
			<date type="published" when="1983">1983</date>
			<publisher>Addison Wesley</publisher>
			<pubPlace>Reading, Mass</pubPlace>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A bytecode set for adaptive optimizations</title>
		<author>
			<persName><forename type="first">C</forename><surname>Béra</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Miranda</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Workshop on Smalltalk Technologies (IWST)</title>
				<imprint>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Back to the future: The story of Squeak, a practical Smalltalk written in itself</title>
		<author>
			<persName><forename type="first">D</forename><surname>Ingalls</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kaehler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Maloney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wallace</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kay</surname></persName>
		</author>
		<idno type="DOI">10.1145/263700.263754</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Object-Oriented Programming, Systems, Languages, and Applications conference (OOPSLA&apos;97)</title>
				<meeting>Object-Oriented Programming, Systems, Languages, and Applications conference (OOPSLA&apos;97)</meeting>
		<imprint>
			<publisher>ACM Press</publisher>
			<date type="published" when="1997">1997</date>
			<biblScope unit="page" from="318" to="326" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<title level="m" type="main">Sista: a Metacircular Architecture for Runtime Optimisation Persistence</title>
		<author>
			<persName><forename type="first">C</forename><surname>Béra</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
		<respStmt>
			<orgName>Université de Lille</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Ph.D. thesis</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Foreign interface for plt scheme</title>
		<author>
			<persName><forename type="first">E</forename><surname>Barzilay</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Orlovsky</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Fifth ACM SIGPLAN Workshop on Scheme and Functional Programming</title>
				<meeting>the Fifth ACM SIGPLAN Workshop on Scheme and Functional Programming</meeting>
		<imprint>
			<date type="published" when="2004">2004</date>
			<biblScope unit="page" from="63" to="74" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Klock</surname></persName>
		</author>
		<title level="m">The layers of larceny&apos;s foreign function interface</title>
				<imprint>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">No-longer-foreign: Teaching an ml compiler to speak c &quot;natively</title>
		<author>
			<persName><forename type="first">M</forename><surname>Blume</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Electronic Notes in Theoretical Computer Science</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page" from="36" to="52" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Java native access (jna)</title>
		<author>
			<persName><forename type="first">T</forename><surname>Fast</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Wall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
		<ptr target="https://github.com/twall/jna" />
		<imprint>
			<date type="published" when="2007">2013-12-08) (2007</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Improving performance of jna by using llvm jit compiler</title>
		<author>
			<persName><forename type="first">Y.-H</forename><surname>Tsai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I.-W</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I.-C</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename></persName>
		</author>
		<author>
			<persName><forename type="first">.-J</forename><surname>Shann</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICIS.2013.6607886</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS)</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="483" to="488" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Passing a language through the eye of a needle</title>
		<author>
			<persName><forename type="first">R</forename><surname>Ierusalimschy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">H</forename><surname>De Figueiredo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Celes</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Communications of the ACM</title>
		<imprint>
			<biblScope unit="volume">54</biblScope>
			<biblScope unit="page" from="38" to="43" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title level="m" type="main">Luajit, a just-in-time compiler for lua</title>
		<author>
			<persName><forename type="first">M</forename><surname>Pall</surname></persName>
		</author>
		<ptr target="http://luajit.org/luajit.html" />
		<imprint>
			<date type="published" when="2005">2005. 2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">A foreign function interface for pallene</title>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">C</forename><surname>De Paula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Ierusalimschy</surname></persName>
		</author>
		<idno type="DOI">10.1145/3561320.3561321</idno>
		<idno>doi:10. 1145/3561320.3561321</idno>
		<ptr target="https://doi.org/10.1145/3561320.3561321" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the XXVI Brazilian Symposium on Programming Languages, SBLP &apos;22</title>
				<meeting>the XXVI Brazilian Symposium on Programming Languages, SBLP &apos;22<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="32" to="40" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Foreign Library Interface</title>
		<author>
			<persName><forename type="first">D</forename><surname>Adler</surname></persName>
		</author>
		<idno type="DOI">10.32614/RJ-2012-004</idno>
		<ptr target="https://doi.org/10.32614/RJ-2012-004.doi:10.32614/RJ-2012-004" />
	</analytic>
	<monogr>
		<title level="j">The R Journal</title>
		<imprint>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="30" to="40" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">Smalltalk in a c world</title>
		<author>
			<persName><forename type="first">D</forename><surname>Chisnall</surname></persName>
		</author>
		<idno type="DOI">10.1145/2448963.2448967</idno>
		<idno>doi:10.1145/2448963.2448967</idno>
		<ptr target="https://doi.org/10.1145/2448963.2448967" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the International Workshop on Smalltalk Technologies, IWST &apos;12</title>
				<meeting>the International Workshop on Smalltalk Technologies, IWST &apos;12<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computing Machinery</publisher>
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Language-side foreign function interfaces with nativeboost</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bruni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fabresse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ducasse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Stasenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Workshop on Smalltalk Technologies</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Foreignfunctions package for macaulay2</title>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">A</forename><surname>Torrance</surname></persName>
		</author>
		<ptr target="https://arxiv.org/abs/2405.12365.arXiv:2405.12365" />
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<monogr>
		<author>
			<persName><forename type="first">K</forename><surname>Fisher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pucella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Reppy</surname></persName>
		</author>
		<title level="m">Data-level interoperability</title>
				<imprint>
			<publisher>Citeseer</publisher>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">A modular foreign function interface</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yallop</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sheets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Madhavapeddy</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.scico.2017.04.002</idno>
		<ptr target="https://doi.org/10.1016/j.scico.2017.04.002" />
	</analytic>
	<monogr>
		<title level="j">Science of Computer Programming</title>
		<imprint>
			<biblScope unit="volume">164</biblScope>
			<biblScope unit="page" from="82" to="97" />
			<date type="published" when="2016">2018. 2016</date>
		</imprint>
	</monogr>
	<note>special issue of selected papers from FLOPS</note>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Declarative foreign function binding through generic programming</title>
		<author>
			<persName><forename type="first">J</forename><surname>Yallop</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sheets</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Madhavapeddy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Functional and Logic Programming: 13th International Symposium, FLOPS 2016</title>
				<meeting><address><addrLine>Kochi, Japan</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2016">March 4-6, 2016. 2016</date>
			<biblScope unit="page" from="198" to="214" />
		</imprint>
	</monogr>
	<note>Proceedings 13</note>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">A framework for interoperability</title>
		<author>
			<persName><forename type="first">K</forename><surname>Fisher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pucella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Reppy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Electronic Notes in Theoretical Computer Science</title>
		<imprint>
			<biblScope unit="volume">59</biblScope>
			<biblScope unit="page" from="3" to="19" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Gildavm: a non-blocking i/o architecture for the cog vm</title>
		<author>
			<persName><forename type="first">G</forename><surname>Polito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tesone</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Miranda</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Simmons</surname></persName>
		</author>
		<ptr target="https://hal.archives-ouvertes.fr/hal-02379275" />
	</analytic>
	<monogr>
		<title level="m">International Workshop on Smalltalk Technologies</title>
				<meeting><address><addrLine>Cologne, Germany</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Lowcode: Extending Pharo with C Types to Improve Performance</title>
		<author>
			<persName><forename type="first">R</forename><surname>Salgado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ducasse</surname></persName>
		</author>
		<idno type="DOI">10.1145/2991041.2991064</idno>
	</analytic>
	<monogr>
		<title level="m">International Workshop on Smalltalk Technologies IWST&apos;16</title>
				<meeting><address><addrLine>Prague, Czech Republic</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b28">
	<analytic>
		<title level="a" type="main">Benzo: Reflective glue for low-level programming</title>
		<author>
			<persName><forename type="first">C</forename><surname>Bruni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fabresse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ducasse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Stasenko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Workshop on Smalltalk Technologies</title>
				<imprint>
			<date type="published" when="2014">2014. 2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b29">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Chari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Garbervetsky</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Bruni</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Denker</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ducasse</surname></persName>
		</author>
		<title level="m">Waterfall: Primitives Generation on the Fly</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
		<respStmt>
			<orgName>Inria</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
