<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>The Italian Conference on CyberSecurity, April</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Building Call Graph of WebAssembly Programs via Abstract Semantics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mattia Paccamiccio</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Franco Raimondi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Loreti</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gran Sasso Science Institute</institution>
          ,
          <addr-line>Viale Francesco Crispi 7, 67100, L'Aquila</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università di Camerino</institution>
          ,
          <addr-line>Via Andrea d'Accorso 16, 62032, Camerino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>0</volume>
      <fpage>8</fpage>
      <lpage>12</lpage>
      <abstract>
        <p>WebAssembly is a binary format for code that is gaining popularity thanks to its focus on portability and performance. Currently, the most common use case for WebAssembly is execution in a browser. It is also being increasingly adopted as a stand-alone application due to its portability. The binary format of WebAssembly, however, makes it prone to being used as a vehicle for malicious software. For instance, one could embed a cryptocurrency miner in code executed by a browser. As a result, there is substantial interest in developing tools for WebAssembly security verification, information flow control, and, more generally, for verifying behavioral properties such as correct API usage. In this document, we address the issue of building call graphs for WebAssembly code. This is important because having or computing a call graph is a prerequisite for most inter-procedural verification tasks. In this paper, we propose a formal solution based on the theory of Abstract Interpretation. We compare our approach to the state-of-the-art by predicting how it would perform against a set of specifically crafted benchmark programs.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;WebAssembly</kwd>
        <kwd>Call Graph</kwd>
        <kwd>Abstract Interpretation</kwd>
        <kwd>Semantics</kwd>
        <kwd>Program Analysis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        WebAssembly “is a binary instruction format for a stack-based virtual machine, designed as a
portable compilation target for programming languages"1. WebAssembly aims to be portable and
eficient: it is actively supported by most web browsers and it is employed for rendering streamed
videos on devices such as smart televisions, set-top boxes, and other embedded devices [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
      </p>
      <p>
        WebAssembly is primarily used in conjunction with JavaScript to optimize tasks that are
typically resource-intensive and would benefit from more optimized, close-to-native
performance [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], such as rendering images, streaming videos and generally heavy computational
tasks for which interpreted languages might be too slow.
      </p>
      <p>
        In parallel with its adoption, researchers have investigated the issue of verifying WebAssembly
code directly [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. From a security point of view, this is motivated by the fact that WebAssembly,
while executing in a relatively safe sandboxed environment, is still the source of security
concerns (information leakage, tampering, policy violation, unauthorized use of computational
power, etc.).
      </p>
      <p>In terms of formal analysis, a call graph is a prerequisite for verifying several properties of
interest, such as information flow analysis, call-ordering requirements, safety, and liveness. We
define a call graph as a directed graph that encodes the possible calling relationships between
functions or methods in a program, represented as nodes, and edges between two nodes (i.e.  ,
) representing that  calls . Constructing an exact call graph is an undecidable problem. The
most adopted approach is soundly approximating all the possible calls, implying that an edge
between  and , in a sound approximation, translates to  may call . The precision of a call
graph depends on the inference rules used, which could include or ignore information about
execution paths, context, and so on.</p>
      <sec id="sec-1-1">
        <title>1.1. Contribution</title>
        <p>
          We investigate the problem of building a precise call graph for WebAssembly. The simple
execution model of WebAssembly, briefly described in Section 2, implies the need to track
values on top of the stack to resolve call sites relying on some form of indirection. Other
sound approaches can be improved in precision [
          <xref ref-type="bibr" rid="ref4">4, 5</xref>
          ], while precise approaches are unsound
(MetaDCE2, WAVM3, Manticore [6]).
        </p>
        <p>We propose an approach based on the theory of Abstract interpretation, aiming to improve
precision while retaining soundness.</p>
        <p>We lay out the paper as follows: we introduce background details about WebAssembly and
Abstract interpretation in Section 2; we describe our formal approach in Section 3; we present
related work in Section 4 and we conclude in Section 5.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>In this section, we start by giving an overview of WebAssembly. We then introduce basic notions
of call graphs, what soundness entails when analyzing WebAssembly modules, and Abstract
interpretation.</p>
      <sec id="sec-2-1">
        <title>2.1. Basics of WebAssembly</title>
        <p>WebAssembly is a simple, typed, assembly-like language whose execution model is based on a
stack machine, implementing a set of low-level instructions, less than 200 in total. It is designed
to be a universal compilation target, mostly for the LLVM family of programming languages.
The resulting binary is then run on an ad-hoc Virtual Machine (VM). The most common example
of WebAssembly Virtual Machine as of today is a sandboxed environment in a browser, but
stand-alone runtime environments for resource-constrained devices are available4. Overall, the
paradigm is very similar to the one that Java adopts: source code is compiled into bytecode and</p>
        <sec id="sec-2-1-1">
          <title>2https://github.com/WebAssembly/binaryen</title>
          <p>3https://wavm.github.io/
4https://github.com/bytecodealliance/wasm-micro-runtime
then executed by a virtual machine. Such a paradigm allows the code to be executed, in theory,
on every platform, as long as a VM is available.</p>
          <p>WebAssembly manages intra-procedural control flow predictably by using labeled jump-like
instructions5 [7]. This allows for the production of a correct control flow graph of a function by
simply analyzing the syntax, whereas in more traditional machine code, such as x86 Assembly,
control flow graph recovery is an undecidable problem [8, 9].</p>
          <p>Notice also that, in WebAssembly, function calls are performed by either the call operator
or the call_indirect operator: the latter can be compared to calling a function pointer and,
as we will see below, is the only source of lack of precision in call graph construction. In
WebAssembly function “names" actually are indices, as such, both of call instructions work
on numerical indices, either passed directly to the instruction as in call, or via the stack in
call_indirect and referring to an index in a table.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Call graphs</title>
        <p>Formally, the call graph of a program is a directed graph  = (, ) in which the set of nodes 
is the set of functions in the program, and there exists an edge (, ) ∈  ⊆  ×  if function
 calls function  in the program. Apart from the verification of properties such as call ordering
constraints, the availability of a call graph is a requirement in many other inter-procedural
analyses that need to track the flow of data in a program.</p>
        <p>The construction of an exact call graph is an undecidable problem, and call graphs are usually
computed by over-approximating the possible calls (i.e., the set  contains edges containing all
the actual calls, and additional spurious edges). Dynamic features such as function pointers
in C/C++ further complicate the issues of computing a call graph. WebAssembly implements
function pointers using tables containing references to functions.</p>
        <p>
          In object-oriented programming languages, the main approach to static call graph construction
is Class Hierarchy Analysis [10] and its improvements. The main idea of Class Hierarchy
Analysis is to employ the inheritance hierarchy to refine the set of methods that could be invoked
at a specific program location. WebAssembly does not have a class hierarchy, so this method
is not directly applicable. Instead, tools like Wassail [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] and Wasmati [11] over-approximate
calls by looking at the type of the function appearing in a call_indirect instruction (i.e., the
content of the stack is ignored). A more granular call graph can only be constructed by tracking
the values that are on top of the stack when call_indirect is used.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Threats to soundness</title>
        <p>WebAssembly modules are designed with host-module interoperability in mind. One of the
possible interactions they can have includes the host performing mutations of the running
instance of the WebAssembly module. To be more specific, instances of shared memories and
tables can be mutated, entailing that with no information available to the analysis about what
the host code does, the analysis result may be unsound. For example, our property of interest is
the call graph. If the target of a call_indirect instruction depends on a value from a memory
block mutated by the host code, the analyzer cannot infer this behavior. As another example,
5https://webassembly.github.io/spec/core/syntax/instructions.html#syntax-instr-control
we could consider a table containing function references modified by the host at some point
during the execution. For the sake of this work, we assume that the module under examination
is closed and address this issue in Section 6.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Abstract interpretation</title>
        <p>Abstract interpretation [12] is a theory of sound approximation of the semantics of computer
programs. It provides a methodology for defining rigorously sound static analyses. Its goal is to
derive information about a program’s semantics without performing an exhausting exploration
of the program under analysis. We obtain this by making the semantics decidable at the
cost of precision (Rice’s theorem, halting problem) but in a way that the semantics remain
sound compared to the concrete semantics. If a concrete interpretation is the evaluation of
concrete semantics, an abstract interpretation is the evaluation of abstract semantics. How
precise abstract semantics are when compared to concrete semantics is a matter of compromise
between computability and tractability of the problem. Due to this factor, tailoring the abstract
semantics to the program properties of interest can be an optimal design choice. Abstract
interpretation sees application in proving software properties, such as the absence of critical
errors, including detecting division-by-zero, NULL pointer dereferencing, etc.</p>
        <p>Abstract interpretation makes wide use of upper bound operators, denoted by ⊔̂︀ and used to
compute control-flow joins, and widening, denoted by ∇̂︀ and used in the computation of loops
and recursion in a way that ensures termination. These operators are sound, and they influence
the precision of the result of the analysis.</p>
        <p>
          For WebAssembly, to-date the only known abstract interpreter capable of producing a call
graph is Sturdy [5, 13]. It uses only non-relational abstractions, which can afect the precision of
the analysis. Wassail [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], too, uses notions of abstract interpretation to perform taint analysis.
3. wassilly
wassilly (WebAssembly Ain’t So Silly) is an abstract interpreter that we are actively developing. It
is tailored to constructing WebAssembly call graphs but can be extended to verify more general
properties, such as the absence of runtime errors.
        </p>
        <p>We introduce and briefly explain the architecture of the tool via the semantics we construct
and some of the peculiarities we design for this implementation. We omit the abstract semantics
of particular instructions if they are homomorphic to their concrete counterparts.</p>
      </sec>
      <sec id="sec-2-5">
        <title>3.1. Semantics</title>
        <p>We introduce concrete and abstract denotational semantics for a simplified version of
WebAssembly. Semantics is the rigorous definition of the meaning of the syntactic constructs of a
programming language.</p>
        <p>The form of semantics we use employs an explicit passing of the next function to compute
and is known as continuation semantics. We denote continuations with the symbol  .</p>
        <p>In the context of our semantics, we define a memory  , holding: a store  comprising of
bindings of values to global variable declarations; an indexed table referencing functions; a
reference to the topmost element (in-context) in the call (or context) stack,  . The global store
and in-context value stack and local variables are available for reading and updating via specific
accessors embedded in  , respectively: ., ., ..</p>
        <p>We use an environment  to keep track of local and global variable declarations.</p>
        <p>The symbol  is used as a timestamp to diferentiate running function instances.  is a
map from label statements to a tuple consisting of a continuation  and a transformation 
happening in case branching instructions such as br or br_if are interpreted with any given
label  as the argument.</p>
        <p>Our semantics take as inputs:  ,  ,  ,  , ,  and produce a function consisting of the
continuation  taking as input the new values for  ,  ,  , ,  . The “normal" continuation 
might be overridden with  ′ in case of non-linear control flow.</p>
        <p>The semantics we defined cover what we deem as the main language concerns, keeping aside
memory operations, which behave similarly to global variables, hence why they are excluded
from this work. Typing is assumed always to be i32, 32-bit integer.</p>
        <p>Auxiliary functions and notations In order to ease the reading of the semantics we define
a set of auxiliary functions. These include basic operations on the stack, boolean evaluation,
variable allocation, and a timestamp to diferentiate function instances. Some functions that
will appear in the semantics will be self-explanatory (e.g:  , ) and are not included in
this table.</p>
        <p>= ( ′, )
 ℎ   ˜ =  ′
 _  =
  ′, ℎ =   
  ′′,  =   ′ 
( ′′, , ℎ)
 _   = ( ′, ˜)
   =
  ′,  =   
 = 0 ? ( ′,  ) : ( ′, )
 ˆ︂ ̂︀ =</p>
        <p>̂︀′, ̂︀ =  ̂︀ ̂︀′, ̂︀
   = ′
   =  
   =</p>
        <sec id="sec-2-5-1">
          <title>Pop an element (value, label) from the stack.</title>
          <p>Push an element (value, label) to the stack.
 is performed twice.</p>
          <p>is performed  times.</p>
          <p>Concrete int-to-boolean evaluation:
0 is  , any non-zero value is .</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>Abstract int-to-boolean evaluation:</title>
          <p>the expression  is evaluated later.</p>
          <p>̂︀</p>
          <p>Timestamp generator.</p>
          <p>Allocates a local variable.</p>
          <p>Allocates a global variable.
  ′,  =</p>
          <p>( ) 
  ′ =    0</p>
          <p>′ ′
 (ℎ .
  ′, ˜
3.1.1. Concrete semantics
In Table 2 we define the concrete semantics of the reduced version of the WebAssembly
instruction set we studied. To avoid ambiguities, we provide a brief description of the semantics.
 ℎ  ()   ( )K
  ′,  =</p>
          <p>=
 ? J  ()K
′ : J  ( )K
′
Concrete semantics
J  (˜) (˜) ()K =
 ′ =   
  ′,  =    ′ ( ˜) 
  ′ =    ′ ( ˜)  
  =   ˜ 
  ′ = ℎ  (, ,  ) 
 ( Θ .JK (Θ→ ′) ′′ ′ )
J K =
  =  . [] 
  ′, ˜ = _ . ( . [].) 
 ˜︂, ˜ = . [].,</p>
          <p>. []. 
 ˜ =  ( , . ( , )) ˜︂, ˜</p>
          <p>J  (˜) (˜) ()K ′
J_ K =
  ′,  =   
   = ..  []</p>
          <p>J ( )K ′
JK =
  ′,  ′,  ′,  =   
 ′(  ) ′</p>
        </sec>
        <sec id="sec-2-5-3">
          <title>Evaluation of a function definition:</title>
          <p>as a definition can be recursive,
we use the   operator.</p>
        </sec>
        <sec id="sec-2-5-4">
          <title>Evaluation of a direct function call:</title>
          <p>the body is retrieved from  ,
the function parameters are bound,
then the body is evaluated.</p>
        </sec>
        <sec id="sec-2-5-5">
          <title>Evaluation of function indirection.</title>
          <p>The reference is held on a table,
and it is retrieved by
the function  .</p>
        </sec>
        <sec id="sec-2-5-6">
          <title>Evaluation of return from function.</title>
          <p>3.1.2. Abstract semantics
In Table 3 we introduce the abstract semantics functions. As already mentioned, we only
introduce the semantics of instructions that are non-homomorphic to their concrete counterparts.
The descriptions on the side provide an explanation of how the abstract semantic functions
difer from their concrete version.</p>
        </sec>
        <sec id="sec-2-5-7">
          <title>Abstract semantics</title>
          <p>JK , ̂︀̂︀ ′̂︀ ==  
 ̂︀′ =  ̂︀
  ′,  ′′ =    :
 →   =  ′ ∇̂︀ ̂︀′ 
 [ ← (),  ], 
 →   =  ′ ⊔̂︀ ̂︀′ 
 [ ← (),  ], 
  ′′̂︀′ ′̂︀ ̂︀</p>
        </sec>
        <sec id="sec-2-5-8">
          <title>Brief description</title>
        </sec>
        <sec id="sec-2-5-9">
          <title>Abstract , with respect to its</title>
          <p>concrete counterpart, computes the
widening ( ∇̂︀ ) or the upper bound (⊔̂︀)
between the “old" computation  ′,</p>
          <p>and the “new" one, ̂︀′.</p>
          <p>With this mechanism we obtain  ′′.</p>
          <p>The ∇̂︀ operator ensures termination.</p>
          <p>Abstract semantics
J_, K ′ ̂︀̂=︀̂︀  =
 ̂︀′, ̂︀ = ˆ︂ ̂︀
 ̂︀′ =  (    ′)</p>
          <p>̂︀ ̂︀
 ̂︀′ =   ¬̂︀ ̂︀′ 
  ′,  ′′ =    :
 →   =  ′ ∇̂︀ ̂︀′   [ ←
 →   =  ′ ⊔̂︀ ̂︀′   [ ←
  ′′̂︀′ ′̂︀ ̂︀ ⊔̂︀  ̂︀′  ̂︀ ̂︀
J ℎ  (; )   ( ; )K ̂︀̂︀ ̂︀ =
 ̂︀′, ̂︀ = ˆ︂ ̂︀
J  (; )K ( ̂︀ ̂︀′)̂︀ ̂︀ ⊔̂︀</p>
          <p>J  ( ; )K ( ¬̂︀ ̂︀′)̂︀ ̂︀
J_ K ̂︀̂︀ ̂︀ =
 ̂︀′, ̂︀ =  ̂︀
   = ..  []</p>
          <p>̂︀
⊔̂︀({∀  ∈ ̂︀(̂︀)} | J ( )  ̂︀′̂︀ ̂︀})
K
(),  ], 
(),  ], 
A function   is applied.</p>
          <p>This improves the precision</p>
          <p>of the analysis.
̂︀′, ̂︀′ are the memories in which,
respectively,  is true and false.</p>
          <p>̂︀</p>
        </sec>
        <sec id="sec-2-5-10">
          <title>Computes the upper bound between</title>
          <p>the computations in  and  .
Similarly to _ ,   is applied.</p>
        </sec>
        <sec id="sec-2-5-11">
          <title>Computes the upper bound between</title>
          <p>the calls to the possible callees.</p>
          <p>Target functions are type filtered.</p>
          <p>The notation omits this detail for
triviality and readability.</p>
          <p>In this semantic notation, we omit the specific semantics of the call graph for readability.
Briefly explained: any instruction that is not call i or call_indirect t will produce
an empty set of callees . The call i instruction will produce a singleton {i}, and the
call_indirect instruction will produce a set containing the function indexes obtained by
resolving the indirections generated by concretizing the abstract value on top of the stack. As
noted, we also filter possible callee functions that do not match the type t.</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>3.2. Architecture of wassilly</title>
        <p>As mentioned, wassilly is a specialized tool for constructing call graphs in a “reasonably" precise,
yet sound, manner. We can achieve this by deducing what value(s) can be on top of the stack
when a call_indirect instruction is met. This can be done with a varying degree of precision.
For our work, we use a relational abstraction to represent values. We then build specialized
data structures on top of it to represent the operand stack, the call stack, and the linear memory.
As memory operations are outside the scope of this work, and the call stack does not present
any particular novelty, we will not discuss those and will focus solely on the operand stack.</p>
        <p>One of the peculiarities of our implementation is its capability to analyze WebAssembly code
directly as stack machines, relationally, and lazily. This is not the case for other analyzers for
languages based on stack machines. For example, Sturdy [13, 5], to the best of our knowledge,
is only capable of non-relational analyses when considering stack machines. Other tools,
such as Soot [14], Wimpl [15], and cmm_of_wasm6 employ some kind of explicit language
transformation step.</p>
        <p>The second peculiarity of our implementation of the operand stack consists of lazily building
abstract expressions on the fly, without necessarily evaluating them (i.e. if we are adding two
integer operands together and we are not interested in integer overflows/underflows, we do
not evaluate the expression, we pop e1, e2 from the stack and push the resulting mathematical
expression e1+e2 to the operand stack). This applies only to the cases in which it is possible
to construct an expression representing the content of the stack lazily. For instance, bitwise
operations are not performed this way.</p>
      </sec>
      <sec id="sec-2-7">
        <title>3.3. Discussion</title>
        <p>
          As the tool is under development a real experimental evaluation cannot be performed yet.
Ignoring performance issues, we demonstrate how the tool operates on the common language
concerns in WebAssembly. The systematic literature review in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] exhaustively showcases all
of the language concerns when we are interested in building call graphs.
        </p>
        <p>We will also make some brief considerations about the language concerns outside the scope
of this paper: mutations performed by the host and memory instructions.</p>
        <p>
          The work presented in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] assesses that the main language concerns that are computed
in an unsound manner are the reachability of functions in imported and exported tables. In
WebAssembly the reachability of references in exported tables should be given for granted, as
they are equivalently reachable as exported functions. It can be helpful to think of them as a
mutable set of exported function references as opposed to a set of exported functions.
        </p>
        <p>Regarding imported functions, while we do not have any information about the behavior of
such functions, unless some form of extra information is available, we can always asses that
they are reachable and can give an unconstrained function output of the correct typing, as it is
explicit in the function signature.</p>
        <p>Besides these points and concerns relative to unknown imported objects, we are potentially
able to cover most of the rest of the concerns with at least the same degree of precision as
the state of the art, while retaining soundness, as we use inference rules which consider the
semantics of a given program instead of type inference only, and a relational abstraction that
allows to derive stronger properties.</p>
        <p>Some specific language concerns, such as “10: Table init. ofset is imported from host" would
need some form of specification to provide a conclusive analysis. This means that with no
additional information available we either produce a sound and very imprecise analysis or an
unsound one, which is not a desirable result for the scope of this work. This need for additional
information takes us to the next point: table mutability.</p>
        <p>The concern of table mutability performed by the host is, in a fundamental way, similar
to the previous one: if we are to produce a sound analysis we should assume that calling a
host function can mutate the tables, and without information available we are “forced" to be
conservative and assume we do not know what function reference is held at any given table
entry, making the analysis of some programs as imprecise as an analysis based on function
types only.</p>
        <sec id="sec-2-7-1">
          <title>6https://github.com/SimonJF/cmm_of_wasm</title>
          <p>We can generalize these concerns as “Target callee is dependant on host behavior". This
covers the concerns: “9: Table is mutated by host", “10: Table init. ofset is imported from host",
“11: Memory init. ofset is imported from host".</p>
          <p>Further indications about this issue will be discussed in Section 6.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Related work</title>
      <p>WebAssembly was first released in 2017 [ 16] and can therefore be considered a relatively “young"
language. The formal specification and a reference implementation in OCaml are available
online [17]. An Isabelle specification is also available in [18].</p>
      <p>The work presented in [19] shows how WebAssembly can be used as an attack vehicle to
write to memory and cause unexpected behavior in the host environment. The tool Wasmati
introduced in [11] employs a graph-based representation of WebAssembly code, called Code
Property Graphs (CPG), and checks security properties on these using a query language. Wasmati
requires a call graph but can generate one by considering function signatures only.</p>
      <p>Even in the absence of bufer overflows and similar attacks, it is still possible for WebAssembly
to leak information in unintended ways; to address this issue, recent work has focused on
information flow analysis, see for instance [7] and [20].</p>
      <p>
        Tools that focus, more in general, on static analysis of WebAssembly include WASP [21],
Manticore [6], and Wassail [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The first two tools are based, respectively, on symbolic and
concolic execution. Wassail, instead, can be seen as a library to develop analyses. Wassail also
supports program slicing [22] and the computation of control flow graphs and call graphs. In
particular, similar to Wasmati, it can provide an over-approximation of the call graph for a
specific WebAssembly module by relying on the types of functions invoked in indirect calls (see
Section 2). Sturdy [5, 13] presents another framework to develop and perform static analysis. It
is a modular, extensible framework based on abstract interpretation. It supports WebAssembly
bytecode and out-of-the-box can perform analyses such as dead code analysis and taint analysis.
      </p>
      <p>The work presented in [15] uses a diferent approach to verifying WebAssembly code. The
authors introduce a simplified C-like Intermediate Representation called Wimpl. The goal is to
use existing tools for C instead of developing verification tools from scratch.</p>
      <p>
        The work in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] presents a comparison between tools that are capable of constructing a call
graph for WebAssembly dynamically or statically, introducing a set of micro-benchmarks that
allow the assessment of the precision of a certain analyzer via specifically crafted edge cases.
Such edge cases cover the language concerns regarding indirect calls.
      </p>
    </sec>
    <sec id="sec-4">
      <title>5. Conclusion</title>
      <p>Given the increasing importance of WebAssembly and the importance of having static
analysis support, we have given concrete and abstract semantics for a simplified version of the
WebAssembly language. We also highlighted some additional challenges which we do not cover
in this work and propose means to tackle them.</p>
      <p>The abstract semantics we specified form the basis of an analyzer based on the theory of
Abstract Interpretation that is currently being developed.</p>
      <p>We also described the approach we are pursuing in the implementation of the analyzer,
which features a lazy operand stack and the ability to directly analyze WebAssembly bytecode
relationally. Based on this, we can predict our approach to be at least as precise as the sound ones
present in the state-of-the-art, whilst potentially being capable of tackling language concerns
which, at the moment, are being computed in an unsound manner.</p>
    </sec>
    <sec id="sec-5">
      <title>6. Future work</title>
      <p>The ability of WebAssembly to work as an external mechanism to provide high-performance
computations is what makes it so popular. Unfortunately, we cannot assume in a definitive
manner that all WebAssembly modules work in isolation and are closed. Some possible future work
in improving the analysis of WebAssembly modules can and should include some mechanism
for defining or inferring host-module(s) interactions.</p>
      <p>This includes analysis of imported host functions (and subsequently analysis of the functions
depending on them) and production of function summaries or a form of specification in a custom
Domain Specific Language.</p>
      <p>Trivially, a DSL would allow to manually set the rules we would instead infer, and would
work as a sort of specification contract (i.e: the soundness of the analysis is guaranteed with
respect to the contract). A similar approach has been explored in [23] for symbolic execution
but, to the best of our knowledge, no approaches are available for abstract interpretation.</p>
      <p>Both of these approaches have pros and cons: the former keeps the analysis fully automated,
but the analysis still has to maintain soundness at the cost of precision. The latter requires
the handwriting of a specification and a degree of knowledge about the specific host program
interacting with the WebAssembly module: it is potentially more precise, as false positives can
be ruled out, but it is prone to be influenced by human error.
[5] K. Brandl, S. Erdweg, S. Keidel, N. Hansen, Modular abstract definitional interpreters
for webassembly, in: K. Ali, G. Salvaneschi (Eds.), 37th European Conference on
ObjectOriented Programming, ECOOP 2023, July 17-21, 2023, Seattle, Washington, United States,
volume 263 of LIPIcs, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2023, pp. 5:1–5:28.
URL: https://doi.org/10.4230/LIPIcs.ECOOP.2023.5. doi:10.4230/LIPIcs.ECOOP.2023.
5.
[6] M. Mossberg, F. Manzano, E. Hennenfent, A. Groce, G. Grieco, J. Feist, T. Brunson,
A. Dinaburg, Manticore: A user-friendly symbolic execution framework for binaries
and smart contracts, in: 34th IEEE/ACM International Conference on Automated Software
Engineering, ASE 2019, San Diego, CA, USA, November 11-15, 2019, IEEE, 2019, pp. 1186–
1189. URL: https://doi.org/10.1109/ASE.2019.00133. doi:10.1109/ASE.2019.00133.
[7] Q. Stiévenart, C. De Roover, Compositional information flow analysis for webassembly
programs, in: 2020 IEEE 20th International Working Conference on Source Code Analysis
and Manipulation (SCAM), IEEE, 2020, pp. 13–24.
[8] M. H. Nguyen, T. B. Nguyen, T. T. Quan, A hybrid aproach for control flow graph
construction from binary code, in: 20th Asia-Pacific Software Engineering Conference,
2013.
[9] A. Flores-Montoya, E. M. Schulte, Datalog disassembly, in: S. Capkun, F. Roesner (Eds.),
29th USENIX Security Symposium, USENIX Security 2020, August 12-14, 2020, USENIX
Association, 2020, pp. 1075–1092. URL: https://www.usenix.org/conference/usenixsecurity20/
presentation/flores-montoya.
[10] J. Dean, D. Grove, C. Chambers, Optimization of object-oriented programs using static
class hierarchy analysis, in: European Conference on Object-Oriented Programming,
Springer, 1995, pp. 77–101.
[11] T. Brito, P. Lopes, N. Santos, J. F. Santos, Wasmati: An eficient static vulnerability scanner
for webassembly, Computers &amp; Security 118 (2022) 102745. URL: https://www.sciencedirect.
com/science/article/pii/S0167404822001407. doi:https://doi.org/10.1016/j.cose.
2022.102745.
[12] P. Cousot, R. Cousot, Abstract interpretation: A unified lattice model for static analysis
of programs by construction or approximation of fixpoints, in: R. M. Graham, M. A.
Harrison, R. Sethi (Eds.), Conference Record of the Fourth ACM Symposium on Principles
of Programming Languages, Los Angeles, California, USA, January 1977, ACM, 1977, pp.
238–252. URL: https://doi.org/10.1145/512950.512973. doi:10.1145/512950.512973.
[13] S. Keidel, S. Erdweg, T. Hombücher, Combinator-based fixpoint algorithms for big-step
abstract interpreters, Proc. ACM Program. Lang. 7 (2023) 955–981. URL: https://doi.org/10.
1145/3607863. doi:10.1145/3607863.
[14] R. Vallée-Rai, P. Co, E. Gagnon, L. J. Hendren, P. Lam, V. Sundaresan, Soot - a java bytecode
optimization framework, in: S. A. MacKay, J. H. Johnson (Eds.), Proceedings of the 1999
conference of the Centre for Advanced Studies on Collaborative Research, November 8-11,
1999, Mississauga, Ontario, Canada, IBM, 1999, p. 13. URL: https://dl.acm.org/citation.cfm?
id=782008.
[15] M. Thalakottur, F. Tip, D. Lehmann, M. Pradel, Wimpl: A simple ir for static analysis of
webassembly binaries, in: Program Analysis for WebAssembly (PAW) 2022, 2022.
[16] A. Haas, A. Rossberg, D. L. Schuf, B. L. Titzer, M. Holman, D. Gohman, L. Wagner, A. Zakai,
J. Bastien, Bringing the web up to speed with webassembly, in: Proceedings of the 38th
ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI),
2017, pp. 185–200.
[17] A. Rossberg, WebAssembly Core Specification, 2019. URL: https://www.w3.org/TR/
wasm-core-1/.
[18] C. Watt, Mechanising and verifying the webassembly specification, in: Proceedings of the
7th ACM SIGPLAN International Conference on certified programs and proofs, 2018, pp.
53–65.
[19] D. Lehmann, J. Kinder, M. Pradel, Everything old is new again: Binary security of
WebAssembly, in: 29th USENIX Security Symposium (USENIX Security 20), USENIX
Association, 2020, pp. 217–234. URL: https://www.usenix.org/conference/usenixsecurity20/
presentation/lehmann.
[20] I. Bastys, M. Algehed, A. Sjösten, A. Sabelfeld, Secwasm: Information flow control for
webassembly, in: Program Analysis for WebAssembly (PAW), 2022.
[21] F. Marques, J. Fragoso Santos, N. Santos, P. Adão, Concolic Execution for WebAssembly,
in: K. Ali, J. Vitek (Eds.), 36th European Conference on Object-Oriented Programming
(ECOOP 2022), volume 222 of Leibniz International Proceedings in Informatics (LIPIcs),
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl, Germany, 2022, pp. 11:1–
11:29. URL: https://drops.dagstuhl.de/opus/volltexte/2022/16239. doi:10.4230/LIPIcs.</p>
      <p>ECOOP.2022.11.
[22] Q. Stiévenart, D. W. Binkley, C. De Roover, Static stack-preserving intra-procedural slicing
of webassembly binaries, in: Proceedings of the 44th International Conference on Software
Engineering, 2022, pp. 2031–2042.
[23] N. He, Z. Zhao, J. Wang, Y. Hu, S. Guo, H. Wang, G. Liang, D. Li, X. Chen, Y. Guo,
Eunomia: Enabling user-specified fine-grained search in symbolically executing webassembly
binaries, in: R. Just, G. Fraser (Eds.), Proceedings of the 32nd ACM SIGSOFT
International Symposium on Software Testing and Analysis, ISSTA 2023, Seattle, WA, USA,
July 17-21, 2023, ACM, 2023, pp. 385–397. URL: https://doi.org/10.1145/3597926.3598064.
doi:10.1145/3597926.3598064.</p>
      <sec id="sec-5-1">
        <title>Program (instruction, function)</title>
      </sec>
      <sec id="sec-5-2">
        <title>Environment</title>
      </sec>
      <sec id="sec-5-3">
        <title>Abstract environment</title>
      </sec>
      <sec id="sec-5-4">
        <title>Abstract memory</title>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ene</surname>
          </string-name>
          ,
          <article-title>How prime video updates its app for more than 8,000 device types</article-title>
          , https://www.amazon.science/blog/ how
          <article-title>-prime-video-updates-its-app-for-more-than-8-000-device-</article-title>
          <string-name>
            <surname>types</surname>
          </string-name>
          ,
          <year>2022</year>
          . Accessed:
          <fpage>2022</fpage>
          -10-11.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Schroeder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Jordan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schulz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hanley</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Fay, Introducing the disney+ application development kit (adk), https://medium.com/disney-streaming/ introducing-the-disney-application-development-kit-adk-</article-title>
          <string-name>
            <surname>ad85ca139073</surname>
          </string-name>
          ,
          <year>2021</year>
          . Accessed:
          <fpage>2022</fpage>
          -10-10.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Thalakottur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Tip</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pradel</surname>
          </string-name>
          ,
          <article-title>That's a tough call: Studying the challenges of call graph construction for webassembly</article-title>
          , in: R. Just, G. Fraser (Eds.),
          <source>Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis</source>
          ,
          <source>ISSTA</source>
          <year>2023</year>
          , Seattle, WA, USA, July
          <volume>17</volume>
          -
          <issue>21</issue>
          ,
          <year>2023</year>
          , ACM,
          <year>2023</year>
          , pp.
          <fpage>892</fpage>
          -
          <lpage>903</lpage>
          . URL: https: //doi.org/10.1145/3597926.3598104. doi:
          <volume>10</volume>
          .1145/3597926.3598104.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Stiévenart</surname>
          </string-name>
          , C. De Roover,
          <article-title>Wassail: a webassembly static analysis library</article-title>
          ,
          <source>in: Fifth International Workshop on Programming Technology for the Future Web</source>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>