=Paper= {{Paper |id=Vol-2908/paper15 |storemode=property |title=AXDInterpolator: A Tool for Computing Interpolants for Arrays with MaxDiff |pdfUrl=https://ceur-ws.org/Vol-2908/paper15.pdf |volume=Vol-2908 |authors=Jose Abel Castellanos Joo,Silvio Ghilardi,Alessandro Gianola,Deepak Kapur |dblpUrl=https://dblp.org/rec/conf/smt/JooGGK21 }} ==AXDInterpolator: A Tool for Computing Interpolants for Arrays with MaxDiff== https://ceur-ws.org/Vol-2908/paper15.pdf
AXDInterpolator: A Tool for Computing Interpolants
for Arrays with MaxDiff
Jose Abel Castellanos Joo1 , Silvio Ghilardi2 , Alessandro Gianola3 and Deepak Kapur1
1
  Department of Computer Science, University of New Mexico, Albuquerque, USA
2
  Dipartimento di Matematica, Università degli Studi di Milano, Milan, Italy
3
  Faculty of Computer Science, Free University of Bozen-Bolzano, Bolzano, Italy


                                         Abstract
                                         Several approaches toward quantifier-free interpolation algorithms of theories involving arrays have been
                                         proposed by extending the language using a binary function skolemizing the extensionality principle. In
                                         FoSSaCS 2021, the last three authors studied the enrichment of the McCarthy’s theory of extensional
                                         arrays with a maxdiff operation. This paper discusses the implementation of the interpolation algorithm
                                         proposed in FoSSaCS 2021 using the Z3 API. The implementation allows the user to choose iZ3,
                                         Mathsat, or SMTInterpol as interpolation engines. The tool returns a formula in SMTLIB2 format,
                                         which allows compatibility with model checkers and invariant generators using such a format. We
                                         compare our algorithm with state-of-the-art interpolation engines. Our experiments using unsatisfiable
                                         formulæ extracted with the model checker UAutomizer show the feasibility of our tool. For that purpose,
                                         we used C programs from the ReachSafety-Arrays and MemSafety-Arrays tracks of SV-COMP.

                                         Keywords
                                         Interpolation, Arrays, MaxDiff, SMT




1. Introduction
Since McMillan’s seminal papers [1, 2], interpolation has been successfully applied in software
model checking. Even if the well-known Craig’s theorem [3] guarantees the existence of
interpolants in first order logic, it does not give any information on the shape the interpolant
can have when a specific theory is involved. Nevertheless, this is crucial for the applications:
when we extract interpolants, we are typically handling a theory which might be undecidable,
but whose quantifier-free fragment is decidable for satisfiability (usually within a somewhat
‘reasonable’ computational complexity). Thus, it is desirable (although not always possible) that
the interpolant is quantifier-free, a fact which is not guaranteed in general. This is why a lot
of effort has been made in analyzing quantifier-free interpolation: given two quantifier-free
formulæ 𝐴 and 𝐵 such that 𝐴 ∧ 𝐵 is not satisfiable (modulo a theory 𝑇 ), a quantifier-free
interpolant 𝐶 is a quantifier-free formula such that 𝑇 |= 𝐴 → 𝐶, 𝑇 |= 𝐶 ∧ 𝐵 → ⊥ and such
that 𝐶 contains only the variables which occur both in 𝐴 and in 𝐵.
   The specific theories we consider here are variants of McCarthy’s theory of arrays [4] with
extensionality. This theory is suitable to formalize programs over arrays, like standard copying,

SMT 21: 19th International Workshop on Satisfiability Modulo Theories, 18–19 July 2021, online
" jose.castellanosjoo@cs.unm.edu (J. A. Castellanos Joo); silvio.ghilardi@unimi.it (S. Ghilardi); gianola@inf.unibz.it
(A. Gianola); kapur@cs.unm.edu (D. Kapur)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)



                                                                                                          40
comparing, searching, sorting functions; verification problems of this kind are collected, e.g., in
the SV-COMP benchmarks categories “ReachSafety-Arrays” and “MemSafety-Arrays”1 , where
safety verification tasks involving arrays of finite but unknown length are considered.
    By itself, the theory of arrays with extensionality does not have quantifier free interpo-
lation [5]; however, in [6] it was shown that quantifier-free interpolation is restored if one
enriches the language with a binary function skolemizing the extensionality axiom. Such a
Skolem function, applied to two array variables 𝑎, 𝑏, returns an index diff(𝑎, 𝑏) where 𝑎, 𝑏
differ (it returns an arbitrary value if 𝑎 is equal to 𝑏). This semantics for the diff operation is
very undetermined and does not have a significant interpretation in concrete programs. That is
why in [7] the diff operation is modified so as to obtain a defined and natural meaning: we
introduce the theory of arrays with maxdiff 𝒜ℛ𝒟(𝑇𝐼 ), where diff(𝑎, 𝑏) returns the biggest
index where 𝑎, 𝑏 differ (in case 𝑎 = 𝑏 we ask for diff(𝑎, 𝑏) to be the minimum index 0). Since
it is natural to view arrays as functions defined on initial intervals of the nonnegative integers,
this choice has a clear semantic motivation. The expressive power of the theory of arrays so
enriched becomes bigger: for instance, if we also add to the language a constant symbol 𝜖 for
the undefined array constantly equal to some ‘undefined’ value ⊥ (where ⊥ is meant to be
different from the values 𝑎[𝑖] actually in use), then we can define |𝑎| as diff(𝑎, 𝜖). In this way
we can model the fact that 𝑎 is undefined outside the interval [ 0, |𝑎| ] - this is useful to formalize
the above mentioned SV-COMP benchmarks. The effectiveness of quantifier-free interpola-
tion in the theory of arrays with maxdiff is exemplified in the Strcpy function example of
Figure 1 in [7]:2 we need to express | · | to formalize its initial states without quantifiers. In
[7], 𝒜ℛ𝒟(𝑇𝐼 ) is shown to admit quantifier-free interpolation, and a suitable algorithm based
on instantiation à-la-Herbrand and on a hierarchical approach ([8, 9, 10]) is introduced: by
exploiting iterated applications of the maxdiff operator and reducing the interpolation problem
to a reduced interpolating theory, we can compute quantifier-free interpolants out of the input
formulæ.
    In this paper, we present (in Section 3) AXDInterpolator3 , a tool that implements the interpo-
lation algorithm of [7]. Following an instantiation-based approach, the algorithm always (and
in particular for all the examples in this paper) returns a quantifier-free formula. A comparison
is done with the state-of-the-art interpolation engines that have known interpolant generation
algorithms for the theory of arrays–iZ3, MathSat, and SMTInterpol [11, 12, 13] support
the theory of arrays with maxdiff. Indeed, on examples supporting expressive features like
diff or the | · | operator, no proper comparison is possible. In case unsafe traces for such exam-
ples are considered, in contrast to AXDInterpolator, Mathsat and iZ3 cannot usually get a
quantifier-free interpolant: either the tools detect the satisfiability of the unsafe trace (since they
reason modulo a theory that is weaker than 𝒜ℛ𝒟(𝑇𝐼 )) so they cannot compute an interpolant,

    1
       https://sv-comp.sosy-lab.org/2020/benchmarks.php
    2
       The Strcpy function performs 𝑁 sequential index-wise copies of values of two arrays 𝑎, 𝑏 from 𝑎 to 𝑏 where
𝑁 is the length of both arrays: the loop invariant for this examples is 𝑎 = 𝑏 ∨ (𝑁 > diff(𝑎, 𝑏) ∧ diff(𝑎, 𝑏) ≥ 𝐼),
𝐼 being the index variable in the loop. This formula is a loop invariant because if the two arrays 𝑎, 𝑏 are different
while 𝐼 < 𝑁 then the conjunction 𝑁 > diff(𝑎, 𝑏) ∧ diff(𝑎, 𝑏) ≥ 𝐼 asserts that the possibly entry where 𝑎, 𝑏
differ has not been processed yet.
     3
       We use the prerelease version 0.9.0 of our tool: https://github.com/typesAreSpaces/AXDInterpolator/releases/
tag/0.9.0



                                                         41
or, in case the unsafe trace is correctly detected as unsat (equipping the solver with axioms for
maxdiff), they fail returning errors or quantified interpolant or timeout; for Example 1 from [7],
SMTInterpol fails. Mathsat and iZ3 fail also in the case of the example from [5], which is
well-known not to admit a quantifier-free interpolant unless the extensionality axiom is skolem-
ized with a generic diff operator: in this specific case, SMTInterpol, instead, can successfully
compute a quantifier-free interpolant using the plain diff operator, but AXDInterpolator can
compute an interpolant with maxdiff a bit faster. Running experiments over two tracks of the
SV-COMP (ReachSafety-Arrays and MemSafety-Arrays), we use unsatisfiable formulæ extracted
with the model checker UAutomizer [14]: the latter automatically transforms C programs taken
from those benchmarks into suitable SMTLIB2 files that allow to verify particular properties
of these programs. Since UAutomizer does not support the extended language of arrays with
maxdiff, the comparison among the different solvers becomes acceptable. The results show that
AXDInterpolator, iZ3, Mathsat and SMTInterpol have an overall identical behavior over
these benchmarks: interpolants are computed for almost every example, apart from a small
portion of timeout outcomes reached by AXDInterpolator as its implementation is evolving.
On more expressive examples, such as Example 1 from [7], other solvers including the most
expressive SMTInterpol fail, whereas AXDinterpolator produces useful interpolants. AXD-
Interpolator uses the Z3 API [15] and allows the user to choose among different solvers in order
to compute the interpolants in the reduced theory invoked by the hierarchical method: iZ3,
Mathsat [12] and SMTInterpol [13]. It generates a formula in SMTLIB2 format, which
guarantees compatibility with model checkers and invariant generators adopting the same
format.


2. Arrays with MaxDiff
The McCarthy theory of arrays [4] with extensionality has three sorts ARRAY, ELEM, INDEX (called
“array”, “element”, and “index” sort, respectively) and two function symbols 𝑟𝑑 (“read”) and
𝑤𝑟 (“write”) of appropriate arities. Besides the axioms for defining 𝑟𝑑 and 𝑤𝑟, it contains the
‘extensionality’ axiom: ∀𝑥, 𝑦. 𝑥 ̸= 𝑦 → (∃𝑖. 𝑟𝑑(𝑥, 𝑖) ̸= 𝑟𝑑(𝑦, 𝑖)). In [7], the Skolem function
diff for this axiom has been given a proper semantics, i.e., diff returns the biggest index
where two different arrays differ. For introducing our theory of arrays with maxdiff, we first
need to consider an index theory 𝑇𝐼 (see [7] for the general definition). The signature of an
index theory 𝑇𝐼 contains at least the binary relation symbol ≤ and the constant 0. Examples of
index theories are the theory of total orders 𝑇 𝑂, integer difference logic ℐ𝒟ℒ, integer linear
arithmetic ℒℐ𝒜, and real linear arithmetics ℒℛ𝒜. For most applications, ℐ𝒟ℒ (namely the
theory of integer numbers with 0, ordering, successor and predecessor) suffices as in this theory
one can model counters for scanning arrays.
   Given an index theory 𝑇𝐼 , we now introduce our array theory with maxdiff 𝒜ℛ𝒟(𝑇𝐼 )
(parameterized by 𝑇𝐼 ) as follows. We still have the same sorts; the language includes the symbols
of 𝑇𝐼 , the operations 𝑟𝑑, 𝑤𝑟, a binary function diff of type ARRAY × ARRAY → INDEX, as well
as constants 𝜖 and ⊥ of sorts ARRAY and ELEM, resp. The constant ⊥ models an undetermined
(e.g. undefined, not-in-use, not coming from appropriate initialization, etc.) value and 𝜀 models
the totally undefined array; the term diff(𝑥, 𝑦) returns the maximum index where 𝑥 and 𝑦



                                               42
differ and returns 0 if 𝑥 and 𝑦 are equal.4 Formally, the axioms of 𝒜ℛ𝒟(𝑇𝐼 ) include, besides
the axioms of 𝑇𝐼 , the following ones:

                      ∀𝑦, 𝑖, 𝑒.         𝑖 ≥ 0 → 𝑟𝑑(𝑤𝑟(𝑦, 𝑖, 𝑒), 𝑖) = 𝑒                                            (1)
                   ∀𝑦, 𝑖, 𝑗, 𝑒.         𝑖 ̸= 𝑗 → 𝑟𝑑(𝑤𝑟(𝑦, 𝑖, 𝑒), 𝑗) = 𝑟𝑑(𝑦, 𝑗)                                    (2)
                        ∀𝑥, 𝑦.          𝑥 ̸= 𝑦 → 𝑟𝑑(𝑥, diff(𝑥, 𝑦)) ̸= 𝑟𝑑(𝑦, diff(𝑥, 𝑦))                           (3)
                      ∀𝑥, 𝑦, 𝑖.         𝑖 > diff(𝑥, 𝑦) → 𝑟𝑑(𝑥, 𝑖) = 𝑟𝑑(𝑦, 𝑖)                                      (4)
                           ∀𝑥.          diff(𝑥, 𝑥) = 0                                                            (5)
                          ∀𝑥.𝑖          𝑖 < 0 → 𝑟𝑑(𝑥, 𝑖) = ⊥                                                      (6)
                            ∀𝑖.         𝑟𝑑(𝜀, 𝑖) = ⊥                                                              (7)

   As an effect of the above axioms, we have that an array 𝑥 is undefined outside the interval
[0, |𝑥|], where |𝑥| is defined as |𝑥| := diff(𝑥, 𝜀). Typically, this interval is finite, and it can be
shown that any satisfiable constraint is satisfiable in a model where all such intervals (involving
variables appearing in the constraint) are finite [7].
   For the interpolation algorithm from [7], we need to introduce iterated diff operations,
similarly to [10]. As we know, diff(𝑎, 𝑏) returns the biggest index where 𝑎 and 𝑏 differ (it
returns 0 if 𝑎 = 𝑏). Now we want an operator that returns the last-but-one index where 𝑎, 𝑏
differ (0 if 𝑎, 𝑏 differ in at most one index), etc. Our language is already enough expressive
for that, so we can introduce such operators explicitly as follows. Given array variables 𝑎, 𝑏,
we define by mutual recursion the sequence of array terms 𝑏1 , 𝑏2 , . . . and of index terms
diff1 (𝑎, 𝑏), diff2 (𝑎, 𝑏), . . . :

                  𝑏1 := 𝑏;                                             diff1 (𝑎, 𝑏) := diff(𝑎, 𝑏1 );
    𝑏𝑘+1 := 𝑤𝑟(𝑏𝑘 , diff𝑘 (𝑎, 𝑏), 𝑟𝑑(𝑎, diff𝑘 (𝑎, 𝑏)));                diff𝑘+1 (𝑎, 𝑏) := diff(𝑎, 𝑏𝑘+1 )

Lemma 2.1. A formula like

                              diff1 (𝑎, 𝑏) = 𝑘1 ∧ · · · · · · ∧ diff𝑙 (𝑎, 𝑏) = 𝑘𝑙                                 (8)

is equivalent modulo 𝒜ℛ𝒟 to the conjunction of the following five formulæ:

                                        𝑘1 ≥ 𝑘2 ∧ · · · ∧ 𝑘𝑙−1 ≥ 𝑘𝑙 ∧ 𝑘𝑙 ≥ 0                                      (9)
                                                                                                                 (10)
                                  ⋀︀
                                       𝑗<𝑙 (𝑘𝑗 > 𝑘𝑗+1 → 𝑟𝑑(𝑎, 𝑘𝑗 ) ̸= 𝑟𝑑(𝑏, 𝑘𝑗 ))
                                                                                                                 (11)
                                            ⋀︀
                                                 𝑗<𝑙 (𝑘𝑗 = 𝑘𝑗+1 → 𝑘𝑗 = 0)
                                                                                                                 (12)
                                   ⋀︀
                                        𝑗≤𝑙 (𝑟𝑑(𝑎, 𝑘𝑗 ) = 𝑟𝑑(𝑏, 𝑘𝑗 ) → 𝑘𝑗 = 0)
                   ∀ℎ (ℎ > 𝑘𝑙 → 𝑟𝑑(𝑎, ℎ) = 𝑟𝑑(𝑏, ℎ) ∨ ℎ = 𝑘1 ∨ · · · ∨ ℎ = 𝑘𝑙−1 )                                (13)

Separated Pairs, 𝑀 -Instantiation and Satisfiability. The key step of the interpolation
algorithm from [7] depends upon the problem of checking satisfiability (modulo 𝒜ℛ𝒟(𝑇𝐼 )) of
quantifier-free formulæ; this is solved by adapting instantiation techniques, like those from [16].
    4
    Notice that it might well be the case that diff(𝑥, 𝑦) = 0 for different 𝑥, 𝑦, but in that case 0 is the only index
where 𝑥, 𝑦 differ.



                                                           43
   The complexity 𝑐(𝑡) of a term 𝑡 is defined as the number of function symbols occurring in 𝑡.
If ℐ is a set of 𝑇𝐼 -terms, an ℐ-instance of a universal formula of the kind ∀𝑖 𝜑 is a formula of the
kind 𝜑(𝑡/𝑖) for some 𝑡 ∈ ℐ.
   A pair of sets of quantifier-free formulæ Φ = (Φ1 , Φ2 ) is a separated pair iff
   (1) Φ1 contains equalities of the form diff𝑘 (𝑎, 𝑏) = 𝑖 and 𝑎 = 𝑤𝑟(𝑏, 𝑖, 𝑒); moreover if
        it contains the equality diff𝑘 (𝑎, 𝑏) = 𝑖, it must also contain an equality of the form
        diff𝑙 (𝑎, 𝑏) = 𝑗 for every 𝑙 < 𝑘;
   (2) Φ2 contains Boolean combinations of 𝑇𝐼 -atoms and of atoms of the forms: {𝑟𝑑(𝑎, 𝑖) =
        𝑟𝑑(𝑏, 𝑗), 𝑟𝑑(𝑎, 𝑖) = 𝑒, 𝑒1 = 𝑒2 }, where 𝑎, 𝑏, 𝑖, 𝑗, 𝑒, 𝑒1 , 𝑒2 are variables or constants of
        the appropriate sorts.
The separated pair is said to be finite iff Φ1 and Φ2 are both finite. In practice, in a separated
pair Φ = (Φ1 , Φ2 ), reading 𝑟𝑑(𝑎, 𝑖) as a functional application, it turns out that the formulæ
from Φ2 can be translated into quantifier-free formulæ of the combined theory 𝑇𝐼 ∪ ℰ𝒰ℱ (the
array variables occurring in Φ2 are converted into free unary function symbols). 𝑇𝐼 ∪ ℰ𝒰ℱ
enjoys the decidability of the quantifier-free fragment and has quantifier-free interpolation
because 𝑇𝐼 is an index theory [7]: we adopt a hierarchical approach (similar to [8, 9]) and we
rely on satisfiability and interpolation algorithms for such a theory as black boxes.
   Let ℐ be a set of 𝑇𝐼 -terms and let Φ = (Φ1 , Φ2 ) be a separated pair; we let Φ(ℐ) =
(Φ1 (ℐ), Φ2 (ℐ)) be the smallest separated pair satisfying the following conditions:
     - Φ1 (ℐ) is equal to Φ1 and Φ2 (ℐ) contains Φ2 ;
     - Φ2 (ℐ) contains all ℐ-instances of the two formulæ ∀𝑖 𝑟𝑑(𝜀, 𝑖) = ⊥, ∀𝑖 (𝑖 < 0 →
        𝑟𝑑(𝑎, 𝑖) = ⊥), where 𝑎 is any array variable occurring in Φ1 or Φ2 ;
     - if Φ1 contains 𝑎 = 𝑤𝑟(𝑏, 𝑖, 𝑒), then Φ2 (ℐ) contains all the ℐ-instances of the equivalent
        formula (𝑖 ≥ 0 → 𝑟𝑑(𝑎, 𝑖) = 𝑒) ∧⋀︀∀ℎ (ℎ ̸= 𝑖 → 𝑟𝑑(𝑎, ℎ) = 𝑟𝑑(𝑏, ℎ));
     - if Φ1 contains the conjunction 𝑙𝑖=1 diff𝑖 (𝑎, 𝑏) = 𝑘𝑖 , then Φ2 (ℐ) contains the for-
        mulæ (9), (10), (11), (12) as well as all ℐ-instances of the formula (13).
For 𝑀 ∈ N ∪ {∞}, the 𝑀 -instantiation of Φ = (Φ1 , Φ2 ) is the separated pair Φ(ℐΦ𝑀 ) =
(Φ1 (ℐΦ𝑀 ), Φ2 (ℐΦ𝑀 )), where ℐΦ𝑀 is the set of 𝑇𝐼 -terms of complexity at most 𝑀 built up from
the index variables occurring in Φ1 , Φ2 . The full instantiation of Φ = (Φ1 , Φ2 ) is the separated
pair Φ(ℐΦ∞ ) = (Φ1 (ℐΦ∞ ), Φ2 (ℐΦ∞ )) (which is usually not finite). A separated pair Φ ⋀︀= (Φ1⋀︀, Φ2 )
is 𝑀 -instantiated iff Φ = Φ(ℐΦ𝑀 ); it is 𝒜ℛ𝒟(𝑇𝐼 )-satisfiable iff so it is the formula Φ1 ∧ Φ2 .
The 𝑆𝑀 𝑇 (𝒜ℛ𝒟(𝑇𝐼 )) problem is decidable for every index theory 𝑇𝐼 (and can be reduced to
𝑇𝐼 ∪ ℰ𝒰ℱ-satisfiability) thanks to the following results:

Lemma 2.2. Given a quantifier-free formula 𝜑, we can compute finitely many finite separation
pairs Φ1 = (Φ11 , Φ12 ), . . . , Φ𝑛 = (Φ𝑛1 , Φ𝑛2 ) such that 𝜑 is 𝒜ℛ𝒟(𝑇𝐼 )-satisfiable iff so is one of the
Φ𝑖 .

Lemma 2.3. The following conditions     are equivalent for a finite separation pair Φ = (Φ1 , Φ2 ):
(i) Φ is 𝒜ℛ𝒟(𝑇𝐼 )-satisfiable; (ii) Φ2 (ℐΦ0 ) is 𝑇𝐼 ∪ ℰ𝒰ℱ-satisfiable.
                                   ⋀︀


2.1. The interpolation algorithm
The interpolation algorithm from [7] is based on instantiation à-la-Herbrand. Our problem is to
compute a quantifier-free interpolant in 𝒜ℛ𝒟(𝑇𝐼 ) of two quantifier-free formulæ 𝐴 and 𝐵.



                                                    44
   We call the variables occurring in both 𝐴 and 𝐵 common variables, whereas the variables
occurring in 𝐴 (resp. in 𝐵) are called 𝐴-variables (resp. 𝐵-variables). The same terminology
applies to terms, atoms and formulæ. As shown    ⋀︀in [7], ⋀︀
                                                           both 𝐴 and 𝐵 can be given
                                                                                 ⋀︀ in the
                                                                                         ⋀︀ form of
finite separated pairs. Thus, 𝐴 is of the form 𝐴1 ∧ 𝐴2 , 𝐵 is of the form 𝐵1 ∧ 𝐵2 , for
separated pairs (𝐴1 , 𝐴2 ) and (𝐵1 , 𝐵2 ). Also, 𝐴 and 𝐵 can be assumed to be both 0-instantiated.
   The formulæ from 𝐴2 and 𝐵2 are formulæ from the signature of 𝑇𝐼 ∪ ℰ𝒰 ℱ (after rewriting
terms of the kind 𝑟𝑑(𝑎, 𝑖) to 𝑓𝑎 (𝑖), where the 𝑓𝑎 are free function symbols). Of course, if
𝐴2 ∧ 𝐵2 is 𝑇𝐼 ∪ ℰ𝒰ℱ-inconsistent, we can get our quantifier-free interpolant by using our black
box algorithm for interpolation in the weaker theory 𝑇𝐼 ∪ ℰ𝒰ℱ.
   Formally, we apply the Loop below until 𝐴2 ∧ 𝐵2 becomes 𝑇𝐼 ∪ ℰ𝒰 ℱ-inconsistent: Theorem 4
in [7] guarantees that, if 𝐴 ∧ 𝐵 is 𝒜ℛ𝒟(𝑇𝐼 )-inconsistent, then the Loop always terminates.
When 𝐴2 ∧ 𝐵2 becomes inconsistent, we can get our interpolant via the interpolation algorithm
for 𝑇𝐼 ∪ ℰ𝒰ℱ. We need a counter 𝑀 recording how many times the Loop has been executed.
 Loop (to be repeated until 𝐴2 ∧ 𝐵2 becomes inconsistent modulo 𝑇𝐼 ∪ ℰ𝒰ℱ, initially 𝑀 = 0).
Pick two distinct common ARRAY-variables 𝑐1 , 𝑐2 and 𝑛 ≥ 1 and such that no conjunct of the kind
diff𝑛 (𝑐1 , 𝑐2 ) = 𝑘 occurs in both 𝐴1 and 𝐵1 for some 𝑛 ≥ 1 (but such that for every 𝑙 < 𝑛 there
is a conjunct of the form diff𝑙 (𝑎, 𝑏) = 𝑘 occurring in both 𝐴1 and 𝐵1 ). Pick also a fresh INDEX
constant 𝑘𝑛 ; conjoin diff𝑛 (𝑐1 , 𝑐2 ) = 𝑘𝑛 to both 𝐴1 and 𝐵1 ; then 𝑀 -instantiate both 𝐴 and 𝐵.
Increase 𝑀 to 𝑀 + 1.
   Notice that the fresh index constants 𝑘𝑛 introduced during the Loop are considered common
constants and so they are considered in the 𝑀 -instantiation of both 𝐴 and 𝐵. In the output,
these constants 𝑘𝑛 need to be eliminated.


3. Implementation
Our implementation is written in C++ using the Z3 C++ API [15]. It takes as input:
    - A SMTLIB2 file using standard elements of the language.
    - An index theory 𝑇𝐼 [7]. Currently, we support quantifier free fragments of total order
      theory (QF_TO), integer difference (QF_IDL), and linear integer arithmetic (QF_LIA).
    - A positive integer indicating a maximum number of allowed iterations of the main loop. 5
    - An integer number denoting the SMT solver to use to compute the interpolant in the
      case the input is unsatisfiable in the reduced theory 𝑇𝐼 ∪ ℰ𝒰 ℱ. Currently, we support
      the SMT solvers Z3, Mathsat, and SMTInterpol.
Our extended language is parameterized by the array sorts in the input formula as well as by an
index theory. The domain sort of every array is currently implemented using the Int sort 6 .
(declare-sort A)
(declare-fun diff’A’ ((Array Int A) (Array Int A)) Int)
(declare-fun length’A’ ((Array Int A)) Int)
(declare-fun empty_array’A’ () (Array Int A)
(declare-fun undefined’A’ () A)
    5
      Index theories stronger than QF_TO require such upper bound for termination purposes in reasonable time.
    6
      The requirement of including additional information on the name of sorted-functions in our extended language
is necessary since Mathsat does not implement ad hoc polymorphism.



                                                       45
   The rest of the SMTLIB2 file should specify two assertions. The first one encodes the A-part
and the second one the B-part of the interpolation problem. In the following we describe the
algorithm in [7] by phases as it was implemented in our tool.
Preprocessing. This step takes two main parts. First, our tool scans the declaration section in
the input file to internalize the signature involving the sorts of the array elements present in the
input file. Additionally, we instantiate new declarations of uninterpreted sorts ArraySortA
which abstract the array sorts present in the input file, as well as two new functions:

(declare-fun wr ((ArraySortA Int A)) ArraySortA)
(declare-fun rd (ArraySortA Int) A)

   with a proper uninterpreted sort. The reason for this is to internalize an uninterpreted
language in the algorithm in order to perform theory reduction from 𝒜ℛ𝒟(𝑇𝐼 ) to 𝑇𝐼 ∪ ℰ𝒰ℱ.
   The second part involves processing the input formula. The algorithm in [7] handles
a conjunctive fragment situation. Thus, the algorithm uses Z3 tactics to perform a DNF
transformation and produce disjuncts of conjunctions. Every disjunct is processed by the
AXDInterpolator C++ class in our implementation which can handle conjunctive formulæ.
Each conjunctive formula aforementioned is normalized (we replace length’A’(x) subfor-
mulæ for diff’A’(x, empty-array’A’)) and flattened by introducing constants of the
right sort. During the execution of the aforementioned methods, we keep track of the 𝐼𝑛𝑡
constants as well as array constants appearing in the input formula. Index constants are used to
instantiate quantified formulæ of the respective 𝒜ℛ𝒟(𝑇𝐼 ) theory in use while array constants
are necessary to compute common pairs of arrays between the A-part and the B-part of
the interpolation problem in the main loop of the algorithm.
Standard Pairs. Separated pairs from Section 2 are called in our implementation Standard Pairs,
and their requirements are encoded as a C++-class called StandardInput. There, atoms
{𝑟𝑑(𝑎, 𝑏) = 𝑟𝑑(𝑏, 𝑗), 𝑟𝑑(𝑎, 𝑖) = 𝑒, 𝑒1 = 𝑒2 } are such that 𝑎, 𝑏 are of sort ArraySort’A’, 𝑖, 𝑗
are of sort Int, 𝑒 is of uninterpreted sort A, and 𝑒1 , 𝑒2 are of appropriate sorts.
   Given a formula, this class keeps two separate z3::expr_vector denoting the
conjunction of formulæ on the first and second component of the standard pair. We use
one instantiation of the StandardInput class for both formulæ A-part and B-part.
Additionally, this class implements methods handling the instantiations of formulæ. In-
stantiation is achieved by including an undefined z3::expr which is replaced using the
z3.substitute(z3::expr_vector const &, z3::expr_vector const&)
method.
Main Loop of the Algorithm and Interpolation Generation. First, the algorithm identifies
the set of common array constants between A-part and the B-part. If this set is empty or
contains a single element, the algorithm proceeds to instantiate an z3::solver and check
the satisfiability of the second component of the Standard Pairs of the A-part and B-part.
Otherwise, it uses an iterator to select common pair of array constants which selects elements in
a fair way 7 . Similarly to the previous case, it checks the satisfiability of the second component

    7
        A selection over 𝑛 objects can be seen as a function 𝑓 : {1, . . . , 𝑛} → N. If 𝑖 ∈ {1, . . . , 𝑛}, 𝑓 (𝑖) denotes the



                                                              46
of the Standard Pairs. If the state of the solver is unsatisfiable the loop terminates saving
the formulæ of the Standard Pairs of the A-part and B-part. Otherwise, it selects the
common pair (𝑐1 , 𝑐2 ) of array constants and ‘propagates’ this pair to each of the Standard Pairs
of the A-part and B-part. The propagation consists of adding the equation of the form
diff𝑛 (𝑐1 , 𝑐2 ) = 𝑘 to the first component of the Standard Pairs where 𝑛 is the smallest integer
number such that diff𝑛 (𝑐1 , 𝑐2 ) does not occur in the Stand Pair of both A-part and B-part
and 𝑘 is a fresh Int constant. Subsequently, both Standard Pairs are 𝑁 − 𝑖𝑛𝑠𝑡𝑎𝑛𝑡𝑖𝑎𝑡𝑒𝑑.
   If the index theory in use extends QF_TO, then the implementation uses heuristics to increase
the value of an internal number keeping track of the maximum allowed complexity on the index
terms. We noticed the implementation performs well if only 0 − 𝑖𝑛𝑠𝑡𝑎𝑛𝑡𝑖𝑎𝑡𝑖𝑜𝑛𝑠 are executed.
Better heuristics are under investigation.
   If in the main loop the solver eventually detects that the second component of the Standard
Pairs is unsatisfiable, then the implementation saves the formulæ of the Standard Pairs of the
A-part and B-part respectively. Depending on the choice of the user, the implementation
executes a command on an external shell calling the SMT-solver to compute the interpolant
using the pair of unsatisfiable formulæ mentioned before. For the latter, the implementation
‘compiles’ a temporary SMTLIB2 file for the respective SMT-solver chosen.
   Reading the interpolant result from the previous process the algorithm removes added
constants in order to provide a solution using elements from the input formula. Since no
SMT-solver supports interpolation algorithms for the QF_TO we select the QF_IDL as the
theory for the interpolation engine and use a custom rewriter implemented on Z3 in order
to relax a QF_IDL formula as a QF_TO statement. The latter is important for us in order to
remain closed in the respective language.
Correctness of the Output. If the interpolant is successfully computed, our implementation
allows the user to automatically verify the correctness of the output, that is the output is a real
interpolant in 𝒜ℛ𝒟(𝑇𝐼 ): this is done by exploiting the first-order axioms of 𝒜ℛ𝒟(𝑇𝐼 ) and by
calling a z3::solver instance using them. For the latter the implementation ‘lifts’ the result
by replacing applications of diff𝑘 to their recursive definition. This is achieved using a data
structure that memorizes these intermediate formulæ.


4. Evaluation
4.1. Some interesting examples
We present in detail an example of two formulæ 𝐴, 𝐵 such that our implementation can compute
a quantifier-free interpolant for them (in 0.066 sec), whereas the other solvers cannot, since:
(i) modulo the standard theory of arrays, they return that 𝐴 ∧ 𝐵 is sat; (ii) when enriched
with the quantified axioms of 𝒜ℛ𝒟(𝑇𝐼 ), iZ3 cannot return an answer within 15 minutes,
SMTInterpol correctly detects that 𝐴 ∧ 𝐵 is unsat but cannot return an interpolant and
Mathsat returns error since it does not support quantifiers. The example is taken from [7]
(Example 1 ):

number of times the object 𝑖 has been selected. A selection over 𝑛 objects is fair if for all 𝑖, 𝑗 in {1, . . . , 𝑛} we have
that |𝑓 (𝑖) − 𝑓 (𝑗)| ≤ 1.



                                                            47
    - A-part: diff(𝑎, 𝑐1 ) = 𝑖1 ∧ diff(𝑏, 𝑐2 ) = 𝑖1 ∧ 𝑎 = 𝑤𝑟(𝑎1 , 𝑖3 , 𝑒3 ) ∧ 𝑎1 = 𝑤𝑟(𝑏, 𝑖1 , 𝑒1 )

    - B-part: 𝑖1 < 𝑖2 ∧ 𝑖2 < 𝑖3 ∧ 𝑟𝑑(𝑐1 , 𝑖2 ) ̸= 𝑟𝑑(𝑐2 , 𝑖2 )

   Our implementation returns the following as an interpolant (and verifies its correctness):

    - Interpolant: let 𝑎!1 = ¬(𝑟𝑑(𝑐2 , diff(𝑐2 , 𝑐1 )) = 𝑟𝑑(𝑐1 , diff(𝑐2 , 𝑐1 ))) be in let 𝑎!2 =
      ¬(diff(𝑐2 , 𝑐1 ) ≤ 𝑖1 ) ∧ 𝑎!1 ∧ ¬(𝑖3 ≤ diff(𝑐2 , 𝑐1 )) be in diff2 (𝑐2 , 𝑐1 ) ≤ 𝑖1 ∧ ¬𝑎!2

   If the previous file is given the iZ3, SMTInterpol and Mathsat, then the SMT-solvers
output sat: clearly, the axioms for maxdiff are missing and 𝐴 ∧ 𝐵 is satisfiable in the standard
theory of arrays. Only AXDInterpolator is able to compute a quantifier-free interpolant.
   Consider this example from [5]: 𝐴 := 𝑥 = 𝑤𝑟(𝑦, 𝑖, 𝑒) and 𝐵 := 𝑟𝑑(𝑥, 𝑗) ̸= 𝑟𝑑(𝑦, 𝑗) ∧
𝑟𝑑(𝑥, 𝑘) ̸= 𝑟𝑑(𝑦, 𝑘) ∧ 𝑗 ̸= 𝑘. When adding the axioms for maxdiff, Mathsat and iZ3
are not able to find an interpolant, whereas both SMTInterpol and AXDInterpolator get
a quantifier-free interpolant: AXDInterpolator is a bit faster than SMTInterpol (0.062 vs
0.176 sec), and the output of the latter uses the plain diff with the undetermined semantics. The
Strcpy function from [7] gives the same results, and AXDInterpolator is still a bit faster than
SMTInterpol (0.057 vs 0.205 sec).

4.2. Benchmarks using SV-COMP and UAutomizer
We also tested our implementation on two benchmarks. We performed our experiments on a
machine with Linux 5.11.16, Intel i7-9700 4.7 GHz with 32 GB of memory. We used the model
checker UAutomizer [14] to extract their SMT Scripts which are SMTLIB2 files containing the
queries produced in order to verify some particular property of C programs. In our case, we
chose C-programs from the ReachSafety-Arrays and MemSafety-Arrays tracks of the SV-COMP
[17]. We let the machine produce SMT Scripts for 15 minutes. We used these SMT Scripts files
to compare the number of interpolants computed from unsatisfiable formulæ. For the latter
we assigned each process up to 360 seconds and 6 GB of memory 8 . All the examples do not
include the expressive features of 𝒜ℛ𝒟(𝑇𝐼 ), since they have been automatically produced by
UAutomizer: this allows us to run these examples on the other solvers iZ3, Mathsat and
SMTInterpol. We defined two outcomes: Success, and Timeout. Success is declared when
an interpolant is computed, and Timeout when the process takes more time or space than the
previously specified. No file failed (i.e., terminated without returning an interpolant). Our tool
was configured to run with QF_LIA and 1000 allowed loop repetitions with the heuristic of
only 0-instantiations. The solvers were configured to use the QF_AUFLIA theory. The
results are shown in Tables 1,2,3 and 4. The implementation is available at the following link:
https://github.com/typesAreSpaces/AXDInterpolator. The experimental results can be obtained
following the instructions in the documentation.

    8
      We were able to use 1008 unsatisfiable formulæ from the ReachSafety-Arrays track; 110 queries contained
array constants or function applications of select/store. From the MemSafety-Arrays track, we were able to obtain
758 unsatisfiable formulæ; 750 contained array related elements. We noticed the input formulæ do not contain
applications of any diff operator as expected since the proposed extension of 𝒜ℛ𝒟(𝑇𝐼 ) is relatively new.



                                                       48
                                                               AXD Interpolator
                 Subtracks                    iZ3                  Mathsat            SMTInterpol
                                       Success Timeout        Success Timeout      Success Timeout
                 array-examples          584      1             584        1         584       1
                 array-memsafety         118      0             118        0         118       0
                 termination-crafted     52       3             52         3         52        3

Table 1
Memsafety-track results - Our implementation

                 Subtracks                       iZ3               Mathsat            SMTInterpol
                                       Success     Timeout    Success Timeout      Success Timeout
                 array-examples          585          0         585        0         585       0
                 array-memsafety         118          0         118        0         118       0
                 termination-crafted     55           0         55         0         55        0

Table 2
Memsafety-track results - Other Solvers

                                                                AXD Interpolator
              Subtracks                          iZ3                Mathsat             SMTInterpol
                                       Success     Timeout     Success   Timeout     Success  Timeout
              array-cav19                31           0          31         0          31        0
              array-examples             50           0          50         0          50        0
              array-fpi                  774          21         774        21         774       21
              array-industry-pattern      8           0           8          0          8        0
              array-lopstr16             54           0          54         0          54        0
              array-patterns             11           0          11         0          11        0
              array-tiling                6           0           6          0          6        0
              reducercommutativity       53           0          53         0          53        0

Table 3
Reachsafety-track results - Our implementation

              Subtracks                          iZ3                Mathsat             SMTInterpol
                                       Success     Timeout     Success Timeout       Success Timeout
              array-cav19                31           0          31         0          31        0
              array-examples             50           0          50         0          50        0
              array-fpi                  795          0          795        0          795       0
              array-industry-pattern      8           0           8         0           8        0
              array-lopstr16             54           0          54         0          54        0
              array-patterns             11           0          11         0          11        0
              array-tiling                6           0           6         0           6        0
              reducercommutativity       53           0          53         0          53        0

Table 4
Reachsafety-track results - Other Solvers


5. Related Work
In [11] McMillan introduced an interpolating proof calculus to extract interpolants using
refutational proofs obtained from the Z3 SMT-solver. His approach benefits from the flexibility
of Z3 to handle a variety of theories and their combination: it relies on a secondary interpolation
solver in order to ‘fill the gaps’ of refutational proofs introduced by theory lemmas, which are
formulæ derived by the satellities theories encoded in Z3 without a detailed explanation in
the proof. The secondary interpolation solver only requires an interpolation algorithm for
QF_UFLIA. Since the theory of arrays is introduced using quantified formulæ, this approach
generates quantified formulæ.
   The authors in [18] employed the proof tree preserving interpolation scheme from [19] to



                                                         49
compute interpolants using a resolution proof. This approach is capable of handling mixed
literals but not mixed terms9 , so the authors introduced the idea of weakly equivalences between
arrays to handle those cases for their interpolation calculi. The authors include the diff
operation between arrays in order to compute quantifier-free interpolants, but no additionally
semantical properties are given to the diff operation.
   The authors in [10] proposed a general framework for computing interpolants via reduction
using finite instantiations. In particular, the paper discusses complete interpolation procedures
for theories of arrays and linked lists, as well as other heap-allocated data structures.


6. Conclusions
In this paper we presented AXDInterpolator, the implementation of the interpolation algorithm
from [7]. We showed the feasibility of AXDInterpolator by validating it on two benchmarks
from the SV-COMP. We also compared our implementation with state-of-the-art solvers: apart
from very few timeout outcomes, our tool handled all the examples the other solvers did. We also
managed to handle interesting examples that are not handled by the other solvers, which makes
the option of our tool appealing. Specifically, expressive examples that make use of features like
| · | cannot in general be managed by any existing solver: this suggests that AXDInterpolator
could be applied for solving more sophisticated examples coming from concrete C programs
using arrays. This analysis is a significant point to investigate in future work.
    We noticed that our implementation, in case of the SV-COMP benchmarks, is behind in time
with respect to the state-of-the-art solvers. We think the main reasons might be the following:
      - The UAutomizer model checker does not include our extended signature in their SMT
        Scripts. We might take into consideration that many industrial problems will not explicitly
        use this language, however the overall structure of many array programs commonly use
        | · | for various algorithms. Thus, we need to integrate this structure in the input formula.
      - Our implementation timeouts slightly more often than other solvers. This point can be
        further improved by doing better engineering work at the implementation level. The DNF
        conversion (in worst case, exponential) appears to be the first target to rework.
      - The current design does not perform incremental satisfiability checks. As future work
        we plan to perform incremental checks since this can improve the performance of the
        used z3::solver data structure. Incremental checks can be implemented due to the
        incremental nature of the proposed interpolation algorithm by including a hash consed
        data structure on the terms/predicates produced in the main loop of the algorithm and
        because the data structure z3::solver can keep track of previously proven assertions.


References
 [1] K. L. McMillan, Interpolation and SAT-based model checking, in: Proc. of CAV, volume
     2725 of LNCS, Springer, 2003, pp. 1–13. doi:10.1007/978-3-540-45069-6\_1.

    9
     Given an interpolation pair (A, B), a symbol 𝑠 is A-local if 𝑠 appears in the symbols of 𝐴 but not in the symbols
of 𝐵. A literal/term is mixed if it contains A-local and B-local symbols



                                                         50
 [2] K. L. McMillan, Lazy abstraction with interpolants, in: Proc. of CAV, volume 4144 of LNCS,
      Springer, 2006, pp. 123–136. doi:10.1007/11817963\_14.
 [3] W. Craig, Three uses of the Herbrand-Gentzen theorem in relating model theory and
      proof theory, J. Symbolic Logic 22 (1957) 269–285.
 [4] J. Mccarthy, Towards a mathematical science of computation, in: In IFIP Congress,
      North-Holland, 1962, pp. 21–28.
 [5] D. Kapur, R. Majumdar, C. G. Zarba, Interpolation for Data Structures, in: Proc. of
      SIGSOFT-FSE, ACM, 2006, pp. 105–116.
 [6] R. Bruttomesso, S. Ghilardi, S. Ranise, Quantifier-free interpolation of a theory of arrays,
      Log. Methods Comput. Sci. 8 (2012).
 [7] S. Ghilardi, A. Gianola, D. Kapur, Interpolation and amalgamation for arrays with maxdiff,
      in: S. Kiefer, C. Tasson (Eds.), Foundations of Software Science and Computation Structures
     - 24th International Conference, FOSSACS 2021, Held as Part of the European Joint Confer-
      ences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg,
      March 27 - April 1, 2021, Proceedings, volume 12650 of Lecture Notes in Computer Science,
      Springer, 2021, pp. 268–288.
 [8] V. Sofronie-Stokkermans, Interpolation in local theory extensions, Log. Methods Comput.
      Sci. 4 (2008).
 [9] V. Sofronie-Stokkermans, On interpolation and symbol elimination in theory extensions,
      Log. Methods Comput. Sci. 14 (2018).
[10] N. Totla, T. Wies, Complete instantiation-based interpolation, J. Autom. Reasoning 57
     (2016) 37–65.
[11] K. L. McMillan, Interpolants from Z3 proofs, in: Proc. of FMCAD, 2011, pp. 19–27.
[12] M. Bozzano, R. Bruttomesso, A. Cimatti, T. Junttila, P. van Rossum, S. Schulz, R. Sebastiani,
      MathSAT: Tight integration of SAT and mathematical decision procedures, Journal of
     Automated Reasoning (2005).
[13] J. Christ, J. Hoenicke, A. Nutz, SMTInterpol: An interpolating SMT solver, in: A. F.
      Donaldson, D. Parker (Eds.), Model Checking Software - 19th International Workshop,
      SPIN 2012, Oxford, UK, July 23-24, 2012. Proceedings, volume 7385 of Lecture Notes in
      Computer Science, Springer, 2012, pp. 248–254.
[14] M. Heizmann, J. Christ, D. Dietsch, E. Ermis, J. Hoenicke, M. Lindenmann, A. Nutz,
      C. Schilling, A. Podelski, Ultimate Automizer with SMTInterpol, in: N. Piterman, S. A.
      Smolka (Eds.), Tools and Algorithms for the Construction and Analysis of Systems, Springer
      Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 641–643.
[15] L. De Moura, N. Bjørner, Z3: An Efficient SMT Solver, in: Proceedings of the Theory
      and Practice of Software, 14th International Conference on Tools and Algorithms for
      the Construction and Analysis of Systems, TACAS’08/ETAPS’08, Springer-Verlag, Berlin,
      Heidelberg, 2008, pp. 337–340.
[16] A. R. Bradley, Z. Manna, H. B. Sipma, What’s decidable about arrays?, in: Proc. of VMCAI,
     volume 3855 of LNCS, Springer, 2006, pp. 427–442. doi:10.1007/11609773\_28.
[17] J. F. Groote, K. G. Larsen, D. Beyer, Software verification: 10th comparative evaluation
     (sv-comp 2021), Tools and Algorithms for the Construction and Analysis of Systems27th
      International Conference, TACAS 2021, Held as Part of the European Joint Conferences
      on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March



                                               51
     27 – April 1, 2021, Proceedings, Part II 12652 (2021) 401—422. URL: https://europepmc.org/
     articles/PMC7984550.
[18] J. Hoenicke, T. Schindler, Efficient interpolation for the theory of arrays, in: Proc.
     of IJCAR, volume 10900 of LNCS (LNAI), Springer, 2018, pp. 549–565. doi:10.1007/
     978-3-319-94205-6\_36.
[19] J. Christ, J. Hoenicke, A. Nutz, Proof tree preserving interpolation, in: N. Piterman,
     S. A. Smolka (Eds.), Tools and Algorithms for the Construction and Analysis of Systems,
     Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 124–138.




                                              52