Efficient Interpolation for the Theory of Arrays
                                            (work in progress)

                            Jochen Hoenicke and Tanja Schindler∗
                                 Department of Computer Science,
                                       University of Freiburg
                         {hoenicke,schindle}@informatik.uni-freiburg.de

                                                 Abstract
       Existing techniques for Craig interpolation for the quantifier-free fragment of the theory of
       arrays require a special solver. The solver needs to know in advance the partitioning (A, B)
       of the interpolation problem and needs to avoid creating AB-mixed terms to be suitable
       for interpolation. This limits the efficiency of these solvers especially when computing
       sequence and tree interpolants. We present a new approach using Proof Tree Preserving
       Interpolation and an array solver based on Weak Equivalence on Arrays. We give an
       interpolation algorithm for the lemmas produced by the array solver.


1      Introduction
Several model-checkers [1, 2, 5, 11, 12, 13, 14, 15, 17, 18] use interpolants to find candidate in-
variants to prove the correctness of software. They require efficient tools to check satisfiability
of a formula in a decidable theory and to compute interpolants (usually sequence or tree inter-
polants) for unsatisfiable formulas. Moreover, they often need to combine several theories, e.g.,
integer or bitvector theory for reasoning about numeric variables and array theory for reason-
ing about pointers. In this paper we present an interpolation procedure for the quantifier-free
fragment of the theory of arrays that allows for the combination with other theories and that
can reuse an existing unsatisfiability proof to compute interpolants efficiently.
    Our method is based on the array solver presented in [6], which fits well into existing Nelson-
Oppen frameworks. The solver generates lemmas (valid in the theory of arrays) that explain
equalities between variables shared between different theories. The variables do not necessarily
belong to the same formula in the interpolation problem and the solver does not need to know
the partitioning. Instead, we use the technique of Proof Tree Preserving Interpolation [9],
which can produce interpolants from existing proofs propagating equalities between symbols
from different partitions.
    The contribution of this paper is an algorithm to interpolate the lemmas produced by the
solver of the theory of arrays. This solver only produces two types of lemmas, namely a variant
of the read-over-write axiom and a variant of the extensionality axiom. However, the lemmas
can contain array store chains of arbitrary length which need to be handled by the interpolation
procedure. Bruttomesso et al. [4] showed that adding a diff function is sufficient to get a theory
of arrays that is closed under interpolation and in principle their algorithm can be used to
interpolate the lemmas. We give a more efficient algorithm that exploits the special shape of
these axioms.
Related Work. Brillout et al. [3] give an interpolation procedure for the combination of
Presburger Arithmetic and the quantifier-free theory of arrays. However, their procedure pro-
duces quantified interpolants. For certain cases, their algorithm removes quantifiers by simple
syntactic transformations, but in general this is not possible.
    ∗ This work is supported by the German Research Council (DFG) under HO 5606/1-1.
Efficient Interpolation for the Theory of Arrays                              Hoenicke and Schindler


    Equality interpolating theories [21, 4] define a general framework for interpolation in the
combination of quantifier-free theories. This framework handles theory lemmas that propagate
equalities between variables shared between theories, enabling Nelson-Oppen theory combina-
tion. The equalities may relate variables that come from different formulas in the interpolation
problem. A theory is equality interpolating if it can find an interpolating term for these equali-
ties that is expressed using only the symbols occuring in both parts of the interpolation problem.
    The algorithm of Yorsh and Musuvathi [21] only supports convex theories and is not ap-
plicable to the theory of arrays. Bruttomesso et al. [4] extended the framework to non-convex
theories. They also present a complete interpolation procedure for the quantifier-free theory of
arrays that works for theory combination. However, their solver depends on the partioning of
the interpolation problem and this can lead to exponential blow-up of the solving procedure.
Our interpolation procedure works on a proof produced by a more efficient array solver that is
independent of the partioning of the interpolation problem.
    Totla and Wies [20] present an interpolation method for arrays based on complete instan-
tiations. It combines the idea of [4] with local theory extension [19]. Given an interpolation
problem A and B, they define two sets K[W (A, B)] and K[W (B, A)], each using only symbols
from A resp. B, that contain the instantiations of the array axioms needed to prove unsatisfi-
abilty. Then an existing solver and interpolation procedure for uninterpreted functions can be
used to compute the interpolant. The procedure produces a quadratic blow-up on the input
formulas. We also found that their procedure fails for some extensionality lemmas, when we
used it to create candidate interpolants.
    The last two techniques require to know the partitions at solving time. Thus, when com-
puting sequence interpolants or tree interpolants, they would require either an adapted inter-
polation procedure or the solver has to run multiple times. In contrast, our method can be
easily extended to tree interpolation [7].


2     Notation
We assume standard first order logic. A theory T is given by a signature Σ and a set of axioms.
The theory of arrays TA is parameterized by an index theory and an element theory. The
signature ΣA of TA contains the select (or read ) function ·[·] and the store (or write) function
·h·  ·i. In the following, a, b, s, t will denote array terms, i, j, k index terms and v, w element
terms. For an array a, index i and element v, a[i] returns the element stored in a at i, and
ahi  vi returns a copy of a where the element at index i is replaced by the element v, leaving
a unchanged. The functions are defined by the following axioms proposed by McCarthy [16].

                                      ∀a i v. ahi  vi[i] = v                                 (idx)
                         ∀a i j v. i 6= j → ahi  vi[j] = a[j]                    (read-over-write)

We consider the variant of the extensional theory of arrays proposed by Bruttomesso et al. [4]
where the signature is extended by the function diff(·, ·) which for distinct arrays a and b
returns an index where a and b differ, and an arbitrary index else. The extensionality axiom
then becomes
                          ∀a b. a[diff(a, b)] = b[diff(a, b)] → a = b .             (ext-diff)
The authors of [4] have shown that the quantifier-free fragment of the theory of arrays with
diff, TAxDiff , is closed under interpolation. To express the interpolants conveniently we use the
                                                                      k
notation from [20] for rewriting arrays. For k ≥ 0 we define a           b for two arrays a and b

2
Efficient Interpolation for the Theory of Arrays                                             Hoenicke and Schindler


inductively as
                          0                  k+1                                        k
                      a       b := a     a         b := ahdiff(a, b)  b[diff(a, b)]i       b .
           k
Thus, a       b changes the values in a at k indices to the values stored in b. The equation
    k
a     b = b holds if and only if a and b differ at at most k indices. The indices where they differ
                                    k
are the diff terms occuring in a      b.
     An interpolation problem (A, B) is a pair of formulas where A ∧ B is unsatisfiable. A Craig
interpolant for (A, B) is a formula I such that (i) A implies I, (ii) I and B are unsatisfiable and
(iii) I contains only symbols shared between A and B. Given an interpolation problem (A, B),
the symbols shared between A and B are called shared, symbols only occuring in A are called
A-local and symbols only occuring in B, B-local. A literal, e.g. a = b, that contains A-local
and B-local symbols is called mixed.


3     Preliminaries
3.1     Proof Tree Preserving Interpolation
In this section we give a short overview of the proof tree preserving interpolation framework
presented by Christ et al. [9]. This method defines two projections ·  A and ·  B that project a
literal to its A-part resp. B-part. For a literal ` occurring only in the formula A, the projections
are `  A ≡ ` and `  B ≡ >, and similar for a literal occurring only in B. This projection can be
naturally extended to conjunctions of literals. Then a partial interpolant of a clause C occurring
in the proof tree is defined as the interpolant of A ∧ (¬C)  A and B ∧ (¬C)  B. The paper
shows that these partial interpolants can be computed inductively over the proof tree and the
partial interpolant of the root is the interpolant of A and B. For a theory lemma C, a partial
interpolant is computed from the interpolation problem (¬C  A, ¬C  B).
    The core idea of proof tree preserving interpolation is a scheme to handle mixed equalities
a = b where a is A-local and b is B-local. For these a fresh variable xab is introduced and the
projections are defined as follows.

                          (a = b)  A :≡ (a = xab )           (a = b)  B :≡ (xab = b)

Thus, a = b is equivalent to ∃xab .(a = b)  A ∧ (a = b)  B and xab is a new shared variable
that may occur in interpolants. For disequalities we follow [10] and use an auxiliary variable
xab and a Boolean auxiliary variable pxab . We define EQ(x, s) :≡ px xor x = s and define the
projections for a 6= b as

                      (a 6= b)  A :≡ EQ(xab , a)           (a 6= b)  B :≡ ¬ EQ(xab , b) .

For an interpolation problem (A∧(¬C)  A, B ∧(¬C)  B) where ¬C contains a 6= b we require as
additional symbol condition that the interpolant has the form I[EQ(xab , s1 )] . . . [EQ(xab , sn )]1 ,
where s1 , . . . , sn are shared terms and each EQ term occurs positively in I. For a resolution
step on the pivot literal a = b the following interpolation rule combines the partial interpolants
of the input clauses to a partial interpolant of the resolvent.

                C1 ∨ a = b : I1 [EQ(xab , s1 )] . . . [EQ(xab , sn )]      C2 ∨ a 6= b : I2 (xab )
                                    C1 ∨ C2 : I1 [I2 (s1 )] . . . [I2 (sn )]
    1 One can show that such an interpolant exists for every equality interpolating theory in the sense of Defini-

tion 4.1 in [4]. The terms s1 , . . . , sn are the terms v in that definition.


                                                                                                                 3
Efficient Interpolation for the Theory of Arrays                              Hoenicke and Schindler


3.2     Weakly Equivalent Arrays
In this section, we revisit the definitions and results about weakly equivalent arrays that are
used in the decision procedure for the theory of arrays presented by Christ and Hoenicke [6].
    For a formula F , let V be the set of terms that contains the array terms in F and in addition
the select terms a[i] and their indices i and for every store term ahi  vi in F the terms i, v,
a[i] and ahi  vi[i]. Let ∼ be the equivalence relation on V representing equality. The weak
equivalence graph GW is defined by its vertices, the array-valued terms in V , and its undirected
                                                         i
edges of the form (i) s1 ↔ s2 if s1 ∼ s2 and (ii) s1 ↔ s2 if s1 has the form s2 hi  ·i or vice
                                                 W
versa. If two arrays a and b are connected in G by a path P they are called weakly equivalent.
             P
We write a ⇔ b. Weakly equivalent arrays can differ only at finitely many positions given by
                                 i
Stores (P ) := {i | ∃s1 s2 . s1 ↔ s2 ∈ P }. Two arrays a and b are called weakly equivalent on
i, denoted by a ≈i b, if there exists a path P between them such that k 6∼ i holds for every
k ∈ Stores (P ). If a and b are weakly equivalent on i, they must store the same value at i. Two
arrays a and b are called weakly congruent on i, a ∼i b, if the equality a0 [j] = b0 [k] holds for
j ∼ k ∼ i and a0 ≈i a, b0 ≈i b. Also in this case they must store the same value at i.
                      P
    We use Cond(a ⇔ b), Cond(a ≈i b), Cond(a ∼i b) to denote the conjunction of the literals
v = v (resp. v 6= v 0 ), v, v 0 ∈ V , such that v ∼ v 0 (resp. v 6∼ v 0 ) is necessary to show the
       0

corresponding property. Instances of array lemmas are generated according to the following
rules:
                                   i∼j       a ≈i b    a[i], b[j] ∈ V
                                                                               (read-over-weakeq)
                                 i 6= j ∨ ¬ Cond(a ≈i b) ∨ a[i] = b[j]
                    P
                 a⇔b      ∀i ∈ Stores (P ) . a ∼i b a, b ∈ V
                             P
                                    _                                                (weakeq-ext)
                 ¬ Cond(a ⇔ b) ∨       ¬ Cond(a ∼i b) ∨ a = b
                                      i∈Stores(P )

The first rule, based on (read-over-write), propagates equalities between select terms and the
second, based on extensionality, propagates equalities on array terms. In the following, we will
describe how we can derive partial interpolants for these array lemmas.


4     Interpolants for Read-Over-Weakeq Lemmas
In this section, we show how to interpolate lemmas generated by (read-over-weakeq). A lemma
of this type (see Figure 1) explains a conflict of the form
                                    i = j ∧ Cond(a ≈i b) ∧ a[i] 6= b[j] .
The weak equivalence a ≈i b ensures that a and b are equal at i = j which contradicts a[i] 6= b[j].
   If the index equality i = j is mixed, the interpolation problem contains the shared auxiliary
variable xij for i = j. If i (resp. j) is shared, we call i (resp. j) the shared term for i = j. We
then identify four basic cases: (i) there exists a shared term for i = j and a[i] 6= b[j] is in B
or mixed, (ii) there is a shared term for i = j and a[i] 6= b[j] is A-local, (iii) both i and j are
B-local, and (iv) both i and j are A-local.

4.1     There is a Shared Term for i = j and a[i] 6= b[j] is in B or mixed
If there exists a shared term x for the index equality i = j, the interpolant I can contain terms
s[x] for shared array terms s occuring in the weak path between a and b. The basic idea is to
summarize the weak A-paths by applying rule (read-over-weakeq) on their end terms.

4
Efficient Interpolation for the Theory of Arrays                                             Hoenicke and Schindler


                       a[i]                              |                             b[j]

                                            i                        j
                                       \    —            /   /
                                k1            k2                 km−1           km
                        a                              ···                               b

Figure 1: A read-over-weakeq conflict. Solid lines represent strong (dis-)equalities, dotted lines
function-argument relations, and zigzag lines represent weak paths consisting of store steps and
array equalities.


Example 1. Consider the following read-over-weakeq conflict:
   i = j ∧ a = s1 ∧ s1 hk1  v1 i = s2 ∧ s2 hk2  v2 i = s3 ∧ s3 = b ∧ i 6= k1 ∧ i 6= k2 ∧ a[i] 6= b[j]
where a, k2 , v2 , i are A-local, b, k1 , v1 , j are B-local, and s1 , s2 , s3 are shared. Projecting the
mixed literals on A and B as described in Section 3.1 yields the interpolation problem
         A : i = xij ∧ a = s1 ∧ s2 hk2  v2 i = s3 ∧ EQ(xik1 , i) ∧ i 6= k2 ∧ EQ(xa[i]b[j] , a[i])
         B : xij = j ∧ s1 hk1  v1 i = s2 ∧ s3 = b ∧ ¬ EQ(xik1 , k1 ) ∧ ¬ EQ(xa[i]b[j] , b[j]) .
An interpolant is
                     I ≡ EQ(xa[i]b[j] , s1 [xij ]) ∧ s2 [xij ] = s3 [xij ] ∧ EQ(xik1 , xij ) .
A implies I. As a = s1 holds, we get a[xij ] = s1 [xij ]. With EQ(xa[i]b[j] , a[i]) and i = xij follows
EQ(xa[i]b[j] , s1 [xij ]). The equality s2 [xij ] = s3 [xij ] follows by applying (read-over-weakeq) on
s3 = s2 hk2  v2 i and using xij = i 6= k2 . Finally, EQ(xik1 , i) and i = xij yield EQ(xik1 , xij ).
B contradicts I. By ¬ EQ(xik1 , k1 ) and EQ(xik1 , xij ) we get xij 6= k1 . With s2 = s1 hk1 v1 i and
(read-over-weakeq) we get s1 [xij ] = s2 [xij ]. From s3 = b we get s3 [xij ] = b[xij ]. Transitivity,
EQ(xa[i]b[j] , s1 [xij ]) and s2 [xij ] = s3 [xij ] and the above select equalities yield EQ(xa[i]b[j] , b[xij ]).
With xij = j, we get a contradiction to ¬ EQ(xa[i]b[j] , b[j]).
Symbol condition for I. I contains only shared symbols and auxiliary variables. All auxiliary
variables introduced for disequalities appear in positive EQ terms.

Algorithm. Assume that the weak path P : a ≈i b is subdivided into A- and B-paths where
shared paths are added to B-paths. Let x be the shared term for i = j, i.e. x stands for i if i
is shared, for j if i is not shared but j is, and for the auxiliary variable xij if i = j is mixed.
(i) An inner A-path π of P starts and ends with a shared term: π : s1 ≈i s2 (these shared array
terms can also be auxiliary variables introduced for a mixed array equality). The summary is
s1 [x] = s2 [x]. For every B-local index disequality i 6= k on π add the disjunct x = k and for
every mixed index disequality add the disjunct EQ(xik , x). The interpolant of the subpath is
                                                            _                 _
      Iπ ≡ s1 [x] = s2 [x] ∨ FπA (x)   where FπA (x) :≡          x=k ∨              EQ(xik , x) .
                                                                 k∈Stores(π)         k∈Stores(π)
                                                                 i6=k B-local         i6=k mixed

(ii) If a[i] 6= b[j] is mixed and a[i] is A-local, the first A-path on P starts with a or a is shared,
i.e. π : a ≈i s1 (where s1 can be a). For the path π, build the term EQ(xa[i]b[j] , s1 [x]) and add
FπA (x) as in case (i).
                                   Iπ ≡ EQ(xa[i]b[j] , s1 [x]) ∨ FπA (x)

                                                                                                                 5
Efficient Interpolation for the Theory of Arrays                                         Hoenicke and Schindler


(iii) Similarly in the case where a[i] 6= b[j] is mixed and b[j] is A-local, the last A-path on P
ends with b or b is shared, π : sn ≈i b. In this case the disjunct i 6= j needs to be added if i = j
is B-local and i, j are both shared.

                               Iπ ≡ EQ(xa[i]b[j] , sn [x]) ∨ FπA (x) [ ∨ i 6= j]

(iv) For every B-path π, add the conjunct x 6= k for each A-local index disequality i 6= k, and
the conjunct EQ(xik , x) for each mixed index disequality i 6= k on π. We define
                                                 ^                 ^
            Iπ ≡ FπB (x)     where FπB (x) :≡        x 6= k ∧          EQ(xik , x) .
                                                     k∈Stores(π)         k∈Stores(π)
                                                     i6=k A-local         i6=k mixed

   The lemma interpolant is the conjunction of the above path interpolants. If i, j are shared
and i = j is A-local, add the conjunct i = j.
                                  ^             ^
                           I≡        Iπ ∧           Iπ [ ∧ i = j] .
                                    π∈A-paths         π∈B-paths


4.2     There is a Shared Term for i = j and a[i] 6= b[j] is A-local
If there exists a shared index for i = j and a[i] 6= b[j] is A-local, we build disequalities for
the B-paths instead of equalities for the A-paths. This corresponds roughly to obtaining the
interpolant of the inverse problem (B, A) by Section 4.1 and negating the resulting formula.
Only the EQ terms are not negated because of the asymmetry of the projection.
    Using the definitions of FπA and FπB from the previous section, the lemma interpolant is
                        _                                    _
            I≡                (s1 [x] 6= s2 [x] ∧ FπB (x)) ∨     FπA (x) [ ∨ i 6= j] .
                  (π:s1 ≈i s2 )∈B-paths                             π∈A-paths


4.3     Both i and j are B-local
When both i and j are B-local, we have no term x representing the weak path index i in the
interpolant I. Hence, summarizing A- (resp. B-) paths by terms of the form s1 [x] = s2 [x] ∨ FπA
(resp. s1 [x] 6= s2 [x] ∧ FπB ) as above is not possible. Instead we use the diff function to make
statements about the desired index. For instance, if a = bhi  vihj  wi for arrays a, b with
                    2                                     1
b[j] 6= w, then a      b = b and diff(a, b) = j or diff(a   b, b) = j hold.
Example 2. Consider the following conflict:

                    i = j ∧ a = s1 ∧ s1 hk  vi = s2 ∧ s2 = b ∧ i 6= k ∧ a[i] 6= b[j]

where a, b, i, j are B-local, k, v are A-local, and s1 , s2 are shared. Splitting the mixed
disequality i 6= k as described in Section 3.1 results in the interpolation problem

                       A : s1 hk  vi = s2 ∧ EQ(xik , k)
                       B : i = j ∧ a = s1 ∧ s2 = b ∧ ¬ EQ(xik , i) ∧ a[i] 6= b[j] .

An interpolant should reflect the information that s1 and s2 can differ at most at one index
satisfying the EQ term. Using diff, we can express the interpolant
                                                                         1
                         I : (s1 = s2 ∨ EQ(xik , diff(s1 , s2 ))) ∧ s1       s2 = s2 .

6
Efficient Interpolation for the Theory of Arrays                                        Hoenicke and Schindler


A implies I. The first literal in A implies that s1 and s2 differ at most at index k, yielding
   1
s1   s2 = s2 . If s1 6= s2 , then k = diff(s1 , s2 ) and EQ(xik , diff(s1 , s2 )) follows from EQ(xik , k).
B contradicts I. B implies that s1 [i] 6= s2 [i] holds. Thus, s1 = s2 cannot hold and I states
that EQ(xik , diff(s1 , s2 )) holds. With ¬ EQ(xik , i) in B, this implies diff(s1 , s2 ) 6= i. Hence,
                                                                            1
s1 and s2 must differ at least at two indices. This contradicts s1             s2 = s2 in I.
Symbol condition for I. I contains only shared symbols and the auxiliary variable for i 6= k
appears in a positive EQ term only.

   To generalize this idea, we define inductively for arrays a and b, a number m ≥ 0 and a
formula F (·) with one free parameter:

             weq(a, b, 0, F (·)) :≡      a=b
                                                                             1                         (weq)
       weq(a, b, m + 1, F (·)) :≡        (a = b ∨ F (diff(a, b))) ∧ weq(a        b, b, m, F (·)) .

The term weq(a, b, m, F (·)) states that arrays a and b differ at most at m indices and that each
index i on which they differ satisfies the formula F (i).

Algorithm. For any A-path π : s1 ≈i s2 , we count the number of stores |π| := | Stores (π) |.
Each index i where s1 and s2 differ needs to satisfy FπA (i) as defined in Section 4.1. There is
nothing to do for B-paths. The lemma interpolant is
                                     ^
                          I≡                weq(s1 , s2 , |π|, FπA (·)) .
                                      (π:s1 ≈i s2 )∈A-paths


4.4     Both i and j are A-local
In the case where both i and j are A-local, we summarize the B-paths, as all B-path ends are
shared terms. Analogously to weq, we define for arrays a, b, a number m ≥ 0 and a formula F :

           nweq(a, b, 0, F (·)) :≡      a 6= b
                                                                             1                        (nweq)
      nweq(a, b, m + 1, F (·)) :≡       (a 6= b ∧ F (diff(a, b))) ∨ nweq(a       b, b, m, F (·)) .
The term nweq(a, b, m, F (·)) expresses that either one of the first m indices i found by stepwise
rewriting a to b satisfies the formula F (i), or a and b differ at more than m indices.
   Analogously to the last section, the lemma interpolant can be expressed as
                                       _
                             I≡                nweq(s1 , s2 , |π|, FπB (·)) .
                                     (π:s1 ≈i s2 )∈B-paths


Note that this is almost the negation of the interpolant of (B, A) computed as in Section 4.3.
Only the EQ terms are not negated because of the asymmetry of the projection.


5     Interpolants for Weakeq-Ext Lemmas
In this section, we describe how to compute interpolants for array lemmas of type (weakeq-ext).
Figure 2 displays a visualization of the corresponding conflict
                                   P
                                             ^
                           Cond(a ⇔ b) ∧        Cond(a ∼i b) ∧ a 6= b .
                                                 i∈Stores(P )


                                                                                                            7
Efficient Interpolation for the Theory of Arrays                                                 Hoenicke and Schindler


                                                                 /
                                       i1               i2            im−1         im
                                a                             ...                            b
                        i1                              —                          —
                                                                      1            1
                                       k11              k21          km 1 −1
                                                                                  km 1
                                a                             ...                            b
                                                               ..
                                                                .
                        im
                                       —                                          —
                                                                      m           m
                                       k1m              k2m          km m −1
                                                                                 km m
                                a                             ...                            b

Figure 2: Visualization of a weakeq-ext lemma. Solid lines represent strong (dis-)equalities,
zigzag lines store paths and dashed zigzag lines weak paths which can contain select edges.


The main path P shows that a and b differ at most at the indices in Stores (P ) and a ∼i b
shows that a and b do not differ at index i.
    A weak path labelled with i (short: i-path) represents weak congruence on i, i.e. it can
contain select edges a0 j, k b0 where i ∼ j and i ∼ k and a0 [j] ∼ b0 [k]. We modify the methods
in Section 4 to summarize the i-paths. In the B-local case 4.3, B-local select edges make no
difference, as the weq terms are built over A-paths, and analogously for the A-local case 4.4.
However, if there are A-local select edges in the B-local case or vice versa, then k is shared or
the index equality i = k is mixed and we can use k or the auxiliary variable xik to proceed as
in the shared cases 4.1 and 4.2.
    We have to adapt the interpolation procedures in Sections 4.1 and 4.2 by adding the index
equalities that pertain to a select edge, analogously to the index disequality for a store edge
before. More specifically, we add to FπA (x) a disjunct x 6= k for each B-local i = k on an A-path,
and x 6= xik for each mixed i = k. Here, x is the shared term for the main index equality i = j
as before. For B-paths we add to FπB (x) the conjunct x = k for each A-local i = k and x = xik
for each mixed i = k. Furthermore, if there is a mixed select equality a0 [j] = b0 [k] on the weak
path, the auxiliary variable xa0 [j]b0 [k] is used in the summary for the A-path instead of s[x], i.e.,
we get a term of the form xa0 [j]b0 [k] = s2 [x] in 4.1, and analogously for 4.2.
    For (weakeq-ext) lemmas, we distinguish three cases: (i) a 6= b is in B, (ii) a 6= b is A-local,
or (iii) a 6= b is mixed.

5.1     a 6= b is in B
If the disequality a 6= b is in B, the A-paths both on the main store path and on the weak paths
have only shared path ends. Hence, we summarize A-paths similarly to Sections 4.1 and 4.3.

                                                    P
Algorithm. Divide the main path a ⇔ b into A-paths and B-paths. For each i ∈ Stores (P )
on a B-path, summarize the corresponding i-path as in Sections 4.1 or 4.3. The resulting
                                                    π
summary is denoted by Ii . For an A-path s1 ⇔ s2 use a weq term to state that each index
where s1 and s2 differ satisfies Ii (·) for some i ∈ Stores (π) where Ii is computed as in 4.1 with
the shared term · for i = j. The lemma interpolant is
                         ^                    ^                        _
                 I≡            Ii ∧                 weq(s1 , s2 , |π|,      Ii (·)) .
                                                π
                         i∈Stores(π)         (s1 ⇔s2 )∈A-paths                 i∈Stores(π)
                         π∈B-paths
8
Efficient Interpolation for the Theory of Arrays                                             Hoenicke and Schindler


5.2     a 6= b is A-local
The case where a 6= b is A-local is similar with the roles of A and B swapped. For each
i ∈ Stores (π) on an A-path π on P , interpolate the corresponding weak path as in Sections 4.2
or 4.4 and obtain Ii . For each i ∈ Stores (π) on a B-path π on P , interpolate the corresponding
weak path as in Section 4.2 using · as shared term and obtain Ii (·). The lemma interpolant is
                             _                     _                               ^
                   I≡            Ii    ∨                 nweq(s1 , s2 , |π|,           Ii (·)) .
                                              π
                        i∈Stores(π)        (s1 ⇔s2 )∈B-paths                   i∈Stores(π)
                        π∈A-paths


5.3     a 6= b is mixed
If a 6= b is mixed, where w.l.o.g. a is A-local, the outer A- and B-paths end with A-local or
B-local terms respectively. The auxiliary variable xab may not be used in store or select terms,
thus we first need to find a shared term representing a before we can summarize A-paths.

Example 3. Consider the following conflict:

                       a = shi1  v1 i ∧ b = shi2  v2 i ∧ a 6= b               (main path)
                     ∧ a[i1 ] = s1 [i1 ] ∧ b = s1 hk1  w1 i ∧ i1 6= k1         (i1 -path)
                     ∧ a = s2 hk2  w2 i ∧ i2 6= k2 ∧ b[i2 ] = s2 [i2 ]         (i2 -path)

where a, i1 , v1 , k2 , w2 are A-local, b, i2 , v2 , k1 , w1 are B-local and s, s1 , s2 are shared. An
interpolant for the conflict is

I ≡ EQ(xab , s) ∧ weq(s, s2 , 1, EQ(xi2 k2 , ·))
                                                                                                          
    ∨ nweq s, s1 , 2, EQ(xab , sh·  s1 [·]i) ∧ weq(sh·  s1 [·]i, s2 , 1, EQ(xi2 k2 , :)) ∧ EQ(xi1 k1 , ·) ,

where in the second line the symbol · refers to the diff term of the outer nweq term and the
symbol : to the diff term of the inner weq term.
A implies I. Because of a = shi1  v1 i, the arrays a and s can differ only at i1 . If a = s, we
get EQ(xab , s) by replacing a in the A-projection of a 6= b and we get weq(s, s2 , 1, EQ(xi2 k2 , ·))
from the i2 -path as in Section 4.3. Otherwise, s[i1 ] 6= s1 [i1 ], as a[i1 ] = s1 [i1 ] holds in A by the
i1 -path. By correcting s at i1 with s1 [i1 ], we get a, and therefore EQ(xab , shi1  s1 [i1 ]i) holds.
Again, we get weq(shi1  s1 [i1 ]i, s2 , 1, EQ(xi2 k2 , :)) as in Section 4.3. Finally, for i1 , we have
the literal EQ(xi1 k1 , i1 ) by projection. We know that i1 is among the diff terms between s and
s1 . Thus, the nweq term holds since i1 satisfies the nested formula.
I contradicts B. Assume that the first disjunct of I holds. By EQ(xab , s) in I and ¬ EQ(xab , b)
in B, we get s 6= b. The only potential difference is at i2 because of B. Since B contains
s2 [i2 ] = b[i2 ] this implies that s and s2 also differ at i2 . However, with the weq term in I
and ¬ EQ(xi2 k2 , i2 ) in B, we get a contradiction. Assume now that the second disjunct of I
holds. Then either s and s1 must differ at some index · which satisfies the formula inside the
nweq term, or at more than 2 positions. We know from B that s and s1 can only differ at i2
and k1 . For k1 , there is the term ¬ EQ(xi1 k1 , k1 ) in B. Hence, · can only be i2 . Because of
EQ(xab , sh·  s1 [·]i) in I and ¬ EQ(xab , b) in B, we get sh·  s1 [·]i =
                                                                          6 b. Since s and b only differ
at i2 , the difference can only be at i2 = · and sh·  s1 [·]i[i2 ] 6= b[i2 ] = s2 [i2 ]. By the inner weq
term in I, i2 has to satisfy EQ(xi2 k2 , i2 ), which is a contradiction to B.

                                                                                                                 9
Efficient Interpolation for the Theory of Arrays                                      Hoenicke and Schindler

                                                                             π
Algorithm. Identify in the main path P the first A-path a ⇔0 s1 and its store indices
Stores (π0 ) = {i1 , . . . i|π0 | }. To build an interpolant, we rewrite s1 by storing at each index
im the value a[im ]. We use s̃ to denote the intermediate arrays. We build a formula Im (s̃)
inductively over m ≤ |π0 |. This formula is an interpolant if s̃ is a shared array that differs from
a only at the indices i1 , . . . , im .
    For m = 0, i.e., a = s̃, we modify the lemma by adding the strong edge s̃ ↔ a in front of
all paths and summarize it as if it was B-local using the algorithm in Section 5.1, but drop the
                                     π
weq term for the path s̃ ↔ a ⇔0 s1 . This yields I5.1 (s̃). We define
                                      I0 (s̃) ≡ EQ(xab , s̃) ∧ I5.1 (s̃) .
   For the induction step to m + 1 we assume that s̃ only differs from a at i1 , . . . , im , im+1 .
Our goal is to find a shared index term x for im+1 and a shared value v for a[x]. We use the
im+1 -path to conclude that s̃hx  vi is equal to a at im+1 . Then we can include Im (s̃hx  vi)
computed using the induction hypothesis.
   (i) If there is a select edge on a B-subpath of the im+1 -path or if im+1 is itself shared, we
immediately get a shared term x for im+1 . If the last B-path π m+1 on the im+1 -path starts
with a mixed select equality, then the corresponding auxiliary variable is the shared value
v. Otherwise, π m+1 starts with a shared array sm+1 and v := sm+1 [x]. We summarize the
im+1 -path from a to the start of π m+1 as in Section 4.2 and get I4.2 (x). Finally, we set
                           Im+1 (s̃) ≡ I4.2 (x) ∨ (Im (s̃hx  vi) ∧ FπBm+1 (x)) .
                                                                                    π m+1
    (ii) Otherwise, we split the im+1 -path into a ∼im+1 sm+1 and sm+1 ⇔ b, where π m+1 is
the last B-subpath of the im+1 -path. If s1 and a are equal on im+1 then also s̃ and a are equal
and the interpolant is simply Im (s̃). If a and sm+1 differ on im+1 , we build an interpolant from
a ∼im+1 sm+1 as0 in 4.4 and obtain I4.4 . Otherwise, s1 and sm+1 differ on im+1 . We build the
               P                                                                            π
store path s1 ⇔ sm+1 by concatenating P and π m+1 . Using nweq on the subpaths s ⇔ s0 of
  0                                                                                   |π| 0
P we find the shared term x for im+1 . If π is in A we need to add the conjunct s        s = s0 to
obtain an interpolant. We get
      Im+1 (s̃) ≡ Im (s̃) ∨ I4.4 [for a ∼im+1 sm+1 ]
                                                                               
                                                                                       |π| 0
                         _
                   ∨        nweq s, s0 , |π|, Im (s̃h·  sm+1 [·]i) ∧ FπBm+1 (·) [ ∧ s    s = s0 ] .
                       π
                      s⇔s0 in P 0

The lemma interpolant for the mixed extensionality lemma is I ≡ I|π0 | (s1 ).


6     Conclusion and Future Work
We presented an interpolation algorithm for the quantifier-free fragment of the theory of arrays.
The algorithm uses an efficient array solver based on weak equivalence on arrays. In contrast to
most existing interpolation algorithms for arrays, the solver does not depend on the partioning
of the interpolation problem. Thus, our technique allows for efficient interpolation especially
in the context of sequence interpolants and tree interpolants where interpolants for different
partitions of the same unsatisfiable formula need to be computed.
    Because we use the framework of proof tree preserving interpolation we only need to provide
algorithms to interpolate the lemmas. These algorithms build the formulas by simply iterating
over the weak equivalence and weak congruence paths found by the array solver.
    Implementation of the algorithm in SMTInterpol [8] is ongoing work. The algorithm for
read-over-weakeq lemmas is already implemented and supports sequence and tree interpolation.

10
Efficient Interpolation for the Theory of Arrays                                 Hoenicke and Schindler


References
 [1] Pavel Andrianov, Karlheinz Friedberger, Mikhail U. Mandrykin, Vadim S. Mutilin, and Anton
     Volkov. CPA-BAM-BnB: Block-abstraction memoization and region-based memory models for
     predicate abstractions - (competition contribution). In TACAS (2), volume 10206 of Lecture
     Notes in Computer Science, pages 355–359, 2017.
 [2] Dirk Beyer, Thomas A. Henzinger, Ranjit Jhala, and Rupak Majumdar. The software model
     checker blast. STTT, 9(5-6):505–525, 2007.
 [3] Angelo Brillout, Daniel Kroening, Philipp Rümmer, and Thomas Wahl. Program verification via
     craig interpolation for presburger arithmetic with arrays. In VERIFY@IJCAR, volume 3 of EPiC
     Series in Computing, pages 31–46. EasyChair, 2010.
 [4] Roberto Bruttomesso, Silvio Ghilardi, and Silvio Ranise. Quantifier-free interpolation in combi-
     nations of equality interpolating theories. ACM Trans. Comput. Log., 15(1):5:1–5:34, 2014.
 [5] Franck Cassez, Anthony M. Sloane, Matthew Roberts, Matthew Pigram, Pongsak Suvanpong,
     and Pablo González de Aledo Marugán. Skink: Static analysis of programs in LLVM intermediate
     representation - (competition contribution). In TACAS (2), volume 10206 of Lecture Notes in
     Computer Science, pages 380–384, 2017.
 [6] Jürgen Christ and Jochen Hoenicke. Weakly equivalent arrays. In FroCos, volume 9322 of Lecture
     Notes in Computer Science, pages 119–134. Springer, 2015.
 [7] Jürgen Christ and Jochen Hoenicke. Proof tree preserving tree interpolation. J. Autom. Reasoning,
     57(1):67–95, 2016.
 [8] Jürgen Christ, Jochen Hoenicke, and Alexander Nutz. Smtinterpol: An interpolating SMT solver.
     In SPIN, volume 7385 of Lecture Notes in Computer Science, pages 248–254. Springer, 2012.
 [9] Jürgen Christ, Jochen Hoenicke, and Alexander Nutz. Proof tree preserving interpolation. In
     TACAS, volume 7795 of Lecture Notes in Computer Science, pages 124–138. Springer, 2013.
[10] Jürgen Christ, Jochen Hoenicke, and Alexander Nutz. Proof tree preserving interpolation. CoRR,
     abs/1705.05309, 2017. Improved and simplified version. http://arxiv.org/abs/1705.05309.
[11] Matthias Dangl, Stefan Löwe, and Philipp Wendler. Cpachecker with support for recursive pro-
     grams and floating-point arithmetic - (competition contribution). In TACAS, volume 9035 of
     Lecture Notes in Computer Science, pages 423–425. Springer, 2015.
[12] Marius Greitschus, Daniel Dietsch, Matthias Heizmann, Alexander Nutz, Claus Schätzle, Christian
     Schilling, Frank Schüssele, and Andreas Podelski. Ultimate taipan: Trace abstraction and abstract
     interpretation - (competition contribution). In TACAS (2), volume 10206 of Lecture Notes in
     Computer Science, pages 399–403, 2017.
[13] Matthias Heizmann, Yu-Wen Chen, Daniel Dietsch, Marius Greitschus, Alexander Nutz, Betim
     Musa, Claus Schätzle, Christian Schilling, Frank Schüssele, and Andreas Podelski. Ultimate au-
     tomizer with an on-demand construction of floyd-hoare automata - (competition contribution). In
     TACAS (2), volume 10206 of Lecture Notes in Computer Science, pages 394–398, 2017.
[14] Matthias Heizmann, Jochen Hoenicke, and Andreas Podelski. Nested interpolants. In POPL,
     pages 471–482. ACM, 2010.
[15] Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Grégoire Sutre. Lazy abstraction. In
     POPL, pages 58–70. ACM, 2002.
[16] John McCarthy. Towards a mathematical science of computation. In IFIP Congress, pages 21–28,
     1962.
[17] Kenneth L. McMillan. Lazy abstraction with interpolants. In CAV, volume 4144 of Lecture Notes
     in Computer Science, pages 123–136. Springer, 2006.
[18] Alexander Nutz, Daniel Dietsch, Mostafa Mahmoud Mohamed, and Andreas Podelski. ULTI-
     MATE KOJAK with memory safety checks - (competition contribution). In TACAS, volume 9035
     of Lecture Notes in Computer Science, pages 458–460. Springer, 2015.
[19] Viorica Sofronie-Stokkermans. Hierarchic reasoning in local theory extensions. In CADE, volume

                                                                                                    11
Efficient Interpolation for the Theory of Arrays                            Hoenicke and Schindler


     3632 of Lecture Notes in Computer Science, pages 219–234. Springer, 2005.
[20] Nishant Totla and Thomas Wies. Complete instantiation-based interpolation. J. Autom. Reason-
     ing, 57(1):37–65, 2016.
[21] Greta Yorsh and Madanlal Musuvathi. A combination method for generating interpolants. In
     CADE, volume 3632 of Lecture Notes in Computer Science, pages 353–368. Springer, 2005.


12