=Paper=
{{Paper
|id=Vol-2908/paper15
|storemode=property
|title=AXDInterpolator: A Tool for Computing Interpolants for Arrays with MaxDiff
|pdfUrl=https://ceur-ws.org/Vol-2908/paper15.pdf
|volume=Vol-2908
|authors=Jose Abel Castellanos Joo,Silvio Ghilardi,Alessandro Gianola,Deepak Kapur
|dblpUrl=https://dblp.org/rec/conf/smt/JooGGK21
}}
==AXDInterpolator: A Tool for Computing Interpolants for Arrays with MaxDiff==
AXDInterpolator: A Tool for Computing Interpolants for Arrays with MaxDiff Jose Abel Castellanos Joo1 , Silvio Ghilardi2 , Alessandro Gianola3 and Deepak Kapur1 1 Department of Computer Science, University of New Mexico, Albuquerque, USA 2 Dipartimento di Matematica, Università degli Studi di Milano, Milan, Italy 3 Faculty of Computer Science, Free University of Bozen-Bolzano, Bolzano, Italy Abstract Several approaches toward quantifier-free interpolation algorithms of theories involving arrays have been proposed by extending the language using a binary function skolemizing the extensionality principle. In FoSSaCS 2021, the last three authors studied the enrichment of the McCarthy’s theory of extensional arrays with a maxdiff operation. This paper discusses the implementation of the interpolation algorithm proposed in FoSSaCS 2021 using the Z3 API. The implementation allows the user to choose iZ3, Mathsat, or SMTInterpol as interpolation engines. The tool returns a formula in SMTLIB2 format, which allows compatibility with model checkers and invariant generators using such a format. We compare our algorithm with state-of-the-art interpolation engines. Our experiments using unsatisfiable formulæ extracted with the model checker UAutomizer show the feasibility of our tool. For that purpose, we used C programs from the ReachSafety-Arrays and MemSafety-Arrays tracks of SV-COMP. Keywords Interpolation, Arrays, MaxDiff, SMT 1. Introduction Since McMillan’s seminal papers [1, 2], interpolation has been successfully applied in software model checking. Even if the well-known Craig’s theorem [3] guarantees the existence of interpolants in first order logic, it does not give any information on the shape the interpolant can have when a specific theory is involved. Nevertheless, this is crucial for the applications: when we extract interpolants, we are typically handling a theory which might be undecidable, but whose quantifier-free fragment is decidable for satisfiability (usually within a somewhat ‘reasonable’ computational complexity). Thus, it is desirable (although not always possible) that the interpolant is quantifier-free, a fact which is not guaranteed in general. This is why a lot of effort has been made in analyzing quantifier-free interpolation: given two quantifier-free formulæ 𝐴 and 𝐵 such that 𝐴 ∧ 𝐵 is not satisfiable (modulo a theory 𝑇 ), a quantifier-free interpolant 𝐶 is a quantifier-free formula such that 𝑇 |= 𝐴 → 𝐶, 𝑇 |= 𝐶 ∧ 𝐵 → ⊥ and such that 𝐶 contains only the variables which occur both in 𝐴 and in 𝐵. The specific theories we consider here are variants of McCarthy’s theory of arrays [4] with extensionality. This theory is suitable to formalize programs over arrays, like standard copying, SMT 21: 19th International Workshop on Satisfiability Modulo Theories, 18–19 July 2021, online " jose.castellanosjoo@cs.unm.edu (J. A. Castellanos Joo); silvio.ghilardi@unimi.it (S. Ghilardi); gianola@inf.unibz.it (A. Gianola); kapur@cs.unm.edu (D. Kapur) © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 40 comparing, searching, sorting functions; verification problems of this kind are collected, e.g., in the SV-COMP benchmarks categories “ReachSafety-Arrays” and “MemSafety-Arrays”1 , where safety verification tasks involving arrays of finite but unknown length are considered. By itself, the theory of arrays with extensionality does not have quantifier free interpo- lation [5]; however, in [6] it was shown that quantifier-free interpolation is restored if one enriches the language with a binary function skolemizing the extensionality axiom. Such a Skolem function, applied to two array variables 𝑎, 𝑏, returns an index diff(𝑎, 𝑏) where 𝑎, 𝑏 differ (it returns an arbitrary value if 𝑎 is equal to 𝑏). This semantics for the diff operation is very undetermined and does not have a significant interpretation in concrete programs. That is why in [7] the diff operation is modified so as to obtain a defined and natural meaning: we introduce the theory of arrays with maxdiff 𝒜ℛ𝒟(𝑇𝐼 ), where diff(𝑎, 𝑏) returns the biggest index where 𝑎, 𝑏 differ (in case 𝑎 = 𝑏 we ask for diff(𝑎, 𝑏) to be the minimum index 0). Since it is natural to view arrays as functions defined on initial intervals of the nonnegative integers, this choice has a clear semantic motivation. The expressive power of the theory of arrays so enriched becomes bigger: for instance, if we also add to the language a constant symbol 𝜖 for the undefined array constantly equal to some ‘undefined’ value ⊥ (where ⊥ is meant to be different from the values 𝑎[𝑖] actually in use), then we can define |𝑎| as diff(𝑎, 𝜖). In this way we can model the fact that 𝑎 is undefined outside the interval [ 0, |𝑎| ] - this is useful to formalize the above mentioned SV-COMP benchmarks. The effectiveness of quantifier-free interpola- tion in the theory of arrays with maxdiff is exemplified in the Strcpy function example of Figure 1 in [7]:2 we need to express | · | to formalize its initial states without quantifiers. In [7], 𝒜ℛ𝒟(𝑇𝐼 ) is shown to admit quantifier-free interpolation, and a suitable algorithm based on instantiation à-la-Herbrand and on a hierarchical approach ([8, 9, 10]) is introduced: by exploiting iterated applications of the maxdiff operator and reducing the interpolation problem to a reduced interpolating theory, we can compute quantifier-free interpolants out of the input formulæ. In this paper, we present (in Section 3) AXDInterpolator3 , a tool that implements the interpo- lation algorithm of [7]. Following an instantiation-based approach, the algorithm always (and in particular for all the examples in this paper) returns a quantifier-free formula. A comparison is done with the state-of-the-art interpolation engines that have known interpolant generation algorithms for the theory of arrays–iZ3, MathSat, and SMTInterpol [11, 12, 13] support the theory of arrays with maxdiff. Indeed, on examples supporting expressive features like diff or the | · | operator, no proper comparison is possible. In case unsafe traces for such exam- ples are considered, in contrast to AXDInterpolator, Mathsat and iZ3 cannot usually get a quantifier-free interpolant: either the tools detect the satisfiability of the unsafe trace (since they reason modulo a theory that is weaker than 𝒜ℛ𝒟(𝑇𝐼 )) so they cannot compute an interpolant, 1 https://sv-comp.sosy-lab.org/2020/benchmarks.php 2 The Strcpy function performs 𝑁 sequential index-wise copies of values of two arrays 𝑎, 𝑏 from 𝑎 to 𝑏 where 𝑁 is the length of both arrays: the loop invariant for this examples is 𝑎 = 𝑏 ∨ (𝑁 > diff(𝑎, 𝑏) ∧ diff(𝑎, 𝑏) ≥ 𝐼), 𝐼 being the index variable in the loop. This formula is a loop invariant because if the two arrays 𝑎, 𝑏 are different while 𝐼 < 𝑁 then the conjunction 𝑁 > diff(𝑎, 𝑏) ∧ diff(𝑎, 𝑏) ≥ 𝐼 asserts that the possibly entry where 𝑎, 𝑏 differ has not been processed yet. 3 We use the prerelease version 0.9.0 of our tool: https://github.com/typesAreSpaces/AXDInterpolator/releases/ tag/0.9.0 41 or, in case the unsafe trace is correctly detected as unsat (equipping the solver with axioms for maxdiff), they fail returning errors or quantified interpolant or timeout; for Example 1 from [7], SMTInterpol fails. Mathsat and iZ3 fail also in the case of the example from [5], which is well-known not to admit a quantifier-free interpolant unless the extensionality axiom is skolem- ized with a generic diff operator: in this specific case, SMTInterpol, instead, can successfully compute a quantifier-free interpolant using the plain diff operator, but AXDInterpolator can compute an interpolant with maxdiff a bit faster. Running experiments over two tracks of the SV-COMP (ReachSafety-Arrays and MemSafety-Arrays), we use unsatisfiable formulæ extracted with the model checker UAutomizer [14]: the latter automatically transforms C programs taken from those benchmarks into suitable SMTLIB2 files that allow to verify particular properties of these programs. Since UAutomizer does not support the extended language of arrays with maxdiff, the comparison among the different solvers becomes acceptable. The results show that AXDInterpolator, iZ3, Mathsat and SMTInterpol have an overall identical behavior over these benchmarks: interpolants are computed for almost every example, apart from a small portion of timeout outcomes reached by AXDInterpolator as its implementation is evolving. On more expressive examples, such as Example 1 from [7], other solvers including the most expressive SMTInterpol fail, whereas AXDinterpolator produces useful interpolants. AXD- Interpolator uses the Z3 API [15] and allows the user to choose among different solvers in order to compute the interpolants in the reduced theory invoked by the hierarchical method: iZ3, Mathsat [12] and SMTInterpol [13]. It generates a formula in SMTLIB2 format, which guarantees compatibility with model checkers and invariant generators adopting the same format. 2. Arrays with MaxDiff The McCarthy theory of arrays [4] with extensionality has three sorts ARRAY, ELEM, INDEX (called “array”, “element”, and “index” sort, respectively) and two function symbols 𝑟𝑑 (“read”) and 𝑤𝑟 (“write”) of appropriate arities. Besides the axioms for defining 𝑟𝑑 and 𝑤𝑟, it contains the ‘extensionality’ axiom: ∀𝑥, 𝑦. 𝑥 ̸= 𝑦 → (∃𝑖. 𝑟𝑑(𝑥, 𝑖) ̸= 𝑟𝑑(𝑦, 𝑖)). In [7], the Skolem function diff for this axiom has been given a proper semantics, i.e., diff returns the biggest index where two different arrays differ. For introducing our theory of arrays with maxdiff, we first need to consider an index theory 𝑇𝐼 (see [7] for the general definition). The signature of an index theory 𝑇𝐼 contains at least the binary relation symbol ≤ and the constant 0. Examples of index theories are the theory of total orders 𝑇 𝑂, integer difference logic ℐ𝒟ℒ, integer linear arithmetic ℒℐ𝒜, and real linear arithmetics ℒℛ𝒜. For most applications, ℐ𝒟ℒ (namely the theory of integer numbers with 0, ordering, successor and predecessor) suffices as in this theory one can model counters for scanning arrays. Given an index theory 𝑇𝐼 , we now introduce our array theory with maxdiff 𝒜ℛ𝒟(𝑇𝐼 ) (parameterized by 𝑇𝐼 ) as follows. We still have the same sorts; the language includes the symbols of 𝑇𝐼 , the operations 𝑟𝑑, 𝑤𝑟, a binary function diff of type ARRAY × ARRAY → INDEX, as well as constants 𝜖 and ⊥ of sorts ARRAY and ELEM, resp. The constant ⊥ models an undetermined (e.g. undefined, not-in-use, not coming from appropriate initialization, etc.) value and 𝜀 models the totally undefined array; the term diff(𝑥, 𝑦) returns the maximum index where 𝑥 and 𝑦 42 differ and returns 0 if 𝑥 and 𝑦 are equal.4 Formally, the axioms of 𝒜ℛ𝒟(𝑇𝐼 ) include, besides the axioms of 𝑇𝐼 , the following ones: ∀𝑦, 𝑖, 𝑒. 𝑖 ≥ 0 → 𝑟𝑑(𝑤𝑟(𝑦, 𝑖, 𝑒), 𝑖) = 𝑒 (1) ∀𝑦, 𝑖, 𝑗, 𝑒. 𝑖 ̸= 𝑗 → 𝑟𝑑(𝑤𝑟(𝑦, 𝑖, 𝑒), 𝑗) = 𝑟𝑑(𝑦, 𝑗) (2) ∀𝑥, 𝑦. 𝑥 ̸= 𝑦 → 𝑟𝑑(𝑥, diff(𝑥, 𝑦)) ̸= 𝑟𝑑(𝑦, diff(𝑥, 𝑦)) (3) ∀𝑥, 𝑦, 𝑖. 𝑖 > diff(𝑥, 𝑦) → 𝑟𝑑(𝑥, 𝑖) = 𝑟𝑑(𝑦, 𝑖) (4) ∀𝑥. diff(𝑥, 𝑥) = 0 (5) ∀𝑥.𝑖 𝑖 < 0 → 𝑟𝑑(𝑥, 𝑖) = ⊥ (6) ∀𝑖. 𝑟𝑑(𝜀, 𝑖) = ⊥ (7) As an effect of the above axioms, we have that an array 𝑥 is undefined outside the interval [0, |𝑥|], where |𝑥| is defined as |𝑥| := diff(𝑥, 𝜀). Typically, this interval is finite, and it can be shown that any satisfiable constraint is satisfiable in a model where all such intervals (involving variables appearing in the constraint) are finite [7]. For the interpolation algorithm from [7], we need to introduce iterated diff operations, similarly to [10]. As we know, diff(𝑎, 𝑏) returns the biggest index where 𝑎 and 𝑏 differ (it returns 0 if 𝑎 = 𝑏). Now we want an operator that returns the last-but-one index where 𝑎, 𝑏 differ (0 if 𝑎, 𝑏 differ in at most one index), etc. Our language is already enough expressive for that, so we can introduce such operators explicitly as follows. Given array variables 𝑎, 𝑏, we define by mutual recursion the sequence of array terms 𝑏1 , 𝑏2 , . . . and of index terms diff1 (𝑎, 𝑏), diff2 (𝑎, 𝑏), . . . : 𝑏1 := 𝑏; diff1 (𝑎, 𝑏) := diff(𝑎, 𝑏1 ); 𝑏𝑘+1 := 𝑤𝑟(𝑏𝑘 , diff𝑘 (𝑎, 𝑏), 𝑟𝑑(𝑎, diff𝑘 (𝑎, 𝑏))); diff𝑘+1 (𝑎, 𝑏) := diff(𝑎, 𝑏𝑘+1 ) Lemma 2.1. A formula like diff1 (𝑎, 𝑏) = 𝑘1 ∧ · · · · · · ∧ diff𝑙 (𝑎, 𝑏) = 𝑘𝑙 (8) is equivalent modulo 𝒜ℛ𝒟 to the conjunction of the following five formulæ: 𝑘1 ≥ 𝑘2 ∧ · · · ∧ 𝑘𝑙−1 ≥ 𝑘𝑙 ∧ 𝑘𝑙 ≥ 0 (9) (10) ⋀︀ 𝑗<𝑙 (𝑘𝑗 > 𝑘𝑗+1 → 𝑟𝑑(𝑎, 𝑘𝑗 ) ̸= 𝑟𝑑(𝑏, 𝑘𝑗 )) (11) ⋀︀ 𝑗<𝑙 (𝑘𝑗 = 𝑘𝑗+1 → 𝑘𝑗 = 0) (12) ⋀︀ 𝑗≤𝑙 (𝑟𝑑(𝑎, 𝑘𝑗 ) = 𝑟𝑑(𝑏, 𝑘𝑗 ) → 𝑘𝑗 = 0) ∀ℎ (ℎ > 𝑘𝑙 → 𝑟𝑑(𝑎, ℎ) = 𝑟𝑑(𝑏, ℎ) ∨ ℎ = 𝑘1 ∨ · · · ∨ ℎ = 𝑘𝑙−1 ) (13) Separated Pairs, 𝑀 -Instantiation and Satisfiability. The key step of the interpolation algorithm from [7] depends upon the problem of checking satisfiability (modulo 𝒜ℛ𝒟(𝑇𝐼 )) of quantifier-free formulæ; this is solved by adapting instantiation techniques, like those from [16]. 4 Notice that it might well be the case that diff(𝑥, 𝑦) = 0 for different 𝑥, 𝑦, but in that case 0 is the only index where 𝑥, 𝑦 differ. 43 The complexity 𝑐(𝑡) of a term 𝑡 is defined as the number of function symbols occurring in 𝑡. If ℐ is a set of 𝑇𝐼 -terms, an ℐ-instance of a universal formula of the kind ∀𝑖 𝜑 is a formula of the kind 𝜑(𝑡/𝑖) for some 𝑡 ∈ ℐ. A pair of sets of quantifier-free formulæ Φ = (Φ1 , Φ2 ) is a separated pair iff (1) Φ1 contains equalities of the form diff𝑘 (𝑎, 𝑏) = 𝑖 and 𝑎 = 𝑤𝑟(𝑏, 𝑖, 𝑒); moreover if it contains the equality diff𝑘 (𝑎, 𝑏) = 𝑖, it must also contain an equality of the form diff𝑙 (𝑎, 𝑏) = 𝑗 for every 𝑙 < 𝑘; (2) Φ2 contains Boolean combinations of 𝑇𝐼 -atoms and of atoms of the forms: {𝑟𝑑(𝑎, 𝑖) = 𝑟𝑑(𝑏, 𝑗), 𝑟𝑑(𝑎, 𝑖) = 𝑒, 𝑒1 = 𝑒2 }, where 𝑎, 𝑏, 𝑖, 𝑗, 𝑒, 𝑒1 , 𝑒2 are variables or constants of the appropriate sorts. The separated pair is said to be finite iff Φ1 and Φ2 are both finite. In practice, in a separated pair Φ = (Φ1 , Φ2 ), reading 𝑟𝑑(𝑎, 𝑖) as a functional application, it turns out that the formulæ from Φ2 can be translated into quantifier-free formulæ of the combined theory 𝑇𝐼 ∪ ℰ𝒰ℱ (the array variables occurring in Φ2 are converted into free unary function symbols). 𝑇𝐼 ∪ ℰ𝒰ℱ enjoys the decidability of the quantifier-free fragment and has quantifier-free interpolation because 𝑇𝐼 is an index theory [7]: we adopt a hierarchical approach (similar to [8, 9]) and we rely on satisfiability and interpolation algorithms for such a theory as black boxes. Let ℐ be a set of 𝑇𝐼 -terms and let Φ = (Φ1 , Φ2 ) be a separated pair; we let Φ(ℐ) = (Φ1 (ℐ), Φ2 (ℐ)) be the smallest separated pair satisfying the following conditions: - Φ1 (ℐ) is equal to Φ1 and Φ2 (ℐ) contains Φ2 ; - Φ2 (ℐ) contains all ℐ-instances of the two formulæ ∀𝑖 𝑟𝑑(𝜀, 𝑖) = ⊥, ∀𝑖 (𝑖 < 0 → 𝑟𝑑(𝑎, 𝑖) = ⊥), where 𝑎 is any array variable occurring in Φ1 or Φ2 ; - if Φ1 contains 𝑎 = 𝑤𝑟(𝑏, 𝑖, 𝑒), then Φ2 (ℐ) contains all the ℐ-instances of the equivalent formula (𝑖 ≥ 0 → 𝑟𝑑(𝑎, 𝑖) = 𝑒) ∧⋀︀∀ℎ (ℎ ̸= 𝑖 → 𝑟𝑑(𝑎, ℎ) = 𝑟𝑑(𝑏, ℎ)); - if Φ1 contains the conjunction 𝑙𝑖=1 diff𝑖 (𝑎, 𝑏) = 𝑘𝑖 , then Φ2 (ℐ) contains the for- mulæ (9), (10), (11), (12) as well as all ℐ-instances of the formula (13). For 𝑀 ∈ N ∪ {∞}, the 𝑀 -instantiation of Φ = (Φ1 , Φ2 ) is the separated pair Φ(ℐΦ𝑀 ) = (Φ1 (ℐΦ𝑀 ), Φ2 (ℐΦ𝑀 )), where ℐΦ𝑀 is the set of 𝑇𝐼 -terms of complexity at most 𝑀 built up from the index variables occurring in Φ1 , Φ2 . The full instantiation of Φ = (Φ1 , Φ2 ) is the separated pair Φ(ℐΦ∞ ) = (Φ1 (ℐΦ∞ ), Φ2 (ℐΦ∞ )) (which is usually not finite). A separated pair Φ ⋀︀= (Φ1⋀︀, Φ2 ) is 𝑀 -instantiated iff Φ = Φ(ℐΦ𝑀 ); it is 𝒜ℛ𝒟(𝑇𝐼 )-satisfiable iff so it is the formula Φ1 ∧ Φ2 . The 𝑆𝑀 𝑇 (𝒜ℛ𝒟(𝑇𝐼 )) problem is decidable for every index theory 𝑇𝐼 (and can be reduced to 𝑇𝐼 ∪ ℰ𝒰ℱ-satisfiability) thanks to the following results: Lemma 2.2. Given a quantifier-free formula 𝜑, we can compute finitely many finite separation pairs Φ1 = (Φ11 , Φ12 ), . . . , Φ𝑛 = (Φ𝑛1 , Φ𝑛2 ) such that 𝜑 is 𝒜ℛ𝒟(𝑇𝐼 )-satisfiable iff so is one of the Φ𝑖 . Lemma 2.3. The following conditions are equivalent for a finite separation pair Φ = (Φ1 , Φ2 ): (i) Φ is 𝒜ℛ𝒟(𝑇𝐼 )-satisfiable; (ii) Φ2 (ℐΦ0 ) is 𝑇𝐼 ∪ ℰ𝒰ℱ-satisfiable. ⋀︀ 2.1. The interpolation algorithm The interpolation algorithm from [7] is based on instantiation à-la-Herbrand. Our problem is to compute a quantifier-free interpolant in 𝒜ℛ𝒟(𝑇𝐼 ) of two quantifier-free formulæ 𝐴 and 𝐵. 44 We call the variables occurring in both 𝐴 and 𝐵 common variables, whereas the variables occurring in 𝐴 (resp. in 𝐵) are called 𝐴-variables (resp. 𝐵-variables). The same terminology applies to terms, atoms and formulæ. As shown ⋀︀in [7], ⋀︀ both 𝐴 and 𝐵 can be given ⋀︀ in the ⋀︀ form of finite separated pairs. Thus, 𝐴 is of the form 𝐴1 ∧ 𝐴2 , 𝐵 is of the form 𝐵1 ∧ 𝐵2 , for separated pairs (𝐴1 , 𝐴2 ) and (𝐵1 , 𝐵2 ). Also, 𝐴 and 𝐵 can be assumed to be both 0-instantiated. The formulæ from 𝐴2 and 𝐵2 are formulæ from the signature of 𝑇𝐼 ∪ ℰ𝒰 ℱ (after rewriting terms of the kind 𝑟𝑑(𝑎, 𝑖) to 𝑓𝑎 (𝑖), where the 𝑓𝑎 are free function symbols). Of course, if 𝐴2 ∧ 𝐵2 is 𝑇𝐼 ∪ ℰ𝒰ℱ-inconsistent, we can get our quantifier-free interpolant by using our black box algorithm for interpolation in the weaker theory 𝑇𝐼 ∪ ℰ𝒰ℱ. Formally, we apply the Loop below until 𝐴2 ∧ 𝐵2 becomes 𝑇𝐼 ∪ ℰ𝒰 ℱ-inconsistent: Theorem 4 in [7] guarantees that, if 𝐴 ∧ 𝐵 is 𝒜ℛ𝒟(𝑇𝐼 )-inconsistent, then the Loop always terminates. When 𝐴2 ∧ 𝐵2 becomes inconsistent, we can get our interpolant via the interpolation algorithm for 𝑇𝐼 ∪ ℰ𝒰ℱ. We need a counter 𝑀 recording how many times the Loop has been executed. Loop (to be repeated until 𝐴2 ∧ 𝐵2 becomes inconsistent modulo 𝑇𝐼 ∪ ℰ𝒰ℱ, initially 𝑀 = 0). Pick two distinct common ARRAY-variables 𝑐1 , 𝑐2 and 𝑛 ≥ 1 and such that no conjunct of the kind diff𝑛 (𝑐1 , 𝑐2 ) = 𝑘 occurs in both 𝐴1 and 𝐵1 for some 𝑛 ≥ 1 (but such that for every 𝑙 < 𝑛 there is a conjunct of the form diff𝑙 (𝑎, 𝑏) = 𝑘 occurring in both 𝐴1 and 𝐵1 ). Pick also a fresh INDEX constant 𝑘𝑛 ; conjoin diff𝑛 (𝑐1 , 𝑐2 ) = 𝑘𝑛 to both 𝐴1 and 𝐵1 ; then 𝑀 -instantiate both 𝐴 and 𝐵. Increase 𝑀 to 𝑀 + 1. Notice that the fresh index constants 𝑘𝑛 introduced during the Loop are considered common constants and so they are considered in the 𝑀 -instantiation of both 𝐴 and 𝐵. In the output, these constants 𝑘𝑛 need to be eliminated. 3. Implementation Our implementation is written in C++ using the Z3 C++ API [15]. It takes as input: - A SMTLIB2 file using standard elements of the language. - An index theory 𝑇𝐼 [7]. Currently, we support quantifier free fragments of total order theory (QF_TO), integer difference (QF_IDL), and linear integer arithmetic (QF_LIA). - A positive integer indicating a maximum number of allowed iterations of the main loop. 5 - An integer number denoting the SMT solver to use to compute the interpolant in the case the input is unsatisfiable in the reduced theory 𝑇𝐼 ∪ ℰ𝒰 ℱ. Currently, we support the SMT solvers Z3, Mathsat, and SMTInterpol. Our extended language is parameterized by the array sorts in the input formula as well as by an index theory. The domain sort of every array is currently implemented using the Int sort 6 . (declare-sort A) (declare-fun diff’A’ ((Array Int A) (Array Int A)) Int) (declare-fun length’A’ ((Array Int A)) Int) (declare-fun empty_array’A’ () (Array Int A) (declare-fun undefined’A’ () A) 5 Index theories stronger than QF_TO require such upper bound for termination purposes in reasonable time. 6 The requirement of including additional information on the name of sorted-functions in our extended language is necessary since Mathsat does not implement ad hoc polymorphism. 45 The rest of the SMTLIB2 file should specify two assertions. The first one encodes the A-part and the second one the B-part of the interpolation problem. In the following we describe the algorithm in [7] by phases as it was implemented in our tool. Preprocessing. This step takes two main parts. First, our tool scans the declaration section in the input file to internalize the signature involving the sorts of the array elements present in the input file. Additionally, we instantiate new declarations of uninterpreted sorts ArraySortA which abstract the array sorts present in the input file, as well as two new functions: (declare-fun wr ((ArraySortA Int A)) ArraySortA) (declare-fun rd (ArraySortA Int) A) with a proper uninterpreted sort. The reason for this is to internalize an uninterpreted language in the algorithm in order to perform theory reduction from 𝒜ℛ𝒟(𝑇𝐼 ) to 𝑇𝐼 ∪ ℰ𝒰ℱ. The second part involves processing the input formula. The algorithm in [7] handles a conjunctive fragment situation. Thus, the algorithm uses Z3 tactics to perform a DNF transformation and produce disjuncts of conjunctions. Every disjunct is processed by the AXDInterpolator C++ class in our implementation which can handle conjunctive formulæ. Each conjunctive formula aforementioned is normalized (we replace length’A’(x) subfor- mulæ for diff’A’(x, empty-array’A’)) and flattened by introducing constants of the right sort. During the execution of the aforementioned methods, we keep track of the 𝐼𝑛𝑡 constants as well as array constants appearing in the input formula. Index constants are used to instantiate quantified formulæ of the respective 𝒜ℛ𝒟(𝑇𝐼 ) theory in use while array constants are necessary to compute common pairs of arrays between the A-part and the B-part of the interpolation problem in the main loop of the algorithm. Standard Pairs. Separated pairs from Section 2 are called in our implementation Standard Pairs, and their requirements are encoded as a C++-class called StandardInput. There, atoms {𝑟𝑑(𝑎, 𝑏) = 𝑟𝑑(𝑏, 𝑗), 𝑟𝑑(𝑎, 𝑖) = 𝑒, 𝑒1 = 𝑒2 } are such that 𝑎, 𝑏 are of sort ArraySort’A’, 𝑖, 𝑗 are of sort Int, 𝑒 is of uninterpreted sort A, and 𝑒1 , 𝑒2 are of appropriate sorts. Given a formula, this class keeps two separate z3::expr_vector denoting the conjunction of formulæ on the first and second component of the standard pair. We use one instantiation of the StandardInput class for both formulæ A-part and B-part. Additionally, this class implements methods handling the instantiations of formulæ. In- stantiation is achieved by including an undefined z3::expr which is replaced using the z3.substitute(z3::expr_vector const &, z3::expr_vector const&) method. Main Loop of the Algorithm and Interpolation Generation. First, the algorithm identifies the set of common array constants between A-part and the B-part. If this set is empty or contains a single element, the algorithm proceeds to instantiate an z3::solver and check the satisfiability of the second component of the Standard Pairs of the A-part and B-part. Otherwise, it uses an iterator to select common pair of array constants which selects elements in a fair way 7 . Similarly to the previous case, it checks the satisfiability of the second component 7 A selection over 𝑛 objects can be seen as a function 𝑓 : {1, . . . , 𝑛} → N. If 𝑖 ∈ {1, . . . , 𝑛}, 𝑓 (𝑖) denotes the 46 of the Standard Pairs. If the state of the solver is unsatisfiable the loop terminates saving the formulæ of the Standard Pairs of the A-part and B-part. Otherwise, it selects the common pair (𝑐1 , 𝑐2 ) of array constants and ‘propagates’ this pair to each of the Standard Pairs of the A-part and B-part. The propagation consists of adding the equation of the form diff𝑛 (𝑐1 , 𝑐2 ) = 𝑘 to the first component of the Standard Pairs where 𝑛 is the smallest integer number such that diff𝑛 (𝑐1 , 𝑐2 ) does not occur in the Stand Pair of both A-part and B-part and 𝑘 is a fresh Int constant. Subsequently, both Standard Pairs are 𝑁 − 𝑖𝑛𝑠𝑡𝑎𝑛𝑡𝑖𝑎𝑡𝑒𝑑. If the index theory in use extends QF_TO, then the implementation uses heuristics to increase the value of an internal number keeping track of the maximum allowed complexity on the index terms. We noticed the implementation performs well if only 0 − 𝑖𝑛𝑠𝑡𝑎𝑛𝑡𝑖𝑎𝑡𝑖𝑜𝑛𝑠 are executed. Better heuristics are under investigation. If in the main loop the solver eventually detects that the second component of the Standard Pairs is unsatisfiable, then the implementation saves the formulæ of the Standard Pairs of the A-part and B-part respectively. Depending on the choice of the user, the implementation executes a command on an external shell calling the SMT-solver to compute the interpolant using the pair of unsatisfiable formulæ mentioned before. For the latter, the implementation ‘compiles’ a temporary SMTLIB2 file for the respective SMT-solver chosen. Reading the interpolant result from the previous process the algorithm removes added constants in order to provide a solution using elements from the input formula. Since no SMT-solver supports interpolation algorithms for the QF_TO we select the QF_IDL as the theory for the interpolation engine and use a custom rewriter implemented on Z3 in order to relax a QF_IDL formula as a QF_TO statement. The latter is important for us in order to remain closed in the respective language. Correctness of the Output. If the interpolant is successfully computed, our implementation allows the user to automatically verify the correctness of the output, that is the output is a real interpolant in 𝒜ℛ𝒟(𝑇𝐼 ): this is done by exploiting the first-order axioms of 𝒜ℛ𝒟(𝑇𝐼 ) and by calling a z3::solver instance using them. For the latter the implementation ‘lifts’ the result by replacing applications of diff𝑘 to their recursive definition. This is achieved using a data structure that memorizes these intermediate formulæ. 4. Evaluation 4.1. Some interesting examples We present in detail an example of two formulæ 𝐴, 𝐵 such that our implementation can compute a quantifier-free interpolant for them (in 0.066 sec), whereas the other solvers cannot, since: (i) modulo the standard theory of arrays, they return that 𝐴 ∧ 𝐵 is sat; (ii) when enriched with the quantified axioms of 𝒜ℛ𝒟(𝑇𝐼 ), iZ3 cannot return an answer within 15 minutes, SMTInterpol correctly detects that 𝐴 ∧ 𝐵 is unsat but cannot return an interpolant and Mathsat returns error since it does not support quantifiers. The example is taken from [7] (Example 1 ): number of times the object 𝑖 has been selected. A selection over 𝑛 objects is fair if for all 𝑖, 𝑗 in {1, . . . , 𝑛} we have that |𝑓 (𝑖) − 𝑓 (𝑗)| ≤ 1. 47 - A-part: diff(𝑎, 𝑐1 ) = 𝑖1 ∧ diff(𝑏, 𝑐2 ) = 𝑖1 ∧ 𝑎 = 𝑤𝑟(𝑎1 , 𝑖3 , 𝑒3 ) ∧ 𝑎1 = 𝑤𝑟(𝑏, 𝑖1 , 𝑒1 ) - B-part: 𝑖1 < 𝑖2 ∧ 𝑖2 < 𝑖3 ∧ 𝑟𝑑(𝑐1 , 𝑖2 ) ̸= 𝑟𝑑(𝑐2 , 𝑖2 ) Our implementation returns the following as an interpolant (and verifies its correctness): - Interpolant: let 𝑎!1 = ¬(𝑟𝑑(𝑐2 , diff(𝑐2 , 𝑐1 )) = 𝑟𝑑(𝑐1 , diff(𝑐2 , 𝑐1 ))) be in let 𝑎!2 = ¬(diff(𝑐2 , 𝑐1 ) ≤ 𝑖1 ) ∧ 𝑎!1 ∧ ¬(𝑖3 ≤ diff(𝑐2 , 𝑐1 )) be in diff2 (𝑐2 , 𝑐1 ) ≤ 𝑖1 ∧ ¬𝑎!2 If the previous file is given the iZ3, SMTInterpol and Mathsat, then the SMT-solvers output sat: clearly, the axioms for maxdiff are missing and 𝐴 ∧ 𝐵 is satisfiable in the standard theory of arrays. Only AXDInterpolator is able to compute a quantifier-free interpolant. Consider this example from [5]: 𝐴 := 𝑥 = 𝑤𝑟(𝑦, 𝑖, 𝑒) and 𝐵 := 𝑟𝑑(𝑥, 𝑗) ̸= 𝑟𝑑(𝑦, 𝑗) ∧ 𝑟𝑑(𝑥, 𝑘) ̸= 𝑟𝑑(𝑦, 𝑘) ∧ 𝑗 ̸= 𝑘. When adding the axioms for maxdiff, Mathsat and iZ3 are not able to find an interpolant, whereas both SMTInterpol and AXDInterpolator get a quantifier-free interpolant: AXDInterpolator is a bit faster than SMTInterpol (0.062 vs 0.176 sec), and the output of the latter uses the plain diff with the undetermined semantics. The Strcpy function from [7] gives the same results, and AXDInterpolator is still a bit faster than SMTInterpol (0.057 vs 0.205 sec). 4.2. Benchmarks using SV-COMP and UAutomizer We also tested our implementation on two benchmarks. We performed our experiments on a machine with Linux 5.11.16, Intel i7-9700 4.7 GHz with 32 GB of memory. We used the model checker UAutomizer [14] to extract their SMT Scripts which are SMTLIB2 files containing the queries produced in order to verify some particular property of C programs. In our case, we chose C-programs from the ReachSafety-Arrays and MemSafety-Arrays tracks of the SV-COMP [17]. We let the machine produce SMT Scripts for 15 minutes. We used these SMT Scripts files to compare the number of interpolants computed from unsatisfiable formulæ. For the latter we assigned each process up to 360 seconds and 6 GB of memory 8 . All the examples do not include the expressive features of 𝒜ℛ𝒟(𝑇𝐼 ), since they have been automatically produced by UAutomizer: this allows us to run these examples on the other solvers iZ3, Mathsat and SMTInterpol. We defined two outcomes: Success, and Timeout. Success is declared when an interpolant is computed, and Timeout when the process takes more time or space than the previously specified. No file failed (i.e., terminated without returning an interpolant). Our tool was configured to run with QF_LIA and 1000 allowed loop repetitions with the heuristic of only 0-instantiations. The solvers were configured to use the QF_AUFLIA theory. The results are shown in Tables 1,2,3 and 4. The implementation is available at the following link: https://github.com/typesAreSpaces/AXDInterpolator. The experimental results can be obtained following the instructions in the documentation. 8 We were able to use 1008 unsatisfiable formulæ from the ReachSafety-Arrays track; 110 queries contained array constants or function applications of select/store. From the MemSafety-Arrays track, we were able to obtain 758 unsatisfiable formulæ; 750 contained array related elements. We noticed the input formulæ do not contain applications of any diff operator as expected since the proposed extension of 𝒜ℛ𝒟(𝑇𝐼 ) is relatively new. 48 AXD Interpolator Subtracks iZ3 Mathsat SMTInterpol Success Timeout Success Timeout Success Timeout array-examples 584 1 584 1 584 1 array-memsafety 118 0 118 0 118 0 termination-crafted 52 3 52 3 52 3 Table 1 Memsafety-track results - Our implementation Subtracks iZ3 Mathsat SMTInterpol Success Timeout Success Timeout Success Timeout array-examples 585 0 585 0 585 0 array-memsafety 118 0 118 0 118 0 termination-crafted 55 0 55 0 55 0 Table 2 Memsafety-track results - Other Solvers AXD Interpolator Subtracks iZ3 Mathsat SMTInterpol Success Timeout Success Timeout Success Timeout array-cav19 31 0 31 0 31 0 array-examples 50 0 50 0 50 0 array-fpi 774 21 774 21 774 21 array-industry-pattern 8 0 8 0 8 0 array-lopstr16 54 0 54 0 54 0 array-patterns 11 0 11 0 11 0 array-tiling 6 0 6 0 6 0 reducercommutativity 53 0 53 0 53 0 Table 3 Reachsafety-track results - Our implementation Subtracks iZ3 Mathsat SMTInterpol Success Timeout Success Timeout Success Timeout array-cav19 31 0 31 0 31 0 array-examples 50 0 50 0 50 0 array-fpi 795 0 795 0 795 0 array-industry-pattern 8 0 8 0 8 0 array-lopstr16 54 0 54 0 54 0 array-patterns 11 0 11 0 11 0 array-tiling 6 0 6 0 6 0 reducercommutativity 53 0 53 0 53 0 Table 4 Reachsafety-track results - Other Solvers 5. Related Work In [11] McMillan introduced an interpolating proof calculus to extract interpolants using refutational proofs obtained from the Z3 SMT-solver. His approach benefits from the flexibility of Z3 to handle a variety of theories and their combination: it relies on a secondary interpolation solver in order to ‘fill the gaps’ of refutational proofs introduced by theory lemmas, which are formulæ derived by the satellities theories encoded in Z3 without a detailed explanation in the proof. The secondary interpolation solver only requires an interpolation algorithm for QF_UFLIA. Since the theory of arrays is introduced using quantified formulæ, this approach generates quantified formulæ. The authors in [18] employed the proof tree preserving interpolation scheme from [19] to 49 compute interpolants using a resolution proof. This approach is capable of handling mixed literals but not mixed terms9 , so the authors introduced the idea of weakly equivalences between arrays to handle those cases for their interpolation calculi. The authors include the diff operation between arrays in order to compute quantifier-free interpolants, but no additionally semantical properties are given to the diff operation. The authors in [10] proposed a general framework for computing interpolants via reduction using finite instantiations. In particular, the paper discusses complete interpolation procedures for theories of arrays and linked lists, as well as other heap-allocated data structures. 6. Conclusions In this paper we presented AXDInterpolator, the implementation of the interpolation algorithm from [7]. We showed the feasibility of AXDInterpolator by validating it on two benchmarks from the SV-COMP. We also compared our implementation with state-of-the-art solvers: apart from very few timeout outcomes, our tool handled all the examples the other solvers did. We also managed to handle interesting examples that are not handled by the other solvers, which makes the option of our tool appealing. Specifically, expressive examples that make use of features like | · | cannot in general be managed by any existing solver: this suggests that AXDInterpolator could be applied for solving more sophisticated examples coming from concrete C programs using arrays. This analysis is a significant point to investigate in future work. We noticed that our implementation, in case of the SV-COMP benchmarks, is behind in time with respect to the state-of-the-art solvers. We think the main reasons might be the following: - The UAutomizer model checker does not include our extended signature in their SMT Scripts. We might take into consideration that many industrial problems will not explicitly use this language, however the overall structure of many array programs commonly use | · | for various algorithms. Thus, we need to integrate this structure in the input formula. - Our implementation timeouts slightly more often than other solvers. This point can be further improved by doing better engineering work at the implementation level. The DNF conversion (in worst case, exponential) appears to be the first target to rework. - The current design does not perform incremental satisfiability checks. As future work we plan to perform incremental checks since this can improve the performance of the used z3::solver data structure. Incremental checks can be implemented due to the incremental nature of the proposed interpolation algorithm by including a hash consed data structure on the terms/predicates produced in the main loop of the algorithm and because the data structure z3::solver can keep track of previously proven assertions. References [1] K. L. McMillan, Interpolation and SAT-based model checking, in: Proc. of CAV, volume 2725 of LNCS, Springer, 2003, pp. 1–13. doi:10.1007/978-3-540-45069-6\_1. 9 Given an interpolation pair (A, B), a symbol 𝑠 is A-local if 𝑠 appears in the symbols of 𝐴 but not in the symbols of 𝐵. A literal/term is mixed if it contains A-local and B-local symbols 50 [2] K. L. McMillan, Lazy abstraction with interpolants, in: Proc. of CAV, volume 4144 of LNCS, Springer, 2006, pp. 123–136. doi:10.1007/11817963\_14. [3] W. Craig, Three uses of the Herbrand-Gentzen theorem in relating model theory and proof theory, J. Symbolic Logic 22 (1957) 269–285. [4] J. Mccarthy, Towards a mathematical science of computation, in: In IFIP Congress, North-Holland, 1962, pp. 21–28. [5] D. Kapur, R. Majumdar, C. G. Zarba, Interpolation for Data Structures, in: Proc. of SIGSOFT-FSE, ACM, 2006, pp. 105–116. [6] R. Bruttomesso, S. Ghilardi, S. Ranise, Quantifier-free interpolation of a theory of arrays, Log. Methods Comput. Sci. 8 (2012). [7] S. Ghilardi, A. Gianola, D. Kapur, Interpolation and amalgamation for arrays with maxdiff, in: S. Kiefer, C. Tasson (Eds.), Foundations of Software Science and Computation Structures - 24th International Conference, FOSSACS 2021, Held as Part of the European Joint Confer- ences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 27 - April 1, 2021, Proceedings, volume 12650 of Lecture Notes in Computer Science, Springer, 2021, pp. 268–288. [8] V. Sofronie-Stokkermans, Interpolation in local theory extensions, Log. Methods Comput. Sci. 4 (2008). [9] V. Sofronie-Stokkermans, On interpolation and symbol elimination in theory extensions, Log. Methods Comput. Sci. 14 (2018). [10] N. Totla, T. Wies, Complete instantiation-based interpolation, J. Autom. Reasoning 57 (2016) 37–65. [11] K. L. McMillan, Interpolants from Z3 proofs, in: Proc. of FMCAD, 2011, pp. 19–27. [12] M. Bozzano, R. Bruttomesso, A. Cimatti, T. Junttila, P. van Rossum, S. Schulz, R. Sebastiani, MathSAT: Tight integration of SAT and mathematical decision procedures, Journal of Automated Reasoning (2005). [13] J. Christ, J. Hoenicke, A. Nutz, SMTInterpol: An interpolating SMT solver, in: A. F. Donaldson, D. Parker (Eds.), Model Checking Software - 19th International Workshop, SPIN 2012, Oxford, UK, July 23-24, 2012. Proceedings, volume 7385 of Lecture Notes in Computer Science, Springer, 2012, pp. 248–254. [14] M. Heizmann, J. Christ, D. Dietsch, E. Ermis, J. Hoenicke, M. Lindenmann, A. Nutz, C. Schilling, A. Podelski, Ultimate Automizer with SMTInterpol, in: N. Piterman, S. A. Smolka (Eds.), Tools and Algorithms for the Construction and Analysis of Systems, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 641–643. [15] L. De Moura, N. Bjørner, Z3: An Efficient SMT Solver, in: Proceedings of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS’08/ETAPS’08, Springer-Verlag, Berlin, Heidelberg, 2008, pp. 337–340. [16] A. R. Bradley, Z. Manna, H. B. Sipma, What’s decidable about arrays?, in: Proc. of VMCAI, volume 3855 of LNCS, Springer, 2006, pp. 427–442. doi:10.1007/11609773\_28. [17] J. F. Groote, K. G. Larsen, D. Beyer, Software verification: 10th comparative evaluation (sv-comp 2021), Tools and Algorithms for the Construction and Analysis of Systems27th International Conference, TACAS 2021, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021, Luxembourg City, Luxembourg, March 51 27 – April 1, 2021, Proceedings, Part II 12652 (2021) 401—422. URL: https://europepmc.org/ articles/PMC7984550. [18] J. Hoenicke, T. Schindler, Efficient interpolation for the theory of arrays, in: Proc. of IJCAR, volume 10900 of LNCS (LNAI), Springer, 2018, pp. 549–565. doi:10.1007/ 978-3-319-94205-6\_36. [19] J. Christ, J. Hoenicke, A. Nutz, Proof tree preserving interpolation, in: N. Piterman, S. A. Smolka (Eds.), Tools and Algorithms for the Construction and Analysis of Systems, Springer Berlin Heidelberg, Berlin, Heidelberg, 2013, pp. 124–138. 52