Introduction

Reasoning About Set Comprehensions

Edmund S. L. Lam

sllam@qatar.cmu.edu 0

Iliano Cervesato

iliano@cmu.edu 0 0 Carnegie Mellon University

Set comprehension is a mathematical notation for de ning sets on the basis of a property of their members. Although set comprehension is widely used in mathematics and some programming languages, direct support for reasoning about it is still not readily available in state-of-the-art SMT solvers. This paper presents a technique for translating formulas which express constraints involving set comprehensions into rst-order formulas that can be veri ed by o -the-shelf SMT solvers. More speci cally, we have developed a lightweight Python library that extends the popular Z3 SMT solver with the ability to reason about the satis ability of set comprehension patterns. This technique is general and can be deployed in a broad range of SMT solvers.

Introduction In particular, the model S carries information on an acceptable value of the set variable X |

in this example, X = f1; 2; 3g is the smallest solution, with X = f1; 2; 3; 4g being another one.

Our goal is to reduce satis ability in SC (Th) to satis ability in Th extended with an uninterpreted sort for sets and an uninterpreted binary predicate 2_ , which encodes set membership.

To achieve this, we translate a set formula of interest S in SC (Th) into a formula VSW of this

resulting theory, which we denote by U +Th (Th with U ninterpreted symbols). Speci cally, each appearance of a set comprehension or of a standard set operation in S, is mapped to a fresh variable together with formulas in the language of U +Th. These formulas ensure that this variable is associated with the exact elements of the original set structure. Then, we verify that S is satis able in SC (Th) by checking that VSW has a model M in U +Th, i.e., that M j=U +Th VSW. For a wide range of formulas, this veri cation step can be carried out using a state-of-the-art high-performance SMT solver such as Z3 [ 1 ]. Finally, we lift M to a model S of S, such that S j=SC (Th) S. Even though this operation is still undecidable in general [ 2 ], it is nonetheless decidable for many instances of SC (LIA) formulas and can be implemented as an e ective satis ability test operation. For instance, our work in [ 3 ] uses a conservative implementation of this test operation to determine satis ability of nite set comprehension formulas that would guarantee safety of certain compiler optimizations.

To illustrate this technique, here is the (slightly simpli ed) encoding VSW where S is the

formula from our earlier example:

VSW = >88z: z 2_ X2 $ z 2_ X3 < : : :

8y: y 2_ X2 $ (y = 10 _ y = 20 _ y = 30) >:8x: (x 10 2_ X3) $ (x 2_ X ^ x < 4) F1 : X2 = X3 F2 : X2 = f10; 20; 30g F3 : X3 = fx 10 | x < 4gx2_ X The rst formula, F1, states that X2 and X3 are extensionally equal. The second formula, F2, constrains the members of X2 to be the integers 10, 20 or 30. Hence X2 is a precise representation of the extensional set f10; 20; 30g in U +LIA. The third formula, F3, restricts X3 so that, for each element x in X that is larger than 4, x 10 must be a member of X3. Hence F3 constrains the behavior of X3 to that of fx 10 | x < 4gx2_ X in U +LIA.

The rest of the paper is organized as follows: Section 2 introduces our source and target languages, SC (LIA) and U +LIA. Section 3 de nes the encoding of SC (LIA) formulas into U +LIA formulas. Section 4 presents a conservative satis ability test operation for SC (LIA) formulas. Section 5 describes our prototype implementation of this operation, a lightweight Python library that extends Z3. Section 6 discusses our conclusions and future work. 2

Notations, Term Languages and Models

In this section, we introduce meta-notation that we will use throughout this paper and we de ne the source and target term languages SC (LIA) and U +LIA.

In discussions, we write o to denote nite sets of syntactic objects o, with ; for the empty set. We write fo; og for the extension of set o with syntactic object o, omitting the brackets when no ambiguity arises. Meta-level set membership is denoted as o 2 o. We write meta-level set comprehension as fo : o j (o)g, which represents the set containing all objects o from o that satisfy the condition (o). We write [o0=x]o for the simultaneous replacement within object o of all occurrences of variable x with o0. When traversing binding constructs and quanti ers, substitutions implicitly -rename variables to avoid capture. These are all meta-level notations, not to be confused with the syntactic objects in our source language, which also features sets.

Figure 1 de nes our source language SC (LIA) and the target language U +LIA. Terms in SC (LIA) are either arithmetic expressions, written t, or set expressions, denoted s. Arithmetic Variables x; X

Values v SC (LIA): Set Comprehensions over Linear Integer Arithmetic Arithmetic Term Arithmetic Formula Set Term Set Formula t ::= x :j v j t op t T ::= t = t j t < t j :T j T ^ T s ::= X j ftg j: ft | T gx2_ s j s [ s j s \ s j s n s

S ::= t 2_ s j s = s j s s j :S j S ^ S U +LIA: Linear Integer Arithmetic and Uninterpreted Sets Arithmetic Term Arithmetic Formula Uninterpreted Set Term Uninterpreted Set Formula Formula t ::= x :j v j t op t T ::= t = t j t < t s ::= X

S ::= t 2_ s

F; C ::= S j T j :F j F ^ F j 9x:F j 8x:F expressions correspond to integers and set expressions to sets of integers. An arithmetic expression is either a base variable x, a number v in Z, or a binary arithmetic operation t op t supported in the theory of linear integer arithmetic (e.g., + or ). A set expression is either a set variable X, an extensional set ftg, the union, intersection or di erence between two sets, or a set comprehension ft | T gx2_ s. The empty set fg is the special case of an extensional set ftg where t is empty. In a set comprehension, we refer to t as the comprehension pattern, to T as the comprehension guard, to x as the binding variable, and to s as the comprehension domain. The scope of the binding variable x is t and T . Where useful for clarity, we explicitly annotate this dependency by writing t and T as t(x) and T (x). Formulas T that appear within set comprehensions are quanti er-free arithmetic formulas over the integer domain. Atomic arithmetic formulas include equality, written t1 =: t2, and other standard predicates, for example t1 < t2. The arithmetic formulas T in comprehension guards combine them using the standard Boolean connectives | we display a minimal set consisting of : and ^ in Figure 1 but will freely use the full set of Boolean connectives in our examples. The set formulas of SC (LIA), denoted S, are : the Boolean combination of set membership t 2_ s, set equality s1 = s2 and the subset relation s1 s2.

The language of U +LIA resembles SC (LIA), with two major syntactic di erences: rst, it dispenses with all set expressions except for set variables X; second, U +LIA formulas allow the use of quanti ers over base variables.

A model is a structure M that satis es the axioms of a theory Th and a formula F .

This is denoted by M j=Th F . A model M contains mappings from the variables of F to their respective instantiations that satisfy F in Th. The lookup operation M(x) denotes the instantiated value that x is mapped to. We write Mx7!v for the model that maps x to v and is otherwise identical to M. For each predicate symbol p that appears in Th or F , the model M also contains mappings from p to the set of all valid instances of the predicate relation. The lookup operation M(p) denotes the set of valid relation instances of p. For simplicity, we assume predicates have the same arity throughout Th and F . Interpretations of the (sub)formula(s) of F and terms t that appear in F are obtained from M by the operations MJF K and MJtK. Unsatis ability of a formula F is denoted by 6j=Th F .

A model in our source language SC (LIA) is denoted by S and its satis ability judgment is

denoted by S j=SC (LIA) S where S is an SC (LIA) formula. We require that SC (LIA) contain the theory of linear integer arithmetic and general set theory. A model S contains mappings for integer variables x to integer values v and set variables X to extensional sets ftg. The sets of all integers and sets that appear in S are denoted by domZ(S) and domS(S) respectively. We will omit standard de nitions of integer term interpretation SJtK and integer formula satis ability S j=SC (LIA) T . The interpretation SJsK of set terms s is de ned as follows:

SJXK = S(X) SJft; tgK = fSJtK; SJtKg

SJfgK = ; SJft(x) | T (x)gx2_ sK = fSx7!aJt(x)K : domZ(S) j a 2 SJsK and Sx7!a j=SC (LIA) T (x)g SJs1 [ s2K = fa : domZ(S) j a 2 SJs1K or a 2 SJs2Kg SJs1 \ s2K = fa : domZ(S) j a 2 SJs1K and a 2 SJs2Kg

SJs1 n s2K = fa : domZ(S) j a 2 SJs1K and a 62 SJs2Kg This de nition is inductive and makes use of SJtK and S j=SC (LIA) T . The interpretation of a set comprehension SJft(x) | T (x)gx2_ sK is de ned as the set of all (and only) the interpretations of the comprehension pattern t(x) with the binding variable x mapped to the integers a in the interpretation of the comprehension domain s such that the corresponding instance of the comprehension guard T (x) is satis able under S. The other cases are standard.

Satis ability for set formulas, S j=SC (LIA) S, is de ned inductively over set formulas and on top of SJtK, SJSK and S j=SC (LIA) T . The key cases of this de nition are as follows: S j=SC (LIA) t 2_ s i

: S j=SC (LIA) s1 = s2 i S j=SC (LIA) s1 s2 i for all a 2 domZ(S); SJtK 2 SJsK for all a 2 domZ(S); Sx7!a j=SC (LIA) x 2_ s1 i Sx7!a j=SC (LIA) x 2_ s2

Sx7!a j=SC (LIA) x 2_ s1 only if Sx7!a j=SC (LIA) x 2_ s2 We omitted the cases for the logical connectives whose interpretations are standard. Note that membership 2_ is an interpreted predicate symbol in U +LIA: t 2_ s is satis able in S if and only if the interpretation of t is actually a member of the interpretation of s. Since 2_ is interpreted, S does not contain any mappings for 2_ (i.e., S(2_ ) = ?). Satis ability of equality, s1 =: s2 is de ned in terms of the membership relation: s1 and s2 are equal if and only if they contain the same integers. In a similar manner, satis ability of the subset relation s1 s2 is de ned in terms of membership 2_.

Models of our target language U +LIA are denoted by M, and the satis ability judgment is denoted by M j=U +LIA F . We require that U +LIA contains the theory of linear integer arithmetic and an uninterpreted domain for sets. Unlike S, M contains no mappings for set variables, since the set domain is uninterpreted. However, M( 2_) maps to the set of pairs hMJtK; Xi for every valid relation t 2_ X from F . The de nition of MJtK and M j=U +LIA F are standard and therefore omitted. 3

Encoding SC (LIA) into U +LIA

In this section, we de ne an encoding of SC (LIA) terms and formulas into U +LIA terms and fsoertmfourlmasu. laTheniscoendcinogdifnugnicstigoinvsenVbWyfotrwmo. translation functions, the set term encoding V Wset and

X set = X V W 8X > VftgWset = < ; ; >

fXg ; 8x: x 2_ X $ mem(t; x) where mem(ft; tg; x) = x =: VtW _ mem(t; x)

mem(;; x) = ? Vs1 op s2Wset = 8X > > > Vft | T gx2_ sWset = < > : 8X > > > > > > > > < > > > > > > > > : > > > :

Vs1Wset = xs1

V1; V2; X ; C1 ^ C2 ^ constr (op) where

The set term encoding of a set expression s is given by a triple (X; V; C) that we write

VsWterm = X V ; C. Here, X is a fresh variable associated with s (unless s is already a variable), V is a collection of uninterpreted set variables, and C is a U +LIA formula. We refer to X as the representative variable of s, C as the extensional context formula and V as the set variable signature. The context formula C constrains X to capture the extensional behavior of s, while V contains all local set variables in C created while encoding s. We require that V contain no duplicates and we rely on implicit -renaming to enforce this constraint. This provides an implicit mechanism to guarantee unique assignments of set variables during encoding.

Figure 2 de nes the set term encoding operation. In the case of an extensional set t = ft1; : : : ; tng, the extensional context formula 8x: x 2_ X $ (x =: t1 _ : : : _ x =: tn) constrains representative variable X to contain all and only the elements that appear in t. We encode all SC (LIA) terms ti in t by means of the encoding operation VtiW. For the element terms we consider in this paper, it is su cient to de ne their encoding as the identity function. The entry Vs1 op s2Wset covers the cases for set union, intersection and di erence. For each, we rst derive encodings for s1 and s2 to obtain representative variables X1 and X2. The variable X is assigned to represent the set s1 op s2. This is done with the help of the context formula constr (op): If op is the union operator [, the context formula state that any member x of X must be a member of either X1 or X2 and vice-versa. The cases where op is \ and n are similar. Each of these context formulas constrains the extensional behavior X to that of the union, intersection or set di erence of s1 and s2 respectively. Finally, Vft | T gx2_ sWset encodes the set comprehension as follows: The comprehension domain s is encoded into X0 with the constraints of s captured by Cdom . The comprehension ft | T gx2_ s is encoded into X with its constraints captured by context formulas Cmax and Crg . The comprehension maximality condition Cmax imposes the constraint that every member x of the comprehension domain X0

VS2Wform = F2 V2 ; C2

V:SWform = :F V ; C where VSWform = F Figure 3: Set Formula Encoding into U +LIA: VSWform = F that satis es the comprehension guard T has its corresponding comprehension pattern t as a member of X. The comprehension range restriction Crg imposes the constraint that for every member z of the set comprehension X, there exists some member x of the comprehension domain X0 whose corresponding comprehension pattern t is equal to z and satis es the comprehension guard T .

While it is tempting to replace the context formulas Cmax and Crg with the single implication formula Ccomp = 8x: (x 2_ X0 ^ T (x)) $ t(x) 2_ X (as done in our example in Section 1), Ccomp would not accurately capture the behavior of set comprehensions with expressions t(x) that are not injective functions. For instance, consider the comprehension s1 = fx%3 | >gx2_ s2 , where % is the modulus operator. Assuming that X and X0 are the representatives of set comprehension s1 and set term s2 respectively, the formula Ccomp = 8x: x 2_ X0 $ (x%3) 2_ X is incorrect in the ` ' case: it demands any x such that x%3 is a member of X must be a member of X0, which is clearly not the behavior of the set comprehension. The combination of Cmax and Crg on the other hand, is sound as long as t(x) is a total function.

Figure 3 de nes the encoding of a set formula S, denoted by VSWform = F V ; C. It encodes a SC (LIA) formula S into an U +LIA formula F under the extensional context formula C with the set variable signatures V. We call F the main encoding formula. Similarly to the term encoding operation, the formula C speci es the constraints on uninterpreted set variables amnedmVbeirssthhipe sreetlaotfioloncbayl sseitmvpalryiaubsliensgctrheaetteedrmbyetnhceodenincgodisngs.etTohfesc.aWseeVtth2e_nsWsifmorpmlyenchcoecdkesththaet the encoding of t is in the representative variable X of VsVWsWet . For Vs1 =: s2Wform , we encode s1 and s2 by applying the term encoding operation of Figure 3 to each, obtaining X1 and X2 as representatives of s1 and s2 respectively with extensional context C1 and C2. Following this, the equality constraint is represented in U +LIA as the formula 8x: x 2_ X1 $ x 2_ X2, stating that all members of X1 are members of X2 and vice-versa. Encoding set equality in this manner exercises the axiom of extensionality of set theory, enforcing the extensional equality of sets with respect to the membership relation ( 2_). The case Vs1 s2Wform is similar, except that it is encoded into a one-sided extensional membership `equality' (!). The case VS1 ^S2Wform de nes the cases for the logical connectives ^, during which we traverse the respective sub-formulas and terms containing set terms and derive their encodings. The main formula is simply the ^ logical connective applied to the respective encoded formulas.

satCheck (S) = where (

TMU ? TMU(x) TMU(X) TMU( 2_) = = = if VS form = F

W otherwise

V ; C and M j=U +LIA C ^ F M(x) ?

? (fa : domZ(M) j ha; Xi 2 M( 2_)g if X 2= V otherwise

The encoding of negation requires some care, and is the reason for separating the main encoding formula F and the extensional context formula C. Our de nition in Figure 3 encodes the negated SC (LIA) formula :(f1g f1; 2g) as follows:

V:(f1g f1; 2g)Wform = :(8x: x 2_ X1 ! x 2_ X2)

X1; X2 ; X1 = f1g ^ X2 = f1; 2g Here, we abbreviated the actual encoding of the extensional sets as X1 = f1g and X2 = f1; 2g for to be succinct. The conjunction of the main formula and context formula (i.e., :F ^ C) concisely captures the negated formula :(f1g f1; 2g), namely: :(f1g f1; 2g)

:(8x: x 2_ X1 ! x 2_ X2) ^ X1 = f1g ^ X2 = f1; 2g

Assume we had instead collapsed F and C into a single formula component, so that VSWform = F ^ C V. Then we would have to translate the above formula as : : :((8x: x 2_ X1 ! x 2_ X2) ^ X1 = f1g ^ X2 = f1; 2g) X1; X2 which is not the intended meaning of the negation as, propagating the negation, :(f1g f1; 2g) 6

:(8x: x 2_ X1 ! x 2_ X2) _ X1 6= f1g _ X2 6= f1; 2g 4

Satis ability Testing for SC (LIA)

In this section, we de ne a satis ability test operation of SC (LIA) formulas on the encoding de ned in the previous section. This operation, denoted by satCheck (S), is de ned in Figure 4. If the SC (LIA) set formula S is satis able, this operation extracts a model for S from a model of its encoding VSWform in U +LIA. Such a model of VSWform can be obtained using an o -theshelf SMT solver that supports the built-in theory of linear integer arithmetic and uninterpreted domains. More speci cally, given VSWform = F V ; C, the satis ability of S is determined by checking the satis ability of its U +LIA encoding: i.e., M j=U +LIA C ^ F . Decidability of this operation therefore depends on the decidability of j=U +LIA. If S is determined to be unsatis able, satCheck (S) returns ?. If S is determined to be satis able, it returns a decoded model TMU, which is an SC (LIA) model inferred from M. Since this operation is undecidable in general, it may not return a value at all for some formulas. The decode operator T U is de ned as follows: for integer variables x, TMU simply maps x to M(x). For a set variable X however, TMU(X) returns the set of all integers a such that ha; Xi 2 M( 2_). The membership relation of M is stripped away in TMU, i.e., TMU(2_ ) = ?.

Lemma 1 determines the soundness of the term encoding: for any set term s, we can extract a corresponding interpretation of s from its encoding. Soundness for the formula encoding is given by Theorem 2. It states that for some SC (LIA) formula S with the encoding VSWform =

F V ; C, if S is satis able under SC (LIA), then C ^ F is satis able under U +LIA.

Lemma 1 (Soundness of the Term Encoding). Let s be a SC (LIA) set term and VsWterm = X V ; C its encoding in U +LIA. For any S that contains an interpretation of s (i.e., SJsK), there exists M such that M j=U +LIA C and S = TMU.

Proof. The proof proceeds by structural induction on the encoding operation of the term s, with base cases X and f g t , we can infer that X contains all and only t . For the base case f g the values in t for the membership relation constraints (x 2_ X, such that x 2 t) imposed by the context formula of the encoding of ftg (i.e., SJsK = t). Hence, by de nition of T:U, we show that S = TMU. For the inductive case s1 op s2 and ft | T gx2_ s, we similarly show S = TMU by observing that their respective interpretations in S correspond to sets built from the membership relations (x 2_ X) captured by C.

Theorem 2 (Soundness of the Formula Encoding). Let S be a SC (LIA) formula S and VSWform = F V ; C its encoding in U +LIA. For any S such that S j=SC (LIA) S, there exists M such that M j=U +LIA C ^ F and S = TMU.

Proof. The proof proceeds by structural induction on the encoding operation of the formula S, with base cases t 2_ s, s1 =: s2 and s1 s2, whose proofs follow from the soundness of term encoding (Lemma 1).

Lemma 3 and Theorem 4 state the converse of Lemma 1 and Theorem 2 respectively, hence providing a completeness guarantee for our encoding.

TMU j=SC (LIA) X =: s.

Lemma 3 (Completeness of the Term Encoding). Let s be a SC (LIA) set term and VsWterm = X V ; C its encoding in U +LIA. For any M such that M j=U +LIA C, we have that Proof. This proof is similar to that of Lemma 1. It proceeds by structural induction on the encoding of s. The main di erence is that in each case, we show that interpretations of s obtained from S (i.e., S(X)) corresponds to that of obtained from TMU, hence TMU = S. Theorem 4 (Completeness of the Formula Encoding). Let S be a SC (LIA) formula and VSWform = F V ; C its encoding in U +LIA. For all M such that M j=U +LIA C ^ F , we have that TMU j=SC (LIA) S.

Proof. Like the proof of Theorem 2, this proof proceeds by structural induction on the encoding of formula S. The proof for the base cases follows from Lemma 3.

Corollary 1 states the soundness and completeness of the satis ability test operation satCheck , and it follows from properties 1, 2, 3 and 4.

Corollary 1 (Soundness and Completeness of satCheck ). For any SC (LIA) formula S, we have the following: satCheck (S) = S if and only if S j=SC (LIA) S.

satCheck (S) = ? if and only if 6j=SC (LIA) S. We implemented the above technique as a lightweight library with Z3 as the underlying SMT solver. This prototype is available for download at https://github.com/sllam/pysetcomp. Given an SC (Th) formula, where Th is some theory that the Z3 SMT solver supports, our Python library implements the encoding operation VSWform = F V ; C. Formulas F and C are native Z3 formulas, hence M j=U +Th F ^ C is implemented by simply passing F ^ C to the Z3 satis ability checking interfaces. Our prototype also includes a simple combinator library that provides the programmer with a convenient way to write SC (Th) formulas.

The code that implement our example in Section 1 is as follows: 1 ## Initialize sorts and variables 2 I = z3.IntSort() 3 IntSet = mkSetSort( I ) 4 x = z3.Int(`x') 5 X = z3.Const(`X', IntSet) 76 ## f10; 20; 30g =: f x 10 j x < 4 gx 2_ X 8 S = VSet(I,10,20,30) |Eq| Compre(x*10,x<4,x,X) 9 10 ## VSWform = F V ; C 11 F, C, V = transForm( S ) 12 13 ## M j=U+Th F ^ C 14 s = z3.Solver() 15 s.add(z3.And([F]+C)) 16 print s.check() -- Prints `sat'

Z3's built-in operations are explicitly prepended by z3. Lines 2{5 initializes the Z3 sorts (data domains) and variables that we need in this example: I abbreviates the Z3 integer sort, mkSetSort(I) returns a representation of the sort of sets of integer. In line 4, a Z3 integer variable x is declared, while in line 5 a Z3 variable of the sort of sets of integer is declared. Line 8 implements the actual SC (LIA) formula: VSet(I,10,20,30) builds the extensional set f10,20,30g, while Compre(x*10,x<4,x,X) builds the set c:omprehension fx 10 j x < 4gx2_ X . The in x operator Eq corresponds to the equality predicate =. Line 11 implements the encoding of S into U +LIA formulas F and C, with the set of local variables V created by the encoding procedure. Finally, lines 14{15 implements the satis ability test by feeding F ^ C to the Z3 satis ability checker.

Our prototype works on more than just integers. The following example involves tuples of integers.

X =: fh1; 8i; h2; 5i; h3; 2i; h3; 4i; h4; 8ig ^ Y =: fx 10 | x 3ghx;yi2_ X ^ x 10 2_ Y ^ hx ; yi 2_ X The corresponding code in our implementation is as follows: 1 IntPair = mkTupleSort( I, I ) 2 tup = IntPair.tup 3 pi1 = IntPair.pi1 4 5 PairSet = mkSetSort( IntPair ) 6 IntSet = mkSetSort( I ) 7 8 X = z3.Const('X', PairSet) In this example, the set comprehension Y maps a set containing pairs of integers, into a set of integers, thus requiring our system to handle many-sorted encodings. Lines 1{3 de ne a new tuple sort of pairs of integers IntPair, with tup as its data constructor and pi1 as a projection operator for the left tuple argument. Lines 5{6 declare new set sorts: PairSet is the sort of sets of integer pairs, while IntSet the sort of sets of integers. Lines 8{11 declare the variables of the respective sorts, while lines 13{16 de ne the actual formula. The in x binary relation In implements the membership relation 2_ . Encoding cs with the transForm operation and feeding its output to the Z3 solver yields a satis able result, where a satis able instance of x and y can be extracted.

The implementation of the In operator is non-trivial and requires explanation. A call to x |In| X creates an instance of a Z3 function, named mem, that is interpreted by our encoding as the binary relation x 2_ X in Z3: this function maps the pair x and X to a Z3 Boolean value. Note that the two instances of In at lines 15{16 are of di erent sort: At line 15 we have an instance with the sort Int*IntSet, while at line 16 we have IntPair*PairSet. Since Z3 does not support parametric data types, our implementation must therefore map each instance to a di erent Z3 function symbol (here mem IntSet and mem PairSet ) that are interpreted appropriately by our Python library. In order to provide a convenient interface that hides this mapping, we implement a lookup procedure inside the In operator that is akin to a simpli ed run-time version of type dictionary passing during type checking of Haskell type classes.

We have tested our implementation on a suite of set comprehension formulas of varying complexity. While most are rather small, in the future we intend to provide empirical results on more practical examples. 6

Conclusion and Future Work

In this paper, we reduced the satis ability problem for formulas featuring comprehension and other set operations over a standard theory (linear integer arithmetic in our examples) to solving satis ability constraints over this same theory augmented with a single uninterpreted sort. This technique allows the satis ability of set-based formulas to be veri ed by a wide range of o -theshelf SMT solvers that support the base theory. We have implemented a lightweight Python library that utilizes this encoding technique on the popular Z3 SMT solver. This implementation generalizes the encoding described here to a broad range of data types supported by Z3, which includes integer, real numbers, tuples, and nite lists.

In the future, we are interested in expanding our results to comprehensions over multisets. A commonly used representation for multisets relies on array maps, that associate each elements in the support domain to its multiplicity [ 5 ]. An adaptation of the technique presented in this paper that simply swaps our encoding of sets with this representation of multisets does not work however. To appreciate the added challenge, consider the task of verifying the following formula about multiset comprehensions:

M =

X1 = *x%3 | >+x2_ X2 where * + delimits multisets in the same way as f g delimited sets in the rest of the paper. In the suggested encoding, the variables X1 and X2 would be implemented as array maps: for every element x in the domain, X1[x] gives us the number of times that x occurs in X1 (its multiplicity), and similarly for X2. A naive adaptation of our technique suggests the following encoding for the above formula: (8x; m: (X2[x] = m ^ m > 0) ! X1[x%3] = m

8z; m: (X1[z] = m ^ m > 0) ! 9x:(z = x%3 ^ X2[x] = m) Although it would be adequate for injective operations on x (e.g., x + 3), this encoding fails for non-injective operations such as x%3. For instance, the formula M is satis ed for X1 = *0; 0; 0; 2+ and X2 = *3; 3; 6; 8+, yet the above encoding yields an unsatis able formula. The problem is that we have multiple elements in X2 that are mapped to the same element in X1. Since 3%3 = 0 and X2[ 3 ] = 2, the above encoding entails that X1[0] = 2. However, 6%3 = 0 also, which means that, given that X2[ 6 ] = 1, we have that X1[0] = 1 too | an absurdity. We would need to combine these results by adding the multiplicities of all x that map to the same value in X1: doing so for all x in X2 such that x%3 = 0 in this example would yield the expected result, X1[0] = 3. This has proved to be non-trivial in Z3. A possible approach to reasoning about such arithmetic aggregate operations could be adapted from [ 4 ].

[1]

De Moura and N. Bj rner. Z3: An E cient SMT Solver . In Proc. of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS'08/ETAPS'08 , pages 337 { 340 , Berlin, Heidelberg, 2008 . Springer-Verlag.

[2] Joseph

Halpern . Presburger Arithmetic with Unary Predicates is 11 Complete . Journal of Symbolic Logic , 56 :56{ 2 , 1991 .

[3] Edmund

S. L.

Lam and Iliano

Cervesato . Constraint Handling Rules with Multiset Comprehension Patterns . In 11th Workshop on Constraint Handling Rules , 2014 .

[4]

K. R. M.

Leino and

Monahan . Reasoning about Comprehensions with First-Order SMT Solvers . In In Proc. of the 2009 ACM symposium on Applied Computing , pages 615 { 622 . ACM, 2009 .

[5]

Piskac and

Kuncak . MUNCH | Automated Reasoner for Sets and Multisets . In IJCAR'10 , volume 6173 of Lecture Notes in Computer Science, pages 149 { 155 . Springer, 2010 .

[6]

Suter ,

Steiger , and

Kuncak . Sets with Cardinality Constraints in Satis ability Modulo Theories . In Veri cation, Model Checking, and Abstract Interpretation , pages 403 { 418 . Springer, 2011 .