<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Reasoning About Set Comprehensions</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edmund S. L. Lam</string-name>
          <email>sllam@qatar.cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Iliano Cervesato</string-name>
          <email>iliano@cmu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Carnegie Mellon University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Set comprehension is a mathematical notation for de ning sets on the basis of a property of their members. Although set comprehension is widely used in mathematics and some programming languages, direct support for reasoning about it is still not readily available in state-of-the-art SMT solvers. This paper presents a technique for translating formulas which express constraints involving set comprehensions into rst-order formulas that can be veri ed by o -the-shelf SMT solvers. More speci cally, we have developed a lightweight Python library that extends the popular Z3 SMT solver with the ability to reason about the satis ability of set comprehension patterns. This technique is general and can be deployed in a broad range of SMT solvers.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <sec id="sec-1-1">
        <title>In particular, the model S carries information on an acceptable value of the set variable X |</title>
        <p>in this example, X = f1; 2; 3g is the smallest solution, with X = f1; 2; 3; 4g being another one.</p>
        <p>Our goal is to reduce satis ability in SC (Th) to satis ability in Th extended with an
uninterpreted sort for sets and an uninterpreted binary predicate 2_ , which encodes set membership.</p>
        <sec id="sec-1-1-1">
          <title>To achieve this, we translate a set formula of interest S in SC (Th) into a formula VSW of this</title>
          <p>
            resulting theory, which we denote by U +Th (Th with U ninterpreted symbols). Speci cally,
each appearance of a set comprehension or of a standard set operation in S, is mapped to a
fresh variable together with formulas in the language of U +Th. These formulas ensure that
this variable is associated with the exact elements of the original set structure. Then, we
verify that S is satis able in SC (Th) by checking that VSW has a model M in U +Th, i.e., that
M j=U +Th VSW. For a wide range of formulas, this veri cation step can be carried out using a
state-of-the-art high-performance SMT solver such as Z3 [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. Finally, we lift M to a model S
of S, such that S j=SC (Th) S. Even though this operation is still undecidable in general [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ], it
is nonetheless decidable for many instances of SC (LIA) formulas and can be implemented as
an e ective satis ability test operation. For instance, our work in [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] uses a conservative
implementation of this test operation to determine satis ability of nite set comprehension formulas
that would guarantee safety of certain compiler optimizations.
          </p>
        </sec>
        <sec id="sec-1-1-2">
          <title>To illustrate this technique, here is the (slightly simpli ed) encoding VSW where S is the</title>
          <p>formula from our earlier example:</p>
          <p>VSW =
&gt;88z: z 2_ X2 $ z 2_ X3
&lt; : : :</p>
          <p>8y: y 2_ X2 $ (y = 10 _ y = 20 _ y = 30)
&gt;:8x: (x 10 2_ X3) $ (x 2_ X ^ x &lt; 4)
F1 : X2 = X3
F2 : X2 = f10; 20; 30g
F3 : X3 = fx 10 | x &lt; 4gx2_ X
The rst formula, F1, states that X2 and X3 are extensionally equal. The second formula,
F2, constrains the members of X2 to be the integers 10, 20 or 30. Hence X2 is a precise
representation of the extensional set f10; 20; 30g in U +LIA. The third formula, F3, restricts X3
so that, for each element x in X that is larger than 4, x 10 must be a member of X3. Hence
F3 constrains the behavior of X3 to that of fx 10 | x &lt; 4gx2_ X in U +LIA.</p>
          <p>The rest of the paper is organized as follows: Section 2 introduces our source and target
languages, SC (LIA) and U +LIA. Section 3 de nes the encoding of SC (LIA) formulas into
U +LIA formulas. Section 4 presents a conservative satis ability test operation for SC (LIA)
formulas. Section 5 describes our prototype implementation of this operation, a lightweight
Python library that extends Z3. Section 6 discusses our conclusions and future work.
2</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Notations, Term Languages and Models</title>
      <p>In this section, we introduce meta-notation that we will use throughout this paper and we
de ne the source and target term languages SC (LIA) and U +LIA.</p>
      <p>In discussions, we write o to denote nite sets of syntactic objects o, with ; for the empty
set. We write fo; og for the extension of set o with syntactic object o, omitting the brackets
when no ambiguity arises. Meta-level set membership is denoted as o 2 o. We write meta-level
set comprehension as fo : o j (o)g, which represents the set containing all objects o from o that
satisfy the condition (o). We write [o0=x]o for the simultaneous replacement within object o
of all occurrences of variable x with o0. When traversing binding constructs and quanti ers,
substitutions implicitly -rename variables to avoid capture. These are all meta-level notations,
not to be confused with the syntactic objects in our source language, which also features sets.</p>
      <p>Figure 1 de nes our source language SC (LIA) and the target language U +LIA. Terms in
SC (LIA) are either arithmetic expressions, written t, or set expressions, denoted s. Arithmetic
Variables x; X</p>
      <p>Values v
SC (LIA): Set Comprehensions over Linear Integer Arithmetic
Arithmetic Term
Arithmetic Formula
Set Term
Set Formula
t ::= x :j v j t op t
T ::= t = t j t &lt; t j :T j T ^ T
s ::= X j ftg j: ft | T gx2_ s j s [ s j s \ s j s n s</p>
      <p>S ::= t 2_ s j s = s j s s j :S j S ^ S
U +LIA: Linear Integer Arithmetic and Uninterpreted Sets
Arithmetic Term
Arithmetic Formula
Uninterpreted Set Term
Uninterpreted Set Formula
Formula
t ::= x :j v j t op t
T ::= t = t j t &lt; t
s ::= X</p>
      <p>S ::= t 2_ s</p>
      <p>F; C ::= S j T j :F j F ^ F j 9x:F j 8x:F
expressions correspond to integers and set expressions to sets of integers. An arithmetic
expression is either a base variable x, a number v in Z, or a binary arithmetic operation t op t
supported in the theory of linear integer arithmetic (e.g., + or ). A set expression is either a
set variable X, an extensional set ftg, the union, intersection or di erence between two sets, or
a set comprehension ft | T gx2_ s. The empty set fg is the special case of an extensional set ftg
where t is empty. In a set comprehension, we refer to t as the comprehension pattern, to T as
the comprehension guard, to x as the binding variable, and to s as the comprehension domain.
The scope of the binding variable x is t and T . Where useful for clarity, we explicitly annotate
this dependency by writing t and T as t(x) and T (x). Formulas T that appear within set
comprehensions are quanti er-free arithmetic formulas over the integer domain. Atomic arithmetic
formulas include equality, written t1 =: t2, and other standard predicates, for example t1 &lt; t2.
The arithmetic formulas T in comprehension guards combine them using the standard Boolean
connectives | we display a minimal set consisting of : and ^ in Figure 1 but will freely use the
full set of Boolean connectives in our examples. The set formulas of SC (LIA), denoted S, are
:
the Boolean combination of set membership t 2_ s, set equality s1 = s2 and the subset relation
s1 s2.</p>
      <p>The language of U +LIA resembles SC (LIA), with two major syntactic di erences: rst, it
dispenses with all set expressions except for set variables X; second, U +LIA formulas allow the
use of quanti ers over base variables.</p>
      <sec id="sec-2-1">
        <title>A model is a structure M that satis es the axioms of a theory Th and a formula F .</title>
        <p>This is denoted by M j=Th F . A model M contains mappings from the variables of F to
their respective instantiations that satisfy F in Th. The lookup operation M(x) denotes the
instantiated value that x is mapped to. We write Mx7!v for the model that maps x to v and is
otherwise identical to M. For each predicate symbol p that appears in Th or F , the model M
also contains mappings from p to the set of all valid instances of the predicate relation. The
lookup operation M(p) denotes the set of valid relation instances of p. For simplicity, we assume
predicates have the same arity throughout Th and F . Interpretations of the (sub)formula(s)
of F and terms t that appear in F are obtained from M by the operations MJF K and MJtK.
Unsatis ability of a formula F is denoted by 6j=Th F .</p>
      </sec>
      <sec id="sec-2-2">
        <title>A model in our source language SC (LIA) is denoted by S and its satis ability judgment is</title>
        <p>denoted by S j=SC (LIA) S where S is an SC (LIA) formula. We require that SC (LIA) contain
the theory of linear integer arithmetic and general set theory. A model S contains mappings for
integer variables x to integer values v and set variables X to extensional sets ftg. The sets of all
integers and sets that appear in S are denoted by domZ(S) and domS(S) respectively. We will
omit standard de nitions of integer term interpretation SJtK and integer formula satis ability
S j=SC (LIA) T . The interpretation SJsK of set terms s is de ned as follows:</p>
        <p>SJXK = S(X)
SJft; tgK = fSJtK; SJtKg</p>
        <p>SJfgK = ;
SJft(x) | T (x)gx2_ sK = fSx7!aJt(x)K : domZ(S) j a 2 SJsK and Sx7!a j=SC (LIA) T (x)g
SJs1 [ s2K = fa : domZ(S) j a 2 SJs1K or a 2 SJs2Kg
SJs1 \ s2K = fa : domZ(S) j a 2 SJs1K and a 2 SJs2Kg</p>
        <p>SJs1 n s2K = fa : domZ(S) j a 2 SJs1K and a 62 SJs2Kg
This de nition is inductive and makes use of SJtK and S j=SC (LIA) T . The interpretation of a set
comprehension SJft(x) | T (x)gx2_ sK is de ned as the set of all (and only) the interpretations
of the comprehension pattern t(x) with the binding variable x mapped to the integers a in
the interpretation of the comprehension domain s such that the corresponding instance of the
comprehension guard T (x) is satis able under S. The other cases are standard.</p>
        <p>Satis ability for set formulas, S j=SC (LIA) S, is de ned inductively over set formulas and on
top of SJtK, SJSK and S j=SC (LIA) T . The key cases of this de nition are as follows:
S j=SC (LIA) t 2_ s i</p>
        <p>:
S j=SC (LIA) s1 = s2 i
S j=SC (LIA) s1
s2 i
for all a 2 domZ(S);
SJtK 2 SJsK
for all a 2 domZ(S);
Sx7!a j=SC (LIA) x 2_ s1 i Sx7!a j=SC (LIA) x 2_ s2</p>
        <p>Sx7!a j=SC (LIA) x 2_ s1 only if Sx7!a j=SC (LIA) x 2_ s2
We omitted the cases for the logical connectives whose interpretations are standard. Note that
membership 2_ is an interpreted predicate symbol in U +LIA: t 2_ s is satis able in S if and only
if the interpretation of t is actually a member of the interpretation of s. Since 2_ is interpreted,
S does not contain any mappings for 2_ (i.e., S(2_ ) = ?). Satis ability of equality, s1 =: s2 is
de ned in terms of the membership relation: s1 and s2 are equal if and only if they contain
the same integers. In a similar manner, satis ability of the subset relation s1 s2 is de ned in
terms of membership 2_.</p>
        <p>Models of our target language U +LIA are denoted by M, and the satis ability judgment
is denoted by M j=U +LIA F . We require that U +LIA contains the theory of linear integer
arithmetic and an uninterpreted domain for sets. Unlike S, M contains no mappings for set
variables, since the set domain is uninterpreted. However, M( 2_) maps to the set of pairs
hMJtK; Xi for every valid relation t 2_ X from F . The de nition of MJtK and M j=U +LIA F are
standard and therefore omitted.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Encoding SC (LIA) into U +LIA</title>
      <p>In this section, we de ne an encoding of SC (LIA) terms and formulas into U +LIA terms and
fsoertmfourlmasu. laTheniscoendcinogdifnugnicstigoinvsenVbWyfotrwmo. translation functions, the set term encoding V Wset and</p>
      <p>X set = X
V W
8X
&gt;
VftgWset = &lt;
; ; &gt;</p>
      <p>fXg ; 8x: x 2_ X $ mem(t; x)
where mem(ft; tg; x) = x =: VtW _ mem(t; x)</p>
      <p>mem(;; x) = ?
Vs1 op s2Wset =
8X
&gt;
&gt;
&gt;
Vft | T gx2_ sWset = &lt;
&gt;
:
8X
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&lt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
&gt;
:
&gt;
&gt;
&gt;
:</p>
      <p>Vs1Wset = xs1</p>
      <p>V1; V2; X ; C1 ^ C2 ^ constr (op)
where</p>
      <sec id="sec-3-1">
        <title>The set term encoding of a set expression s is given by a triple (X; V; C) that we write</title>
        <p>VsWterm = X V ; C. Here, X is a fresh variable associated with s (unless s is already
a variable), V is a collection of uninterpreted set variables, and C is a U +LIA formula. We
refer to X as the representative variable of s, C as the extensional context formula and V as
the set variable signature. The context formula C constrains X to capture the extensional
behavior of s, while V contains all local set variables in C created while encoding s. We require
that V contain no duplicates and we rely on implicit -renaming to enforce this constraint.
This provides an implicit mechanism to guarantee unique assignments of set variables during
encoding.</p>
        <p>Figure 2 de nes the set term encoding operation. In the case of an extensional set t =
ft1; : : : ; tng, the extensional context formula 8x: x 2_ X $ (x =: t1 _ : : : _ x =: tn) constrains
representative variable X to contain all and only the elements that appear in t. We encode
all SC (LIA) terms ti in t by means of the encoding operation VtiW. For the element terms we
consider in this paper, it is su cient to de ne their encoding as the identity function. The
entry Vs1 op s2Wset covers the cases for set union, intersection and di erence. For each, we rst
derive encodings for s1 and s2 to obtain representative variables X1 and X2. The variable X
is assigned to represent the set s1 op s2. This is done with the help of the context formula
constr (op): If op is the union operator [, the context formula state that any member x of
X must be a member of either X1 or X2 and vice-versa. The cases where op is \ and n are
similar. Each of these context formulas constrains the extensional behavior X to that of the
union, intersection or set di erence of s1 and s2 respectively. Finally, Vft | T gx2_ sWset encodes
the set comprehension as follows: The comprehension domain s is encoded into X0 with the
constraints of s captured by Cdom . The comprehension ft | T gx2_ s is encoded into X with
its constraints captured by context formulas Cmax and Crg . The comprehension maximality
condition Cmax imposes the constraint that every member x of the comprehension domain X0</p>
        <p>VS2Wform = F2 V2 ; C2</p>
        <p>V:SWform = :F V ; C where VSWform = F
Figure 3: Set Formula Encoding into U +LIA: VSWform = F
that satis es the comprehension guard T has its corresponding comprehension pattern t as a
member of X. The comprehension range restriction Crg imposes the constraint that for every
member z of the set comprehension X, there exists some member x of the comprehension domain
X0 whose corresponding comprehension pattern t is equal to z and satis es the comprehension
guard T .</p>
        <p>While it is tempting to replace the context formulas Cmax and Crg with the single implication
formula Ccomp = 8x: (x 2_ X0 ^ T (x)) $ t(x) 2_ X (as done in our example in Section 1), Ccomp
would not accurately capture the behavior of set comprehensions with expressions t(x) that
are not injective functions. For instance, consider the comprehension s1 = fx%3 | &gt;gx2_ s2 ,
where % is the modulus operator. Assuming that X and X0 are the representatives of set
comprehension s1 and set term s2 respectively, the formula Ccomp = 8x: x 2_ X0 $ (x%3) 2_ X
is incorrect in the ` ' case: it demands any x such that x%3 is a member of X must be a
member of X0, which is clearly not the behavior of the set comprehension. The combination of
Cmax and Crg on the other hand, is sound as long as t(x) is a total function.</p>
        <p>Figure 3 de nes the encoding of a set formula S, denoted by VSWform = F V ; C. It
encodes a SC (LIA) formula S into an U +LIA formula F under the extensional context formula
C with the set variable signatures V. We call F the main encoding formula. Similarly to the
term encoding operation, the formula C speci es the constraints on uninterpreted set variables
amnedmVbeirssthhipe sreetlaotfioloncbayl sseitmvpalryiaubsliensgctrheaetteedrmbyetnhceodenincgodisngs.etTohfesc.aWseeVtth2e_nsWsifmorpmlyenchcoecdkesththaet
the encoding of t is in the representative variable X of VsVWsWet . For Vs1 =: s2Wform , we encode s1
and s2 by applying the term encoding operation of Figure 3 to each, obtaining X1 and X2 as
representatives of s1 and s2 respectively with extensional context C1 and C2. Following this,
the equality constraint is represented in U +LIA as the formula 8x: x 2_ X1 $ x 2_ X2, stating
that all members of X1 are members of X2 and vice-versa. Encoding set equality in this manner
exercises the axiom of extensionality of set theory, enforcing the extensional equality of sets with
respect to the membership relation ( 2_). The case Vs1 s2Wform is similar, except that it is
encoded into a one-sided extensional membership `equality' (!). The case VS1 ^S2Wform de nes
the cases for the logical connectives ^, during which we traverse the respective sub-formulas
and terms containing set terms and derive their encodings. The main formula is simply the ^
logical connective applied to the respective encoded formulas.</p>
        <p>satCheck (S) =
where
(</p>
        <p>TMU
?
TMU(x)
TMU(X)
TMU( 2_)
=
=
=
if VS form = F</p>
        <p>W
otherwise</p>
        <p>V ; C and M j=U +LIA C ^ F
M(x)
?</p>
        <p>?
(fa : domZ(M) j ha; Xi 2 M( 2_)g
if X 2= V
otherwise</p>
        <p>The encoding of negation requires some care, and is the reason for separating the main
encoding formula F and the extensional context formula C. Our de nition in Figure 3 encodes
the negated SC (LIA) formula :(f1g f1; 2g) as follows:</p>
        <p>V:(f1g
f1; 2g)Wform = :(8x: x 2_ X1 ! x 2_ X2)</p>
        <p>X1; X2 ; X1 = f1g ^ X2 = f1; 2g
Here, we abbreviated the actual encoding of the extensional sets as X1 = f1g and X2 = f1; 2g
for to be succinct. The conjunction of the main formula and context formula (i.e., :F ^ C)
concisely captures the negated formula :(f1g f1; 2g), namely:
:(f1g
f1; 2g)</p>
        <p>:(8x: x 2_ X1 ! x 2_ X2) ^ X1 = f1g ^ X2 = f1; 2g</p>
        <p>Assume we had instead collapsed F and C into a single formula component, so that
VSWform = F ^ C V. Then we would have to translate the above formula as
: :
:((8x: x 2_ X1 ! x 2_ X2) ^ X1 = f1g ^ X2 = f1; 2g)
X1; X2
which is not the intended meaning of the negation as, propagating the negation,
:(f1g
f1; 2g)
6</p>
        <p>:(8x: x 2_ X1 ! x 2_ X2) _ X1 6= f1g _ X2 6= f1; 2g
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Satis ability Testing for SC (LIA)</title>
      <p>In this section, we de ne a satis ability test operation of SC (LIA) formulas on the encoding
de ned in the previous section. This operation, denoted by satCheck (S), is de ned in Figure 4.
If the SC (LIA) set formula S is satis able, this operation extracts a model for S from a model
of its encoding VSWform in U +LIA. Such a model of VSWform can be obtained using an o
-theshelf SMT solver that supports the built-in theory of linear integer arithmetic and uninterpreted
domains. More speci cally, given VSWform = F V ; C, the satis ability of S is determined
by checking the satis ability of its U +LIA encoding: i.e., M j=U +LIA C ^ F . Decidability
of this operation therefore depends on the decidability of j=U +LIA. If S is determined to be
unsatis able, satCheck (S) returns ?. If S is determined to be satis able, it returns a decoded
model TMU, which is an SC (LIA) model inferred from M. Since this operation is undecidable
in general, it may not return a value at all for some formulas. The decode operator T U is
de ned as follows: for integer variables x, TMU simply maps x to M(x). For a set variable X
however, TMU(X) returns the set of all integers a such that ha; Xi 2 M( 2_). The membership
relation of M is stripped away in TMU, i.e., TMU(2_ ) = ?.</p>
      <p>Lemma 1 determines the soundness of the term encoding: for any set term s, we can extract
a corresponding interpretation of s from its encoding. Soundness for the formula encoding is
given by Theorem 2. It states that for some SC (LIA) formula S with the encoding VSWform =</p>
      <sec id="sec-4-1">
        <title>F V ; C, if S is satis able under SC (LIA), then C ^ F is satis able under U +LIA.</title>
        <p>Lemma 1 (Soundness of the Term Encoding). Let s be a SC (LIA) set term and VsWterm =
X V ; C its encoding in U +LIA. For any S that contains an interpretation of s (i.e., SJsK),
there exists M such that M j=U +LIA C and S = TMU.</p>
        <p>Proof. The proof proceeds by structural induction on the encoding operation of the term s,
with base cases X and f g t , we can infer that X contains all and only
t . For the base case f g
the values in t for the membership relation constraints (x 2_ X, such that x 2 t) imposed
by the context formula of the encoding of ftg (i.e., SJsK = t). Hence, by de nition of T:U,
we show that S = TMU. For the inductive case s1 op s2 and ft | T gx2_ s, we similarly show
S = TMU by observing that their respective interpretations in S correspond to sets built from
the membership relations (x 2_ X) captured by C.</p>
        <p>Theorem 2 (Soundness of the Formula Encoding). Let S be a SC (LIA) formula S and
VSWform = F V ; C its encoding in U +LIA. For any S such that S j=SC (LIA) S, there exists
M such that M j=U +LIA C ^ F and S = TMU.</p>
        <p>Proof. The proof proceeds by structural induction on the encoding operation of the formula S,
with base cases t 2_ s, s1 =: s2 and s1 s2, whose proofs follow from the soundness of term
encoding (Lemma 1).</p>
        <p>Lemma 3 and Theorem 4 state the converse of Lemma 1 and Theorem 2 respectively, hence
providing a completeness guarantee for our encoding.</p>
        <p>TMU j=SC (LIA) X =: s.</p>
        <p>Lemma 3 (Completeness of the Term Encoding). Let s be a SC (LIA) set term and VsWterm =
X V ; C its encoding in U +LIA. For any M such that M j=U +LIA C, we have that
Proof. This proof is similar to that of Lemma 1. It proceeds by structural induction on the
encoding of s. The main di erence is that in each case, we show that interpretations of s
obtained from S (i.e., S(X)) corresponds to that of obtained from TMU, hence TMU = S.
Theorem 4 (Completeness of the Formula Encoding). Let S be a SC (LIA) formula and
VSWform = F V ; C its encoding in U +LIA. For all M such that M j=U +LIA C ^ F ,
we have that TMU j=SC (LIA) S.</p>
        <p>Proof. Like the proof of Theorem 2, this proof proceeds by structural induction on the encoding
of formula S. The proof for the base cases follows from Lemma 3.</p>
        <p>Corollary 1 states the soundness and completeness of the satis ability test operation
satCheck , and it follows from properties 1, 2, 3 and 4.</p>
        <p>Corollary 1 (Soundness and Completeness of satCheck ). For any SC (LIA) formula S, we
have the following:
satCheck (S) = S if and only if S j=SC (LIA) S.</p>
        <p>satCheck (S) = ? if and only if 6j=SC (LIA) S.
We implemented the above technique as a lightweight library with Z3 as the underlying SMT
solver. This prototype is available for download at https://github.com/sllam/pysetcomp.
Given an SC (Th) formula, where Th is some theory that the Z3 SMT solver supports, our
Python library implements the encoding operation VSWform = F V ; C. Formulas F and C
are native Z3 formulas, hence M j=U +Th F ^ C is implemented by simply passing F ^ C to the
Z3 satis ability checking interfaces. Our prototype also includes a simple combinator library
that provides the programmer with a convenient way to write SC (Th) formulas.</p>
        <p>The code that implement our example in Section 1 is as follows:
1 ## Initialize sorts and variables
2 I = z3.IntSort()
3 IntSet = mkSetSort( I )
4 x = z3.Int(`x')
5 X = z3.Const(`X', IntSet)
76 ## f10; 20; 30g =: f x 10 j x &lt; 4 gx 2_ X
8 S = VSet(I,10,20,30) |Eq| Compre(x*10,x&lt;4,x,X)
9
10 ## VSWform = F V ; C
11 F, C, V = transForm( S )
12
13 ## M j=U+Th F ^ C
14 s = z3.Solver()
15 s.add(z3.And([F]+C))
16 print s.check() -- Prints `sat'</p>
        <p>Z3's built-in operations are explicitly prepended by z3. Lines 2{5 initializes the Z3 sorts
(data domains) and variables that we need in this example: I abbreviates the Z3 integer sort,
mkSetSort(I) returns a representation of the sort of sets of integer. In line 4, a Z3 integer
variable x is declared, while in line 5 a Z3 variable of the sort of sets of integer is declared.
Line 8 implements the actual SC (LIA) formula: VSet(I,10,20,30) builds the extensional set
f10,20,30g, while Compre(x*10,x&lt;4,x,X) builds the set c:omprehension fx 10 j x &lt; 4gx2_ X .
The in x operator Eq corresponds to the equality predicate =. Line 11 implements the encoding
of S into U +LIA formulas F and C, with the set of local variables V created by the encoding
procedure. Finally, lines 14{15 implements the satis ability test by feeding F ^ C to the Z3
satis ability checker.</p>
        <p>Our prototype works on more than just integers. The following example involves tuples of
integers.</p>
        <p>X =: fh1; 8i; h2; 5i; h3; 2i; h3; 4i; h4; 8ig ^ Y =: fx 10 | x
3ghx;yi2_ X ^ x 10 2_ Y ^ hx ; yi 2_ X
The corresponding code in our implementation is as follows:
1 IntPair = mkTupleSort( I, I )
2 tup = IntPair.tup
3 pi1 = IntPair.pi1
4
5 PairSet = mkSetSort( IntPair )
6 IntSet = mkSetSort( I )
7
8 X = z3.Const('X', PairSet)
In this example, the set comprehension Y maps a set containing pairs of integers, into a set of
integers, thus requiring our system to handle many-sorted encodings. Lines 1{3 de ne a new
tuple sort of pairs of integers IntPair, with tup as its data constructor and pi1 as a projection
operator for the left tuple argument. Lines 5{6 declare new set sorts: PairSet is the sort of
sets of integer pairs, while IntSet the sort of sets of integers. Lines 8{11 declare the variables
of the respective sorts, while lines 13{16 de ne the actual formula. The in x binary relation
In implements the membership relation 2_ . Encoding cs with the transForm operation and
feeding its output to the Z3 solver yields a satis able result, where a satis able instance of x
and y can be extracted.</p>
        <p>The implementation of the In operator is non-trivial and requires explanation. A call to
x |In| X creates an instance of a Z3 function, named mem, that is interpreted by our encoding
as the binary relation x 2_ X in Z3: this function maps the pair x and X to a Z3 Boolean value.
Note that the two instances of In at lines 15{16 are of di erent sort: At line 15 we have
an instance with the sort Int*IntSet, while at line 16 we have IntPair*PairSet. Since Z3
does not support parametric data types, our implementation must therefore map each instance
to a di erent Z3 function symbol (here mem IntSet and mem PairSet ) that are interpreted
appropriately by our Python library. In order to provide a convenient interface that hides this
mapping, we implement a lookup procedure inside the In operator that is akin to a simpli ed
run-time version of type dictionary passing during type checking of Haskell type classes.</p>
        <p>We have tested our implementation on a suite of set comprehension formulas of varying
complexity. While most are rather small, in the future we intend to provide empirical results
on more practical examples.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>In this paper, we reduced the satis ability problem for formulas featuring comprehension and
other set operations over a standard theory (linear integer arithmetic in our examples) to solving
satis ability constraints over this same theory augmented with a single uninterpreted sort. This
technique allows the satis ability of set-based formulas to be veri ed by a wide range of o
-theshelf SMT solvers that support the base theory. We have implemented a lightweight Python
library that utilizes this encoding technique on the popular Z3 SMT solver. This implementation
generalizes the encoding described here to a broad range of data types supported by Z3, which
includes integer, real numbers, tuples, and nite lists.</p>
      <p>
        In the future, we are interested in expanding our results to comprehensions over multisets. A
commonly used representation for multisets relies on array maps, that associate each elements
in the support domain to its multiplicity [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. An adaptation of the technique presented in this
paper that simply swaps our encoding of sets with this representation of multisets does not
work however. To appreciate the added challenge, consider the task of verifying the following
formula about multiset comprehensions:
      </p>
      <p>M
=</p>
      <p>:</p>
      <p>X1 = *x%3 | &gt;+x2_ X2
where * + delimits multisets in the same way as f g delimited sets in the rest of the paper.
In the suggested encoding, the variables X1 and X2 would be implemented as array maps: for
every element x in the domain, X1[x] gives us the number of times that x occurs in X1 (its
multiplicity), and similarly for X2. A naive adaptation of our technique suggests the following
encoding for the above formula:
(8x; m: (X2[x] = m ^ m &gt; 0) ! X1[x%3] = m</p>
      <p>
        8z; m: (X1[z] = m ^ m &gt; 0) ! 9x:(z = x%3 ^ X2[x] = m)
Although it would be adequate for injective operations on x (e.g., x + 3), this encoding fails
for non-injective operations such as x%3. For instance, the formula M is satis ed for X1 =
*0; 0; 0; 2+ and X2 = *3; 3; 6; 8+, yet the above encoding yields an unsatis able formula. The
problem is that we have multiple elements in X2 that are mapped to the same element in X1.
Since 3%3 = 0 and X2[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] = 2, the above encoding entails that X1[0] = 2. However, 6%3 = 0
also, which means that, given that X2[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] = 1, we have that X1[0] = 1 too | an absurdity.
We would need to combine these results by adding the multiplicities of all x that map to the
same value in X1: doing so for all x in X2 such that x%3 = 0 in this example would yield the
expected result, X1[0] = 3. This has proved to be non-trivial in Z3. A possible approach to
reasoning about such arithmetic aggregate operations could be adapted from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>De Moura</surname>
          </string-name>
          and
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Bj rner. Z3: An E cient SMT Solver</article-title>
          .
          <source>In Proc. of the Theory and Practice of Software, 14th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS'08/ETAPS'08</source>
          , pages
          <fpage>337</fpage>
          {
          <fpage>340</fpage>
          , Berlin, Heidelberg,
          <year>2008</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Joseph</surname>
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Halpern</surname>
          </string-name>
          .
          <article-title>Presburger Arithmetic with Unary Predicates is 11 Complete</article-title>
          .
          <source>Journal of Symbolic Logic</source>
          ,
          <volume>56</volume>
          :56{
          <fpage>2</fpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Edmund</surname>
            <given-names>S. L.</given-names>
          </string-name>
          <string-name>
            <surname>Lam</surname>
            and
            <given-names>Iliano</given-names>
          </string-name>
          <string-name>
            <surname>Cervesato</surname>
          </string-name>
          .
          <article-title>Constraint Handling Rules with Multiset Comprehension Patterns</article-title>
          .
          <source>In 11th Workshop on Constraint Handling Rules</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K. R. M.</given-names>
            <surname>Leino</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Monahan</surname>
          </string-name>
          .
          <article-title>Reasoning about Comprehensions with First-Order SMT Solvers</article-title>
          . In
          <source>In Proc. of the 2009 ACM symposium on Applied Computing</source>
          , pages
          <volume>615</volume>
          {
          <fpage>622</fpage>
          . ACM,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Piskac</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kuncak</surname>
          </string-name>
          . MUNCH |
          <article-title>Automated Reasoner for Sets and Multisets</article-title>
          .
          <source>In IJCAR'10</source>
          , volume
          <volume>6173</volume>
          of Lecture Notes in Computer Science, pages
          <volume>149</volume>
          {
          <fpage>155</fpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P.</given-names>
            <surname>Suter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Steiger</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Kuncak</surname>
          </string-name>
          .
          <article-title>Sets with Cardinality Constraints in Satis ability Modulo Theories</article-title>
          . In Veri cation,
          <source>Model Checking, and Abstract Interpretation</source>
          , pages
          <volume>403</volume>
          {
          <fpage>418</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>