An SMT Theory for n-Indexed Sequences Hichem Rami Ait El Hara1,2 , FranΓ§ois Bobot2 and Guillaume Bury1 1 OCamlPro, Paris, France 2 UniversitΓ© Paris-Saclay, CEA, List, F-91120, Palaiseau, France Abstract The SMT (Satisfiability Modulo Theories) theory of arrays is well-established and widely used, with various decision procedures and extensions developed for it. However, recent works suggest that developing tailored reasoning for some theories, such as sequences and strings, is more efficient than reasoning over them through axiomatization over the theory of arrays. In this paper, we are interested in reasoning over n-indexed sequences as they are found in some programming languages, such as Ada. We propose an SMT theory of n-indexed sequences and explore different ways to represent and reason over n-indexed sequences using existing theories, as well as tailored calculi for the theory. 1. Introduction In the SMT theory of sequences, sequences are viewed as a generalization of strings to non-character elements, with possibly infinite alphabets. Sequences are dynamically sized, and their theory has a rich signature. It allows selecting elements of a sequence by their index, concatenating two sequences, extracting sub-sequences, and performing other operations. The expressiveness of the theory of sequences makes it easier to represent various commonly found data structures in programming languages, such as arrays in the C language, lists in Python, etc. The theory of arrays is less expressive as it only supports selecting and storing one value at one index at a time, and arrays have fixed sizes determined by the number of inhabitants of the sort of indices. In contrast, sequences have dynamic lengths and operations allowing the selection and updating of sets of indices at a time. To use the theory of arrays to represent sequences, one would need to extend it and axiomatize the necessary properties, such as dynamic length and additional operations like concatenation and extraction. We are interested in a variant of the theory of sequences, which we call the theory of n-indexed sequences. They differ from sequences mainly in their indexing, as they are not necessarily 0-indexed but n-indexed, as their name suggests. This means they are defined as ordered collections of values of the same sort indexed from a first index 𝑛 to a last index π‘š. Such sequences are present in some programming languages like Ada. Since there is no dedicated theory for such sequences, reasoning over them cannot be done straightforwardly using the existing theories of arrays and sequences. It is therefore necessary to use extensions and axiomatizations to reason over them. In this paper, we will present the theory of n-indexed sequences, its signature and semantics, as well as different ways to reason over it using existing theories and by adapting calculi from the theory of sequences to the theory of n-indexed sequences. Related work: The SMT theory of sequences was introduced by Bjorner et al. [1]. Several contributions explored this theory, its syntax and semantics [2], and its decidability [3, 4]. Our theory of n-indexed sequences and the calculi we developed are based on the contribution by Sheng et al. [5], which in turn is based on reasoning about the theories of strings [6, 7] and arrays [8]. SMT 2024: 22nd International Workshop on Satisfiability Modulo Theories $ hra687261@gmail.com (H. R. Ait El Hara); francois.bobot@ocamlpro.com (F. Bobot); gbury@gmail.com (G. Bury) Β€ https://hra687261.github.io/ (H. R. Ait El Hara); https://gbury.eu/ (G. Bury)  0000-0001-7909-0413 (H. R. Ait El Hara); 0000-0002-6756-0788 (F. Bobot); 0009-0002-1267-251X (G. Bury) Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Other contributions have extended the theory of arrays with properties that are present in sequences, such as length [9, 10, 11] and a concatenation function [12]. 2. Notation We refer to the theory of n-indexed sequences as the theory of n-sequences or the NSeq theory, and n-indexed sequence terms will be referred to as n-sequences, n-sequence terms, or NSeq terms. Their sort will be denoted as NSeq E, where E is the sort of the elements of the n-sequence. We will refer to the theory of sequences as the Seq theory. Int is the sort of integers from the theory of Linear Integer Arithmetic. min and max are the usual mathematical functions. ite is the ite SMT-LIB function; it takes a boolean expression and two expressions of the same sort and returns the first one if the boolean expression is true and the second otherwise. The let π‘₯ = 𝑣, 𝑦 symbol binds a variable π‘₯ to a value 𝑣 in a term 𝑦. In the remainder of the paper, we will use 𝑠, 𝑠𝑛 , 𝑀, 𝑀𝑛 , 𝑦1 , 𝑦2 , 𝑧1 , and 𝑧2 to represent NSeq terms, with 𝑛 being a positive integer. π‘˜, π‘˜1 , π‘˜2 , and π‘˜3 will represent fresh NSeq term variables. 𝑖 and 𝑗 will be used as integer index terms, and 𝑒 and 𝑣 will be used as NSeq element terms. We assume that all the terms we use are well-sorted. 3. The Theory of n-Indexed Sequences Table 1 The signature of the theory of n-indexed sequences SMT-LIB symbol Sort nseq.get NSeq E β†’ Int β†’ E nseq.set NSeq E β†’ Int β†’ E β†’ NSeq E nseq.first NSeq E β†’ Int nseq.last NSeq E β†’ Int nseq.const Int β†’ Int β†’ E β†’ NSeq E nseq.relocate NSeq E β†’ Int β†’ NSeq E nseq.concat NSeq E β†’ NSeq E β†’ NSeq E nseq.slice NSeq E β†’ Int β†’ Int β†’ NSeq E nseq.update NSeq E β†’ NSeq E β†’ NSeq E We present in this section the theory of n-indexed sequences. The signature of the NSeq theory is presented in the table 1. In the remainder of the paper, when referring to the symbols of the theory, the prefix "nseq." of the symbols will be omitted. The following list describes the semantics of each symbol of the theory: β€’ f 𝑠 : the first index of 𝑠. β€’ l𝑠 : the last index of 𝑠. β€’ get(𝑠, 𝑖): if the index 𝑖 is within the bounds of 𝑠, returns the element associated with 𝑖 in 𝑠; otherwise, returns an uninterpreted value. An uninterpreted value is one which is not constrained and can be any value of the right sort. β€’ set(𝑠, 𝑖, 𝑣): if 𝑖 is within the bounds of 𝑠, creates a new n-sequence in which 𝑖 is associated with 𝑣, and the other indices within the bounds are associated with the same values as the corresponding indices in 𝑠; otherwise, returns 𝑠. β€’ const(𝑓, 𝑙, 𝑣): creates an n-sequence with 𝑓 as the first index and 𝑙 as the last, and if it is not empty, all the indices within its bounds are associated with the value 𝑣. β€’ relocate(π‘Ž, 𝑓 ): given an n-sequence π‘Ž and an index 𝑓 , returns a new n-sequence 𝑏 which has 𝑓 as its first index and (𝑓 + lπ‘Ž βˆ’ f π‘Ž ) as its last index, and associates with each index 𝑖 within the bounds of 𝑏 the same value associated with the index (𝑖 βˆ’ f 𝑏 + f π‘Ž ) in π‘Ž. β€’ concat(π‘Ž, 𝑏): if π‘Ž is empty, returns 𝑏; if 𝑏 is empty, returns π‘Ž; if f 𝑏 = lπ‘Ž +1, returns a new n- sequence in which the first index is f π‘Ž , the last index is l𝑏 , and all the indices within the bounds of π‘Ž are associated with the same values as the corresponding indices in π‘Ž, as well as the indices within the bounds of 𝑏 are associated with the same values as the corresponding indices in 𝑏; if f 𝑏 ΜΈ= lπ‘Ž +1, returns π‘Ž. β€’ slice(π‘Ž, 𝑓, 𝑙): if f π‘Ž ≀ 𝑓 ≀ 𝑙 ≀ lπ‘Ž , returns a new n-sequence for which the first index is 𝑓 , the last index is 𝑙, and all the indices between 𝑓 and 𝑙 are associated with the same values as the corresponding indices in π‘Ž; otherwise, returns π‘Ž. β€’ update(π‘Ž, 𝑏): if π‘Ž is empty, or 𝑏 is empty, or the property f π‘Ž ≀ f 𝑏 ≀ l𝑏 ≀ lπ‘Ž doesn’t hold, returns π‘Ž; otherwise, returns a new n-sequence 𝑐 such that f 𝑐 = f π‘Ž and l𝑐 = lπ‘Ž , and for all the indices within the bounds of 𝑐, if they are within the bounds of 𝑏, then they are associated with the same values as they are in 𝑏, otherwise they are associated with the values to which they are associated in π‘Ž. Definition 1 (Bounds). The bounds of an n-sequence 𝑠 are its first and last index, which are respectively denoted as f 𝑠 and l𝑠 , and which correspond to the values returned by the functions nseq.first(𝑠) and nseq.last(𝑠) respectively. An index 𝑖 is said to be within the bounds of an n-sequence 𝑠 if: f 𝑠 ≀ 𝑖 ≀ l𝑠 Definition 2 (Extensionality). The theory of n-indexed sequences is extensional, which means that n-sequences that contain the same elements are equal. Therefore, given two n-sequences π‘Ž and 𝑏: f π‘Ž = f 𝑏 ∧ lπ‘Ž = l𝑏 ∧ (βˆ€π‘– : Int, f π‘Ž ≀ 𝑖 ≀ lπ‘Ž β†’ get(π‘Ž, 𝑖) = get(𝑏, 𝑖)) β†’π‘Ž=𝑏 Definition 3 (Empty n-sequence). An n-sequence 𝑠 is said to be empty if l𝑠 < f 𝑠 . Two empty n- sequences π‘Ž and 𝑏 are equal if f π‘Ž = f 𝑏 and lπ‘Ž = l𝑏 ; otherwise, they are distinct. 4. Reasoning with existing theories One way to reason over the NSeq theory is by using the theory of arrays. It is done by extending it with the symbols of the NSeq theory and adding the right axioms that follow the semantics of the corresponding symbols in the NSeq theory. Another way is to use the theory of sequences and the theory of algebraic data types. It consists in defining n-sequences as a pair of a sequence and the first index (the offset to zero): (declare-datatype NSeq (par (T) ((nseq.mk (nseq.first Int) (nseq.seq (Seq T)))))) The other symbols of the NSeq theory can also be defined using the NSeq data type defined above, for example: (define-fun nseq.last (par (T) ((s (NSeq T))) Int (+ (- (seq.len (nseq.seq s)) 1) (nseq.first s)))) (define-fun nseq.get (par (T) ((s (NSeq T)) (i Int)) T (seq.nth (nseq.seq s) (- i (nseq.first s))))) (define-fun nseq.set (par (T) ((s (NSeq T)) (i Int) (v T)) (NSeq T) (nseq.mk (nseq.first s) (seq.update (nseq.seq s) (- i (nseq.first s)) (seq.unit v))))) Except for the const function which needs to be axiomatized: (declare-fun nseq.const (par (T) (Int Int T) (NSeq T))) ;; "nseq_const" (assert (par (T) (forall ((f Int) (l Int) (v T)) (! (let ((s (nseq.const f l v))) (and (= (nseq.first s) f) (= (nseq.last s) l) (forall ((i Int)) (=> (and (<= f i) (<= i l)) (= (nseq.get s i) v))))) :pattern ((nseq.const f l v)))))) The full NSeq theory, defined using the Seq theory and Algebraic Data Types, is attached in Appendix A.1. Although this approach allows us to reason over n-indexed sequences, it is not ideal to depend on two theories to do so. Additionally, the differences in semantics between the update and 𝑠𝑙𝑖𝑐𝑒 functions of the NSeq theory, and the seq.update and seq.extract functions of the Seq theory, make the definitions relatively complex. 5. Porting Calculi from the Seq Theory to the NSeq Theory To develop our calculi over the NSeq theory, we based our work on the calculi developed by Sheng et al. [5] on the Seq theory, where two calculi were proposed. The first is called the BASE calculus, based on a string theory calculus that reduces functions selecting and storing one element at an index to concatenations of sequences. The second is called the EXT calculus, and it handles these functions using array-like reasoning. Our versions of these calculi are referred to as NS-BASE and NS-EXT, respectively. The NSeq theory differs from the Seq theory in both syntax and semantics of many symbols: β€’ const and relocate do not appear in the Seq theory, while seq.empty, seq.unit, and seq.len do not appear in the NSeq theory. β€’ The seq.nth function corresponds to the get function in the NSeq theory. β€’ seq.update from the Seq theory with a value as a third argument corresponds to set in the NSeq theory, while seq.update with a sequence as a third argument corresponds to update in the NSeq theory, which takes only two NSeq terms as arguments. β€’ seq.extract in the Seq theory takes a sequence, an offset, and a length, and corresponds to slice in the NSeq theory, which takes an n-sequence, a first index, and a last index. β€’ The concatenation function (seq.++) in Seq is n-ary, and it corresponds to concat in the NSeq theory, which is binary. Therefore, we needed to make substantial changes to the Seq theory calculi to adapt them to the NSeq theory. In this section, we present the resulting calculi. We assume that we are in a theory combination framework where reasoning over the LIA (Linear Integer Arithmetic) theory is supported, and where unsatisfiability in one of the theories implies unsatisfiability of the entire reasoning. We will only present the rules that handle the symbols of the NSeq theory. 5.1. Common calculus Definition 4 (Equivalence modulo relocation). Given two NSeq terms 𝑠1 and 𝑠2 , the terms are said to be equivalent modulo relocation, denoted with the equivalence relation 𝑠1 =π‘Ÿπ‘’π‘™π‘œπ‘ 𝑠2 , such that: 𝑠1 =π‘Ÿπ‘’π‘™π‘œπ‘ 𝑠2 ≑ l𝑠2 = l𝑠1 βˆ’ f 𝑠1 + f 𝑠2 βˆ§βˆ€π‘– : 𝐼𝑛𝑑, f 𝑠1 ≀ 𝑖 ≀ l𝑠1 β‡’ 𝑔𝑒𝑑(𝑠1 , 𝑖) = 𝑔𝑒𝑑(𝑠2 , 𝑖 βˆ’ f 𝑠1 + f 𝑠2 ) Equivalence modulo relocation represents equivalence between n-sequences relative to their starting index. Two n-sequences are equivalent modulo relocation if they have the same set of elements in the same order, but start at different indices. Definition 5 (NSeq term normal form). For simplicity and consistency with Seq theory calculi, we introduce the concatenation operator :: with the invariant: 𝑠 = 𝑠1 :: 𝑠2 =β‡’ f 𝑠 = f 𝑠1 ∧ l𝑠 = l𝑠2 ∧ f 𝑠2 = l𝑠1 +1 This operator is used to normalize NSeq terms. It differs from concat by not having to check the condition that f 𝑠2 = l𝑠1 +1 before concatenation, as it is ensured by the invariant. Assumption 1. We assume that the following simplification rewrites are applied whenever possible: 𝑠1 :: 𝑠2 β†’ 𝑠1 when l𝑠2 < f 𝑠2 (1) 𝑠1 :: 𝑠2 β†’ 𝑠2 when l𝑠1 < f 𝑠1 (2) 𝑠1 :: 𝑠2 β†’ 𝑠1 :: 𝑀1 :: ... :: 𝑀𝑛 when 𝑠2 = 𝑀1 :: ... :: 𝑀𝑛 (3) 𝑠1 :: 𝑠2 β†’ 𝑀1 :: ... :: 𝑀𝑛 :: 𝑠2 when 𝑠1 = 𝑀1 :: ... :: 𝑀𝑛 (4) (1) and (2) remove empty NSeq terms from normal form. (3) and (4) make sure that when an NSeq term appear in the normal of another NSeq term and has its own normal form, then it is replaced by its normal form. Figure 1 illustrates a set of common rules shared between the two calculi NS-BASE and NS-EXT. The rules Const-Bounds and Reloc-Bounds propagate the bounds of constant and relocated n-sequences, which are created using the const and relocate functions, respectively. The rules NS-Slice, NS-Concat, and NS-Update handle slice, concat, and update by normalizing the NSeq terms under appropriate conditions. If an NSeq term has two normal forms where distinct terms begin at the same index but end at different indices, the NS-Split rule rewrites the longer term as a concatenation of the shorter one and a fresh variable. The NS-Comp-Reloc rule propagates concatenations over equivalence modulo relocation. 5.2. The base calculus The base calculus comprises the rules in figures 1 and 2. The rules R-Get and R-Set handle the 𝑔𝑒𝑑 and 𝑠𝑒𝑑 operations by introducing a new normal form for the NSeq terms they operate on. In the R-Get rule, when 𝑖 is within the bounds of 𝑠, a new normal form of 𝑠 is introduced. It includes a constant NSeq term of size one at the 𝑖th position storing the value 𝑣, and two variables, π‘˜1 and π‘˜2 , to represent the left and right segments of NSeq term 𝑠 respectively. The R-Set rule operates similarly: when 𝑖 is within the bounds of 𝑠2 , new normal forms are introduced for both 𝑠1 and 𝑠2 . These forms share two variables, π‘˜1 and π‘˜3 , representing segments on the left and right of the 𝑖th index. 𝑠1 has a constant NSeq term of size one holding the value 𝑣 at the 𝑖th index, while 𝑠2 introduces another variable, π‘˜2 , of the same size and position. 𝑠 = const(𝑓, 𝑙, 𝑣) 𝑠1 = π‘Ÿπ‘’π‘™π‘œπ‘π‘Žπ‘‘π‘’(𝑠2 , 𝑖) Const-Bounds Reloc-Bounds f 𝑠 = 𝑓 ∧ l𝑠 = 𝑙 𝑖 = f 𝑠2 βˆ§π‘ 1 = 𝑠2 || 𝑖 ΜΈ= f 𝑠2 ∧ f 𝑠1 = 𝑖 ∧ l𝑠1 = 𝑖 + l𝑠2 βˆ’ f 𝑠2 βˆ§π‘ 1 =π‘Ÿπ‘’π‘™π‘œπ‘ 𝑠2 𝑠1 = 𝑠𝑙𝑖𝑐𝑒(𝑠, 𝑓, 𝑙) NS-Slice (𝑓 < f 𝑠 βˆ¨π‘™ < 𝑓 ∨ l𝑠 < 𝑙) ∧ 𝑠1 = 𝑠 || f 𝑠 ≀ 𝑓 ≀ 𝑙 ≀ l𝑠 βˆ§π‘  = π‘˜1 :: 𝑠1 :: π‘˜2 𝑠 = π‘π‘œπ‘›π‘π‘Žπ‘‘(𝑠1 , 𝑠2 ) NS-Concat l𝑠1 < f 𝑠1 βˆ§π‘  = 𝑠2 || (l𝑠2 < f 𝑠2 ∨ l𝑠1 +1 ΜΈ= f 𝑠2 ) ∧ 𝑠 = 𝑠1 || f 𝑠1 ≀ l𝑠1 ∧ f 𝑠2 ≀ l𝑠2 ∧ f 𝑠2 = l𝑠1 +1 ∧ 𝑠 = 𝑠1 :: 𝑠2 𝑠1 = π‘’π‘π‘‘π‘Žπ‘‘π‘’(𝑠2 , 𝑠) NS-Update (l𝑠 < f 𝑠 βˆ¨π‘“π‘  < f 𝑠2 ∨ l𝑠2 < l𝑠 ) ∧ 𝑠1 = 𝑠2 || f 𝑠2 ≀ f 𝑠 ≀ l𝑠 ≀ l𝑠2 βˆ§π‘ 1 = π‘˜1 :: 𝑠 :: π‘˜3 ∧ 𝑠2 = π‘˜1 :: π‘˜2 :: π‘˜3 𝑠 = 𝑀 :: 𝑦1 :: 𝑧1 𝑠 = 𝑀 :: 𝑦2 :: 𝑧2 NS-Split l𝑦1 = l𝑦2 βˆ§π‘¦1 = 𝑦2 || l𝑦1 > l𝑦2 βˆ§π‘¦1 = 𝑦2 :: π‘˜ ∧ f π‘˜ = l𝑦2 +1 ∧ lπ‘˜ = l𝑦1 || l𝑦1 < l𝑦2 βˆ§π‘¦2 = 𝑦1 :: π‘˜ ∧ f π‘˜ = l𝑦1 +1 ∧ lπ‘˜ = l𝑦2 𝑠1 = π‘˜1 :: π‘˜2 :: ... :: π‘˜π‘› 𝑠1 =π‘Ÿπ‘’π‘™π‘œπ‘ 𝑠2 NS-Comp-Reloc f 𝑠1 = f 𝑠2 βˆ§π‘ 1 = 𝑠2 || 𝑠2 = π‘Ÿπ‘’π‘™π‘œπ‘π‘Žπ‘‘π‘’(π‘˜1 , f 𝑠2 ) :: π‘Ÿπ‘’π‘™π‘œπ‘π‘Žπ‘‘π‘’(π‘˜2 , f π‘˜2 βˆ’ f 𝑠1 + f 𝑠2 ) :: ... :: π‘Ÿπ‘’π‘™π‘œπ‘π‘Žπ‘‘π‘’(π‘˜π‘› , f π‘˜π‘› βˆ’ f 𝑠1 + f 𝑠2 ) Figure 1: Common inference rules for the NS-BASE and NS-EXT calculi 𝑣 = 𝑔𝑒𝑑(𝑠, 𝑖) R-Get 𝑖 < f 𝑠 ∨ l𝑠 < 𝑖 || f 𝑠 ≀ 𝑖 ≀ l𝑠 βˆ§π‘  = π‘˜1 :: π‘π‘œπ‘›π‘ π‘‘(𝑖, 𝑖, 𝑣) :: π‘˜2 𝑠1 = 𝑠𝑒𝑑(𝑠2 , 𝑖, 𝑣) R-Set (𝑖 < f 𝑠2 ∨ l𝑠2 < 𝑖) ∧ 𝑠1 = 𝑠2 || f 𝑠2 ≀ 𝑖 ≀ l𝑠2 ∧ f 𝑠1 = f 𝑠2 ∧ l𝑠1 = l𝑠2 𝑠1 = π‘˜1 :: π‘π‘œπ‘›π‘ π‘‘(𝑖, 𝑖, 𝑣) :: π‘˜3 ∧ 𝑠2 = π‘˜1 :: π‘˜2 :: π‘˜3 Figure 2: NS-BASE specific inference rules 5.3. The extended calculus The extended calculus consists of the rules in figures 1 and 3. It differs from the base calculus by handling the get and set functions similarly to how they are treated in the array decision procedure described in [8]. The Get-Intro rule introduces a get operation from a set operation. The Get-Set operation is equivalent to what is commonly referred to as the read-over-write or select-over-store rule in the Array theory, allowing the application of a get operation over a set operation. The Set-Bound rule ensures that a set operation is performed within the bounds of the target NSeq term, or that the resulting NSeq term is equivalent to the one it was applied on. The Get-Concat, Set-Concat, and Set-Concat-Inv rules illustrate how get and set operations are handled when applied to an NSeq term in normal form, where 𝑣 = 𝑔𝑒𝑑(𝑠, 𝑖) 𝑠 = 𝑀1 :: ... :: 𝑀𝑛 Get-Concat 𝑖 < f 𝑠 ∨ l𝑠 < 𝑖 || f 𝑀1 ≀ 𝑖 ≀ l𝑀1 βˆ§π‘”π‘’π‘‘(𝑀1 , 𝑖) = 𝑣 || ... || f 𝑀𝑛 ≀ 𝑖 ≀ l𝑀𝑛 βˆ§π‘”π‘’π‘‘(𝑀𝑛 , 𝑖) = 𝑣 𝑠1 = 𝑠𝑒𝑑(𝑠2 , 𝑖, 𝑣) 𝑠2 = 𝑀1 :: ... :: 𝑀𝑛 Set-Concat 𝑖 < f 𝑠2 ∨ l𝑠2 < 𝑖 || 𝑠1 = π‘˜1 :: ... :: π‘˜π‘› ∧ f 𝑀1 ≀ 𝑖 ≀ l𝑀1 βˆ§π‘˜1 = 𝑠𝑒𝑑(𝑀1 , 𝑖, 𝑣)∧ f π‘˜1 = f 𝑀1 ∧ lπ‘˜1 = l𝑀1 ∧... ∧ f π‘˜π‘› = f 𝑀𝑛 ∧ lπ‘˜π‘› = l𝑀𝑛 || ... || 𝑠1 = π‘˜1 :: ... :: π‘˜π‘› ∧ f 𝑀𝑛 ≀ 𝑖 ≀ l𝑀𝑛 βˆ§π‘˜π‘› = 𝑠𝑒𝑑(𝑀𝑛 , 𝑖, 𝑣)∧ f π‘˜1 = f 𝑀1 ∧ lπ‘˜1 = l𝑀1 ∧... ∧ f π‘˜π‘› = f 𝑀𝑛 ∧ lπ‘˜π‘› = l𝑀𝑛 𝑠1 = 𝑠𝑒𝑑(𝑠2 , 𝑖, 𝑣) 𝑠1 = 𝑀1 :: ... :: 𝑀𝑛 Set-Concat-Inv 𝑖 < f 𝑠2 ∨ l𝑠2 < 𝑖 || 𝑠2 = π‘˜1 :: ... :: π‘˜π‘› ∧ f 𝑀1 ≀ 𝑖 ≀ l𝑀1 βˆ§π‘€1 = 𝑠𝑒𝑑(π‘˜1 , 𝑖, 𝑣)∧ f π‘˜1 = f 𝑀1 ∧ lπ‘˜1 = l𝑀1 ∧... ∧ f π‘˜π‘› = f 𝑀𝑛 ∧ lπ‘˜π‘› = l𝑀𝑛 || ... || 𝑠2 = π‘˜1 :: ... :: π‘˜π‘› ∧ f 𝑀𝑛 ≀ 𝑖 ≀ l𝑀𝑛 βˆ§π‘€π‘› = 𝑠𝑒𝑑(π‘˜π‘› , 𝑖, 𝑣)∧ f π‘˜1 = f 𝑀1 ∧ lπ‘˜1 = l𝑀1 ∧... ∧ f π‘˜π‘› = f 𝑀𝑛 ∧ lπ‘˜π‘› = l𝑀𝑛 𝑠 = π‘π‘œπ‘›π‘ π‘‘(𝑓, 𝑙, 𝑣) 𝑒 = 𝑔𝑒𝑑(𝑠, 𝑖) Get-Const 𝑖 < f 𝑠 ∨ l𝑠 < 𝑖 || f 𝑠 ≀ 𝑖 ≀ l𝑠 βˆ§π‘’ = 𝑣 𝑠1 = 𝑠𝑒𝑑(𝑠2 , 𝑖, 𝑣) Get-Intro 𝑖 < f 𝑠1 ∨ l𝑠1 < 𝑖 || f 𝑠1 ≀ 𝑖 ≀ l𝑠1 βˆ§π‘£ = 𝑔𝑒𝑑(𝑠1 , 𝑖) 𝑠1 = 𝑠𝑒𝑑(𝑠2 , 𝑖, 𝑣) 𝑒 = 𝑔𝑒𝑑(𝑠1 , 𝑗) Get-Set 𝑖 < f 𝑠1 ∨ l𝑠1 < 𝑖 || 𝑖 = 𝑗 ∧ f 𝑠1 ≀ 𝑖 ≀ l𝑠1 βˆ§π‘’ = 𝑣 || 𝑖 ΜΈ= 𝑗 ∧ f 𝑠1 ≀ 𝑖 ≀ l𝑠1 βˆ§π‘’ = 𝑔𝑒𝑑(𝑠2 , 𝑖) 𝑠1 = 𝑠𝑒𝑑(𝑠2 , 𝑖, 𝑣) Set-Bound 𝑠1 = 𝑠2 || f 𝑠1 ≀ 𝑖 ≀ l𝑠1 βˆ§π‘”π‘’π‘‘(𝑠2 , 𝑖) ΜΈ= 𝑣 𝑣 = 𝑔𝑒𝑑(𝑠1 , 𝑖) 𝑠1 =π‘Ÿπ‘’π‘™π‘œπ‘ 𝑠2 Get-Reloc 𝑖 < 𝑓𝑠 ∨ 𝑙𝑠 < 𝑖 || 𝑓𝑠 ≀ 𝑖 ≀ 𝑙𝑠 ∧ 𝑣 = 𝑔𝑒𝑑(𝑠2 , 𝑖 βˆ’ f 𝑠1 βˆ’ f 𝑠2 ) Figure 3: NS-EXT specific inference rules the operations affect the right component of the concatenation within its bounds. The Get-Const rule addresses the special case where a get operation is applied to a constant NSeq term. The Get-Reloc rule facilitates the propagation of constraints on index-associated values in NSeq terms from one term to others that are equivalent modulo relocation. 6. Implementation We have implemented a prototype of the described calculi in the Colibri2 CP (Constraint Programming) solver. In this section, we discuss some of the implementation choices we made. The rewriting rules described in Assumption 1 are applied whenever applicable using a callback system. When the conditions are satisfied, the corresponding rewriting rule is triggered. Equivalence modulo relocation is managed using a disjoint-set (union-find) data structure. In this data structure, the elements of the sets are NSeq terms, and the equivalence relation is defined by =π‘Ÿπ‘’π‘™π‘œπ‘ as previously specified. By definition, if two elements of an equivalence class are at the same relocation Figure 4: Number of solved goals by accumulated time in seconds on quantifier-free Seq benchmarks translated from the QF_AX SMT-LIB benchmarks offset from the representative, they are equal. The data structure maintains, for each equivalence class, a mapping from offset to an element of the class that is at that specific relocation offset from the representative. This facilitates efficient detection of such equalities with minimal overhead. 7. Experimental Results In this section, we present experimental results of the calculi described in the previous section. Currently, our experiments have focused exclusively on quantifier-free benchmarks that utilize only the theory of sequences and the theory of uninterpreted functions. These benchmarks constitute a subset of those employed in the paper [5], originally translated into the Seq theory from the QF_AX SMT-LIB benchmarks. To achieve this, we implemented support for the Seq theory in our solver by translating Seq terms into NSeq terms. The translation process is as follows: β€’ Seq terms: NSeq terms for which the first index is 0 and the last index is greater or equal than βˆ’1. β€’ seq.empty: an NSeq term of the same sort, in which the first index is 0 and the last is βˆ’1, denoted πœ–. β€’ seq.unit(𝑣): const(0, 0, 𝑣) β€’ seq.len(𝑠): l𝑠 βˆ’ f 𝑠 +1 β€’ seq.nth(𝑠, 𝑖): get(𝑠, 𝑖) β€’ seq.update(𝑠1 , 𝑖, 𝑠2 ): let(π‘Ÿ, relocate(𝑠2 , 𝑖), ite(f 𝑠1 ≀ 𝑖 ≀ l𝑠1 ∧ l𝑠1 < lπ‘Ÿ , update(𝑠1 , slice(π‘Ÿ, 𝑖, l𝑠1 )), update(𝑠1 , π‘Ÿ))) β€’ seq.extract(𝑠, 𝑖, 𝑗): ite(𝑖 < f 𝑠 ∨ l𝑠 < 𝑖 ∨ 𝑗 ≀ 0, πœ–, slice(𝑠, 𝑖, min(l𝑠 , 𝑖 + 𝑗 βˆ’ 1))) β€’ seq.++(𝑠1 , 𝑠2 , 𝑠3 , ..., 𝑠𝑛 ): let(𝑐1 , concat(𝑠1 , relocate(𝑠2 , l𝑠1 +1)), let(𝑐2 , concat(𝑐1 , relocate(𝑠3 , l𝑐1 +1)), ... concat(π‘π‘›βˆ’2 , relocate(𝑠𝑛 , lπ‘π‘›βˆ’2 +1)))) The figure 4 depicts the number of satisfiable and unsatisfiable goals solved over accumulated time using our prototype implementation and the cvc5 SMT solver with different command-line options. NS- BASE and NS-EXT refer to our implementations1 , described in section 5, which can be used by running Colibri2 with the command-line options --nseq-base and --nseq-ext, respectively. We compare our implementation with the Seq theory implementation in cvc5 (version 1.1.1). In the graphs in figure 4, cvc5 corresponds to running cvc5 with the command-line option --strings-exp, necessary for using cvc5’s solver for the Seq theory. cvc5-eager uses the same option with --seq-arrays=eager, and cvc5-lazy with --seq-arrays=lazy, these options indicate different strategies for using an array-inspired solver for the Seq theory. Examining the graph on the right, which shows performance on unsatisfiable goals, we observe that our NS-EXT implementation outperforms cvc5 in both time and number of goals solved. Meanwhile, NS-BASE initially solves more goals than cvc5, but solves fewer overall. Additionally, cvc5-eager and cvc5-lazy solve more goals in less time compared to the others. The same trends apply to the satisfiable case, with the exception that cvc5 also surpasses NS-EXT in both time and number of goals solved once the 20-second threshold is reached. In our context, focused on program verification, the performance on unsatisfiable goals holds greater significance, though the satisfiable case remains useful. Since Colibri2 constructs concrete counterex- amples before concluding satisfiability, we aim to enhance our current model generation technique for n-sequences. In the unsatisfiable case, while we compete closely with the state-of-the-art SMT solver cvc5, we have observed that some goals unsolved within a short timeout (5 seconds) also remain unsolved with longer timeouts, suggesting potential performance limitations in our propagators for the NSeq theory. It’s also notable that our translation from Seq to NSeq in Colibri2 introduces more complex terms, and Colibri2 lacks clause learning, making decisions costlier than in other SMT solvers. 8. Conclusion In this paper, we explored the topic of n-indexed sequences in SMT. We proposed a theory for such sequences and discussed approaches for reasoning over it, whether by using existing theories or by adapting calculi from the theory of sequences to this theory. Looking ahead, our future work will delve deeper into different reasoning approaches for this theory, exploring their respective strengths and weaknesses through benchmarking with n-indexed sequences. We aim to prove the correctness of our developed calculi and explore alternative methods for reasoning over n-sequences beyond traditional sequence or string reasoning. Moreover, we seek to identify additional applications for this theory beyond programming languages where n-indexed sequences are present. References [1] N. BjΓΈrner, V. Ganesh, R. Michel, M. Veanes, An SMT-LIB Format for Sequences and Regular Expressions, Strings (2012). [2] H. R. Ait El Hara, F. Bobot, G. Bury, On SMT Theory Design: The Case of Sequences, in: Kalpa Publications in Computing, volume 18, EasyChair, 2024, pp. 14–29. URL: https://easychair.org/ publications/paper/qdvJ. doi:10.29007/75tl, iSSN: 2515-1762. [3] C. A. Furia, What’s Decidable about Sequences?, in: A. Bouajjani, W.-N. Chin (Eds.), Automated Technology for Verification and Analysis, Springer, Berlin, Heidelberg, 2010, pp. 128–142. doi:10. 1007/978-3-642-15643-4_11. [4] A. JeΕΌ, A. W. Lin, O. Markgraf, P. RΓΌmmer, Decision Procedures for Sequence Theories, in: C. Enea, A. Lal (Eds.), Computer Aided Verification, Lecture Notes in Computer Science, Springer Nature Switzerland, Cham, 2023, pp. 18–40. doi:10.1007/978-3-031-37703-7_2. 1 Available at: https://git.frama-c.com/pub/colibrics/-/tree/smt2024 (commit SHA: 43024e674ef26673d2495f3b186954fa37bc3890) [5] Y. Sheng, A. NΓΆtzli, A. Reynolds, Y. Zohar, D. Dill, W. Grieskamp, J. Park, S. Qadeer, C. Barrett, C. Tinelli, Reasoning About Vectors: Satisfiability Modulo a Theory of Sequences, Journal of Automated Reasoning 67 (2023) 32. URL: https://doi.org/10.1007/s10817-023-09682-2. doi:10. 1007/s10817-023-09682-2. [6] T. Liang, A. Reynolds, C. Tinelli, C. Barrett, M. Deters, A DPLL(T) Theory Solver for a Theory of Strings and Regular Expressions, volume 8559, Springer International Publishing, Cham, 2014, pp. 646–662. URL: http://link.springer.com/10.1007/978-3-319-08867-9_43. doi:10.1007/ 978-3-319-08867-9_43, book Title: Computer Aided Verification Series Title: Lecture Notes in Computer Science. [7] M. Berzish, V. Ganesh, Y. Zheng, Z3str3: a string solver with theory-aware heuristics, in: Proceedings of the 17th Conference on Formal Methods in Computer-Aided Design, FMCAD ’17, FMCAD Inc, Austin, Texas, 2017, pp. 55–59. [8] J. Christ, J. Hoenicke, Weakly Equivalent Arrays, in: C. Lutz, S. Ranise (Eds.), Frontiers of Combining Systems, Lecture Notes in Computer Science, Springer International Publishing, Cham, 2015, pp. 119–134. doi:10.1007/978-3-319-24246-0_8. [9] M. P. Bonacina, S. Graham-Lengrand, N. Shankar, CDSAT for Nondisjoint Theories with Shared Predicates: Arrays With Abstract Length, Satisfiability Modulo Theo- ries workshop, CEUR Workshop Proceedings 3185 (2022). URL: https://par.nsf.gov/biblio/ 10358980-cdsat-nondisjoint-theories-shared-predicates-arrays-abstract-length. [10] S. Ghilardi, A. Gianola, D. Kapur, C. Naso, Interpolation Results for Arrays with Length and MaxDiff, ACM Transactions on Computational Logic 24 (2023) 28:1–28:33. URL: https://doi.org/10. 1145/3587161. doi:10.1145/3587161. [11] A. R. Bradley, Z. Manna, H. B. Sipma, What’s Decidable About Arrays?, in: E. A. Emerson, K. S. Namjoshi (Eds.), Verification, Model Checking, and Abstract Interpretation, Springer, Berlin, Heidelberg, 2006, pp. 427–442. doi:10.1007/11609773_28. [12] Q. Wang, A. W. Appel, A Solver for Arrays with Concatenation, Journal of Automated Reasoning 67 (2023) 4. URL: https://doi.org/10.1007/s10817-022-09654-y. doi:10.1007/s10817-022-09654-y. A. Appendix A.1. Representation of n-Indexed Sequences using Sequences and Algebraic Data Types (declare-datatypes ((NSeq 1)) ((par (T) ((nseq.mk (nseq.first Int) (nseq.seq (Seq T))))))) (define-fun nseq.last (par (T) ((s (NSeq T))) Int (+ (- (seq.len (nseq.seq s)) 1) (nseq.first s)))) (define-fun nseq.get (par (T) ((s (NSeq T)) (i Int)) T (seq.nth (nseq.seq s) (- i (nseq.first s))))) (define-fun nseq.set (par (T) ((s (NSeq T)) (i Int) (v T)) (NSeq T) (nseq.mk (nseq.first s) (seq.update (nseq.seq s) (- i (nseq.first s)) (seq.unit v))))) (declare-fun nseq.const (par (T) (Int Int T) (NSeq T))) ;; "nseq_const" (assert (par (T) (forall ((f Int) (l Int) (v T)) (! (let ((s (nseq.const f l v))) (and (= (nseq.first s) f) (= (nseq.last s) l) (forall ((i Int)) (=> (and (<= f i) (<= i l)) (= (nseq.get s i) v))))) :pattern ((nseq.const f l v)))))) (define-fun nseq.relocate (par (T) ((s (NSeq T)) (f Int)) (NSeq T) (nseq.mk f (nseq.seq s)))) (define-fun nseq.concat (par (T) ((s1 (NSeq T)) (s2 (NSeq T))) (NSeq T) (ite (< (nseq.last s1) (nseq.first s1)) s2 (ite (or (< (nseq.last s2) (nseq.first s2)) (not (= (nseq.first s2) (+ (nseq.last s1) 1)))) s1 (nseq.mk (nseq.first s1) (seq.++ (nseq.seq s1) (nseq.seq s2))))))) (define-fun nseq.slice (par (T) ((s (NSeq T)) (f Int) (l Int)) (NSeq T) (ite (and (<= f l) (and (<= (nseq.first s) f) (<= l (nseq.last s)))) (nseq.mk f (seq.extract (nseq.seq s) (- f (nseq.first s)) (+ (- l f) 1))) s))) (define-fun nseq.update (par (T) ((s1 (NSeq T)) (s2 (NSeq T))) (NSeq T) (ite (and (<= (nseq.first s2) (nseq.last s2)) (<= (nseq.first s1) (nseq.first s2)) (<= (nseq.last s2) (nseq.last s1))) (nseq.mk (nseq.first s1) (seq.update (nseq.seq s1) (- (nseq.first s2) (nseq.first s1)) (nseq.seq s2))) s1)))