An SMT Theory for n-Indexed Sequences
                         Hichem Rami Ait El Hara1,2 , François Bobot2 and Guillaume Bury1
                         1
                             OCamlPro, Paris, France
                         2
                             Université Paris-Saclay, CEA, List, F-91120, Palaiseau, France


                                         Abstract
                                         The SMT (Satisfiability Modulo Theories) theory of arrays is well-established and widely used, with various
                                         decision procedures and extensions developed for it. However, recent works suggest that developing tailored
                                         reasoning for some theories, such as sequences and strings, is more efficient than reasoning over them through
                                         axiomatization over the theory of arrays. In this paper, we are interested in reasoning over n-indexed sequences as
                                         they are found in some programming languages, such as Ada. We propose an SMT theory of n-indexed sequences
                                         and explore different ways to represent and reason over n-indexed sequences using existing theories, as well as
                                         tailored calculi for the theory.


                         1. Introduction
                         In the SMT theory of sequences, sequences are viewed as a generalization of strings to non-character
                         elements, with possibly infinite alphabets. Sequences are dynamically sized, and their theory has a
                         rich signature. It allows selecting elements of a sequence by their index, concatenating two sequences,
                         extracting sub-sequences, and performing other operations. The expressiveness of the theory of
                         sequences makes it easier to represent various commonly found data structures in programming
                         languages, such as arrays in the C language, lists in Python, etc.
                            The theory of arrays is less expressive as it only supports selecting and storing one value at one index
                         at a time, and arrays have fixed sizes determined by the number of inhabitants of the sort of indices.
                         In contrast, sequences have dynamic lengths and operations allowing the selection and updating of
                         sets of indices at a time. To use the theory of arrays to represent sequences, one would need to extend
                         it and axiomatize the necessary properties, such as dynamic length and additional operations like
                         concatenation and extraction.
                            We are interested in a variant of the theory of sequences, which we call the theory of n-indexed
                         sequences. They differ from sequences mainly in their indexing, as they are not necessarily 0-indexed
                         but n-indexed, as their name suggests. This means they are defined as ordered collections of values
                         of the same sort indexed from a first index 𝑛 to a last index 𝑚. Such sequences are present in some
                         programming languages like Ada. Since there is no dedicated theory for such sequences, reasoning
                         over them cannot be done straightforwardly using the existing theories of arrays and sequences. It is
                         therefore necessary to use extensions and axiomatizations to reason over them.
                            In this paper, we will present the theory of n-indexed sequences, its signature and semantics, as well
                         as different ways to reason over it using existing theories and by adapting calculi from the theory of
                         sequences to the theory of n-indexed sequences.

                         Related work:
                         The SMT theory of sequences was introduced by Bjorner et al. [1]. Several contributions explored this
                         theory, its syntax and semantics [2], and its decidability [3, 4].
                           Our theory of n-indexed sequences and the calculi we developed are based on the contribution by
                         Sheng et al. [5], which in turn is based on reasoning about the theories of strings [6, 7] and arrays [8].

                          SMT 2024: 22nd International Workshop on Satisfiability Modulo Theories
                          $ hra687261@gmail.com (H. R. Ait El Hara); francois.bobot@ocamlpro.com (F. Bobot); gbury@gmail.com (G. Bury)
                           https://hra687261.github.io/ (H. R. Ait El Hara); https://gbury.eu/ (G. Bury)
                           0000-0001-7909-0413 (H. R. Ait El Hara); 0000-0002-6756-0788 (F. Bobot); 0009-0002-1267-251X (G. Bury)
                                      © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Other contributions have extended the theory of arrays with properties that are present in sequences,
such as length [9, 10, 11] and a concatenation function [12].


2. Notation
We refer to the theory of n-indexed sequences as the theory of n-sequences or the NSeq theory, and
n-indexed sequence terms will be referred to as n-sequences, n-sequence terms, or NSeq terms. Their
sort will be denoted as NSeq E, where E is the sort of the elements of the n-sequence. We will refer to
the theory of sequences as the Seq theory. Int is the sort of integers from the theory of Linear Integer
Arithmetic. min and max are the usual mathematical functions. ite is the ite SMT-LIB function; it
takes a boolean expression and two expressions of the same sort and returns the first one if the boolean
expression is true and the second otherwise. The let 𝑥 = 𝑣, 𝑦 symbol binds a variable 𝑥 to a value 𝑣 in
a term 𝑦.
   In the remainder of the paper, we will use 𝑠, 𝑠𝑛 , 𝑤, 𝑤𝑛 , 𝑦1 , 𝑦2 , 𝑧1 , and 𝑧2 to represent NSeq terms,
with 𝑛 being a positive integer. 𝑘, 𝑘1 , 𝑘2 , and 𝑘3 will represent fresh NSeq term variables. 𝑖 and 𝑗 will
be used as integer index terms, and 𝑢 and 𝑣 will be used as NSeq element terms. We assume that all the
terms we use are well-sorted.


3. The Theory of n-Indexed Sequences

Table 1
The signature of the theory of n-indexed sequences
    SMT-LIB symbol                     Sort
    nseq.get                           NSeq E → Int → E
    nseq.set                           NSeq E → Int → E → NSeq E
    nseq.first                         NSeq E → Int
    nseq.last                          NSeq E → Int
    nseq.const                         Int → Int → E → NSeq E
    nseq.relocate                      NSeq E → Int → NSeq E
    nseq.concat                        NSeq E → NSeq E → NSeq E
    nseq.slice                         NSeq E → Int → Int → NSeq E
    nseq.update                        NSeq E → NSeq E → NSeq E

  We present in this section the theory of n-indexed sequences. The signature of the NSeq theory is
presented in the table 1. In the remainder of the paper, when referring to the symbols of the theory, the
prefix "nseq." of the symbols will be omitted.
  The following list describes the semantics of each symbol of the theory:

    • f 𝑠 : the first index of 𝑠.
    • l𝑠 : the last index of 𝑠.
    • get(𝑠, 𝑖): if the index 𝑖 is within the bounds of 𝑠, returns the element associated with 𝑖 in 𝑠;
      otherwise, returns an uninterpreted value. An uninterpreted value is one which is not constrained
      and can be any value of the right sort.
    • set(𝑠, 𝑖, 𝑣): if 𝑖 is within the bounds of 𝑠, creates a new n-sequence in which 𝑖 is associated with 𝑣,
      and the other indices within the bounds are associated with the same values as the corresponding
      indices in 𝑠; otherwise, returns 𝑠.
    • const(𝑓, 𝑙, 𝑣): creates an n-sequence with 𝑓 as the first index and 𝑙 as the last, and if it is not
      empty, all the indices within its bounds are associated with the value 𝑣.
    • relocate(𝑎, 𝑓 ): given an n-sequence 𝑎 and an index 𝑓 , returns a new n-sequence 𝑏 which has 𝑓
      as its first index and (𝑓 + l𝑎 − f 𝑎 ) as its last index, and associates with each index 𝑖 within the
      bounds of 𝑏 the same value associated with the index (𝑖 − f 𝑏 + f 𝑎 ) in 𝑎.
    • concat(𝑎, 𝑏): if 𝑎 is empty, returns 𝑏; if 𝑏 is empty, returns 𝑎; if f 𝑏 = l𝑎 +1, returns a new n-
      sequence in which the first index is f 𝑎 , the last index is l𝑏 , and all the indices within the bounds
      of 𝑎 are associated with the same values as the corresponding indices in 𝑎, as well as the indices
      within the bounds of 𝑏 are associated with the same values as the corresponding indices in 𝑏; if
      f 𝑏 ̸= l𝑎 +1, returns 𝑎.
    • slice(𝑎, 𝑓, 𝑙): if f 𝑎 ≤ 𝑓 ≤ 𝑙 ≤ l𝑎 , returns a new n-sequence for which the first index is 𝑓 , the
      last index is 𝑙, and all the indices between 𝑓 and 𝑙 are associated with the same values as the
      corresponding indices in 𝑎; otherwise, returns 𝑎.
    • update(𝑎, 𝑏): if 𝑎 is empty, or 𝑏 is empty, or the property f 𝑎 ≤ f 𝑏 ≤ l𝑏 ≤ l𝑎 doesn’t hold, returns
      𝑎; otherwise, returns a new n-sequence 𝑐 such that f 𝑐 = f 𝑎 and l𝑐 = l𝑎 , and for all the indices
      within the bounds of 𝑐, if they are within the bounds of 𝑏, then they are associated with the same
      values as they are in 𝑏, otherwise they are associated with the values to which they are associated
      in 𝑎.

Definition 1 (Bounds). The bounds of an n-sequence 𝑠 are its first and last index, which are respectively
denoted as f 𝑠 and l𝑠 , and which correspond to the values returned by the functions nseq.first(𝑠) and
nseq.last(𝑠) respectively. An index 𝑖 is said to be within the bounds of an n-sequence 𝑠 if:
                                                f 𝑠 ≤ 𝑖 ≤ l𝑠

Definition 2 (Extensionality). The theory of n-indexed sequences is extensional, which means that
n-sequences that contain the same elements are equal. Therefore, given two n-sequences 𝑎 and 𝑏:

                                              f 𝑎 = f 𝑏 ∧ l𝑎 = l𝑏 ∧
                              (∀𝑖 : Int, f 𝑎 ≤ 𝑖 ≤ l𝑎 → get(𝑎, 𝑖) = get(𝑏, 𝑖))
                                                   →𝑎=𝑏
Definition 3 (Empty n-sequence). An n-sequence 𝑠 is said to be empty if l𝑠 < f 𝑠 . Two empty n-
sequences 𝑎 and 𝑏 are equal if f 𝑎 = f 𝑏 and l𝑎 = l𝑏 ; otherwise, they are distinct.


4. Reasoning with existing theories
One way to reason over the NSeq theory is by using the theory of arrays. It is done by extending it
with the symbols of the NSeq theory and adding the right axioms that follow the semantics of the
corresponding symbols in the NSeq theory.
  Another way is to use the theory of sequences and the theory of algebraic data types. It consists in
defining n-sequences as a pair of a sequence and the first index (the offset to zero):

(declare-datatype NSeq
  (par (T) ((nseq.mk (nseq.first Int) (nseq.seq (Seq T))))))


  The other symbols of the NSeq theory can also be defined using the NSeq data type defined above,
for example:

(define-fun nseq.last (par (T) ((s (NSeq T))) Int
  (+ (- (seq.len (nseq.seq s)) 1) (nseq.first s))))

(define-fun nseq.get (par (T) ((s (NSeq T)) (i Int)) T
  (seq.nth (nseq.seq s) (- i (nseq.first s)))))

(define-fun nseq.set (par (T) ((s (NSeq T)) (i Int) (v T)) (NSeq T)
  (nseq.mk (nseq.first s)
    (seq.update (nseq.seq s) (- i (nseq.first s)) (seq.unit v)))))
  Except for the const function which needs to be axiomatized:

(declare-fun nseq.const (par (T) (Int Int T) (NSeq T)))

;; "nseq_const"
(assert (par (T) (forall ((f Int) (l Int) (v T))
    (!
      (let ((s (nseq.const f l v)))
        (and
          (= (nseq.first s) f)
          (= (nseq.last s) l)
          (forall ((i Int))
            (=> (and (<= f i) (<= i l)) (= (nseq.get s i) v)))))
      :pattern ((nseq.const f l v))))))


   The full NSeq theory, defined using the Seq theory and Algebraic Data Types, is attached in Appendix
A.1.
   Although this approach allows us to reason over n-indexed sequences, it is not ideal to depend on two
theories to do so. Additionally, the differences in semantics between the update and 𝑠𝑙𝑖𝑐𝑒 functions of
the NSeq theory, and the seq.update and seq.extract functions of the Seq theory, make the definitions
relatively complex.


5. Porting Calculi from the Seq Theory to the NSeq Theory
To develop our calculi over the NSeq theory, we based our work on the calculi developed by Sheng et al.
[5] on the Seq theory, where two calculi were proposed. The first is called the BASE calculus, based
on a string theory calculus that reduces functions selecting and storing one element at an index to
concatenations of sequences. The second is called the EXT calculus, and it handles these functions using
array-like reasoning. Our versions of these calculi are referred to as NS-BASE and NS-EXT, respectively.
   The NSeq theory differs from the Seq theory in both syntax and semantics of many symbols:

    • const and relocate do not appear in the Seq theory, while seq.empty, seq.unit, and seq.len do
      not appear in the NSeq theory.
    • The seq.nth function corresponds to the get function in the NSeq theory.
    • seq.update from the Seq theory with a value as a third argument corresponds to set in the NSeq
      theory, while seq.update with a sequence as a third argument corresponds to update in the NSeq
      theory, which takes only two NSeq terms as arguments.
    • seq.extract in the Seq theory takes a sequence, an offset, and a length, and corresponds to slice in
      the NSeq theory, which takes an n-sequence, a first index, and a last index.
    • The concatenation function (seq.++) in Seq is n-ary, and it corresponds to concat in the NSeq
      theory, which is binary.

  Therefore, we needed to make substantial changes to the Seq theory calculi to adapt them to the
NSeq theory. In this section, we present the resulting calculi. We assume that we are in a theory
combination framework where reasoning over the LIA (Linear Integer Arithmetic) theory is supported,
and where unsatisfiability in one of the theories implies unsatisfiability of the entire reasoning. We will
only present the rules that handle the symbols of the NSeq theory.
5.1. Common calculus
Definition 4 (Equivalence modulo relocation). Given two NSeq terms 𝑠1 and 𝑠2 , the terms are said
to be equivalent modulo relocation, denoted with the equivalence relation 𝑠1 =𝑟𝑒𝑙𝑜𝑐 𝑠2 , such that:

                                                 𝑠1 =𝑟𝑒𝑙𝑜𝑐 𝑠2 ≡
          l𝑠2 = l𝑠1 − f 𝑠1 + f 𝑠2 ∧∀𝑖 : 𝐼𝑛𝑡, f 𝑠1 ≤ 𝑖 ≤ l𝑠1 ⇒ 𝑔𝑒𝑡(𝑠1 , 𝑖) = 𝑔𝑒𝑡(𝑠2 , 𝑖 − f 𝑠1 + f 𝑠2 )

Equivalence modulo relocation represents equivalence between n-sequences relative to their starting index.
Two n-sequences are equivalent modulo relocation if they have the same set of elements in the same order,
but start at different indices.

Definition 5 (NSeq term normal form). For simplicity and consistency with Seq theory calculi, we
introduce the concatenation operator :: with the invariant:

                         𝑠 = 𝑠1 :: 𝑠2     =⇒ f 𝑠 = f 𝑠1 ∧ l𝑠 = l𝑠2 ∧ f 𝑠2 = l𝑠1 +1

This operator is used to normalize NSeq terms. It differs from concat by not having to check the condition
that f 𝑠2 = l𝑠1 +1 before concatenation, as it is ensured by the invariant.

Assumption 1. We assume that the following simplification rewrites are applied whenever possible:

                   𝑠1 :: 𝑠2   →           𝑠1              when l𝑠2 < f 𝑠2                (1)
                   𝑠1 :: 𝑠2   →           𝑠2              when l𝑠1 < f 𝑠1                (2)
                   𝑠1 :: 𝑠2   → 𝑠1 :: 𝑤1 :: ... :: 𝑤𝑛     when 𝑠2 = 𝑤1 :: ... :: 𝑤𝑛      (3)
                   𝑠1 :: 𝑠2   → 𝑤1 :: ... :: 𝑤𝑛 :: 𝑠2     when 𝑠1 = 𝑤1 :: ... :: 𝑤𝑛      (4)

(1) and (2) remove empty NSeq terms from normal form. (3) and (4) make sure that when an NSeq term
appear in the normal of another NSeq term and has its own normal form, then it is replaced by its normal
form.

   Figure 1 illustrates a set of common rules shared between the two calculi NS-BASE and NS-EXT. The
rules Const-Bounds and Reloc-Bounds propagate the bounds of constant and relocated n-sequences,
which are created using the const and relocate functions, respectively. The rules NS-Slice, NS-Concat,
and NS-Update handle slice, concat, and update by normalizing the NSeq terms under appropriate
conditions. If an NSeq term has two normal forms where distinct terms begin at the same index but
end at different indices, the NS-Split rule rewrites the longer term as a concatenation of the shorter one
and a fresh variable. The NS-Comp-Reloc rule propagates concatenations over equivalence modulo
relocation.

5.2. The base calculus
The base calculus comprises the rules in figures 1 and 2. The rules R-Get and R-Set handle the 𝑔𝑒𝑡 and
𝑠𝑒𝑡 operations by introducing a new normal form for the NSeq terms they operate on. In the R-Get
rule, when 𝑖 is within the bounds of 𝑠, a new normal form of 𝑠 is introduced. It includes a constant
NSeq term of size one at the 𝑖th position storing the value 𝑣, and two variables, 𝑘1 and 𝑘2 , to represent
the left and right segments of NSeq term 𝑠 respectively. The R-Set rule operates similarly: when 𝑖 is
within the bounds of 𝑠2 , new normal forms are introduced for both 𝑠1 and 𝑠2 . These forms share two
variables, 𝑘1 and 𝑘3 , representing segments on the left and right of the 𝑖th index. 𝑠1 has a constant
NSeq term of size one holding the value 𝑣 at the 𝑖th index, while 𝑠2 introduces another variable, 𝑘2 , of
the same size and position.
                      𝑠 = const(𝑓, 𝑙, 𝑣)                                      𝑠1 = 𝑟𝑒𝑙𝑜𝑐𝑎𝑡𝑒(𝑠2 , 𝑖)
      Const-Bounds                             Reloc-Bounds
                       f 𝑠 = 𝑓 ∧ l𝑠 = 𝑙                                        𝑖 = f 𝑠2 ∧𝑠1 = 𝑠2                ||
                                                                  𝑖 ̸= f 𝑠2 ∧ f 𝑠1 = 𝑖 ∧ l𝑠1 = 𝑖 + l𝑠2 − f 𝑠2
                                                                                 ∧𝑠1 =𝑟𝑒𝑙𝑜𝑐 𝑠2

                                                      𝑠1 = 𝑠𝑙𝑖𝑐𝑒(𝑠, 𝑓, 𝑙)
                               NS-Slice
                                           (𝑓 < f 𝑠 ∨𝑙 < 𝑓 ∨ l𝑠 < 𝑙) ∧ 𝑠1 = 𝑠         ||
                                           f 𝑠 ≤ 𝑓 ≤ 𝑙 ≤ l𝑠 ∧𝑠 = 𝑘1 :: 𝑠1 :: 𝑘2

                                                       𝑠 = 𝑐𝑜𝑛𝑐𝑎𝑡(𝑠1 , 𝑠2 )
                     NS-Concat
                                                      l𝑠1 < f 𝑠1 ∧𝑠 = 𝑠2                             ||
                                           (l𝑠2 < f 𝑠2 ∨ l𝑠1 +1 ̸= f 𝑠2 ) ∧ 𝑠 = 𝑠1                   ||
                                   f 𝑠1 ≤ l𝑠1 ∧ f 𝑠2 ≤ l𝑠2 ∧ f 𝑠2 = l𝑠1 +1 ∧ 𝑠 = 𝑠1 :: 𝑠2

                                                       𝑠1 = 𝑢𝑝𝑑𝑎𝑡𝑒(𝑠2 , 𝑠)
               NS-Update
                                          (l𝑠 < f 𝑠 ∨𝑓𝑠 < f 𝑠2 ∨ l𝑠2 < l𝑠 ) ∧ 𝑠1 = 𝑠2                     ||
                               f 𝑠2 ≤ f 𝑠 ≤ l𝑠 ≤ l𝑠2 ∧𝑠1 = 𝑘1 :: 𝑠 :: 𝑘3 ∧ 𝑠2 = 𝑘1 :: 𝑘2 :: 𝑘3
                                           𝑠 = 𝑤 :: 𝑦1 :: 𝑧1    𝑠 = 𝑤 :: 𝑦2 :: 𝑧2
                      NS-Split
                                                 l𝑦1 = l𝑦2 ∧𝑦1 = 𝑦2                             ||
                                   l𝑦1 > l𝑦2 ∧𝑦1 = 𝑦2 :: 𝑘 ∧ f 𝑘 = l𝑦2 +1 ∧ l𝑘 = l𝑦1            ||
                                   l𝑦1 < l𝑦2 ∧𝑦2 = 𝑦1 :: 𝑘 ∧ f 𝑘 = l𝑦1 +1 ∧ l𝑘 = l𝑦2

                                            𝑠1 = 𝑘1 :: 𝑘2 :: ... :: 𝑘𝑛      𝑠1 =𝑟𝑒𝑙𝑜𝑐 𝑠2
             NS-Comp-Reloc
                                                      f 𝑠1 = f 𝑠2 ∧𝑠1 = 𝑠2                              ||
                                 𝑠2 = 𝑟𝑒𝑙𝑜𝑐𝑎𝑡𝑒(𝑘1 , f 𝑠2 ) :: 𝑟𝑒𝑙𝑜𝑐𝑎𝑡𝑒(𝑘2 , f 𝑘2 − f 𝑠1 + f 𝑠2 ) :: ...
                                              :: 𝑟𝑒𝑙𝑜𝑐𝑎𝑡𝑒(𝑘𝑛 , f 𝑘𝑛 − f 𝑠1 + f 𝑠2 )


Figure 1: Common inference rules for the NS-BASE and NS-EXT calculi


                                                        𝑣 = 𝑔𝑒𝑡(𝑠, 𝑖)
                           R-Get
                                                    𝑖 < f 𝑠 ∨ l𝑠 < 𝑖                       ||
                                      f 𝑠 ≤ 𝑖 ≤ l𝑠 ∧𝑠 = 𝑘1 :: 𝑐𝑜𝑛𝑠𝑡(𝑖, 𝑖, 𝑣) :: 𝑘2

                                                     𝑠1 = 𝑠𝑒𝑡(𝑠2 , 𝑖, 𝑣)
                       R-Set
                                            (𝑖 < f 𝑠2 ∨ l𝑠2 < 𝑖) ∧ 𝑠1 = 𝑠2                      ||
                                       f 𝑠2 ≤ 𝑖 ≤ l𝑠2 ∧ f 𝑠1 = f 𝑠2 ∧ l𝑠1 = l𝑠2
                                 𝑠1 = 𝑘1 :: 𝑐𝑜𝑛𝑠𝑡(𝑖, 𝑖, 𝑣) :: 𝑘3 ∧ 𝑠2 = 𝑘1 :: 𝑘2 :: 𝑘3


Figure 2: NS-BASE specific inference rules


5.3. The extended calculus
The extended calculus consists of the rules in figures 1 and 3. It differs from the base calculus by handling
the get and set functions similarly to how they are treated in the array decision procedure described
in [8]. The Get-Intro rule introduces a get operation from a set operation. The Get-Set operation is
equivalent to what is commonly referred to as the read-over-write or select-over-store rule in the Array
theory, allowing the application of a get operation over a set operation. The Set-Bound rule ensures
that a set operation is performed within the bounds of the target NSeq term, or that the resulting NSeq
term is equivalent to the one it was applied on. The Get-Concat, Set-Concat, and Set-Concat-Inv rules
illustrate how get and set operations are handled when applied to an NSeq term in normal form, where
                                              𝑣 = 𝑔𝑒𝑡(𝑠, 𝑖)        𝑠 = 𝑤1 :: ... :: 𝑤𝑛
                           Get-Concat
                                                  𝑖 < f 𝑠 ∨ l𝑠 < 𝑖               ||
                                         f 𝑤1 ≤ 𝑖 ≤ l𝑤1 ∧𝑔𝑒𝑡(𝑤1 , 𝑖) = 𝑣         ||    ... ||
                                         f 𝑤𝑛 ≤ 𝑖 ≤ l𝑤𝑛 ∧𝑔𝑒𝑡(𝑤𝑛 , 𝑖) = 𝑣

                                           𝑠1 = 𝑠𝑒𝑡(𝑠2 , 𝑖, 𝑣)      𝑠2 = 𝑤1 :: ... :: 𝑤𝑛
            Set-Concat
                                                  𝑖 < f 𝑠2 ∨ l𝑠2 < 𝑖                            ||
                           𝑠1 = 𝑘1 :: ... :: 𝑘𝑛 ∧ f 𝑤1 ≤ 𝑖 ≤ l𝑤1 ∧𝑘1 = 𝑠𝑒𝑡(𝑤1 , 𝑖, 𝑣)∧
                              f 𝑘1 = f 𝑤1 ∧ l𝑘1 = l𝑤1 ∧... ∧ f 𝑘𝑛 = f 𝑤𝑛 ∧ l𝑘𝑛 = l𝑤𝑛            || ... ||
                           𝑠1 = 𝑘1 :: ... :: 𝑘𝑛 ∧ f 𝑤𝑛 ≤ 𝑖 ≤ l𝑤𝑛 ∧𝑘𝑛 = 𝑠𝑒𝑡(𝑤𝑛 , 𝑖, 𝑣)∧
                              f 𝑘1 = f 𝑤1 ∧ l𝑘1 = l𝑤1 ∧... ∧ f 𝑘𝑛 = f 𝑤𝑛 ∧ l𝑘𝑛 = l𝑤𝑛

                                             𝑠1 = 𝑠𝑒𝑡(𝑠2 , 𝑖, 𝑣)      𝑠1 = 𝑤1 :: ... :: 𝑤𝑛
          Set-Concat-Inv
                                                    𝑖 < f 𝑠2 ∨ l𝑠2 < 𝑖                   ||
                             𝑠2 = 𝑘1 :: ... :: 𝑘𝑛 ∧ f 𝑤1 ≤ 𝑖 ≤ l𝑤1 ∧𝑤1 = 𝑠𝑒𝑡(𝑘1 , 𝑖, 𝑣)∧
                                f 𝑘1 = f 𝑤1 ∧ l𝑘1 = l𝑤1 ∧... ∧ f 𝑘𝑛 = f 𝑤𝑛 ∧ l𝑘𝑛 = l𝑤𝑛   ||             ... ||
                             𝑠2 = 𝑘1 :: ... :: 𝑘𝑛 ∧ f 𝑤𝑛 ≤ 𝑖 ≤ l𝑤𝑛 ∧𝑤𝑛 = 𝑠𝑒𝑡(𝑘𝑛 , 𝑖, 𝑣)∧
                                f 𝑘1 = f 𝑤1 ∧ l𝑘1 = l𝑤1 ∧... ∧ f 𝑘𝑛 = f 𝑤𝑛 ∧ l𝑘𝑛 = l𝑤𝑛

                                             𝑠 = 𝑐𝑜𝑛𝑠𝑡(𝑓, 𝑙, 𝑣)         𝑢 = 𝑔𝑒𝑡(𝑠, 𝑖)
                             Get-Const
                                          𝑖 < f 𝑠 ∨ l𝑠 < 𝑖 ||       f 𝑠 ≤ 𝑖 ≤ l𝑠 ∧𝑢 = 𝑣

                                                         𝑠1 = 𝑠𝑒𝑡(𝑠2 , 𝑖, 𝑣)
                     Get-Intro
                                   𝑖 < f 𝑠1 ∨ l𝑠1 < 𝑖 ||       f 𝑠1 ≤ 𝑖 ≤ l𝑠1 ∧𝑣 = 𝑔𝑒𝑡(𝑠1 , 𝑖)

                                            𝑠1 = 𝑠𝑒𝑡(𝑠2 , 𝑖, 𝑣)      𝑢 = 𝑔𝑒𝑡(𝑠1 , 𝑗)
                             Get-Set
                                                      𝑖 < f 𝑠1 ∨ l𝑠1 < 𝑖          ||
                                              𝑖 = 𝑗 ∧ f 𝑠1 ≤ 𝑖 ≤ l𝑠1 ∧𝑢 = 𝑣       ||
                                         𝑖 ̸= 𝑗 ∧ f 𝑠1 ≤ 𝑖 ≤ l𝑠1 ∧𝑢 = 𝑔𝑒𝑡(𝑠2 , 𝑖)

                                                          𝑠1 = 𝑠𝑒𝑡(𝑠2 , 𝑖, 𝑣)
                           Set-Bound
                                         𝑠1 = 𝑠2    ||     f 𝑠1 ≤ 𝑖 ≤ l𝑠1 ∧𝑔𝑒𝑡(𝑠2 , 𝑖) ̸= 𝑣

                                                𝑣 = 𝑔𝑒𝑡(𝑠1 , 𝑖)       𝑠1 =𝑟𝑒𝑙𝑜𝑐 𝑠2
                Get-Reloc
                               𝑖 < 𝑓𝑠 ∨ 𝑙𝑠 < 𝑖 ||        𝑓𝑠 ≤ 𝑖 ≤ 𝑙𝑠 ∧ 𝑣 = 𝑔𝑒𝑡(𝑠2 , 𝑖 − f 𝑠1 − f 𝑠2 )


Figure 3: NS-EXT specific inference rules


the operations affect the right component of the concatenation within its bounds. The Get-Const rule
addresses the special case where a get operation is applied to a constant NSeq term. The Get-Reloc rule
facilitates the propagation of constraints on index-associated values in NSeq terms from one term to
others that are equivalent modulo relocation.


6. Implementation
We have implemented a prototype of the described calculi in the Colibri2 CP (Constraint Programming)
solver. In this section, we discuss some of the implementation choices we made.
   The rewriting rules described in Assumption 1 are applied whenever applicable using a callback
system. When the conditions are satisfied, the corresponding rewriting rule is triggered.
   Equivalence modulo relocation is managed using a disjoint-set (union-find) data structure. In this
data structure, the elements of the sets are NSeq terms, and the equivalence relation is defined by =𝑟𝑒𝑙𝑜𝑐
as previously specified. By definition, if two elements of an equivalence class are at the same relocation
Figure 4: Number of solved goals by accumulated time in seconds on quantifier-free Seq benchmarks translated
from the QF_AX SMT-LIB benchmarks


offset from the representative, they are equal. The data structure maintains, for each equivalence
class, a mapping from offset to an element of the class that is at that specific relocation offset from the
representative. This facilitates efficient detection of such equalities with minimal overhead.


7. Experimental Results
In this section, we present experimental results of the calculi described in the previous section. Currently,
our experiments have focused exclusively on quantifier-free benchmarks that utilize only the theory
of sequences and the theory of uninterpreted functions. These benchmarks constitute a subset of
those employed in the paper [5], originally translated into the Seq theory from the QF_AX SMT-LIB
benchmarks.
   To achieve this, we implemented support for the Seq theory in our solver by translating Seq terms
into NSeq terms. The translation process is as follows:

    • Seq terms: NSeq terms for which the first index is 0 and the last index is greater or equal than −1.
    • seq.empty: an NSeq term of the same sort, in which the first index is 0 and the last is −1, denoted
      𝜖.
    • seq.unit(𝑣): const(0, 0, 𝑣)
    • seq.len(𝑠): l𝑠 − f 𝑠 +1
    • seq.nth(𝑠, 𝑖): get(𝑠, 𝑖)
    • seq.update(𝑠1 , 𝑖, 𝑠2 ):

                                    let(𝑟, relocate(𝑠2 , 𝑖), ite(f 𝑠1 ≤ 𝑖 ≤ l𝑠1 ∧ l𝑠1 < l𝑟 ,
                                       update(𝑠1 , slice(𝑟, 𝑖, l𝑠1 )), update(𝑠1 , 𝑟)))

    • seq.extract(𝑠, 𝑖, 𝑗):

                             ite(𝑖 < f 𝑠 ∨ l𝑠 < 𝑖 ∨ 𝑗 ≤ 0, 𝜖, slice(𝑠, 𝑖, min(l𝑠 , 𝑖 + 𝑗 − 1)))

    • seq.++(𝑠1 , 𝑠2 , 𝑠3 , ..., 𝑠𝑛 ):

                                         let(𝑐1 , concat(𝑠1 , relocate(𝑠2 , l𝑠1 +1)),
                                         let(𝑐2 , concat(𝑐1 , relocate(𝑠3 , l𝑐1 +1)),
                                                               ...
                                         concat(𝑐𝑛−2 , relocate(𝑠𝑛 , l𝑐𝑛−2 +1))))
   The figure 4 depicts the number of satisfiable and unsatisfiable goals solved over accumulated time
using our prototype implementation and the cvc5 SMT solver with different command-line options. NS-
BASE and NS-EXT refer to our implementations1 , described in section 5, which can be used by running
Colibri2 with the command-line options --nseq-base and --nseq-ext, respectively. We compare
our implementation with the Seq theory implementation in cvc5 (version 1.1.1). In the graphs in figure
4, cvc5 corresponds to running cvc5 with the command-line option --strings-exp, necessary for
using cvc5’s solver for the Seq theory. cvc5-eager uses the same option with --seq-arrays=eager,
and cvc5-lazy with --seq-arrays=lazy, these options indicate different strategies for using an
array-inspired solver for the Seq theory.
   Examining the graph on the right, which shows performance on unsatisfiable goals, we observe that
our NS-EXT implementation outperforms cvc5 in both time and number of goals solved. Meanwhile,
NS-BASE initially solves more goals than cvc5, but solves fewer overall. Additionally, cvc5-eager and
cvc5-lazy solve more goals in less time compared to the others. The same trends apply to the satisfiable
case, with the exception that cvc5 also surpasses NS-EXT in both time and number of goals solved once
the 20-second threshold is reached.
   In our context, focused on program verification, the performance on unsatisfiable goals holds greater
significance, though the satisfiable case remains useful. Since Colibri2 constructs concrete counterex-
amples before concluding satisfiability, we aim to enhance our current model generation technique
for n-sequences. In the unsatisfiable case, while we compete closely with the state-of-the-art SMT
solver cvc5, we have observed that some goals unsolved within a short timeout (5 seconds) also remain
unsolved with longer timeouts, suggesting potential performance limitations in our propagators for
the NSeq theory. It’s also notable that our translation from Seq to NSeq in Colibri2 introduces more
complex terms, and Colibri2 lacks clause learning, making decisions costlier than in other SMT solvers.


8. Conclusion
In this paper, we explored the topic of n-indexed sequences in SMT. We proposed a theory for such
sequences and discussed approaches for reasoning over it, whether by using existing theories or by
adapting calculi from the theory of sequences to this theory.
   Looking ahead, our future work will delve deeper into different reasoning approaches for this theory,
exploring their respective strengths and weaknesses through benchmarking with n-indexed sequences.
We aim to prove the correctness of our developed calculi and explore alternative methods for reasoning
over n-sequences beyond traditional sequence or string reasoning. Moreover, we seek to identify
additional applications for this theory beyond programming languages where n-indexed sequences are
present.


References
    [1] N. Bjørner, V. Ganesh, R. Michel, M. Veanes, An SMT-LIB Format for Sequences and Regular
        Expressions, Strings (2012).
    [2] H. R. Ait El Hara, F. Bobot, G. Bury, On SMT Theory Design: The Case of Sequences, in: Kalpa
        Publications in Computing, volume 18, EasyChair, 2024, pp. 14–29. URL: https://easychair.org/
        publications/paper/qdvJ. doi:10.29007/75tl, iSSN: 2515-1762.
    [3] C. A. Furia, What’s Decidable about Sequences?, in: A. Bouajjani, W.-N. Chin (Eds.), Automated
        Technology for Verification and Analysis, Springer, Berlin, Heidelberg, 2010, pp. 128–142. doi:10.
        1007/978-3-642-15643-4_11.
    [4] A. Jeż, A. W. Lin, O. Markgraf, P. Rümmer, Decision Procedures for Sequence Theories, in: C. Enea,
        A. Lal (Eds.), Computer Aided Verification, Lecture Notes in Computer Science, Springer Nature
        Switzerland, Cham, 2023, pp. 18–40. doi:10.1007/978-3-031-37703-7_2.

1
    Available at: https://git.frama-c.com/pub/colibrics/-/tree/smt2024 (commit SHA: 43024e674ef26673d2495f3b186954fa37bc3890)
 [5] Y. Sheng, A. Nötzli, A. Reynolds, Y. Zohar, D. Dill, W. Grieskamp, J. Park, S. Qadeer, C. Barrett,
     C. Tinelli, Reasoning About Vectors: Satisfiability Modulo a Theory of Sequences, Journal
     of Automated Reasoning 67 (2023) 32. URL: https://doi.org/10.1007/s10817-023-09682-2. doi:10.
     1007/s10817-023-09682-2.
 [6] T. Liang, A. Reynolds, C. Tinelli, C. Barrett, M. Deters, A DPLL(T) Theory Solver for a Theory
     of Strings and Regular Expressions, volume 8559, Springer International Publishing, Cham,
     2014, pp. 646–662. URL: http://link.springer.com/10.1007/978-3-319-08867-9_43. doi:10.1007/
     978-3-319-08867-9_43, book Title: Computer Aided Verification Series Title: Lecture Notes
     in Computer Science.
 [7] M. Berzish, V. Ganesh, Y. Zheng, Z3str3: a string solver with theory-aware heuristics, in:
     Proceedings of the 17th Conference on Formal Methods in Computer-Aided Design, FMCAD ’17,
     FMCAD Inc, Austin, Texas, 2017, pp. 55–59.
 [8] J. Christ, J. Hoenicke, Weakly Equivalent Arrays, in: C. Lutz, S. Ranise (Eds.), Frontiers of
     Combining Systems, Lecture Notes in Computer Science, Springer International Publishing, Cham,
     2015, pp. 119–134. doi:10.1007/978-3-319-24246-0_8.
 [9] M. P. Bonacina, S. Graham-Lengrand, N. Shankar,               CDSAT for Nondisjoint Theories
     with Shared Predicates: Arrays With Abstract Length,                Satisfiability Modulo Theo-
     ries workshop, CEUR Workshop Proceedings 3185 (2022). URL: https://par.nsf.gov/biblio/
     10358980-cdsat-nondisjoint-theories-shared-predicates-arrays-abstract-length.
[10] S. Ghilardi, A. Gianola, D. Kapur, C. Naso, Interpolation Results for Arrays with Length and
     MaxDiff, ACM Transactions on Computational Logic 24 (2023) 28:1–28:33. URL: https://doi.org/10.
     1145/3587161. doi:10.1145/3587161.
[11] A. R. Bradley, Z. Manna, H. B. Sipma, What’s Decidable About Arrays?, in: E. A. Emerson,
     K. S. Namjoshi (Eds.), Verification, Model Checking, and Abstract Interpretation, Springer, Berlin,
     Heidelberg, 2006, pp. 427–442. doi:10.1007/11609773_28.
[12] Q. Wang, A. W. Appel, A Solver for Arrays with Concatenation, Journal of Automated Reasoning 67
     (2023) 4. URL: https://doi.org/10.1007/s10817-022-09654-y. doi:10.1007/s10817-022-09654-y.


A. Appendix
A.1. Representation of n-Indexed Sequences using Sequences and Algebraic Data
     Types
(declare-datatypes ((NSeq 1))
  ((par (T) ((nseq.mk (nseq.first Int) (nseq.seq (Seq T)))))))

(define-fun nseq.last (par (T) ((s (NSeq T))) Int
  (+ (- (seq.len (nseq.seq s)) 1) (nseq.first s))))

(define-fun nseq.get (par (T) ((s (NSeq T)) (i Int)) T
  (seq.nth (nseq.seq s) (- i (nseq.first s)))))

(define-fun nseq.set (par (T) ((s (NSeq T)) (i Int) (v T)) (NSeq T)
  (nseq.mk (nseq.first s)
    (seq.update (nseq.seq s) (- i (nseq.first s)) (seq.unit v)))))

(declare-fun nseq.const (par (T) (Int Int T) (NSeq T)))

;; "nseq_const"
(assert (par (T) (forall ((f Int) (l Int) (v T))
    (!
      (let ((s (nseq.const f l v)))
        (and
          (= (nseq.first s) f)
          (= (nseq.last s) l)
          (forall ((i Int))
            (=> (and (<= f i) (<= i l)) (= (nseq.get s i) v)))))
      :pattern ((nseq.const f l v))))))

(define-fun nseq.relocate (par (T) ((s (NSeq T)) (f Int)) (NSeq T)
     (nseq.mk f (nseq.seq s))))

(define-fun nseq.concat (par (T) ((s1 (NSeq T)) (s2 (NSeq T))) (NSeq T)
  (ite (< (nseq.last s1) (nseq.first s1))
    s2
    (ite
      (or
        (< (nseq.last s2) (nseq.first s2))
        (not (= (nseq.first s2) (+ (nseq.last s1) 1))))
      s1
      (nseq.mk
        (nseq.first s1)
        (seq.++ (nseq.seq s1) (nseq.seq s2)))))))

(define-fun nseq.slice (par (T) ((s (NSeq T)) (f Int) (l Int)) (NSeq T)
  (ite
    (and
       (<= f l)
       (and (<= (nseq.first s) f) (<= l (nseq.last s))))
    (nseq.mk f (seq.extract (nseq.seq s) (- f (nseq.first s)) (+ (- l f) 1)))
    s)))

(define-fun nseq.update (par (T) ((s1 (NSeq T)) (s2 (NSeq T))) (NSeq T)
  (ite
    (and
       (<= (nseq.first s2) (nseq.last s2))
       (<= (nseq.first s1) (nseq.first s2))
       (<= (nseq.last s2) (nseq.last s1)))
    (nseq.mk (nseq.first s1)
       (seq.update
         (nseq.seq s1)
         (- (nseq.first s2) (nseq.first s1))
         (nseq.seq s2)))
     s1)))