1 Introduction

An Afficient Equivalence-Checking Algorithm for a Model of Programs with Commutative and Absorptive Statements

Vladislav Podymov

0 0 National Research University Higher School of Economics 3 Kochnovsky Proezd , Moscow 125319 , Russia

85 96

We present an efficient equivalence-checking algorithm for a propositional model of programs with semantics based on (what we call) progressive monoids on the finite set of statements generated by relations of a specific form. We consider arbitrary set of relations for commutativity (relations of the form ab = ba for statements a, b) and left absorption (relations of the form ab = b for statements a, b) properties. The main results are a polynomial-time decidability for the equivalence problem in the considered case, and an explicit description of an equivalence-checking algorithm which terminates in time polynomial in size of programs.

program models equivalence checking semigroups commutativity left absorption

1 Introduction

The paper addresses the functional equivalence problem for imperative programs: given two programs, do determine whether they compute the same function. This problem is known to be undecidable in general (follows from Rice’s theorem [ 1 ]), but is highly important to have means to determine if given programs are equivalent: theoretically, the equivalence problem complexity shows what influence the structure of programs has on their meaning, i.e. which structural properties can be analyzed and which cannot; practically, equivalence-checking algorithms can be used in a huge variety of program analysis problems: optimization, verification, refactoring, obfuscation, and so on — in which any two programs (e.g. original and transformed, or ideal and real) are tested for having the same functionality.

One way to overcome the negativity of Rice’s theorem is to provide sufficient conditions for program equivalence. For example, an optimizing compiler performs a sequence of program transformation steps, each step preserves the original program functionality, and sufficient equivalence conditions may be used to prove transformation soundness and provide optimization algorithms known to be sound. The range of sound and efficient optimization capabilities is impressive: compilers have means to remove some unreachable or dead code, to restructure control flow graph and its linear code blocks, to optimize a variety of efficiency characteristics [ 2 ]. Nevertheless, there is still a long way to go. For example, the block of code if(P(z)) {x=f(x); y=y+z;} else {x=g(x);x=v+z;} if(!P(z)) y=y+z; else x=v+z; and a much simpler block y=y+z; x=v+z; are equivalent, if we assume that the functions P, f, g are side-effect free and terminate, but simplifications of this kind are unlikely to be done automatically by modern compilers. To prove these blocks are equvalent, it is sufficient to use the following properties: 1. the value of P(z) remains unchanged; 2. the order of statements x = f(x); and y = y + z; has no effect on the result, and the same holds for x = v + z; and y = y + z; 3. the statement x = g(x); is dead if x = v + z; acts right after it, and the same holds for x = f(x); and x = v + z;

We refer to (2) as commutativity, and (3) as left absorption properties for statements, and focus on these two properties only. Suppose we have a statement a : (x1, . . . , xn) = f (y1, . . . , yk): “compute f on arguments y1, . . . , yk, and put the result into x1, . . . , xn”. Let M od(a) be the set {x1, . . . , xn}, and U se(a) be {y1, . . . , yk}. An easy way to extract commutativity and left absorption for statements is to analyze their U se- and M od-sets: if U se(a) ∩ M od(b) = M od(a) ∩ U se(b) = M od(a) ∩ M od(b) = ∅, then a and b are commutative; if M od(a) ⊆ M od(b) and M od(a) ∩ U se(b) = ∅, then b is left-absorptive for a.

To investigate the equivalence problem with respect to these properties we use the following approach: describe a program model which preserves the original program syntax but has a much simpler abstract semantics preserving some original semantical properties (programs in such a model are often called program schemata); propose an equivalence notion for schemata such that if two given schemata are equivalent, then the original programs are also equivalent; solve the equivalence problem for schemata, and use the resulting equivalence-checking algorithm to verify the original program equivalence.

We consider the model called “propositional programs” (PPs for short) proposed in [ 3, 4 ]. A PP is basically a finite automaton such that: vertices are labeled with statement symbols and correspond to program control locations in which a current data state is changed; transitions are labeled with all possible valuations for Boolean condition symbols; vertices are connected in the same way as in the original program. An interpretation which defines how a PP operates consists of: a set of abstract data states; a meaning of each statement symbol as a function for data state modification; a meaning of each condition symbol as a truth-value marking of data states. In general case these components are defined in terms of Kripke frames and models [ 5 ], but this paper focuses on commutativity and left absorption properties only: in this case data states are elements of a monoid on the set of statements generated by a set of relations of the form ab = ba (for commutativity) and ab = b (for left absorption) where a, b are statement symbols. For short, we call such a monoid a CA-monoid. We call two PPs equvalent in a CA-monoid M if for each interpretation based on M last data states of their executions are equal. All necessary definitions are given in the further sections.

The main goal of the paper is to provide an efficient equivalence-checking algorithm for PPs operating in CA-monoids. We call an algorithm efficient if it terminates in a polynomial time. Unfortunately, even for the trivial case when no commutativity and left absorption is assumed the equivalence problem for PPs is known to be co-NPcomplete [ 6 ]. But for a fixed finite signature (alphabet of statements and conditions) for the trivial case PPs are known to be closely related to finite automata [ 7 ] and have an equivalence-checking algorithm having a very low time complexity [ 8 ]. Thus, hereafter a fixed finite signature is assumed, which means that we study the hardness with respect to structural program features. Another unpleasant fact is that equivalence-checking results for automata and PPs are not interchangeable for nontrivial cases, for example, for arbitrary commutativity the equivalence problem for automata is known to be undecidable [ 9 ], while for schemata it is decidable in polynomial time for a syntactically and semantically more general case [ 10 ].

CA-monoids considered in this paper are assumed to be progressive: we cannot obtain the same element twice appending non-neutral element to the right via composition. Note that for left-cancellative monoids the progressiveness is the same as the atomicity of the neutral element. The main result of the paper is an improvement of known decidability of the equivalence problem in the considered case [ 11 ] to polynomial-time decidability (theorem 2), and an explicit description of a polynomial-time decision algorithm (section 5). Note that if commutativity and left absorption properties are extracted in the way showed above, then the resulting CA-monoid is necessarily progressive. Moreover, [ 12 ] provides a simple-to-check progressiveness criterion for CA-monoids.

To make a few remarks on related work, we may mention polynomial-time decidability results on equivalence problems for one-counter automata [ 13 ] (NLcompleteness), specific context-free languages [ 14, 15 ], specific recursive schemata [ 10 ], and decidability results for multitape automata [ 16 ] and k-valued transducers [ 17 ]. Note that there are much more results on the equivalence problem and quite a bit of them are mentioned.

The structure of the paper is as follows. Section 2 provides basic definitions. Section 3 discusses the main idea of the analysis technique and basic properties of the main object to be analyzed — a graph of consistent computations (GCC). The result of section 4 is a deep analysis of a GCC which allows to describe an polynomial-time equivalence-checking algorithm. The main theorem and the algorithm description are given in section 5. 2

Basic Definitions

Hereafter A and L are finite alphabets of basic assignment statements, or actions for short, and logical conditions respectively. A propositional program (PP) π = (L, en, ex, T, B) consists of: a finite set of control locations L including entry en and exit ex; a transition function T : (L \ {ex}) × L → L; a binding function B : L → A. We write l −→δπ l′ and l →π l′ instead of T (l, δ) = l′ for readability. We define the size |π| of a PP π to be the number of its control locations: |π| = |L|. The meaning of a PP is briefly discussed in the introduction except of the following: a logical condition may be understood as a valuation for all Boolean conditions, which means that exactly one logical condition holds for each abstract data state.

To define semantics of PPs in general case we should use Kripke frames and models [ 5 ], but in the considered case (commutativity and left absorption for actions) we simplify it using the notation of semigroup theory. A semigroup (S, ∗) consists of an arbitrary set S of elements, and a binary associative operation on S. A monoid M = (S, ∗, ε) is a semigroup (S, ∗) together with its neutral element ε. To define how a PP operates we use a monoid on the set of actions: each action from A is an element of the monoid, and each other element s can be decomposed into actions (s = a1 ∗ · · · ∗ ak, k ≥ 0, ai ∈ A for 1 ≤ i ≤ k). We write [a1 . . . ak] to denote such s. Hereafter we use only monoids on the set of actions. An interpretation I = (M, ξ) for a PP consists of a monoid M = (S, ∗, ε), and a valuation ξ : S → L for logical conditions.

A path for a program π is a sequence of program locations of the form l1 →π l2 →π . . . . For paths pt′, pt′′, by pt′ → pt′′ we denote a path which starts with a path pt′ and continues with a path pt′′. A path is called a trace if l1 is the entry of π. A path is full if δ1 it either is infinite or leads to the exit of π. A full trace is called a run. A trace l1 −→π l2 −δ→2 . . . is called an I-trace for an interpretation I if δi = ξ([B(l1) . . . B(li)]) for i ≥ 1. To simplify the notation, we write B(l1, . . . , li) along with B(l1) . . . B(li), and [l1, . . . , l1] along with [B(l1, . . . , li)]. The result of a finite I-run rn is the element [rn]. An infinite I-run loops and has the indefinite result ⊥. Two programs are I-equivalent if the results of their I-runs are equivalent. For a monoid M two programs are M equivalent if they are I-equivalent for each interpretation I = (M, ξ).

The equivalence problem investigated in this paper is the M -equivalence problem for PPs where M is a monoid: given two PPs, to check if they are M -equivalent. The problem is solved for specific monoids M , which we call progressive CA-monoids. A monoid is progressive if for each triple of its elements s, s1, s2, if s1 6= ε or s2 6= ε, then s 6= s ∗ s1 ∗ s2 Note that the elements of a progressive monoid are partially ordered w.r.t. its operation, i.e. s1 ∗ s2 6= s1 for s2 6= ε. In other words, if we introduce the following order ≤: s1 ≤ s2 iff ∃s. s2 = s1 ∗ s — then a monoid is ordered iff ≤ is a non-strict partial order. Along with ≤ we will use naturally induced orders <, ≥, >, as well as s1ks2 to denote the incomparability of s1, s2. A CA-monoid is a monoid generated by an arbitrary set of relations of the form ab = ba and ab = b (a, b ∈ A); the former one is a commutativity relation, the latter is a (left) absorption relation. A monoid generated by a set of relations can be defined as follows: – each element is a set of words over the alphabet A; – g ∈ [h] iff g can be obtained from h with a finite number of rewritings using generating relations (a subword from the left part of a relation to the right part and vice versa).

The main result of the paper is the polynomial-time decidability of the M equivalence problem for PPs for an arbitrary progressive CA-monoid M together with an explicit description of a decision algorithm. The description of the algorithm depends on a description of M , i.e. on a set of commutativity and absorption relations defining M . The algorithm is polynomial in time in size of input programs.

Graph of Consistent Computations

To simplify notation, hereafter we assume an arbitrary progressive CA-monoid M = (S, ∗, ε) to be defined, as well as PPs and π1 = (L1, en1, ex1, T1, B1) π2 = (L2, en2, ex2, T2, B2).

The main idea of how we analyze the M -equivalence problem for PPs is similar to those developed in [ 3, 4 ], [ 10 ], [ 17 ] and can be described as follows. Suppose we have an interpretation I = (M, ξ). We construct the I-traces tr1, tr2 of π1 and π2 in the following manner: we start with one-element I-traces of π1, π2; if [tr1] = [tr2], then we add the next location to both traces; if [tr1] > [tr2], then we add the next location to tr2; otherwise we add the next location to tr1. If any of the traces tri is an I-run, then we do not add anything to it. As the result we have a sequence of pairs of traces leading either to the I-runs of tr1, tr2, or to an I-run and an I-trace. Consider any such pair (tr1 → l1, tr2 → l2). To find what results can be obtained with computations starting in these traces, all we need to know is last program locations of these traces and data states they lead to. Thus, we replace this pair with a tuple (l1, l2, [tr1 → l1], [tr2 → l2]).

Hereafter we write ⌊s⌋ to denote the set of all minimal members of s, s ∈ S: {g | g ∈ s, |g| = min |u|} — and ksk to denote the length of words in ⌊s⌋. Also, we write u∈s ⌊h⌋ instead of ⌊[h]⌋, and call a word h ∈ A∗ a minimal member if h ∈ ⌊h⌋.

Let (l1, l2, s1, s2) be a tuple as described above, and s be a longest common prefix s of s1, s2: s ∗ s′1 = s1; s ∗ s′2 = s2; ksk has a maximal possible value. To check the equivalence of programs, we do not need the results themselves, but to check if the results are equal. Thus, we replace s1, s2 with s′1, s′2 in the tuple. Due to the construction rules for tuples, ks′2k ≤ 1, which means that s′2 = [α] for α ∈ A∪{λ}. Then we replace (l1, l2, s1, s2) with a reduced tuple (l1, l2, s′1, α). Looking over all possible interpretations I, we connect all possible tuples in a graph Γ , which we call a graph of consistent computations (GCC) in the following way. The vertices of Γ are all possible tuples (l1, l2, s, α), where l1, l2 are locations of π1, π2, s is a data state, and α ∈ A ∪ {λ}. If there exists an interpretation I such that tuples t1, t2 are what we get from adjacent pairs of traces obtained with the construction rules for I, then t1 and t2 are connected with an edge. To eliminate the enumeration of all interpretations, we define Γ using M , π1, and π2 only, and then prove that to analyze Γ is sufficient to understand whether π1, π2 are M -equivalent.

To explain the term “consistent”, we define the notion of consistency for traces of PPs in the following way. Traces of π1, π2 are (M -)consistent if they are I-traces for some interpretation I = (M, ξ). Thus, an alternative description of the M -equivalence problem for PPs is: given two PPs, to check if for each pair of M -consistent runs their results are equal. The following proposition shows how to check trace consistency without enumerating of all possible interpretations. if [tr1′] = [tr2′], then ∃δ. tr1′ −→δπ1 l1 and tr2′ −→δπ2 l2.

Proposition 1 ([ 3 ]). Let M be an progressive monoid, and tr1, tr2 be traces of π1, π2. Then tr1, tr2 are M -consistent iff for any their prefixes tr1′ →π1 l1, tr2′ →π2 l2 holds:

Now we give an alternative definition of Γ , which does not require enumeration of all interpretations. The nodes of Γ are 4-tuples of the form v = (l1, l2, s, α) where li ∈ Li, s ∈ S, and α ∈ A ∪ {λ}. The root of Γ is the node (en1, en2, s, α) where: if [B1(en1)] ≤ [B2(en2)], then s = ε, otherwise s = [B1(en2)]; if [B2(en2)] ≤ [B1(en1)], then α = λ, otherwise α = B2(en2). Some of the nodes (l1, l2, s, α) are terminal and have no outgoing edges: 1. l1 = ex1, α 6= λ; 2. l2 = ex2, s 6= ε; 3. l1 = ex1, l2 = ex2, and either s1 6= ε, or α 6= λ.

For cases (1), (2) (which are not mutually exclusive, but exclude (3)) we call a node disproving (the equivalence of PPs); Edges outgoing from a nonterminal node v = (l1, l2, s, α) carry the following labels (one edge for each δ ∈ L): (δ, δ) if l1 6= ex1, l2 6= ex2, and s = [α] = ε; (ǫ, δ) if l2 6= ex2, α = λ, and either l1 = ex1 or s 6= ε; (δ, ǫ) otherwise. Here ǫ is a special symbol, ǫ ∈/ L. To finish the definition of Γ , we (σ1,σ2) describe the node v′ = (l1′, l2′, s′, α′) s.t. v −−−−→ v′. If σ1 = ǫ, then l1′ = l1 and s′′ = s, otherwise l1 −σ→1 l1′ and s′′ = s ∗ [B1(l1′)]. If σ2 = ǫ, then l2′ = l2 and α′′ = α, σ2 otherwise l2 −→ l2′ and α′′ = B2(l2′). If s′′ = [α] ∗ s′′′, then s′ = s′′′ and α′ = λ, othewise s′ = s′′ and α′ = α′′.

As the GCC is defined, we should prove that it contains as many traces as needed to check if π1, π2 are M -equivalent. To do it, we show the correspondence between paths in Γ and traces of π1, π2 (lemmas 1–3), and then show what paths should we search for to check whether π1, π2 are M -equivalent (theorem 1).

To shorten the notation, we give three more definitions. An r-path is a path in Γ originating from its root. Projections pr1(ω), pr2(ω) of an r-path ω are defined as follows. pri((en1, en2, s, α)) = eni. Let ω = ω′ −(−σ1−,σ−2→) (l1, l2, s, α). Then: – pri(ω) = pri(ω′) if σi = ǫ; – pri(ω) = pri(ω′) −σ→i li otherwise.

For an infinite r-path ω we define pri(ω) as the shortest r-path s.t. for each finite prefix ω′′ of ω holds: pri(ω′′) is a prefix of pri(ω). An r-path ω is disproving in two cases: it leads to a disproving node; it is infinite, and for some its (infinite) tail ωt and some i ∈ {1, 2} holds: for a label (σ1, σ2) of any edge of ωt holds σi = ǫ, and for any node (l1, l2, s, α) of ωt the location exi is reachable from li.

Lemma 1. Let ω be an r-path. Then pr1(ω) and pr2(ω) are consistent traces of π1, π2. Moreover, if ω leads to a node (l1, l2, s, α), then the following holds: there exists s′ ∈ S s.t. s′ ∗ s = [pr1(ω)] and s′ ∗ [α] = [pr2(ω)]; if s 6= ε and α 6= λ, then s and [α] are incomparable.

Lemma 2. Let rn1, rn2 be consistent runs of π1, π2. Then there is an r-path ω s.t. pri(ω) is a prefix of rni (for i ∈ {1, 2}), and either ω is infinite, or it leads to a terminal vertex. Lemma 3. Let ω be an r-path, i ∈ {1, 2}, and rni = pri(ω) →πi pt be a run of πi. Then rni and pr3−i(ω) are consistent traces.

Lemmas 1, 2 follow from proposition 1, the left-cancellativity of M , and the definition of Γ . To prove lemma 3 by contradiction, we note that if rni and pr3−i(ω) are not consistent, then due to proposition 1 and lemma 1 there exist a proper prefix pt′ of a path pt and a proper prefix tr3′−i of pr3−i(ω) s.t. [pri(ω) →πi pt′] = [tr3′−i]. Then by lemma 1 there does not exist a prefix ω′ of ω s.t. pri(ω) = tr3′−i, which is impossible by the definition of pri.

Theorem 1. PPs π1, π2 are M -equivalent iff Γ contains no disproving paths. Proof. Necessity. If Γ contains a finite disproving path ω, then by lemma 3 the traces pr1(ω), pr2(ω) can be extended to consistent runs of π1, π2, and by lemma 2 the results of these runs are distinct. If Γ contains an infinite disproving path ω and i-th components of edge labels of some its tail are equal to ǫ, then the trace pri(ω) can be extended to a finite run rni of πi, while pr3−i is an infinite run of π3−i, and by lemma 3 the runs rni and pr3−i are consistent and have distinct results.

Sufficiency. If rn1, rn2 are consistent runs of π1, π2 with distinct results, then by lemma 2 we obtain an r-path ω which is either finite and leads to a disproving vertex due to lemma 1, or infinite, in which case exactly one of runs rn1, rn2 is infinite, and ω is an infinite disproving path.

Note that theorem 1 does not solve the M -equivalence problem for PPs, as Γ is infinite in general case. The next section provide means to build a polynomial-time traversal of Γ sufficient to solve the equivalence problem. 4

Analysis of Graph of Consistent Computations

The main idea of the further GCC analysis is to partition its nodes into a finite number of classes and prove that during the traversal it is sufficient to visit polynomially many nodes in each class, and after it we state either that π1 and π2 are not M -equivalent (lemma 7) or that other nodes in the class may be ignored in the further traversal (lemma 8). To do it, we introduce and analyze the notion which we called an absorption effect. An (absorption) effect induced by an element s ∈ S is a function aes : S → S showing how the elements of S are changed under “pressure” of absorption of s appended to the right. Formally, aes(s1) = arg min ks2k. To prove the soundness of the definition, s2:s1∗s=s2∗s we state a (what we call) commutativity of minimals: if h, g are minimal members of the same element of M , then g can be obtained from h with commutativity rewritings only (without any absorption). The soundness is provided by the following lemma. Lemma 4. An equation X ∗ s = s′ has at most one solution s′′ on X s.t. ks′′k has minimal possible value among all solutions.

Proof. Suppose we have at least two such solutions: s′1′, and s′′: s′1′ ∗ s = s′ = s′2′ ∗ s. 2 Let h1 ∈ ⌊s′1′⌋, h2 ∈ ⌊s′2′⌋, and h ∈ ⌊s⌋. Due to the commutativity of minimals, we can obtain h2h from h1h using commutativity relations only. By symmetry of commutativity relation, we get the derivation of h−h2− from h−h1− (g− is a reversed g). By the left-cancellativity of M and the same symmetry, h1 can be rewritten to h2 with commutativity only, which means s′1′ = s′2′.

The next step is to introduce a partial order on the set of all effects (which we denote by ): ae1 ≤ ae2 iff there exist s1, s2 ∈ S s.t. s1 ≤ s2, ae1 = aes1 , and ae2 = aes2 . To prove that ≤ is a partial order, we state one more algebraic property of M , which we call the inductive minimality: if h is a minimal member, a ∈ A, and [ah] 6= [h], then ah is a minimal member.

Lemma 5. The relation ≤ over is a non-strict partial order.

Proof. By the inductive minimality, an effect ae determines a set of symbols deleted from a minimal member: if ae(s) = s′ and h ∈ ⌊s⌋, then there exists h′ ∈ ⌊s′⌋ s.t. h′ can be obtained from h by deleting some symbols. If there exist ae1, ae2 s.t. ae1 ≤ ae2 and ae2 ≤ ae1, then ae1 deletes at least those symbols which are deleted by ae2 and vice versa. It means that for any minimal member h the effects ae1 and ae2 delete the same symbols from h, and thus ae1([h]) = ae2([h]).

Another pleasant property of the effects is that they can be (in some sense) computed by a finite automaton. It means that the set is finite, and that having a short description of an effect aes and an action a, we can easily compute the description of aes∗[a]. An automaton A = (Q, q0, TA, SA, MA) (over the alphabet A) consists of: a finite set of states Q including an initial state q0; a transition function T : Q × A → Q; a set of labels SA; a marking function MA : Q → SA. We denote by A(q, h) a state to which A jumps reading the word h ∈ A∗, and use the following shortenings: A(h) is A(q0, h); MA(q, h) is MA(A(q, h)); MA(h) is MA(A(h)). The notion of automaton used in this paper is a minor generalization of a well-known finite deterministic acceptor with the only difference: instead of answers “yes” and “no”, here SA is the set of all possible answers. We put the following meaning in the phrase “to compute an effect”: |SA| = ||; ae[h] = ae[g] iff MA(h) = MA(g); if [h] = [g], then A(h) = A(g). We call an automaton satisfying all these properties an ae-automaton. Note that intuitively the best way to define such an automaton is to say SA = , but formally the set cannot be used explicitly in any algorithm as it contains functions with an infinite domain. The description of an ae-automaton is based on an automata-based recognition of absorption: there exists an automaton with the following two properties: reading an input word h, it enters a state labelled with the set {a ∈ A | [a] ∗ [h−] = [h−]}; reading words h, g s.t. [h−] = [g−], it enters the same state.

Lemma 6. For any progressive CA-monoid there exists an ae-automaton.

Proof. We start with an automaton A stated by the automata-based recognition of absorption. Then we divide the states of A into equivalence classes: q and q′ are equal iff ∀h ∈ A∗. MA(q, h) = MA(q′, h). Note that we have a finite number N of such classes. Changing the labels of A to {1, . . . , N }, enumerating the equivalence classes, and assigning the label i to the i-th class, we get an automaton A′ s.t. MA′ (h) = MA′ (g) iff aeh− = aeg− . Then by changing the labels again we construct exactly N finite-state acceptors, the i-th of which says if the original answer is i. Using classical automatatheoretic results [ 8 ], we reverse the language of each acceptor, construct their Cartesian product and reassign labels back to {1, . . . , N }, and the result is exactly a required ae-automaton: MA′ (h) = MA′ (g) iff aeh = aeg, and all states were constructed for elements of S, thus all parts of the definition are satisfied.

Hereafter we assume to be given an ae-automaton A = (Q, q0, TA, SA, MA). By the definition of A, we write A([h]) along with A(h). The last two notions required for the analysis of Γ are an effect evolution and a generalized effect evolution. Intuitively, these notions show how an absorption effect is modified during a program run from some intermediate run location. An effect evolution ae˙ sh along the word h = a1 . . . ak ∈ A∗ induced by s ∈ S is defined as follows. We start with the sequence (aes, aes∗[a1], . . . , aes∗[h]). Then we delete adjacent duplicates. The result is exactly ae˙ sh. Note that any evolution is a chain in a finite (lemma 6) partially ordered (lemma 5) set, and thus the number of evolutions is finite. A generalized effect evolution ae¨qh along the word h = a1 . . . ak ∈ A∗ induced by a state q ∈ Q is defined in the same way except we start with the sequence of pairs ((aeε, aeMA(q)), (ae[a1], aeMA(q,a1)), . . . , (ae[h], aeMA(q,h))). For the same reasons the set of all generalized evolutions is finite.

There is an important property of evolutions. For any full path pt for a PP π = (L, en, ex, T, B) and any s ∈ S there exists a full path pt′ (which we call reduced) s.t.: ae˙ sB(pt) = ae˙ sB(pt′); if pt is finite, then |pt′| ≤ |L| · |Q|; if pt is infinite, then pt′ = pt′1′ → pt′2′ → pt′2′ → . . . where |pt′1′| + |pt′2′| ≤ |L| · |Q|. To prove the existence of a reduced path pt′, it is sufficient to: mark pt with states of A starting with A(s) modifying it with the actions of pt; delete loops between two repeated pairs of a program location and an automaton state; for an infinite path, replace an infinite tail corresponding to the last effect in the evolution with the repetition of a loop described above. The same property holds for generalized evolutions with an upper bound |L| · |Q|2 instead of |L| · |Q|. The following lemmas are the result of the analysis of Γ , as they provide a finite partitioning of the nodes of Γ required in the equivalence-checking algorithm.

Lemma 7. Let: q ∈ Q; l1 ∈ L1; l2 ∈ L2; α ∈ A ∪ {λ}; m = |Q|3 · |π1| · |π2|2 + 2; pairwise-distinct GCC nodes (l1, l2, si, α), i ∈ {1, . . . , m}, be reachable from the GCC root; A(s1) = · · · = A(sm) = q; li → pti be a full path of πi (i ∈ {1, 2}), and at least one of these paths is finite; ae¨B1(pt1) = ((ae11, ae21), . . . , (ae1k, ae2k)); ae˙ αB2(pt2) = q (ae˙ 1, . . . , ae˙ p); 1 ≤ k′ ≤ k; the elements ae1k′ (si), i ∈ {1, . . . , m}, be pairwise distinct; {ae1, . . . , aep} ∩ {ae2k′+1, . . . , ae2 } = ∅.

Then π1 and π2 are not M -equivalent.

Proof. The main idea is to show that at least for one of the nodes wj traces of a projection of a path ωj leading to this node can be expanded into consistent runs with different results. We show in details only one case (for other cases a proof technique works in the same way, though it looks a bit more complex): pt1 and pt2 are finite, k′ = k. W.l.o.g. we assume pt1 and pt2 to be reduced paths, thus |pti| ≤ |πi| · |Q|3−i. By proposition 1 the only way for the runs not to be consistent is if their proper prefixes lead to a common element s, and next steps are made with distinct conditions δ1, δ2 — we call it a collision. By lemma 3, for each ωp these prefixes are not shorter than the projections of ωp. Thus, we have [pr1(ωp) → pt′1] = [pr2(ωp) → pt′2] for some proper prefixes pt′1, pt′2 of pt1 and pt2. But as the elements ae1k(sp), ae1k(sr) are distinct for p 6= r, the elements [pr1(ωp) → pt′1] and [pr1(ωp) → pt′1] are also distinct (as M is left-cancellative), while [pr2(ωp) → pt′2] = [pr2(ωr) → pt′2]. Thus, a collision for different indexes p, r cannot happen for the same prefixes pt′1, pt′2 simultaneously. On the other hand, we have at most |pt1| · |pt2| ≤ |π1| · |π2| · |Q|3 possible pairs of lengths for these prefixes, which means that at least for two indexes j′, j′′ there are no collisions: in other words, expansions for these indexes are consistent. Based on the same thought about distinction of ae1k(sj′ ), ae1k(sj′′ ) as above for p, r, we conclude that at least for one of indexes j′, j′′ the results of computations are distinct. Lemma 8. Let: q ∈ Q; l1 ∈ L1; l2 ∈ L2; α ∈ A ∪ {λ}; m = |Q|3 · |π1| · |π2|2 + 2; pairwise-distinct GCC nodes vi = (l1, l2, si, α), i ∈ {1, . . . , m}, are reachable from the GCC root; A(s1) = · · · = A(sm) = q; li → pti is a full path of πi, i ∈ {1, 2}; ae¨qB1(pt1)] = ((ae11, ae21), . . . , (ae1k, ae2k)); ae˙ αB2(pt2) = (ae1, . . . , aes); 1 ≤ k′ < k, 1 ≤ s′ < s; the elements ae1k′ (si), i ∈ {1, . . . , m}, be pairwise different; ae1k′+1(s1) = · · · = ae1k′+1(sm); {ae1, . . . , aes′ } ∩ {ae2k′+1, . . . , ae2 } = ∅; aes′+1 ∈ {ae2k′+1, . . . , ae2k}; k ω be a GCC r-path leading to vm; pr1(ω) → pt1 and pr2(ω) → pt2 be consistent runs having distinct results.

Then there exists j, 1 ≤ j < m, s.t. vj belongs to some disproving r-path.

We prove lemma 8 using the same technique as for lemma 7, so due to lack of space we omit the proof. 5

Equivalence-Checking Algorithm

The resulting equivalence-checking algorithm is to traverse Γ in any way. During the traversal visited nodes are partitioned into classes defined by a state q, locations l1, l2, and a symbol α (as in lemmas 7, 8): the nodes (l1, l2, s, α) and (l1′, l2′, s′, α′) belong to the same class iff l1 = l1′, l2 = l2′, A(s) = A(s′), and α = α′. If the traversal is finished, then the absence of disproving paths can be checked directly, and the answer is given by theorem 1. If (|Q|3 · |π1| · |π2|2 + 2) nodes are visited in any class, then either the non-equivalence is stated (lemma 7), or all other nodes in the class are ignored from now on (lemma 8).

Let N be the total program size: N = |π1| + |π2|. The number of distinct node classes is O(N 2). Each class contains O(N 3) nodes. Thus, no more than O(N 5) nodes are visited during the traversal. A minimal member g ∈ ⌊h⌋ can be constructed in time O(|h|) using the inductive minimality and lemma 6. Checking if [h] ≤ [g] can be done in time O((|h| + |g|)2): construct a word g′ ∈ ⌊g⌋; if h = λ, then [h] ≤ [g]; otherwise h = ah′, where a ∈ A; if [ag] = [g] (it can be checked with A), then delete a from h and repeat the procedure; otherwise check whether g contains a letter a; if not, then [h] [g]; if yes, then pick the leftmost occurence of a in g: g = g′ag′′; if each symbol from g′ commutes with a, then delete marked a from h and g and repeat the procedure; otherwise [h] [g]. The soundness of the latter algorithm is proved in the appendix. The elimination of a longest prefix can be performed in a similar way, the difference is that we pick not the leftmost symbol of h but every symbol which can be shifted to the leftmost position with commutativity relations, and every completion is successfull. Each element of M can be encoded as any its member (e.g. constructed by a computation). During the traversal the algorithm constructs words h of length at most O(N 5), and each element comparison can be done in time O(N 10). For each vertex the algorithm does at most O(N 3) comparisons of elements of M , and this is the bottleneck of the algorithm. Thus, the algorithm performs O(N 5) steps, and each step is done in time O(N 13), and the total time is O(N 18). The last upper bound together with theorem 1 and lemmas 7, 8 states the validity of the main theorem of this paper. Theorem 2. Let M be a progressive CA-monoid on a finite set of actions. Then the M -equivalence problem for propositional programs is decidable in time polynomial in size of programs. 6

Conclusion

We want to present three thoughts concluding the paper. The first one is that the considered case: propositional programs, commutativity, and left absorption — appears to be simple enough to have a polynomial-time decidability for the equivalence problem. The second one is that (nevertheless) this case is complex enough to require a rather nontrivial technique for its analysis. Finally, the third one is that it was mentioned in the introduction that there is still a long way to go, and this statement remains unchanged, as even in a simple example in the introduction we need not only commutativity and absorption (but also dependencies between statements and conditions) to prove the equivalence. To be able to prove the equivalence of “real” fragments of code, we should consider other properties of program primitives, which is the topic of future research.

1. Rice , H.G. : Classes of recursively enumerable sets and their decision problems . Trans. Amer. Math. Soc . 89 , 25 - 59 ( 1953 )

2. Aho , A.V. , Lam , M.S. , Sethi , R. , Ullman , J.D. : Compilers: Principles, Techniques and

Tools. Pearson

Education , Inc., New York ( 2006 )

3. Zakharov , V.A. : An efficient and unified approach to the decidability of equivalence of propositional program schemes . Lect. Notes Comput. Sci . 1443 , 247 - 258 ( 1998 )

4. Zakharov , V.A. , Zakharyaschev , I.M. : On the equivalence checking problem for a model of programs related with muti-tape automata . Lect. Notes Comput. Sci . 3317 , 293âA˘ S¸ - 305 ( 2005 )

5. Harel , D. , Kozen , D. , Tiuryn

J.: Dynamic

Logic . The MIT Press ( 2000 )

6. Hunt , H.B. , Constable , R.L. , Sahni , S. : On the computational complexity of program scheme equivalence . SIAM Journal of Computing. 9, N 2 , 396 - 416 ( 1980 )

7. Rutledge , J.D.: On Ianov's program schemata . J. Assoc. Comp. Mach. 11 , 1 - 9 ( 1964 )

8. Hopcroft , J.E. , Motwani , R. , Ullman , J.D. : Introduction to automata theory, languages, and computation . Addison-Wesley, Boston ( 2003 )

9. Kozen , D. : Kleene Algebra withTests and Commutativity Conditions . In: Tools and Algorithms for Construction and Analysis of Systems , Second International Workshop, pp. 14 - 33 ( 1996 )

10. Zakharov , V.A. : On the decidability of the equivalence problem for monadic recursive programs . Theoretical Informatics and Applications. 34, N 2 , 157 - 171 ( 2000 )

11. Glushkov , V.M. , Letichevskii , A.A. : Theory of algorithms and discrete processors . Adv. Inf. Syst. Sci. 1 , 1 - 58 ( 1969 )

12. Podymov , V.V. , Zakharov , V.A. : On a semigroup model of sequential programs specified by means of two-tape automata . Belgorod State University Scientific Bulletin. History. Political science. Economics. Information technologies . 7 ( 78 ), N 14/1 , 94 - 101 ( 2010 ) (in Russian)

13. Bohm , S. , Goller , S , Jancar, P. : Equivalence of deterministic one-counter automata is NLcomplete . Proceedings of the forty-fifth annual ACM symposium on Theory of computing . 131- 140 ( 2013 )

14. Debski , W. , Fraczak , W. : Concatenation state machines and simple functions . Lecture Notes in Computer Science . 3317 , 113 - 124 ( 2004 )

15. Bastien , C. , Czyzowicz , J. , Fraczak , W. , Rytter , W. : Prime normal form and equivalence of simple grammars . Theoretical Computer Science. 363, N 2 , 124 - 134 ( 2006 )

16. Harju , T. , Karhumäki , J.: The equivalence problem of multitape finite automata . Theoretical Computer Science. 78, N .2 , pp. 347 - 355 ( 1991 )

17. de Souza, R.: On the decidability of the equivalence for k-valued transducers . Lecture Notes in Computer Science . 5257 , 252 - 263 ( 2008 )