An Afficient Equivalence-Checking Algorithm for
a Model of Programs with Commutative and Absorptive
                      Statements

                                    Vladislav Podymov

                  National Research University Higher School of Economics
                       3 Kochnovsky Proezd, Moscow 125319, Russia
                                  valdus@yandex.ru


       Abstract. We present an efficient equivalence-checking algorithm for a propo-
       sitional model of programs with semantics based on (what we call) progressive
       monoids on the finite set of statements generated by relations of a specific form.
       We consider arbitrary set of relations for commutativity (relations of the form
       ab = ba for statements a, b) and left absorption (relations of the form ab = b for
       statements a, b) properties. The main results are a polynomial-time decidability
       for the equivalence problem in the considered case, and an explicit description of
       an equivalence-checking algorithm which terminates in time polynomial in size
       of programs.

       Key words: program models, equivalence checking, semigroups, commutativity,
       left absorption


1   Introduction
The paper addresses the functional equivalence problem for imperative programs: given
two programs, do determine whether they compute the same function. This problem is
known to be undecidable in general (follows from Rice’s theorem [1]), but is highly
important to have means to determine if given programs are equivalent: theoretically,
the equivalence problem complexity shows what influence the structure of programs
has on their meaning, i.e. which structural properties can be analyzed and which cannot;
practically, equivalence-checking algorithms can be used in a huge variety of program
analysis problems: optimization, verification, refactoring, obfuscation, and so on — in
which any two programs (e.g. original and transformed, or ideal and real) are tested for
having the same functionality.
    One way to overcome the negativity of Rice’s theorem is to provide sufficient con-
ditions for program equivalence. For example, an optimizing compiler performs a se-
quence of program transformation steps, each step preserves the original program func-
tionality, and sufficient equivalence conditions may be used to prove transformation
soundness and provide optimization algorithms known to be sound. The range of sound
and efficient optimization capabilities is impressive: compilers have means to remove
some unreachable or dead code, to restructure control flow graph and its linear code
blocks, to optimize a variety of efficiency characteristics [2]. Nevertheless, there is still
a long way to go. For example, the block of code
86

    if(P(z)) {x=f(x); y=y+z;} else {x=g(x);x=v+z;}
    if(!P(z)) y=y+z; else x=v+z;
and a much simpler block y=y+z; x=v+z; are equivalent, if we assume that the
functions P, f, g are side-effect free and terminate, but simplifications of this kind
are unlikely to be done automatically by modern compilers. To prove these blocks are
equvalent, it is sufficient to use the following properties:

 1. the value of P(z) remains unchanged;
 2. the order of statements x = f(x); and y = y + z; has no effect on the result,
    and the same holds for x = v + z; and y = y + z;
 3. the statement x = g(x); is dead if x = v + z; acts right after it, and the same
    holds for x = f(x); and x = v + z;

     We refer to (2) as commutativity, and (3) as left absorption properties for statements,
and focus on these two properties only. Suppose we have a statement a : (x1 , . . . , xn ) =
f (y1 , . . . , yk ): “compute f on arguments y1 , . . . , yk , and put the result into x1 , . . . , xn ”.
Let M od(a) be the set {x1 , . . . , xn }, and U se(a) be {y1 , . . . , yk }. An easy way to
extract commutativity and left absorption for statements is to analyze their U se- and
M od-sets: if U se(a) ∩ M od(b) = M od(a) ∩ U se(b) = M od(a) ∩ M od(b) = ∅, then
a and b are commutative; if M od(a) ⊆ M od(b) and M od(a) ∩ U se(b) = ∅, then b is
left-absorptive for a.
     To investigate the equivalence problem with respect to these properties we use the
following approach: describe a program model which preserves the original program
syntax but has a much simpler abstract semantics preserving some original semantical
properties (programs in such a model are often called program schemata); propose an
equivalence notion for schemata such that if two given schemata are equivalent, then
the original programs are also equivalent; solve the equivalence problem for schemata,
and use the resulting equivalence-checking algorithm to verify the original program
equivalence.
     We consider the model called “propositional programs” (PPs for short) proposed in
[3, 4]. A PP is basically a finite automaton such that: vertices are labeled with state-
ment symbols and correspond to program control locations in which a current data state
is changed; transitions are labeled with all possible valuations for Boolean condition
symbols; vertices are connected in the same way as in the original program. An inter-
pretation which defines how a PP operates consists of: a set of abstract data states; a
meaning of each statement symbol as a function for data state modification; a mean-
ing of each condition symbol as a truth-value marking of data states. In general case
these components are defined in terms of Kripke frames and models [5], but this paper
focuses on commutativity and left absorption properties only: in this case data states
are elements of a monoid on the set of statements generated by a set of relations of
the form ab = ba (for commutativity) and ab = b (for left absorption) where a, b are
statement symbols. For short, we call such a monoid a CA-monoid. We call two PPs
equvalent in a CA-monoid M if for each interpretation based on M last data states of
their executions are equal. All necessary definitions are given in the further sections.
     The main goal of the paper is to provide an efficient equivalence-checking algo-
rithm for PPs operating in CA-monoids. We call an algorithm efficient if it terminates
                                                                                          87

in a polynomial time. Unfortunately, even for the trivial case when no commutativity
and left absorption is assumed the equivalence problem for PPs is known to be co-NP-
complete [6]. But for a fixed finite signature (alphabet of statements and conditions) for
the trivial case PPs are known to be closely related to finite automata [7] and have an
equivalence-checking algorithm having a very low time complexity [8]. Thus, hereafter
a fixed finite signature is assumed, which means that we study the hardness with respect
to structural program features. Another unpleasant fact is that equivalence-checking re-
sults for automata and PPs are not interchangeable for nontrivial cases, for example, for
arbitrary commutativity the equivalence problem for automata is known to be undecid-
able [9], while for schemata it is decidable in polynomial time for a syntactically and
semantically more general case [10].
     CA-monoids considered in this paper are assumed to be progressive: we cannot ob-
tain the same element twice appending non-neutral element to the right via composition.
Note that for left-cancellative monoids the progressiveness is the same as the atomicity
of the neutral element. The main result of the paper is an improvement of known decid-
ability of the equivalence problem in the considered case [11] to polynomial-time decid-
ability (theorem 2), and an explicit description of a polynomial-time decision algorithm
(section 5). Note that if commutativity and left absorption properties are extracted in the
way showed above, then the resulting CA-monoid is necessarily progressive. Moreover,
[12] provides a simple-to-check progressiveness criterion for CA-monoids.
     To make a few remarks on related work, we may mention polynomial-time
decidability results on equivalence problems for one-counter automata [13] (NL-
completeness), specific context-free languages [14, 15], specific recursive schemata
[10], and decidability results for multitape automata [16] and k-valued transducers [17].
Note that there are much more results on the equivalence problem and quite a bit of them
are mentioned.
     The structure of the paper is as follows. Section 2 provides basic definitions. Sec-
tion 3 discusses the main idea of the analysis technique and basic properties of the
main object to be analyzed — a graph of consistent computations (GCC). The result
of section 4 is a deep analysis of a GCC which allows to describe an polynomial-time
equivalence-checking algorithm. The main theorem and the algorithm description are
given in section 5.


2   Basic Definitions

Hereafter A and L are finite alphabets of basic assignment statements, or actions for
short, and logical conditions respectively. A propositional program (PP)

                                  π = (L, en, ex, T, B)

consists of: a finite set of control locations L including entry en and exit ex; a transition
                                                                                       δ
function T : (L \ {ex}) × L → L; a binding function B : L → A. We write l −           →π l ′
and l →π l′ instead of T (l, δ) = l′ for readability. We define the size |π| of a PP π to be
the number of its control locations: |π| = |L|. The meaning of a PP is briefly discussed
in the introduction except of the following: a logical condition may be understood as
88

a valuation for all Boolean conditions, which means that exactly one logical condition
holds for each abstract data state.
      To define semantics of PPs in general case we should use Kripke frames and mod-
els [5], but in the considered case (commutativity and left absorption for actions) we
simplify it using the notation of semigroup theory. A semigroup (S, ∗) consists of
an arbitrary set S of elements, and a binary associative operation on S. A monoid
M = (S, ∗, ε) is a semigroup (S, ∗) together with its neutral element ε. To define how
a PP operates we use a monoid on the set of actions: each action from A is an element of
the monoid, and each other element s can be decomposed into actions (s = a1 ∗· · ·∗ak ,
k ≥ 0, ai ∈ A for 1 ≤ i ≤ k). We write [a1 . . . ak ] to denote such s. Hereafter we use
only monoids on the set of actions. An interpretation I = (M, ξ) for a PP consists of a
monoid M = (S, ∗, ε), and a valuation ξ : S → L for logical conditions.
      A path for a program π is a sequence of program locations of the form l1 →π l2 →π
. . . . For paths pt′ , pt′′ , by pt′ → pt′′ we denote a path which starts with a path pt′ and
continues with a path pt′′ . A path is called a trace if l1 is the entry of π. A path is full if
                                                                                                     δ1
it either is infinite or leads to the exit of π. A full trace is called a run. A trace l1 −→            π
      δ2
l2 −→ . . . is called an I-trace for an interpretation I if δi = ξ([B(l1 ) . . . B(li )]) for
i ≥ 1. To simplify the notation, we write B(l1 , . . . , li ) along with B(l1 ) . . . B(li ), and
[l1 , . . . , l1 ] along with [B(l1 , . . . , li )]. The result of a finite I-run rn is the element [rn].
An infinite I-run loops and has the indefinite result ⊥. Two programs are I-equivalent
if the results of their I-runs are equivalent. For a monoid M two programs are M -
equivalent if they are I-equivalent for each interpretation I = (M, ξ).
      The equivalence problem investigated in this paper is the M -equivalence problem
for PPs where M is a monoid: given two PPs, to check if they are M -equivalent. The
problem is solved for specific monoids M , which we call progressive CA-monoids. A
monoid is progressive if for each triple of its elements s, s1 , s2 , if s1 6= ε or s2 6= ε,
then s 6= s ∗ s1 ∗ s2 Note that the elements of a progressive monoid are partially ordered
w.r.t. its operation, i.e. s1 ∗ s2 6= s1 for s2 6= ε. In other words, if we introduce the
following order ≤: s1 ≤ s2 iff ∃s. s2 = s1 ∗ s — then a monoid is ordered iff ≤ is
a non-strict partial order. Along with ≤ we will use naturally induced orders <, ≥, >,
as well as s1 ks2 to denote the incomparability of s1 , s2 . A CA-monoid is a monoid
generated by an arbitrary set of relations of the form ab = ba and ab = b (a, b ∈ A);
the former one is a commutativity relation, the latter is a (left) absorption relation. A
monoid generated by a set of relations can be defined as follows:

  – each element is a set of words over the alphabet A;
  – g ∈ [h] iff g can be obtained from h with a finite number of rewritings using
    generating relations (a subword from the left part of a relation to the right part and
    vice versa).

    The main result of the paper is the polynomial-time decidability of the M -
equivalence problem for PPs for an arbitrary progressive CA-monoid M together with
an explicit description of a decision algorithm. The description of the algorithm depends
on a description of M , i.e. on a set of commutativity and absorption relations defining
M . The algorithm is polynomial in time in size of input programs.
                                                                                                 89

3     Graph of Consistent Computations
To simplify notation, hereafter we assume an arbitrary progressive CA-monoid M =
(S, ∗, ε) to be defined, as well as PPs
                                  π1 = (L1 , en1 , ex1 , T1 , B1 )
and
                                  π2 = (L2 , en2 , ex2 , T2 , B2 ).
The main idea of how we analyze the M -equivalence problem for PPs is similar to
those developed in [3, 4], [10], [17] and can be described as follows. Suppose we have
an interpretation I = (M, ξ). We construct the I-traces tr1 , tr2 of π1 and π2 in the
following manner: we start with one-element I-traces of π1 , π2 ; if [tr1 ] = [tr2 ], then
we add the next location to both traces; if [tr1 ] > [tr2 ], then we add the next location to
tr2 ; otherwise we add the next location to tr1 . If any of the traces tri is an I-run, then
we do not add anything to it. As the result we have a sequence of pairs of traces leading
either to the I-runs of tr1 , tr2 , or to an I-run and an I-trace. Consider any such pair
(tr1 → l1 , tr2 → l2 ). To find what results can be obtained with computations starting
in these traces, all we need to know is last program locations of these traces and data
states they lead to. Thus, we replace this pair with a tuple (l1 , l2 , [tr1 → l1 ], [tr2 → l2 ]).
     Hereafter we write ⌊s⌋ to denote the set of all minimal members of s, s ∈ S: {g |
g ∈ s, |g| = min |u|} — and ksk to denote the length of words in ⌊s⌋. Also, we write
                u∈s
⌊h⌋ instead of ⌊[h]⌋, and call a word h ∈ A∗ a minimal member if h ∈ ⌊h⌋.
      Let (l1 , l2 , s1 , s2 ) be a tuple as described above, and s be a longest common prefix
s of s1 , s2 : s ∗ s′1 = s1 ; s ∗ s′2 = s2 ; ksk has a maximal possible value. To check the
equivalence of programs, we do not need the results themselves, but to check if the re-
sults are equal. Thus, we replace s1 , s2 with s′1 , s′2 in the tuple. Due to the construction
rules for tuples, ks′2 k ≤ 1, which means that s′2 = [α] for α ∈ A∪{λ}. Then we replace
(l1 , l2 , s1 , s2 ) with a reduced tuple (l1 , l2 , s′1 , α). Looking over all possible interpreta-
tions I, we connect all possible tuples in a graph Γ , which we call a graph of consistent
computations (GCC) in the following way. The vertices of Γ are all possible tuples
(l1 , l2 , s, α), where l1 , l2 are locations of π1 , π2 , s is a data state, and α ∈ A ∪ {λ}.
If there exists an interpretation I such that tuples t1 , t2 are what we get from adjacent
pairs of traces obtained with the construction rules for I, then t1 and t2 are connected
with an edge. To eliminate the enumeration of all interpretations, we define Γ using M ,
π1 , and π2 only, and then prove that to analyze Γ is sufficient to understand whether
π1 , π2 are M -equivalent.
      To explain the term “consistent”, we define the notion of consistency for traces of
PPs in the following way. Traces of π1 , π2 are (M -)consistent if they are I-traces for
some interpretation I = (M, ξ). Thus, an alternative description of the M -equivalence
problem for PPs is: given two PPs, to check if for each pair of M -consistent runs their
results are equal. The following proposition shows how to check trace consistency with-
out enumerating of all possible interpretations.
Proposition 1 ([3]). Let M be an progressive monoid, and tr1 , tr2 be traces of π1 , π2 .
Then tr1 , tr2 are M -consistent iff for any their prefixes tr1′ →π1 l1 , tr2′ →π2 l2 holds:
                                  δ                  δ
if [tr1′ ] = [tr2′ ], then ∃δ. tr1′ −
                                    →π1 l1 and tr2′ −
                                                    → π2 l 2 .
90

    Now we give an alternative definition of Γ , which does not require enumeration
of all interpretations. The nodes of Γ are 4-tuples of the form v = (l1 , l2 , s, α) where
li ∈ Li , s ∈ S, and α ∈ A ∪ {λ}. The root of Γ is the node (en1 , en2 , s, α) where:
if [B1 (en1 )] ≤ [B2 (en2 )], then s = ε, otherwise s = [B1 (en2 )]; if [B2 (en2 )] ≤
[B1 (en1 )], then α = λ, otherwise α = B2 (en2 ). Some of the nodes (l1 , l2 , s, α) are
terminal and have no outgoing edges:

 1. l1 = ex1 , α 6= λ;
 2. l2 = ex2 , s 6= ε;
 3. l1 = ex1 , l2 = ex2 , and either s1 6= ε, or α 6= λ.

     For cases (1), (2) (which are not mutually exclusive, but exclude (3)) we call a
node disproving (the equivalence of PPs); Edges outgoing from a nonterminal node
v = (l1 , l2 , s, α) carry the following labels (one edge for each δ ∈ L): (δ, δ) if l1 6= ex1 ,
l2 6= ex2 , and s = [α] = ε; (ǫ, δ) if l2 6= ex2 , α = λ, and either l1 = ex1 or s 6= ε;
(δ, ǫ) otherwise. Here ǫ is a special symbol, ǫ ∈    / L. To finish the definition of Γ , we
                                                  (σ1 ,σ2 )
describe the node v ′ = (l1′ , l2′ , s′ , α′ ) s.t. v −−−−→ v ′ . If σ1 = ǫ, then l1′ = l1 and
                        σ1 ′
s′′ = s, otherwise l1 −→    l1 and s′′ = s ∗ [B1 (l1′ )]. If σ2 = ǫ, then l2′ = l2 and α′′ = α,
               σ2
otherwise l2 −→ l2 and α′′ = B2 (l2′ ). If s′′ = [α] ∗ s′′′ , then s′ = s′′′ and α′ = λ,
                     ′

othewise s′ = s′′ and α′ = α′′ .
    As the GCC is defined, we should prove that it contains as many traces as needed to
check if π1 , π2 are M -equivalent. To do it, we show the correspondence between paths
in Γ and traces of π1 , π2 (lemmas 1–3), and then show what paths should we search for
to check whether π1 , π2 are M -equivalent (theorem 1).
    To shorten the notation, we give three more definitions. An r-path is a path in Γ
originating from its root. Projections pr1 (ω), pr2 (ω) of an r-path ω are defined as fol-
                                                    (σ1 ,σ2 )
lows. pri ((en1 , en2 , s, α)) = eni . Let ω = ω ′ −−−−→ (l1 , l2 , s, α). Then:

 – pri (ω) = pri (ω ′ ) if σi = ǫ;
                         σi
 – pri (ω) = pri (ω ′ ) −→   li otherwise.

    For an infinite r-path ω we define pri (ω) as the shortest r-path s.t. for each finite
prefix ω ′′ of ω holds: pri (ω ′′ ) is a prefix of pri (ω). An r-path ω is disproving in two
cases: it leads to a disproving node; it is infinite, and for some its (infinite) tail ωt and
some i ∈ {1, 2} holds: for a label (σ1 , σ2 ) of any edge of ωt holds σi = ǫ, and for any
node (l1 , l2 , s, α) of ωt the location exi is reachable from li .

Lemma 1. Let ω be an r-path. Then pr1 (ω) and pr2 (ω) are consistent traces of π1 , π2 .
Moreover, if ω leads to a node (l1 , l2 , s, α), then the following holds: there exists s′ ∈ S
s.t. s′ ∗ s = [pr1 (ω)] and s′ ∗ [α] = [pr2 (ω)]; if s 6= ε and α 6= λ, then s and [α] are
incomparable.

Lemma 2. Let rn1 , rn2 be consistent runs of π1 , π2 . Then there is an r-path ω s.t.
pri (ω) is a prefix of rni (for i ∈ {1, 2}), and either ω is infinite, or it leads to a
terminal vertex.
                                                                                            91

Lemma 3. Let ω be an r-path, i ∈ {1, 2}, and rni = pri (ω) →πi pt be a run of πi .
Then rni and pr3−i (ω) are consistent traces.

    Lemmas 1, 2 follow from proposition 1, the left-cancellativity of M , and the defi-
nition of Γ . To prove lemma 3 by contradiction, we note that if rni and pr3−i (ω) are
not consistent, then due to proposition 1 and lemma 1 there exist a proper prefix pt′ of a
                               ′
path pt and a proper prefix tr3−i of pr3−i (ω) s.t. [pri (ω) →πi pt′ ] = [tr3−i
                                                                            ′
                                                                                ]. Then by
                                         ′                      ′
lemma 1 there does not exist a prefix ω of ω s.t. pri (ω) = tr3−i , which is impossible
by the definition of pri .

Theorem 1. PPs π1 , π2 are M -equivalent iff Γ contains no disproving paths.

Proof. Necessity. If Γ contains a finite disproving path ω, then by lemma 3 the traces
pr1 (ω), pr2 (ω) can be extended to consistent runs of π1 , π2 , and by lemma 2 the results
of these runs are distinct. If Γ contains an infinite disproving path ω and i-th compo-
nents of edge labels of some its tail are equal to ǫ, then the trace pri (ω) can be extended
to a finite run rni of πi , while pr3−i is an infinite run of π3−i , and by lemma 3 the runs
rni and pr3−i are consistent and have distinct results.
    Sufficiency. If rn1 , rn2 are consistent runs of π1 , π2 with distinct results, then by
lemma 2 we obtain an r-path ω which is either finite and leads to a disproving vertex
due to lemma 1, or infinite, in which case exactly one of runs rn1 , rn2 is infinite, and
ω is an infinite disproving path.

    Note that theorem 1 does not solve the M -equivalence problem for PPs, as Γ is
infinite in general case. The next section provide means to build a polynomial-time
traversal of Γ sufficient to solve the equivalence problem.


4   Analysis of Graph of Consistent Computations

The main idea of the further GCC analysis is to partition its nodes into a finite number
of classes and prove that during the traversal it is sufficient to visit polynomially many
nodes in each class, and after it we state either that π1 and π2 are not M -equivalent
(lemma 7) or that other nodes in the class may be ignored in the further traversal (lemma
8). To do it, we introduce and analyze the notion which we called an absorption effect.
An (absorption) effect induced by an element s ∈ S is a function æs : S → S showing
how the elements of S are changed under “pressure” of absorption of s appended to the
right. Formally, æs (s1 ) = arg min ks2 k. To prove the soundness of the definition,
                             s2 :s1 ∗s=s2 ∗s
we state a (what we call) commutativity of minimals: if h, g are minimal members of
the same element of M , then g can be obtained from h with commutativity rewritings
only (without any absorption). The soundness is provided by the following lemma.

Lemma 4. An equation X ∗ s = s′ has at most one solution s′′ on X s.t. ks′′ k has
minimal possible value among all solutions.

Proof. Suppose we have at least two such solutions: s′′1 , and s′′2 : s′′1 ∗ s = s′ = s′′2 ∗ s.
Let h1 ∈ ⌊s′′1 ⌋, h2 ∈ ⌊s′′2 ⌋, and h ∈ ⌊s⌋. Due to the commutativity of minimals, we
92

can obtain h2 h from h1 h using commutativity relations only. By symmetry of com-
mutativity relation, we get the derivation of h− h−       − −
                                                  2 from h h1 (g
                                                                  −
                                                                    is a reversed g).
By the left-cancellativity of M and the same symmetry, h1 can be rewritten to h2 with
commutativity only, which means s′′1 = s′′2 .
     The next step is to introduce a partial order on the set of all effects (which we denote
by ): æ1 ≤ æ2 iff there exist s1 , s2 ∈ S s.t. s1 ≤ s2 , æ1 = æs1 , and æ2 = æs2 . To
prove that ≤ is a partial order, we state one more algebraic property of M , which we
call the inductive minimality: if h is a minimal member, a ∈ A, and [ah] 6= [h], then ah
is a minimal member.
Lemma 5. The relation ≤ over is a non-strict partial order.
Proof. By the inductive minimality, an effect æ determines a set of symbols deleted
from a minimal member: if æ(s) = s′ and h ∈ ⌊s⌋, then there exists h′ ∈ ⌊s′ ⌋ s.t. h′
can be obtained from h by deleting some symbols. If there exist æ1 , æ2 s.t. æ1 ≤ æ2
and æ2 ≤ æ1 , then æ1 deletes at least those symbols which are deleted by æ2 and vice
versa. It means that for any minimal member h the effects æ1 and æ2 delete the same
symbols from h, and thus æ1 ([h]) = æ2 ([h]).
     Another pleasant property of the effects is that they can be (in some sense) computed
by a finite automaton. It means that the set is finite, and that having a short description
of an effect æs and an action a, we can easily compute the description of æs∗[a] . An
automaton A = (Q, q0 , TA , SA , MA ) (over the alphabet A) consists of: a finite set of
states Q including an initial state q0 ; a transition function T : Q × A → Q; a set of
labels SA ; a marking function MA : Q → SA . We denote by A(q, h) a state to which A
jumps reading the word h ∈ A∗ , and use the following shortenings: A(h) is A(q0 , h);
MA (q, h) is MA (A(q, h)); MA (h) is MA (A(h)). The notion of automaton used in this
paper is a minor generalization of a well-known finite deterministic acceptor with the
only difference: instead of answers “yes” and “no”, here SA is the set of all possible
answers. We put the following meaning in the phrase “to compute an effect”: |SA | = ||;
æ[h] = æ[g] iff MA (h) = MA (g); if [h] = [g], then A(h) = A(g). We call an automaton
satisfying all these properties an æ-automaton. Note that intuitively the best way to
define such an automaton is to say SA = , but formally the set cannot be used explicitly
in any algorithm as it contains functions with an infinite domain. The description of an
æ-automaton is based on an automata-based recognition of absorption: there exists an
automaton with the following two properties: reading an input word h, it enters a state
labelled with the set {a ∈ A | [a] ∗ [h− ] = [h− ]}; reading words h, g s.t. [h− ] = [g − ],
it enters the same state.
Lemma 6. For any progressive CA-monoid there exists an æ-automaton.
Proof. We start with an automaton A stated by the automata-based recognition of ab-
sorption. Then we divide the states of A into equivalence classes: q and q ′ are equal iff
∀h ∈ A∗ . MA (q, h) = MA (q ′ , h). Note that we have a finite number N of such classes.
Changing the labels of A to {1, . . . , N }, enumerating the equivalence classes, and as-
signing the label i to the i-th class, we get an automaton A′ s.t. MA′ (h) = MA′ (g)
iff æh− = æg− . Then by changing the labels again we construct exactly N finite-state
                                                                                                   93

acceptors, the i-th of which says if the original answer is i. Using classical automata-
theoretic results [8], we reverse the language of each acceptor, construct their Cartesian
product and reassign labels back to {1, . . . , N }, and the result is exactly a required
æ-automaton: MA′ (h) = MA′ (g) iff æh = æg , and all states were constructed for
elements of S, thus all parts of the definition are satisfied.

      Hereafter we assume to be given an æ-automaton A = (Q, q0 , TA , SA , MA ).
By the definition of A, we write A([h]) along with A(h). The last two notions re-
quired for the analysis of Γ are an effect evolution and a generalized effect evo-
lution. Intuitively, these notions show how an absorption effect is modified during
a program run from some intermediate run location. An effect evolution æ̇hs along
the word h = a1 . . . ak ∈ A∗ induced by s ∈ S is defined as follows. We start
with the sequence (æs , æs∗[a1 ] , . . . , æs∗[h] ). Then we delete adjacent duplicates. The
result is exactly æ̇hs . Note that any evolution is a chain in a finite (lemma 6) par-
tially ordered (lemma 5) set, and thus the number of evolutions is finite. A gen-
eralized effect evolution æ̈hq along the word h = a1 . . . ak ∈ A∗ induced by a
state q ∈ Q is defined in the same way except we start with the sequence of pairs
((æε , æMA (q) ), (æ[a1 ] , æMA (q,a1 ) ), . . . , (æ[h] , æMA (q,h) )). For the same reasons the set
of all generalized evolutions is finite.
      There is an important property of evolutions. For any full path pt for a PP π =
(L, en, ex, T, B) and any s ∈ S there exists a full path pt′ (which we call reduced) s.t.:
                    ′
æ̇sB(pt) = æ̇sB(pt ) ; if pt is finite, then |pt′ | ≤ |L| · |Q|; if pt is infinite, then pt′ = pt′′1 →
pt′′2 → pt′′2 → . . . where |pt′′1 | + |pt′′2 | ≤ |L| · |Q|. To prove the existence of a reduced
path pt′ , it is sufficient to: mark pt with states of A starting with A(s) modifying it with
the actions of pt; delete loops between two repeated pairs of a program location and
an automaton state; for an infinite path, replace an infinite tail corresponding to the last
effect in the evolution with the repetition of a loop described above. The same property
holds for generalized evolutions with an upper bound |L| · |Q|2 instead of |L| · |Q|. The
following lemmas are the result of the analysis of Γ , as they provide a finite partitioning
of the nodes of Γ required in the equivalence-checking algorithm.

Lemma 7. Let: q ∈ Q; l1 ∈ L1 ; l2 ∈ L2 ; α ∈ A ∪ {λ}; m = |Q|3 · |π1 | · |π2 |2 + 2;
pairwise-distinct GCC nodes (l1 , l2 , si , α), i ∈ {1, . . . , m}, be reachable from the GCC
root; A(s1 ) = · · · = A(sm ) = q; li → pti be a full path of πi (i ∈ {1, 2}), and at
                                                                                           B2 (pt2 )
least one of these paths is finite; æ̈qB1 (pt1 ) = ((æ11 , æ21 ), . . . , (æ1k , æ2k )); æ̇α         =
                                                   1
(æ̇1 , . . . , æ̇p ); 1 ≤ k ≤ k; the elements æk′ (si ), i ∈ {1, . . . , m}, be pairwise distinct;
                           ′

{æ1 , . . . , æp } ∩ {æ2k′ +1 , . . . , æ2k } = ∅.
     Then π1 and π2 are not M -equivalent.

Proof. The main idea is to show that at least for one of the nodes wj traces of a pro-
jection of a path ωj leading to this node can be expanded into consistent runs with
different results. We show in details only one case (for other cases a proof technique
works in the same way, though it looks a bit more complex): pt1 and pt2 are finite,
k ′ = k. W.l.o.g. we assume pt1 and pt2 to be reduced paths, thus |pti | ≤ |πi | · |Q|3−i .
By proposition 1 the only way for the runs not to be consistent is if their proper prefixes
lead to a common element s, and next steps are made with distinct conditions δ1 , δ2
94

— we call it a collision. By lemma 3, for each ωp these prefixes are not shorter than
the projections of ωp . Thus, we have [pr1 (ωp ) → pt′1 ] = [pr2 (ωp ) → pt′2 ] for some
proper prefixes pt′1 , pt′2 of pt1 and pt2 . But as the elements æ1k (sp ), æ1k (sr ) are distinct
for p 6= r, the elements [pr1 (ωp ) → pt′1 ] and [pr1 (ωp ) → pt′1 ] are also distinct (as
M is left-cancellative), while [pr2 (ωp ) → pt′2 ] = [pr2 (ωr ) → pt′2 ]. Thus, a collision
for different indexes p, r cannot happen for the same prefixes pt′1 , pt′2 simultaneously.
On the other hand, we have at most |pt1 | · |pt2 | ≤ |π1 | · |π2 | · |Q|3 possible pairs of
lengths for these prefixes, which means that at least for two indexes j ′ , j ′′ there are
no collisions: in other words, expansions for these indexes are consistent. Based on the
same thought about distinction of æ1k (sj ′ ), æ1k (sj ′′ ) as above for p, r, we conclude that
at least for one of indexes j ′ , j ′′ the results of computations are distinct.
Lemma 8. Let: q ∈ Q; l1 ∈ L1 ; l2 ∈ L2 ; α ∈ A ∪ {λ}; m = |Q|3 · |π1 | · |π2 |2 + 2;
pairwise-distinct GCC nodes vi = (l1 , l2 , si , α), i ∈ {1, . . . , m}, are reachable from
the GCC root; A(s1 ) = · · · = A(sm ) = q; li → pti is a full path of πi , i ∈ {1, 2};
                                                        B2 (pt2 )
æ̈qB1 (pt1 )] = ((æ11 , æ21 ), . . . , (æ1k , æ2k )); æ̇α         = (æ1 , . . . , æs ); 1 ≤ k ′ < k, 1 ≤
s < s; the elements æk′ (si ), i ∈ {1, . . . , m}, be pairwise different; æ1k′ +1 (s1 ) =
  ′                             1

· · · = æ1k′ +1 (sm ); {æ1 , . . . , æs′ } ∩ {æ2k′ +1 , . . . , æ2k } = ∅; æs′ +1 ∈ {æ2k′ +1 , . . . , æ2k };
ω be a GCC r-path leading to vm ; pr1 (ω) → pt1 and pr2 (ω) → pt2 be consistent runs
having distinct results.
      Then there exists j, 1 ≤ j < m, s.t. vj belongs to some disproving r-path.
   We prove lemma 8 using the same technique as for lemma 7, so due to lack of space
we omit the proof.


5    Equivalence-Checking Algorithm
The resulting equivalence-checking algorithm is to traverse Γ in any way. During the
traversal visited nodes are partitioned into classes defined by a state q, locations l1 , l2 ,
and a symbol α (as in lemmas 7, 8): the nodes (l1 , l2 , s, α) and (l1′ , l2′ , s′ , α′ ) belong
to the same class iff l1 = l1′ , l2 = l2′ , A(s) = A(s′ ), and α = α′ . If the traversal is
finished, then the absence of disproving paths can be checked directly, and the answer
is given by theorem 1. If (|Q|3 ·|π1 |·|π2 |2 +2) nodes are visited in any class, then either
the non-equivalence is stated (lemma 7), or all other nodes in the class are ignored from
now on (lemma 8).
    Let N be the total program size: N = |π1 | + |π2 |. The number of distinct node
classes is O(N 2 ). Each class contains O(N 3 ) nodes. Thus, no more than O(N 5 ) nodes
are visited during the traversal. A minimal member g ∈ ⌊h⌋ can be constructed in
time O(|h|) using the inductive minimality and lemma 6. Checking if [h] ≤ [g] can be
done in time O((|h| + |g|)2 ): construct a word g ′ ∈ ⌊g⌋; if h = λ, then [h] ≤ [g];
otherwise h = ah′ , where a ∈ A; if [ag] = [g] (it can be checked with A), then delete
a from h and repeat the procedure; otherwise check whether g contains a letter a; if
not, then [h]  [g]; if yes, then pick the leftmost occurence of a in g: g = g ′ ag ′′ ; if
each symbol from g ′ commutes with a, then delete marked a from h and g and repeat
the procedure; otherwise [h]  [g]. The soundness of the latter algorithm is proved in
the appendix. The elimination of a longest prefix can be performed in a similar way,
                                                                                               95

the difference is that we pick not the leftmost symbol of h but every symbol which can
be shifted to the leftmost position with commutativity relations, and every completion
is successfull. Each element of M can be encoded as any its member (e.g. constructed
by a computation). During the traversal the algorithm constructs words h of length at
most O(N 5 ), and each element comparison can be done in time O(N 10 ). For each
vertex the algorithm does at most O(N 3 ) comparisons of elements of M , and this is the
bottleneck of the algorithm. Thus, the algorithm performs O(N 5 ) steps, and each step
is done in time O(N 13 ), and the total time is O(N 18 ). The last upper bound together
with theorem 1 and lemmas 7, 8 states the validity of the main theorem of this paper.
Theorem 2. Let M be a progressive CA-monoid on a finite set of actions. Then the
M -equivalence problem for propositional programs is decidable in time polynomial in
size of programs.


6   Conclusion
We want to present three thoughts concluding the paper. The first one is that the con-
sidered case: propositional programs, commutativity, and left absorption — appears to
be simple enough to have a polynomial-time decidability for the equivalence problem.
The second one is that (nevertheless) this case is complex enough to require a rather
nontrivial technique for its analysis. Finally, the third one is that it was mentioned in the
introduction that there is still a long way to go, and this statement remains unchanged,
as even in a simple example in the introduction we need not only commutativity and ab-
sorption (but also dependencies between statements and conditions) to prove the equiv-
alence. To be able to prove the equivalence of “real” fragments of code, we should
consider other properties of program primitives, which is the topic of future research.


References
 1. Rice, H.G.: Classes of recursively enumerable sets and their decision problems. Trans. Amer.
    Math. Soc. 89, 25–59 (1953)
 2. Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques and Tools.
    Pearson Education, Inc., New York (2006)
 3. Zakharov, V.A.: An efficient and unified approach to the decidability of equivalence of propo-
    sitional program schemes. Lect. Notes Comput. Sci. 1443, 247–258 (1998)
 4. Zakharov, V.A., Zakharyaschev, I.M.: On the equivalence checking problem for a model
    of programs related with muti-tape automata. Lect. Notes Comput. Sci. 3317, 293âĂŞ-305
    (2005)
 5. Harel, D., Kozen, D., Tiuryn J.: Dynamic Logic. The MIT Press (2000)
 6. Hunt, H.B., Constable, R.L., Sahni, S.: On the computational complexity of program scheme
    equivalence. SIAM Journal of Computing. 9, N 2, 396–416 (1980)
 7. Rutledge, J.D.: On Ianov’s program schemata. J. Assoc. Comp. Mach. 11, 1–9 (1964)
 8. Hopcroft, J.E., Motwani, R., Ullman, J.D.: Introduction to automata theory, languages, and
    computation. Addison-Wesley, Boston (2003)
 9. Kozen, D.: Kleene Algebra withTests and Commutativity Conditions. In: Tools and Algo-
    rithms for Construction and Analysis of Systems, Second International Workshop, pp. 14–33
    (1996)
96

10. Zakharov, V.A.: On the decidability of the equivalence problem for monadic recursive pro-
    grams. Theoretical Informatics and Applications. 34, N 2, 157–171 (2000)
11. Glushkov, V.M., Letichevskii, A.A.: Theory of algorithms and discrete processors. Adv. Inf.
    Syst. Sci. 1, 1–58 (1969)
12. Podymov, V.V., Zakharov, V.A.: On a semigroup model of sequential programs specified by
    means of two-tape automata. Belgorod State University Scientific Bulletin. History. Political
    science. Economics. Information technologies. 7(78), N 14/1, 94–101 (2010) (in Russian)
13. Bohm, S., Goller, S, Jancar, P.: Equivalence of deterministic one-counter automata is NL-
    complete. Proceedings of the forty-fifth annual ACM symposium on Theory of computing.
    131–140 (2013)
14. Debski, W., Fraczak, W.: Concatenation state machines and simple functions. Lecture Notes
    in Computer Science. 3317, 113–124 (2004)
15. Bastien, C., Czyzowicz, J., Fraczak, W., Rytter, W.: Prime normal form and equivalence of
    simple grammars. Theoretical Computer Science. 363, N 2, 124–134 (2006)
16. Harju, T., Karhumäki, J.: The equivalence problem of multitape finite automata. Theoretical
    Computer Science. 78, N .2, pp. 347–355 (1991)
17. de Souza, R.: On the decidability of the equivalence for k-valued transducers. Lecture Notes
    in Computer Science. 5257, 252–263 (2008)