=Paper= {{Paper |id=Vol-1689/paper11 |storemode=property |title=Relative Correctness: A Bridge Between Proving and Testing |pdfUrl=https://ceur-ws.org/Vol-1689/paper11.pdf |volume=Vol-1689 |authors=Wided Ghardallou,Ali Mili,Nafi Diallo |dblpUrl=https://dblp.org/rec/conf/vecos/GhardallouMD16 }} ==Relative Correctness: A Bridge Between Proving and Testing== https://ceur-ws.org/Vol-1689/paper11.pdf
               Relative Correctness:
       A Bridge Between Testing and Proving

                 Nafi Diallo1 , Wided Ghardallou2 , and Ali Mili1
             1
               New Jersey Institute of Technology, Newark, NJ, USA
                 2
                   University of Tunis El Manar, Tunis, Tunisia
       ncd8@njit.edu, wided.ghardallou@gmail.com, ali.mili@njit.edu



      Abstract. Relative correctness is the property of a program to be more-
      correct than another with respect to a specification. Whereas tradition-
      ally we distinguish between two categories of candidate programs, namely
      correct programs and incorrect programs, relative correctness arranges
      candidate programs on a partial ordering structure, whose maximal el-
      ements are the correct programs. Also, whereas traditionally we deploy
      proof methods on correct programs to prove their correctness and we de-
      ploy testing methods on incorrect programs to detect and remove their
      faults, relative correctness enables us to bridge this gap by showing that
      we can deploy static analytical methods to an incorrect program to prove
      that while it may be incorrect, it is still more-correct than another. We
      are evolving a technique, called debugging without testing, in which we
      can remove a fault from a program and prove that the new program is
      more-correct than the original, all without any testing (and its associated
      uncertainties/ imperfections). Given that there are orders of magnitude
      more incorrect programs than correct programs in use nowadays, this
      has the potential to expand the scope of proving methods significantly.
      Also, relative correctness has other broad implications for testing and
      proving, which we briefly explore in this paper.


Keywords

absolute correctness, relative correctness, program testing, program proving, de-
bugging without testing, programming without refining, testing for relative cor-
rectness.


1   Blurring Traditional Distinctions

Relative correctness is the property of a program to be more-correct than another
with respect to a given specification; intuitively, a program P ′ is more-correct
than a program P with respect to a specification R if and only if P ′ behaves
according to R more often than P , and violates R less egregiously (i.e. in fewer
ways) than P . Whereas traditionally we distinguish between two categories of
candidate programs for a given specification R, namely correct programs and
incorrect programs, relative correctness enables us to arrange candidate pro-
grams over a partial ordering structure, whose maximal elements are the correct
programs, and all non-maximal elements are incorrect.
    Also, traditionally, proving methods and testing methods have been used on
different sets of programs:
 – Proving methods are deployed on correct programs to prove their correct-
   ness; they are of limited use when deployed on incorrect programs because
   even when a proof fails, we cannot always conclude that the program is in-
   correct, since we cannot tell whether the proof failed because the program is
   incorrect or because it was improperly documented (re: invariant assertions,
   intermediate assertions, etc). Some methods of program analysis can identify
   sources of faults when the correctness proof fails, but their scope is limited.
 – Testing methods are deployed on incorrect programs to detect, locate and
   remove their faults; they are useless when deployed on correct programs,
   because no matter how often a program runs failure-free under test, we can
   never (in practice) conclude with certainty that it is correct.
We argue that consideration of relative correctness has the potential to alter the
practice of proving methods and testing methods:
 – Once we have a formal definition of relative correctness, we can deploy prov-
   ing methods to an incorrect program to prove that while it may be incorrect,
   it is still more-correct than another. Given that there are orderes of mag-
   nitude more incorrect programs than correct programs, the ability to apply
   proving methods to incorrect programs expands the scope of these methods
   significantly. This approach is discussed in section 4.
 – Relative Correctness can also alter the practice of software testing by rec-
   ognizing the difference between testing for relative correctness and testing
   for absolute correctness. When we remove a fault from a program, we ought
   to test it for relative correctness rather than absolute correctness, unless we
   have reason to believe (how do we ever?) that the fault we have just removed
   is the last fault of the program. This matter is discussed in section 5.
 – It has long been a cornerstone of software engineering wisdom that programs
   should not be developed then checked for correctness, but should instead
   be developed hand-in-hand along with their proof, with the proof leading
   the way [8]; echoing David Gries, Carrol Morgan talks about developing
   programs by calculation from their specification, in the same way that a
   mathematician solves an equation by computing its root [22]. The prevailing
   paradigm for developing programs from specifications is that of refinement,
   whereby a program is derived from a specification through a sequence of
   correctness-preserving transformations based on refinement. In section 6 we
   present an alternative paradigm based on relative correctness, illustrate it
   with a simple example, and briefly compare it to related work.
In section 2 we introduce the mathematical background that is needed to carry
out our discussions, and in section 3 we introduce our definition of relative cor-
rectness for deterministic and non-deterministic programs. Also, we conclude in

                                        2
section 7 by summarizing our findings, discussing related work, and sketching
future directions of research.


2     Background

2.1   Relational Mathematics

In this paper we use relations to represent specifications and programs, hence
we devote this section to discussing some operations and properties of relations.
We assume the reader familiar with elementary relational algebra [2], hence this
section is not a tutorial on relations as much as it is an introduction to some
relevant definitions and notations. We represent sets in a program-like notation
by writing variable names and associated data types (sets of values); if we write
S as:
    x: X; y: Y;
then we mean to let S be the cartesian product S = X × Y ; elements of S
are usually denoted by s and the X- (resp. Y -) component of s is denoted by
x(s) (resp. y(s)). When no ambiguity arises, we may write x for x(s), and x′
for x(s′ ), etc. A relation R on set S is a subset of S × S. Special relations on S
include the universal relation L = S × S, the identity relation I = {(s, s)|s ∈ S}
and the empty relation φ = {}. Operations on relations include the set theoretic
operations of union (∪), intersection (∩), difference (\) and complement (R);
they also include the converse of a relation R defined by Rb = {(s, s′ )|(s′ , s) ∈ R},
the domain of a relation defined by dom(R) = {s|∃s : (s, s′ ) ∈ R}, the range
                                                         ′

of a relation defined by rng(R) = dom(R),    b and the product of two relations R
and R defined by: R ◦ R = {(s, s )|∃s : (s, s′′ ) ∈ R ∧ (s′′ , s′ ) ∈ R′ }; when no
       ′                    ′        ′    ′′

ambiguity arises, we may write RR′ for R ◦ R′ .
    A relation R is said to be reflexive if and only if I ⊆ R, symmetric if and
only if R = R, b antisymmetric if R ∩ Rb ⊆ I, asymmetric if and only if R ∩ R     b=φ
and transitive if and only if RR ⊆ R. A relation R is said to be total if and only
if I ⊆ RR b and deterministic if and only if RRb ⊆ I. A relation R is said to be a
vector if and only if RL = R; vectors have the form R = A × S for some subset
A of S; we use them as relational representations of sets. In particular, note that
RL can be written as dom(R) × S; we use it as a relational representation of the
domain of R.


2.2   Program Semantics

Given a program p on space S written in a C-like notation, we define the function
of p (denoted by P ) as the function that p defines on S, i.e. the set of pairs (s, s′ )
such that if program p starts execution in state s it terminates in state s′ ; we
may, when no ambiguity arises, refer to a program and its function by the same
name, P . Because our discussion of correctness and relative correctness refers to
a notion of refinement, we give here a definition of this property.

                                           3
Definition 2.1. Given two relations R and R′ , we say that R′ refines R (abbrev:
R′ ⊒ R or R ⊑ R′ ) if and only if RL ∩ R′ L ∩ (R ∪ R′ ) = R.
Intuitively, this definition means that R′ has a larger domain than R and that
on the domain of R, R′ assigns fewer images to each argument that does R. See
Figure 1. We use refinement to define correctness.

                                                    ′
                              R                   R
                      0      : 0
                                           0 XX    - 0
                         -                   XX
                                                    XX
                                                     z 1
                                                     -
                      1 XX 
                             : 1             1 XX
                        X
                          XXX
                             z
                                                 XX
                                                    Xz 2
                                                     X
                      2 XXXX - 2             2 XX
                                                 XX
                                                     -
                      3
                            XX
                             z 3             3
                                                    Xz 3
                                                     X
                                                     -


                                  Fig. 1. R′ ⊒ R



Definition 2.2. A program p on space S is said to be correct with respect to
specification R on S if and only if its function P refines R.
This definition is identical (modulo differences of notation) to traditional defini-
tions of total correctness [8, 9, 19, 22].


3     Relative Correctness
3.1   Relative Correctness: Deterministic Programs
Proposition 3.1. Due to [21]. Given a specification R and a program P , pro-
gram P is correct with respect to R if and only if (R ∩ P )L = RL.
Interpretation: RL, the domain of R, is the set of states on which execution of
candidate programs must produce correct outputs according to R; on the other
hand, (R ∩ P )L is the set of states on which execution of candidate program P
does produce correct outputs according to R. The program P is correct if and
only if these two sets are identical.
Definition 3.2. Due to [20]. Given a specification R and two deterministic pro-
grams P and P ′ , we say that P ′ is more-correct (resp. strictly more-correct) than
P with respect to R if and only if (R ∩ P ′ )L ⊇ (R ∩ P )L (resp. (R ∩ P ′ )L ⊃
(R ∩ P )L).
To contrast relative correctness with the definition of correctness given in Def-
inition 2.2, we may refer to the latter as absolute correctness. We refer to the
domain of (R ∩ P ) as the competence domain of P with respect to R; for deter-
ministic programs, to be more-correct simply means to have a larger competence
domain. See Figure 2; note that P ′ is more-correct than P but does not duplicate
the correct behavior of P .
    In [20], we find that relative correctness satisfies the following properties:

                                         4
                  R                      P                        P′
        0       -
                : 0 0 XX
                              0 0       : 0
                                         
                    XXX         
                                        
                          Xz
                             X      
        1 XX
            X 1 1           1 1
                :
                      XX                : 1
                                         
                         XXX           
          XXXz
                X           XX
                             z      
        2 X     2 2  XX
                :
                      X       2 2       : 2
                                         
           XXX
            
              X           XX          
              X            X        
        3      -
                z 3 3
                X            z 3 3 
                             X             3
                                 

                      Fig. 2. P ′ ⊒R P , Deterministic Programs


 – Ordering Properties. Relative correctness is reflexive and transitive, but not
   antisymmetric (i.e. two candidate programs could be equally correct, yet
   compute distinct functions). As a very simple example, consider R defined by
   R = {(0, 0), (1, 0), (1, 2)} and let P and P ′ be defined by P = {(0, 1), (1, 0)},
   P ′ = {(0, 2), (1, 2)}.
 – Relative correctness culminates in absolute correctness. A (absolutely) cor-
   rect program is more-correct than (or as correct as) any candidate program.
 – Relative Correctness Implies Enhanced Reliability. If P ′ is more-correct than
   P with respect to R, then it is more reliable than P ; but more-reliable is
   not equivalent to more-correct, as the latter is a logical property whereas
   the former is a stochastic property.
 – Relative Correctness and Refinement. Program P ′ refines program P if and
   only if P ′ is more-correct than P with respect to any specification R. We
   write this property as:

                              P ′ ⊒ P ⇔ (∀R : P ′ ⊒R P ).


3.2   Relative Correctness: Non-Deterministic Programs

In this section we generalize the definition of relative correctness to non-deterministic
programs, as provided by [3]. One may want to ask why we want to discuss rela-
tive correctness of non-deterministic programs, when the programming language
we use is deterministic. The answer is that we want to reason about the relative
correctness of C-like programs without having to compute their function in all
its detail; for example, if program P manipulates variables x, y and z and pro-
gram P ′ is more correct than P with respect to R by virtue of dealing better
than P with variables x and y (more in keeping with R), then we want to reach
that conclusion by focusing exclusively on the behavior of P and P ′ on variables
x and y. But doing so means that we do not need to determine how P and P ′
affect variable z; hence we will deal with non-deterministic representations of P
and P ′ . We have the following definition, due to [3].
Definition 3.3. Given a specification R and two programs P and P ′ . We say
that P ′ is more-correct than P with respect to R if and only if: (R ∩ P ′ )L ⊇
(R∩P )L and (R∩P )L∩R∩P ′ ⊆ P . Also, we say that P ′ is strictly more-correct
than P with respect to R if and only if at least one of the inequalities in this
definition is strict.

                                         5
Interpretation: The first clause provides that P ′ has a larger competence domain
than P . To understand the second clause, consider that the left-hand side of this
clause represents the set of (s, s′ ) pairs such that s is in the competence domain
of P (re: (R ∩ P )L) and s′ is an image of s by P ′ that violates specification R
(re: R ∩ P ′ ). What this clause is providing is that any such (s, s′ ) pair is in P ; so
that on the competence domain of P , P ′ does not violate specification R unless
P does (but P may violate R in ways that P ′ does not). In other words: P ′ is
more-correct than P with respect to R if and only if P ′ has a larger competence
domain (first clause), and violates R in fewer ways (second clause). See Figure
3: The competence domain of P is {1, 2} and that of P ′ is {1, 2, 3}. Program P ′
violates R by assigning 1 to 1 and 2 to 2; but P is guilty of the same misdeed,
in addition to also assigning 0 to 2 and 3 to 1, in violation of specification R.
Hence P ′ is more-correct than P with respect to R.

                   R                       P                       P′
         0      - 0 0 XX
                 :
                             * 0 0
                                        : 0
                                          
                     XX
                            X
                             
                             X       
         1 
           XX  : 1 1 H
                       XX -z 1 1 
                              X          -
                                          : 1
                                          
            
             X
             XXX        XXX
                         HH  X       
           X
         2 X    z
                 X
                  2 2 
                 :      X     -
                              z 2 2 
                              X          -
                                          : 2
                                          
            
             XXX       XXH
                           XH
                            X        
         3     X-
                 z 3 3
                 X           H
                             Xj
                              z 3 3 
                              H
                              X             3
                                  

                  Fig. 3. P ′ ⊒R P , Non-deterministic Specifications




4     Debugging Without Testing

4.1   What is a Fault?

Our study of relative correctness came about when we attempted to answer
the question: what is a fault in a program? and when can we say that we have
removed a fault from a program? The first matter we have to consider is that any
definition of a fault must refer to a level of granularity at which we want to isolate
faults. The coarsest level of granularity would be to consider the whole program
as a possible fault, but that is clearly unhelpful as far as diagnosing and removing
faults; more typical levels of granularity include a line of code, a lexeme, an
expression, an assignment statement, elementary programming statements, etc.
We use the term feature to refer to any part of the source code at an appropriate
level of granularity, including non-contiguous parts.

Definition 4.1. Due to [20]. Given a specification R and a program P , a fault
in program P is any feature f that admits a substitute f ′ such that the program
P ′ obtained from P by replacing f with f ′ is strictly more-correct than P . A
fault removal in P is a pair of features (f, f ′ ) such that f is a fault in P and P ′
obtained from P by replacing f with f ′ is strictly more-correct than P .

                                           6
Using this definition, we can in principle remove a fault from a program without
testing it; all we need to do is prove that the new program obtained by replacing
f with f ′ is strictly more-correct than P . But in practice it is very difficult to
use this definition, because it is difficult to compute the functions of P and P ′ ,
then the competence domains of P and P ′ , then compare them.


4.2   Proving Correctness and Incorrectness of Loops

The analysis of while loops by means of invariant relations provides a way to infer
the relative correctness of iterative programs from partial semantic information.
In [17] we discuss how to use invariant relations to prove the correctness or incor-
rectness of an iterative program of the form w: while (t) {b;}. If we let B be
the function of the loop body ({b;}) and we let T be the vector that represents
the loop condition (i.e. T = {(s, s′ )|t(s)}) then an invariant relation of the while
loop w is a reflexive transitive superset of (T ∩B). We are evolving a tool that can
generate invariant relations for common combinations of code patterns; this tool
matches an internal representation of the loop against prestored code patterns
(called the recognizers) for which it has patterns of invariant relations; actual in-
variant relations are generated by instantiating the patterns of invariant relations
with actual program variables. The following table shows, for the sake of illustra-
tion, two sample recognizers dealing with numeric variables; more information
on this tool can be found at https://selab.njit.edu/tools/fxloops.php.

        ID Data               Statements Invariant Relation Pattern
            int x;
        1R1 const int a>0; x=x+a         {(s, s′ )|x mod a = x′ mod a}
            float x, y;       x=x+a,
        2R1 const float a, b; y=y+b      {(s, s′ )|ay − bx = ay ′ − bx′ }.

   Invariant relations are important for our purposes because they enable us to
determine whether a loop is correct or not with respect to a specification, as
shown in the following two propositions (which are due to [17]). Even though
correctness is defined in terms of the program function, invariant relations enable
us to rule on correctness or incorrectness long before we have collected all the
necessary information to compute the loop function.

Proposition 4.2. Necessary condition of correctness. Let w be a while loop of
the form w = while (t) {b} that terminates for all states in S, let V be an
invariant relation for w, and let R be a specification on S. If w is correct with
respect to R then (R ∩ V )T = RL.

This proposition provides, in effect, that any while loop that admits an invariant
relation V that fails to meet this condition could not possibly be correct with
respect to R. We say about such an invariant relation V that it is incompatible
with the specification R. Any invariant relation that is not incompatible with
the specification is said to be compatible.

                                         7
Proposition 4.3. Sufficient condition of correctness. Given a while loop w of
the form while (t) {b} that terminates for all states in its space S, and given
a specification R on S, if an invariant relation V of w satisfies the condition
   V T ∩ RL ∩ (R ∪ V ∩ Tb ) = R
then w is correct with respect to R.
This proposition provides, in effect, that if an invariant relation V meets this
condition, then it contains sufficient information to subsume the specification,
and to prove the correctness of the loop with respect to R.

4.3   Proving Relative Correctness of Loops
While in the previous section we show how invariant relations can be used to
prove the (absolute) correctness or incorrectness of a loop with respect to a
specification, in this section we explore how they can be used to prove relative
correctness of a loop over another with respect to a given specification. A propo-
sition given in [6] provides an intuitive result to the effect that if we alter a while
loop in such a way as to migrate an invariant relation from the incompatible
column to the compatible column, while preserving all the compatible invariant
relations, we obtain a while loop that is strictly more-correct; in other words,
whatever alteration we make to the while loop as described above can be con-
sidered as a monotonic fault removal. This proposition can be used to diagnose
and remove faults in a while loop:
 – Evidence that a fault exists. The first step in fault removal is to have evidence
   that a fault does indeed exist. In our case, the existence of an incompatible
   invariant relation proves that the program has a fault.
 – Locating the fault. Among all the incompatible invariant relations, we select
   that which involves the smallest number of program variables; this enables
   us to focus our attention on those statements of the program that involve
   the variables in question.
 – Removing the fault. We must modify the selected variable(s), while ensuring
   that all the compatible invariant relations are preserved. The constraint of
   preserving the compatible invariant relations is used for guidance in deciding
   how to change the selected variables.
 – Verifying that the fault has been removed. For each modification that is gen-
   erated in the previous step, we deploy the invariant relations generator and
   check whether the incompatible invariant relation identified in the second
   step has been replaced by a compatible invariant relation. Then we are sure
   that the new loop is strictly more-correct than the original loop (and the
   fault has been removed).
As an illustration of this approach, we consider the following example, which
we borrow from [7]. The space of the specification is defined by the following
variable declarations: char q[]; int let, dig, other, i, l; char c;.
The specification R that we use for relative correctness is:
R = {(s, s′ )|q ∈ listhαA ∪ αa ∪ ϑ ∪ σi ∧ let′ = let + #a (q) + #A (q) ∧ dig ′ =

                                          8
dig + #ϑ (q) ∧ other′ = other + #σ (q)}
where listhT i denotes the set of lists of elements of type T , #A , #a , #ϑ and
#σ the functions that to each list q assign (respectively) the number of upper
case alphabetic characters, lower case alphabetic characters, numeric digits and
symbols. The (faulty) program that we consider is w:
   {i=0; let=0; dig=0; other=0; l=strlen(q);
       while (i=c) let=let-1;
           else if (a<=c && z>=c) let=let-1;
                else if (0>c && 9>=c) dig=dig+1;
                     else other=other+1;}}
We find the following invariant relations of this while loop, where σ1 and σ2
designate the set of characters whose Ascii codes are less than, and (respectively)
greater than equal to, the code of ’0’:
 – V0 = {(s, s′ )|q = q ′ }
 – V1 = {(s, s′ )|i ≤ i′ }
 – V2 = {(s, s′ )|dig ≤ dig ′ }
 – V3 = {(s, s′ )|other ≤ other′ }
 – V4 = {(s, s′ )|let ≥ let′ }
 – V5 = {(s, s′ )|let − #a∪A (q[i..l − 1]) = let′ − #a∪A (q ′ [i′ ..l − 1])}
 – V6 = {(s, s′ )|dig + #σ1 (q[i..l − 1]) = dig ′ + #σ1 (q ′ [i′ ..l − 1])}
 – V7 = {(s, s′ )|other + #σ2∪ϑ (q[i..l − 1]) = other′ + #σ2∪ϑ (q ′ [i′ ..l − 1])}
The following table shows which of these invariant relations are compatible, and
which are incompatible.
       Compatible Invariant Relations Incompatible Invariant Relations
       V0 , V1 , V2 , V3              V4 , V5 , V6 , V7
Becuse the incompatible column is non-empty, we conclude that the program is
incorrect with respect to R, hence we must enhance its correctness. To this effect,
we select the incompatible invariant relation V4 for remediation, which leads us
to focus on variable let for fault removal. Preservation of the compatible invariant
relations mandates that let be modified under the following condition: let ≤ let′ .
We propose: let=let+1;. The generation of invariant relations of the new loop
yields the following table:
       Compatible Invariant Relations Incompatible Invariant Relations
       V0 , V1 , V2 , V3 , V4′ , V5′  V6 , V7
Application of the same process one more time yields the following program:
   {i=0; let=0; dig=0; other=0; l=strlen(q);
       while (i=c) let=let+1;
           else if (a<=c && z>=c) let=let+1;
                else if (0<=c && 9>=c) dig=dig+1;
                     else other=other+1;}}

                                           9
Analysis of this program produces 8 invariant relations, which are all compatible.
       Compatible Invariant Relations Incompatible Invariant Relations
       V0 , V1 , V2 , V3 , V4′ , V5′ , V6′ , V7′
This does not prove that the program is now correct, all it proves is that we
have no evidence (in the forms of an incompatible invariant relation) that it is
incorrect. To establish correctness, we must ensure that the intersection of all
the compatible invariant relations satisfies the sufficient condition provided by
Proposition 4.3, which it does. All the faults have been removed; we now have a
correct program.


5   Testing for Relative Correctness
The usual process of software debugging proceeds as follows: We observe a failure
of the program; we analyze the failure and formulate a hypothesis on its cause;
we modify the source code on the basis of our hypothesis; and finally we test
the new program to ensure that it is now correct. But there is a serious flaw
in this process: when we remove a fault from an incorrect program, we have
no reason to expect the new program to be correct, unless we know (how do
we ever?) that the fault we have just removed is the last fault of the program;
hence when a fault is removed from a program, the new program ought to be
tested for relative correctness over the original program, rather than for absolute
correctness. Of course, regression testing is supposed to ensure monotonicity of
fault removal, but regression testing is essentially a test data selection matter,
whereas the difference between testing for relative correctness and testing for
absolute correctness involves other aspects. We argue that testing a program for
relative correctness has an impact on three aspects of testing, namely test data
selection, test oracle design, and test coverage assessment.
 – Test data selection. The problem of test data selection can be summarized
   as follows: We are given a large or infinite test space S, and we must select
   a small subset thereof T such that the behavior of candidate programs on
   T is a faithful predictor of their behavior on S. The difference between
   absolute correctness and relative correctness is that for absolute correctness
   with respect to specification R, the test space S is dom(R) whereas for
   relative correctness over P with respect to R the test space S is dom(R ∩ P ).
 – Test oracle design. Let Ω(s, s′ ) be the test oracle for absolute correctness
   derived from specification R. Because relative correctness over program P
   tests a candidate program P ′ for Ω only for those states on which P is
   successful, the oracle for relative correctness ω(s, s′ ) can be written as:
       ω(s, s′ ) ≡ (Ω(s, P (s)) ⇒ Ω(s, s′ )).
 – Test Coverage Assessment. It is not sufficient to know that some program
   P ′ has executed successfully on a test data set of size N using oracle ω(s, s′ );
   it is also necessary to know what percentage of the test data set satisfy the
   precondition Ω(s, P (s)) (i.e. is P ′ more-correct than P because P ′ is very
   good or because P is very bad?).

                                        10
To illustrate the difference between absolute correctness and relative correctness,
we consider the same program as section 4, and we resolve to remove its faults
not by static analysis, as we did there, but by testing for relative correctness after
each fault removal. To this effect, we proceed iteratively as follows, starting from
the original program:

 1. Using muJava [18], we generate mutants of the program, and submit each
    mutant to three tests:
      – A test for absolute correctness, using oracle Ω(s, s′ ) derived from speci-
        fication R.
      – A test for relative correctness, using oracle ω(s, s′ ) derived from Ω(s, s′ ).
      – A test for strict relative correctness, which in addition to relative cor-
        rectness also ensures that there is at least one state on which the mutant
        satisfies Ω whereas the base program fails it.
 2. We select those mutants which prove to be strictly more-correct than the
    base program, make each one of them a base program on which we apply
    recursively the same procedure, starting from step 1 above.

We invoke muJava with the option of mutating statements and conditions and we
test every mutant for relative correctness, strict relative correctness and absolute
correctness using randomly generated test data of size 1000. Every invokation
of muJava generates exactly 64 mutants, which we label by indices 1 through
64; hence for example m4.53.8 is mutant 8 of mutant 53 of mutant 4 of the
original program. The outcome of this experiment is shown in Figure 4. The
arcs represent relative correctness relationships; at the bottom of this graph
is the original program, and at the top is the corrected version of the program.
Note that the test for absolute correctness kept coming empty-handed every time
except whenever muJava produced the correct program P ′ . i.e. six times. The
test for relative correctness returned true for every arc in Figure 4 i.e. 25 times;
it enabled us to remove faults one at a time, until we reach a correct program.
Note also that many mutations prove to be perfectly commutative; such is the
case for 4, 8 and 53. Note further that, if we assume for the sake of argument
that our test is exhaustive, then the number of arcs emerging from each program
represents the number of faults in that program. For example, program P has
four faults even though it is three fault removals away from being correct; we
say that P has a fault density of 4 and a fault depth of 3.



6    Programming Without Refinement

For all its interest, program verification is really a dubious bargain: the idea that
we write programs in an informal/ approximate manner then try to prove their
correctness is not very sound. What is more rational is to seek means to write
programs that are certified to be correct by construction [8, 9, 22, 23]. Given a
specification, we can do so in one of two ways:

                                          11
                                          P′ =
               m4.8.53 = m4.53.8 = m8.4.53 = m8.53.4 = m53.4.8 = m53.8.4
       m51.4.8.53 = m51.4.53.8 = m51.8.4.53 = m51.8.53.4 = m51.53.4.8 = m51.53.8.4

                                                      6 @ Y
                                                          H
                                     6              I H
                                                         @ HH
                                                           @  HH
                                                               HH
                                                           @
                                                   m51.4.8     m51.4.53     m51.8.53
                                                 = m51.8.4   = m51.53.4   = m51.53.8
                                                     6@
                                                       I               @  6
                                                                        I
                                                       @                 @
                                                        @                 @
                                                          @                @
      m4.8         m4.53             m8.53
                                                   m51.4        m51.8          m51.53
     = m8.4 YH = m53.4 H
            H          YH = m53.8
           I
           @ HH6         HH6 @   I                         @
                                                           I     6         
             @     HH         HH @                           @
              @        HH         HH
                                   @                          @
                @          H        H@                         @
                     m4               m8            m53          m51
                          Q
                          k                      3
                                                 
                           Q   AKA           
                            Q                  
                             Q    A          
                              Q A  
                               Q         
                                   QA 
                                             P

          Fig. 4. Program Repair by Stepwise Correctness Enhancement


 – Preserve correctness, achieve program quality. Starting from the specifica-
   tion, we transform it progressively to make it more and more program-like,
   while preserving correctness through refinement. This is the traditional pro-
   cess of program construction by successive refinements.
 – Preserve program quality, achieve correctness. Starting from the trivial pro-
   gram abort, we transform it progressively to make more and more-correct
   with respect to R until it is correct. In [5], we present this model as an al-
   ternative program derivation paradigm, and discuss some of its advantages;
   we refer to this approach as programming without refinement.

We illustrate this process on a program that computes the number of alphabetic
characters, numeric characters, and special symbols in an input string. We con-
sider the same space and relation as those given in Section 4, and we present a
sequence of programs P0 ... P4 that are increasingly more-correct, though not
increasingly more-refined. We do not have a methodology for how to do this in
general, hence we merely present the programs and their competence domains.

P0 {abort}. CD0 = φ.
P1 {i=0; let=0; dig=0; other=0; l=strlen(q);
    while (i=c) let+=1;}}
   CD2 = {s|q ∈ listhαA i}.
P2 {i=0; let=0; dig=0; other=0; l=strlen(q);
    while (i=c) let+=1;
         else if (’a’<=c && ’z’>=c) let+=1;}}
   CD2 = {s|q ∈ listhαA ∪ αa i}.
P3 {i=0; let=0; dig=0; other=0; l=strlen(q);
     while (i=c) let+=1;
        else if (’a’<=c && ’z’>=c) let+=1;
        else if (’0’<=c && ’9’>=c) dig+=1;}}
   CD3 = {s|q ∈ listhαA ∪ αa ∪ νi}.
P4 {i=0; let=0; dig=0; other=0; l=strlen(q);
     while (i=c) let+=1;
        else if (’a’<=c && ’z’>=c) let+=1;
        else if (’0’<=c && ’9’>=c) dig+=1;
        else other+=1;}
    CD4 = {s|q ∈ listhαA ∪ αa ∪ ν ∪ σi}.
Clearly, we do have P0 ⊑R P1 ⊑R P2 ⊑R P3 ⊑R P4 ; also, we find CD4 =
dom(R), hence (by Proposition 3.1) P4 ⊒ R, i.e. P4 is correct with respect to R.
    This process bears a striking resemblance to test driven derivation (TDD)
[10, 24], in the following sense: if we let Q1 , Q2 , Q3 , ... Qn be the successive test
data samples (in the form of (input, output) pairs) used in successive iterations
                                                  Si
of TDD, and if we let Ri be defined as Ri = k=1 Qi , then we argue that TDD
is nothing but an instance of programming without refinement, where the speci-
fication is R = Rn and the successive competence domains are CDi = dom(Ri ).
Clearly, the competence domains are increasingly large, by construction of Ri ,
hence we do have P1 ⊑R P2 ⊑R P3 ⊑R ... ⊑R Pn−1 ⊑R Pn , where Pi is the
program obtained after considering the ith test data sample Qi .


7    Concluding Remarks
In [1, 12–14] Laprie et al. discuss various aspects of dependability, and argue
that faults are at the center of the study of dependability;and yet, they provide
a rather vague definition of a fault: A fault is the adjudged or hypothesized cause
of an error [1]. In [3, 4, 20] we attempt to give a formal definition of a fault, and
find that a way (the only way?) to do so is to introduce the concept of relative
correctness. Beyond defining faults, this concept has many implications, of which
we discuss three in this paper, as they pertain to program proving and program
testing.
 – Impact on Proving. Whereas traditionally proving methods are deployed ex-
   clusively on correct programs to prove their correctness, we argue that rela-
   tive correctness enables us to apply proving methods to an incorrect program
   to prove that despite being incorrect, it is still more correct than another.
   Given that there are orders of magnitude more incorrect programs than cor-
   rect programs, this points to a potential expansion of the scope of static
   analysis methods.

                                          13
 – Impact on Testing. We argue that when we remove a fault from a program
   we ought to test it for relative correctness rather than absolute correctness;
   of course regression testing is an attempt to test for relative correctness
   through the selection of targeted test data, but we argue that testing for
   relative correctness has an impact not only on test data selection, but also
   on oracle design and test coverage assessment.
 – Impact on Program Derivation. Whereas the traditional paradigm of pro-
   gram derivation is to proceed by successive correctness-preserving transfor-
   mations on the basis of refinement, we argue that it is also possible to proceed
   by successive correctness-enhancing transformations on the basis of relative
   correctness. One of the main advantages of our paradigm is that it models
   not only the derivation of programs from scratch, but also many aspects of
   software evolution. Given that more software is produced by evolution than
   from scratch, this approach carries significant potential in practice.
Other authors have introduced similar-sounding but distinct concepts of relative
correctness [11, 15, 16] in the context of software testing and program repair.
Their work differs from ours in terms of its specification format (executable
assertions, vs relations), its program semantics (execution traces, vs. program
functions), its definition of correctness (all assertions are true, vs refinement),
its definition of relative correctness (more valid traces, fewer invalid traces vs
larger competence domain and fewer violations), and its goals (fault removal, vs
proving, testing, derivation and evolution).
    Whereas most other authors approximate while loops by unrolling them a
number of times, we approximate them by means of invariant relations. We view
the contrast between unrolling a loop and capturing its behavior by invariant
relations as a choice between capturing all the functional details of a few itera-
tions, and capturing some functional detail of all the iterations. We argue that
capturing all the functional details is typically unnecessary (not all the loop’s
functional properties are worthy / relevant), and modeling a limited number of
iterations is typically insufficient (the behavior of the loop for a limited number
of iterations may not indicate its behavior for an arbitrary number thereof).

Acknowledgement The authors are very grateful to the anonymous reviewers
for their thoughful, valuable feedback.

References
 [1] Algirdas Avizienis, Jean Claude Laprie, Brian Randell, and Carl E Landwehr.
     Basic concepts and taxonomy of dependable and secure computing. IEEE Trans-
     actions on Dependable and Secure Computing, 1(1):11–33, 2004.
 [2] Chris Brink, Wolfram Kahl, and Gunther Schmidt. Relational Methods in Com-
     puter Science. Advances in Computer Science. Springer Verlag, Berlin, Germany,
     1997.
 [3] Jules Desharnais, Nafi Diallo, Wided Ghardallou, Marcelo Frias, Ali Jaoua, and
     Ali Mili. Mathematics for relative correctness. In Relational and Algebraic Meth-
     ods in Computer Science, 2015, pages 191–208, Lisbon, Portugal, September 2015.


                                         14
 [4] Nafi Diallo, Wided Ghardallou, and Ali Mili. Correctness and relative correctness.
     In Proceedings, 37th International Conference on Software Engineering, Firenze,
     Italy, May 20–22 2015.
 [5] Nafi Diallo, Wided Ghardallou, and Ali Mili. Program derivation by correctness
     enhancements. In Proceedings, Refinement 2015, Oslo, Norway, June 2015.
 [6] Wided Ghardallou, Nafi Diallo, Ali Mili, and Marcelo Frias. Debugging without
     testing. In Proceedings, International Conference on Software Testing, Chicago,
     IL, April 2016.
 [7] Alberto Gonzalez-Sanchez, Rui Abreu, Hans Gerhart Gross, and Arjan J.C. van
     Gemund. Prioritizing tests for fault localization through ambiguity group reduc-
     tion. In proceedings, Automated Software Engineering, Lawrence, KS, 2011.
 [8] David Gries. The Science of programming. Springer Verlag, 1981.
 [9] Eric C.R. Hehner. A Practical Theory of Programming. Prentice Hall, 1992.
[10] David Janzen and Hossein Saiedian. Test driven development: Concepts, taxon-
     omy and future direction. IEEE Computer, 38(9):43–50, September 2005.
[11] Shuvendu K. Lahiri, Kenneth L. McMillan, Rahul Sharma, and Chris Hawblitzel.
     Differential assertion checking. In Proceedings, ESEC/ SIGSOFT FSE, pages
     345–455, 2013.
[12] Jean Claude Laprie. Dependability: Basic Concepts and Terminology: in English,
     French, German, Italian and Japanese. Springer Verlag, Heidelberg, 1991.
[13] Jean Claude Laprie. Dependability —its attributes, impairments and means. In
     Predictably Dependable Computing Systems, pages 1–19. Springer Verlag, 1995.
[14] Jean Claude Laprie. Dependable computing: Concepts, challenges, directions. In
     Proceedings, COMPSAC, 2004.
[15] Francesco Logozzo and Thomas Ball. Modular and verified automatic program
     repair. In Proceedings, OOPSLA, pages 133–146, 2012.
[16] Francesco Logozzo, Shuvendu Lahiri, Manual Faehndrich, and Sam Blackshear.
     Verification modulo versions: Towards usable verification. In Proceedings, PLDI,
     page 32, 2014.
[17] Asma Louhichi, Wided Ghardallou, Khaled Bsaies, Lamia Labed Jilani, Olfa
     Mraihi, and Ali Mili. Verifying loops with invariant relations. International Jour-
     nal of Critical Computer Based Systems, 5(1/2):78–102, 2014.
[18] Yu Seung Ma, Jeff Offutt, and Yong Rae Kwon. Mujava: An automated class
     mutation system. Software Testing, Verification and Reliability, 15(2):97–133,
     June 2005.
[19] Zohar Manna. A Mathematical Theory of Computation. McGraw Hill, 1974.
[20] Ali Mili, Marcelo Frias, and Ali Jaoua. On faults and faulty programs. In Peter
     Hoefner, Peter Jipsen, Wolfram Kahl, and Martin Eric Mueller, editors, Proceed-
     ings, RAMICS: 14th International Conference on Relational and Algebraic Meth-
     ods in Computer Science, volume 8428 of Lecture Notes in Computer Science,
     pages 191–207, Marienstatt, Germany, April 28–May 1st 2014. Springer.
[21] Harlan D. Mills, Victor R. Basili, John D. Gannon, and Dick R. Hamlet. Structured
     Programming: A Mathematical Approach. Allyn and Bacon, Boston, Ma, 1986.
[22] Carrol C. Morgan. Programming from Specifications. International Series in Com-
     puter Sciences. Prentice Hall, London, UK, 1998.
[23] Jose N. Oliveira. Programming from metaphorisms. Journal of Logic and Alge-
     braic Programming, 2016.
[24] Daniel Perelman, Sumit Gulwani, Dan Grossman, and Peter Provost. Test driven
     synthesis. In Proceedings, 35th ACM SIGPLAN Conference, PLDI, volume 49,
     pages 408–418, Edinburgh, UK, 2014.



                                          15