Automatic Testing of Operation Invariance Tobias Gödderz and Janis Voigtländer University of Bonn, Germany, {goedderz,jv}@cs.uni-bonn.de Abstract. We present an approach to automatically generating operation invari- ance tests for use with Haskell’s random testing framework QuickCheck. The motivation stems from a paper by Holdermans [8] which showed how to address certain shortcomings of straightforward testing of implementations of an abstract datatype. While effective, his solution requires extra generation work from the test engineer. Also, it may not even be doable if the person responsible for testing has no knowledge about, and program-level access to, the internals of the concrete datatype implementation under test. We propose and realize a refinement to Hol- dermans’ solution that improves on both aspects: Required operation invariance tests can be formulated even in ignorance of implementation internals, and can be automatically generated using Template Haskell. 1 Introduction It is good software engineering practice to test one’s, and possibly other’s, code. In declarative languages, QuickCheck [4] and related tools [1, 3, 10] are an attractive option, combining convenient ways of generating test input data and a DSL for expressing properties to be tested. One recurring situation where property-based testing is desirable is in connection with implementations of an abstract datatype (ADT) specification. The scenario is that some interface (API) is given as a collection of type signatures of operations along with a collection of equational axioms that are expected to hold of those operations. Then there is an, or possibly several, implementations of that specification. The user of such an implementation should not need to be concerned with its internals, but would still like to be convinced of its correctness. It seems very natural to simply turn the given axioms into QuickCheck-like properties, and to accept an implementation as correct if it passes all those axioms-become-tests. However, it has been known for a while that such an approach is not enough to uncover all possible errors [6, 9]. Subtle bugs can remain hidden, due to an unfortunate interplay of buggy implementation and programmed equality (while actual semantic equality is impossible to test in general). Recently, Holdermans [8] presented a solution that works by adding operation invariance tests, to ensure that the assumed notion of equality is not just an equivalence relation, but actually a congruence relation. His solution has its own problems though. Specifically, it requires hand-writing a certain kind of additional test input data generator. But not only is that something which may depend on implementation internals (so is not necessarily something that a user having access only to the datatype’s external API could do); it is also extra work and an additional source of potential errors: get that generator wrong by not covering enough ground, and even the additional tests will not be enough to really establish overall correctness. We set out to improve on these aspects. 2 The Need for Operation Invariance Tests This section is largely a recap of material from Holdermans’ paper. Specifically, we also use his very compelling example of how the approach of simply (and only) testing the axioms from the specification of an ADT may lead to a false sense of security. Assume a set of people is interested in integer FIFO queues, and in particular in separately specifying, implementing, and using them. The specification is just a mathematical entity, consisting of a listing of signatures of desirable operations: empty :: Queue enqueue :: Int → Queue → Queue isEmpty :: Queue → Bool dequeue :: Queue → Queue front :: Queue → Int and of axioms expected to hold: Q1 : isEmpty empty = True Q2 : isEmpty (enqueue x q) = False Q3 : front (enqueue x empty) = x Q4 : front (enqueue x q) = front q if isEmpty q = False Q5 : dequeue (enqueue x empty) = empty Q6 : dequeue (enqueue x q) = enqueue x (dequeue q) if isEmpty q = False An implementation in Haskell would be a module module Queue (Queue, empty, enqueue, isEmpty, front, dequeue) where ... that contains some definition for Queue as well as definitions for the operations adhering to the type signatures from the specification. Importantly, the type Queue is exported without any data constructors, so from the outside of the module, Queue values can only be created, manipulated, and inspected using the provided operations. A prospective user of the implementation can import the above module and call those operations in application code. Now, the user would like to have some certainty that the implementa- tion is correct. The appropriate notion of correctness is adherence to the axioms from the mathematical specification, on which user and implementer should have agreed beforehand. One way for the implementer to convince the user of the correctness is to do extensive property-based testing to establish validity of the specification’s axioms for the provided implementation. Using QuickCheck, the test suite would consist of the following properties: q1 = property (isEmpty empty == True) q2 = property (λ x q → isEmpty (enqueue x q) == False) q3 = property (λ x → front (enqueue x empty) == x) q4 = property (λ x q → isEmpty q == False =⇒ front (enqueue x q) == front q) q5 = property (λ x → dequeue (enqueue x empty) == empty) q6 = property (λ x q → isEmpty q == False =⇒ dequeue (enqueue x q) == enqueue x (dequeue q)) These could even be written down by the user person in ignorance of any internals of the implementation. However, the use of == on Queue values requires availability of an appropriate instance of the Eq type class (which is how Haskell organizes overloading of ==). Let us assume the implementer provides such an instance. Moreover, we have to assume that the implementer also provides an appropriate random generator for Queue values (the q values quantified via the lambda-abstractions above – whereas for the Int values quantified as x we can take for granted that a random generator already exists), because otherwise the properties cannot be tested. In general, the implementer may even have to provide several random Queue generators for different purposes, for example since otherwise certain preconditions in axioms might be too seldom fulfilled, thus preventing effective testing.1 But in the example here this is not an issue, since both empty and nonempty queues will appear with reasonable likelihood among the generated test data. So if a Queue implementation passes all the above tests, it should be correct, right? Unfortunately not. As Holdermans [8] demonstrates, it can happen, even under all the assumptions above, that an implementation passes all the tests but is still harmfully incorrect. Let us repeat that faulty implementation in full as well (also since we will later want to refer to it in checking the adequacy of our own testing solution to the overall problem). Here it is: data Queue = BQ [Int] [Int] bq :: [Int] → [Int] → Queue bq [ ] r = BQ (reverse r) [ ] bq f r = BQ f r empty :: Queue empty = bq [ ] [ ] enqueue :: Int → Queue → Queue enqueue x (BQ f r) = bq f (x : r) isEmpty :: Queue → Bool isEmpty (BQ f ) = null f 1 QuickCheck’s restricting conditional operator =⇒, written as ==>, does not count “A” being false as evidence for “A implies B” being true. Thus, in order to reach a certain number of positive test cases for “A implies B”, that many cases with both “A” and “B” being true must be encountered (and, of course, not a single case with “A” true but “B” false). Consequently, QuickCheck gives up testing if not enough cases with “A” being true are encountered. front :: Queue → Int front (BQ f ) = last f dequeue :: Queue → Queue dequeue (BQ f r) = bq (tail f ) r This implementation uses the “smart constructor” bq to preserve the invariant that, for every BQ f r, it holds that null f implies null r, which makes front simpler to implement. Ironically, in the implementation above the error nevertheless lies in the definition of front, which should have been head f rather than last f . As already mentioned, in order to test properties q1 –q6 , we need a definition of == for Queue and a random generator for Queue values. The implementer is kind enough to provide both (and both are perfectly sane) within the Queue module: instance Eq Queue where q == q0 = toList q == toList q0 toList :: Queue → [Int] toList (BQ f r) = f ++ reverse r instance Arbitrary Queue where arbitrary = do f ← arbitrary r ← arbitrary return (bq f r) Now we can run tests quickCheck q1 through quickCheck q6 , and find that all are satisfied. And this is not just happenstance, by incidentally not hitting a counterexample during the random generation of test input data. No, it is a mathematical fact that the implementation above satisfies the ==-equations in properties q1 –q6 . And yet the implementation is incorrect. To see this, consider: > front (dequeue (enqueue 3 (enqueue 2 (enqueue 1 empty)))) 3 Clearly, a correct implementation of a FIFO queue should have output 2 here. In fact, Q2 –Q6 can be used to prove, by equational reasoning, that front (dequeue (enqueue 3 (enqueue 2 (enqueue 1 empty)))) = 2 The key to what went wrong here (the implementation being incorrect and yet passing all the tests derived from the specification axioms) is that properties q1 –q6 do not, and in fact cannot, test for semantic equivalence. They can only be formulated using the programmed equivalence ==. That would be fine if == were not just any equivalence relation that satisfies q1 –q6 (which we know it does), but actually a congruence relation that does so. For otherwise, == cannot correspond to the semantic equivalence = intuitively used in Q1 –Q6 . Being a congruence relation means to, in addition to being an equivalence relation, be compatible with all operations from the ADT specification. And that is not the case for the implementation given above. For example, > let q = dequeue (enqueue 3 (enqueue 2 (enqueue 1 empty))) > let q0 = enqueue 3 (dequeue (enqueue 2 (enqueue 1 empty))) > q == q0 True but > front q == front q0 False That does not necessarily mean that, morally, the implementation of == given for Queue (and used in the tests) is wrong. The real bug here resides in the implementation of front, but it cannot be detected by just checking q1 –q6 ; one additionally needs appropriate compatibility axioms.2 So Holdermans [8] observes that one should check additional axioms like q7 = property (λ q q0 → q == q0 =⇒ front q == front q0 ) Actually, he wisely adds a non-emptiness check to the precondition to prevent both front q and front q0 from leading to a runtime error. But even then, the new axiom (or any of the other added compatibility axioms) does not – except in very lucky attempts – actually uncover the bug in the implementation. The problem is that it is very unlikely to hit a case with q == q0 for random q and q0 . And even if QuickCheck hits one such case every now and then, it is not very likely that it is also a counterexample to front q == front q0 . So in the vast majority of test runs QuickCheck simply gives up, and we are none the wiser about whether the compatibility axioms do hold for the given implementation or do not. Holdermans’ solution to this problem is to invest manual effort into randomly generating pairs of q and q0 that ought to be considered equivalent queues. Concretely, data Equiv a = a :≡: a instance Arbitrary (Equiv Queue) where arbitrary = do z ← arbitrary x ← from z y ← from z return (x :≡: y) where from xs = do i ← choose (0, length xs − 1) let (xs1 , xs2 ) = splitAt i xs return (bq xs1 (reverse xs2 )) Given some Show instances for Queue and Equiv Queue, testing the newly formulated property q7 = property (λ (q :≡: q0 ) → not (isEmpty q) =⇒ front q == front q0 ) would yield, for example: 2 In general, one should also convince oneself in the first place that == is indeed an equivalence relation as well (reflexive, symmetric, transitive), but often this will be a triviality. For example, any definition of the form x == x0 = f x == f x0 for some function f , like in the Eq instance for Queue, is already guaranteed to give an equivalence relation if == on the target type of f is known to be an equivalence relation. > quickCheck q7 *** Failed! Falsifiable (after 5 tests): BQ [2, 0] [ ] :≡: BQ [2] [0] That, finally, is useful information which can be used to figure out the bug in the implementation of front. But this success depends on the implementer having provided a good definition for the Arbitrary (Equiv Queue) instance above, specifically, a good from-function for “perturbing” a randomly generated queue in different ways. Note, this is not something the user could do themselves, or exercise any control over, since implementing that instance crucially depends on having access to the internals of the queue implementation module. If the implementer makes a mistake in writing that function/instance, then the bug in the original implementation of front will possibly remain unnoticed. For example, in an extreme case, the implementer may accidentally write the Arbitrary (Equiv Queue) instance in such a way that from is semantically equivalent to return, in which case q7 and all other compatibility axioms will be tested positively, despite the implementation of front still being incorrect. Holdermans notes that one might test the chosen perturbation functionality itself to ensure not having introduced an error there. But this, which would correspond to a test quickCheck (λ (q :≡: q0 ) → q == q0 ) does not help with judging the suitability of the perturbation in terms of having “enough randomness”; clearly from xs = return xs would make this test succeed as well. And the inverse property, that any pair of equivalent queues q == q0 has indeed a chance of also being generated as q :≡: q0 , is unfortunately not expressible as a QuickCheck test. Our purpose now is to avoid the need of defining an Arbitrary (Equiv Queue) in- stance. Thus, we will empower the user to detect operation invariance violations in an ADT implementation without having access to internals or relying on the implementer to generate the required pairs of equivalent values for tests. 3 Thoughts on Random Terms A simple idea would be to work purely symbolically as long as possible. For example, the values q = dequeue (enqueue 3 (enqueue 2 (enqueue 1 empty))) and q0 = enqueue 3 (dequeue (enqueue 2 (enqueue 1 empty))) that we first mentioned as evidence for a case with q == q0 but front q /= front q03 , are semantically equivalent according to Q1 –Q6 . Naturally, any pair of conceptually equivalent queues can be shown (mathematically) to be equivalent by using those original axioms. After all, that is the whole point of having the ADT specification comprising of those axioms in the first place. 3 We have /= as the negation of programmed equivalence ==. So a suitable strategy for generating terms q and q0 that would be eligible as q :≡: q0 pairs might be to start from the given signatures of operations, randomly generate a well-typed term q from those operations, then randomly apply a random selection of the axioms Q1 –Q6 at random places in the term q to obtain another term q0 . Using pairs of such q and q0 for tests like front q == front q0 (now actually computing the result, no longer working symbolically) would indeed, in principle, have the power to detect the bugs that violate operation invariance. In fact, this is exactly the strategy we initially pursued. However, it does not work out in practice. Already generating random terms is known to be non-trivial, though solutions exist [5]. Moreover applying random axioms at random places is difficult in general to get right in the sense of ending up with a probability distribution that is effective in terms of hitting cases that uncover bugs. For example, given an integer queue, only the root node can be of type Bool, which will lead to a low probability of axiom Q2 to be applicable at a randomly selected node and an even lower probability of it to be selected for application, especially for large terms. Conversely first deciding on an axiom to apply and then randomly generating a (sub)term at which it is indeed applicable leads to similar uncertainties about coverage. Additionally, there is a problem with “infeasible” terms. For integer queues, that is every term where dequeue or front is applied to a subterm t with isEmpty t being True. Assume for now that x in each term enqueue x q might only be an integer (and not a more complex subterm front q0 ).4 Then the tree of each term is a path. Its leaf will be empty, its root node one of enqueue, isEmpty, front, or dequeue. All inner nodes can only be one of enqueue or dequeue. If at any node in the path more of its descendants are of type dequeue than enqueue, then the term is not valid. The probability that a term of length n is feasible is 12 for n = 1, 1024 252 for n = 10, and about 0.125 for n = 40. This might be much worse for ADTs with more complex terms. Without prior knowledge about which operations are problematic and which axioms contribute to situations of equivalent values (queues, in the example) whose equivalence is not preserved by applying an operation, there is not much hope with the naive approach described above. But of course we precisely want to avoid having to invest any such knowledge about a specific ADT or implementation, since our goal is a generic solution to the problem. Also, working purely symbolically would not be practical for another reason: some axioms, like Q4 and Q6 , have preconditions that need to be established before application. So in the hypothetical rewriting of q into q0 by random applications of axioms described above, we would either have to symbolically prove such preconditions along the way (arbitrarily hard in general), or resort to actually computing values of subterms of q to establish whether some axiom is applicable there or not. 4 Our Solution So what can we do instead? We want to avoid having to apply arbitrarily many rewrites at random places deep in some terms. Toward a solution, let us discuss the case of a binary operation f with two argument positions to be filled by a value of the abstract 4 This could be seen as having an operation enqueue :: Queue → Queue for every x :: Int. x datatype (unlike in the case of Queue, where no operation takes more than one queue as input).5 The required operation invariance test would be that t1 :≡: t10 and t2 :≡: t20 imply f t1 t2 == f t10 t20 , where ti :≡: ti0 means that ti and ti0 are convertible, i.e., there is a sequence of axiom applications that leads from term ti to term ti0 . We claim that it would be enough to focus on the two argument positions of f separately, and to consider only single axiom applications. Indeed, assume the above test leads to a counterexample, i.e., f t1 t2 /= f t10 t20 . Then there are already t, t0 , and t00 with f t t0 /= f t t00 or f t0 t /= f t00 t and where t0 and t00 are exactly one axiom application apart. Why? By the given assumptions, we can form a sequence f t1 t2 = . . . = f t10 t2 = . . . = f t10 t20 of single axiom applications, where in the first part of that sequence axioms are only applied inside the first argument position of f , later only inside the second one. Now if t, t0 , and t00 with the claimed properties would not exist, then it would have to be the case that f t1 t2 == . . . == f t10 t2 == . . . == f t10 t20 and thus f t1 t2 == f t10 t20 by transitivity of ==6 . But this is a contradiction to f t1 t2 /= f t10 t20 . So, generalizing from the case of a binary operation, we have that in order to es- tablish operation invariance overall, it suffices to test that for every operation f it holds f . . . t0 . . . == f . . . t00 . . . for every argument position and all pairs t0 and t00 of terms related by exactly one axiom application and where the other argument positions are appropri- ately filled with random input (but with the same on left and right; this is what the “. . . ” indicate in f . . . t0 . . . == f . . . t00 . . . ). Still, the axiom application relating t0 and t00 could be somewhere deeply nested, i.e., t0 could be f1 . . . (f2 . . . (. . . (fn . . . lhs . . . ) . . . ) . . . ) . . . and t00 be f1 . . . (f2 . . . (. . . (fn . . . rhs . . . ) . . . ) . . . ) . . . , where lhs and rhs are the two sides of an instantiated axiom, while all the other parts (“. . . ”) in the two nestings agree. We could generate tests arising from this, an important observation being that for the “. . . ” parts, since they are equal on both sides (not only equivalent), we can simply use random values – no random symbolic terms are necessary, which is an important gain. Note that, as Haskell is purely functional, it is not necessary to model dependencies between argu- ments to obtain completeness. Actually, we restrict to a subset of all tests, namely ones where the axiom application in fact is at the root of t0 and t00 . That is, we only employ tests f . . . lhs . . . == f . . . rhs . . . , not e.g. f . . . (f1 . . . lhs . . . ) . . . == f . . . (f1 . . . rhs . . . ) . . . – a pragmatic decision, but also one that has turned out well so far. We have not en- countered a situation where this narrowing of test cases has missed some bug, ex- cept in very artificial examples manufactured just for that purpose. The intuitive rea- son seems to be that by doing all tests f . . . lhs . . . == f . . . rhs . . . , which of course also includes the tests f1 . . . lhs . . . == f1 . . . rhs . . . , for varying lhs/rhs as obtained from all axioms, one casts a fine enough net – situations where these all go through, along with all standard instantiations lhs == rhs obtained from the axioms, but where f . . . (f1 . . . lhs . . . ) . . . == f . . . (f1 . . . rhs . . . ) . . . would fail, appear to be extremely rare. To summarize, we propose to test operation invariance via properties of the following form, for each combination of one operation, one argument position, and one axiom: f x1 . . . (axiomlhs y1 . . . ym ) . . . xn == f x1 . . . (axiomrhs y1 . . . ym ) . . . xn 5 The consideration of binary operations here is without loss of generality. Dealing with unary operations is simpler, and dealing with operations of higher arity than two is analogous to dealing with binary operations, just requires more index fiddling in the discussion. 6 Note that while the operation invariance property of == is still under test, we were willing to simply take for granted that == is at least an equivalence relation. See also Footnote 2 where like the xi (alluded to as the “other” values above), the y j – which are meant to fill any variable positions in a symbolic axiom lhs = rhs – can be simply values. No need for symbolic terms, since neither in the xi nor in the y j positions we need to assume any further possible axiom rewrites. Also, no need to generate terms/values in an implementation-dependent way, since the signature and axiom information from the ADT specification suffices. The discussion in the paragraphs preceding this one implies that the proposed approach is sound (if a counterexample is found, then there is a genuine problem with the ADT implementation under test), but not in general complete (even if the axioms and our subset of operation invariance properties are tested exhaustively, some bug in the ADT implementation might be missed – due to our pragmatic decision not to consider deeper nested rewrites like f x1 . . . (f1 y1 . . . (axiomlhs z1 . . . zl ) . . . ym ) . . . xn == f x1 . . . (f1 y1 . . . (axiomrhs z1 . . . zl ) . . . ym ) . . . xn ). The full set of tests of the proposed form for integer queues follows. An index enqueue1 or enqueue2 indicates that the first or second argument position of enqueue is filled by the particular axiom, respectively. Parameters with a prime (i.e. q0 and x0 ) fill the other arguments of the tested operation as opposed to being variables of the relevant axiom. enqueue1 q3 = property (λ x q0 → let lhs = front (enqueue x empty) rhs = x in enqueue lhs q0 == enqueue rhs q0 ) enqueue1 q4 = property (λ x q q0 → isEmpty q == False =⇒ let lhs = front (enqueue x q) rhs = front q in enqueue lhs q0 == enqueue rhs q0 ) enqueue2 q5 = property (λ x x0 → let lhs = dequeue (enqueue x empty) rhs = empty in enqueue x0 lhs == enqueue x0 rhs) isEmpty q5 = property (λ x → let lhs = dequeue (enqueue x empty) rhs = empty in isEmpty lhs == isEmpty rhs) enqueue2 q6 = property (λ x q x0 → isEmpty q == False =⇒ let lhs = dequeue (enqueue x q) rhs = enqueue x (dequeue q) in enqueue x0 lhs == enqueue x0 rhs) isEmpty q6 = property (λ x q → isEmpty q == False =⇒ let lhs = dequeue (enqueue x q) rhs = enqueue x (dequeue q) in isEmpty lhs == isEmpty rhs) dequeue q6 = property (λ x q → isEmpty q == False =⇒ let lhs = dequeue (enqueue x q) rhs = enqueue x (dequeue q) in not (isEmpty lhs) =⇒ dequeue lhs == dequeue rhs) front q6 = property (λ x q → isEmpty q == False =⇒ let lhs = dequeue (enqueue x q) rhs = enqueue x (dequeue q) in not (isEmpty lhs) =⇒ front lhs == front rhs) There are two additional tests that syntactically belong to the others, but do not really add anything: dequeue q5 = property (λ x → let lhs = dequeue (enqueue x empty) rhs = empty in not (isEmpty lhs) =⇒ dequeue lhs == dequeue rhs) front q5 = property (λ x → let lhs = dequeue (enqueue x empty) rhs = empty in not (isEmpty lhs) =⇒ front lhs == front rhs) This is because both sides of Q5 are empty queues, but neither dequeue nor front work on those. 5 A Practical Implementation The tests shown in the previous section can (almost) strictly syntactically be derived from the specification. For every operation, every argument the operation has, and every axiom, one test can be obtained – by applying the operation at the selected argument to the axiom if the types allow it. In addition, constraints may have to be added per operation to prevent their application to invalid values, like the constraints not (isEmpty lhs) in dequeue q6 and front q6 (and in dequeue q5 and front q5 ). A tool or library that generates these tests automatically needs type information about both operations and axioms. Details about the implementation of the datatype and operations, or the specific terms in the axioms, are not necessary. As relatively arbitrarily typed code must be generated, it seems to be at least very tricky to do this with plain Haskell and without a lot of manual help. Thus, to automate our solution as much as possible, we used Template Haskell [11]. As a result, none of the shown tests need to be hand-written by the user. As a case study, we demonstrate here different ways how our tool/library (in a new module Test.OITestGenerator) can be used to generate appropriate QuickCheck tests. First, the axioms have to be specified. For this, OITestGenerator exports a datatype AxiomResult a, its constructor =!=, and a restricting conditional operator V7 that works 7 which is written as ===> akin to =⇒ as known from QuickCheck. Axioms with variables are written as functions with corresponding arguments, returning an AxiomResult a where a is the codomain of the axiom’s left- and right-hand side. q1 :: AxiomResult Bool q1 = isEmpty empty =!= True q2 :: Int → Queue → AxiomResult Bool q2 = λ x q → isEmpty (enqueue x q) =!= False q3 :: Int → AxiomResult Int q3 = λ x → front (enqueue x empty) =!= x q4 :: Int → Queue → AxiomResult Int q4 = λ x q → not (isEmpty q) V front (enqueue x q) =!= front q q5 :: Int → AxiomResult Queue q5 = λ x → dequeue (enqueue x empty) =!= empty q6 :: Int → Queue → AxiomResult Queue q6 = λ x q → not (isEmpty q) V dequeue (enqueue x q) =!= enqueue x (dequeue q) This is already enough preparation to generate the basic tests, i.e., direct translations of the axioms, with the provided function generate basic tests. It gets one argument, a list of Axioms. Axiom is a container holding a) the name of an axiom, which (the axiom) must be a function returning an AxiomResult a, and b) possibly custom generators for the function’s arguments. It has one constructor function axiom, which takes a Name – custom generators can be assigned to an Axiom via withGens, which is explained later. Then generate basic tests returns an expression of type [Property]. Property is the type of QuickCheck tests, as returned by the function property in the earlier given versions of q1 –q6 . In using generate basic tests to generate the basic tests from the above functions returning AxiomResults, two of Template Haskell’s syntactic constructs will be needed: The first are splices. A splice is written $(. . . ), where . . . is an expression. A splice may occur instead of an expression. The splice will be evaluated at compile time and the syntax tree returned by it will be inserted in its place. The second construct used is ’ . . . , where . . . is a name of a function variable or data constructor. Then ’ . . . is of type Name and its value represents the name of the . . . that was quoted. Using these constructs, we can write: adt basic tests :: [Property] adt basic tests = $(let axs = map axiom [’q1 , ’q2 , ’q3 , ’q4 , ’q5 , ’q6 ] in generate basic tests axs) Now, adt basic tests can be executed via mapM quickCheck adt basic tests. Before the, for our purposes more interesting, operation invariance tests can be generated, constraints for dequeue and front must be specified. Such a constraint function has the purpose of deciding whether a set of arguments is valid for the respective operation. Thus it takes the same arguments, but returns a Bool independently of the operation’s return type. may dequeue :: Queue → Bool may dequeue = not ◦ isEmpty may front :: Queue → Bool may front = not ◦ isEmpty Given these, the operation invariance tests can be generated by the provided function generate oi tests. For operations there exists a datatype Op similar to Axiom, with one constructor function op. The function withConstraint may be used to add a constraint to an operation.8 It may also be used multiple times on the same operation, in which case the constraints are connected with a logical “and”. adt oi tests :: [Property] adt oi tests = $(let ops = [op ’empty , op ’enqueue , op ’isEmpty , withConstraint (op ’dequeue) ’may dequeue , withConstraint (op ’front) ’may front ] axs = map axiom [’q1 , ’q2 , ’q3 , ’q4 , ’q5 , ’q6 ] in generate oi tests axs ops) Note that the repeated local definition of axs (in both adt basic tests and adt oi tests) is necessary due to Template Haskell’s stage restrictions. It is not possible to refer to a top-level declaration in the same file, because it is still being compiled when the splices are executed. Note also that empty could be omitted here from the list of operations as it takes no arguments. Running the tests automatically generated above is enough to detect the buggy implementation of front! +++ OK, passed 100 tests (100% Queue.enqueue@1/Main.q3). +++ OK, passed 100 tests (100% Queue.enqueue@1/Main.q4). +++ OK, passed 100 tests (100% Queue.enqueue@2/Main.q5). +++ OK, passed 100 tests (100% Queue.isEmpty@1/Main.q5). *** Gave up! Passed only 0 tests. *** Gave up! Passed only 0 tests. +++ OK, passed 100 tests (100% Queue.enqueue@2/Main.q6). +++ OK, passed 100 tests (100% Queue.isEmpty@1/Main.q6). +++ OK, passed 100 tests (100% Queue.dequeue@1/Main.q6). *** Failed! Falsifiable (after 5 tests): 3 BQ [4] [4, −3] The test that fails here is front q6 (which would have appeared in the output as Queue.front@1/Main.q6). Note that two tests were generated that are correctly typed but have no valid input (as already observed further above when writing down proper- ties by hand); namely dequeue q5 and front q5 alias Queue.dequeue@1/Main.q5 and 8 As opposed to constraints for axioms, which are specified using V in the function whose name is passed to axiom. Queue.front@1/Main.q5. They could be avoided by using other functions exported by OITestGenerator and explained below. Generators can be passed to an operation via withGens in a list, with one generator name for each argument, as in withGens (op ’enqueue) [’arbitrary, ’arbitrary]. It is not allowed to omit generators when using withGens; instead arbitrary must be passed if no special generator should be used for some position. The function withGens can be intermingled with withConstraint and may also be used on Axioms. Also, there is a convenience function generate axiom’s tests which takes only one Axiom and a list of Ops. It is useful when certain combinations of axioms and operations should be excluded. It can also be used when only specific argument positions of an operation should be excluded for an axiom. The function but :: Op → Arg → Op, when called as o ‘but‘ i, excludes the ith argument from o when generating tests. Arg has a single constructor function arg :: Int → Arg. The function but may be called multiple times on the same operation to exclude multiple arguments. To supplement it, there also is only :: Op → Arg → Op to include only one argument. For illustration: all q5 :: [Property] all q5 = $(let ops = [op ’empty , op ’enqueue , op ’isEmpty , op ’dequeue ‘but‘ arg 1 , op ’front ‘but‘ arg 1] in generate axiom’s tests (axiom ’q5 ) ops) Of course, in this case, dequeue and front could simply be omitted completely as they do only have one argument. Another convenience function is generate single test, which again works similarly to generate oi tests, but takes only one Axiom and one Op instead of lists, and generates only a single test. It may be used when more control is needed. enqueue1 q3 = $(generate single test (axiom ’q3 ) (op ’enqueue ‘only‘ 1)) enqueue1 q4 = $(generate single test (axiom ’q4 ) (op ’enqueue ‘only‘ 1)) enqueue2 q5 = $(generate single test (axiom ’q5 ) (op ’enqueue ‘only‘ 2)) isEmpty1 q5 = $(generate single test (axiom ’q5 ) (op ’isEmpty ‘only‘ 1)) enqueue2 q6 = $(generate single test (axiom ’q6 ) (op ’enqueue ‘only‘ 2)) isEmpty1 q6 = $(generate single test (axiom ’q6 ) (op ’isEmpty ‘only‘ 1)) dequeue1 q6 = $(generate single test (axiom ’q6 ) (op ’dequeue ‘only‘ 1)) front1 q6 = $(generate single test (axiom ’q6 ) (op ’front ‘only‘ 1)) No constraints are passed here because the superfluous tests dequeue1 q5 and front1 q5 were purposefully omitted, and because no axiom but Q5 can result in an empty queue. As writing such a list of tests is cumbersome, there is a function show all tests :: Maybe (String → Int → String → String) → [Name] → [Name] → ExpQ which takes an optional formatting function, a list of axiom names, and a list of operation names, and produces a String-typed expression whose content is exactly the code above (plus dequeue1 q5 and front1 q5 , which were removed manually from the output). single test str = $(let ops = [’empty, ’enqueue, ’isEmpty, ’dequeue, ’front ] axs = [’q1 , ’q2 , ’q3 , ’q4 , ’q5 , ’q6 ] in show all tests Nothing axs ops) If constraints have to be added to the generated code, they must be added manually. On the other hand, all ‘only‘ n could be omitted in the case above, since there is no operation with two arguments of the same type.9 Instead of Nothing above, a custom formatting function can be passed that generates the names of the properties. The implementation is available as a Cabal package at http://hackage.haskell. org/package/qc-oi-testgenerator. Insights into the implementation, as well as more discussion of formal aspects of the overall approach, can be found in the first author’s diploma thesis [7]. 6 Conclusion and Discussion We have presented and implemented an approach for automatically checking operation in- variance for Haskell implementations of abstract datatypes. The user writes down axioms by closely following standard QuickCheck syntax, and both the ordinary QuickCheck tests as well as additional operation invariance tests are derived from that. It might have been more desirable to obtain the necessary information about axioms directly from existing QuickCheck tests, freeing the user from possibly having to rewrite them. For this it would be necessary, in the implementation, to obtain the syntax tree of a given Haskell declaration and find the axiom by looking for a call to ==. Then, type information for the left- and right-hand side would be needed. As of today, neither is possible with Template Haskell: The Info data structure returned by reify has a field of type Maybe Dec to hold the syntax tree of the right-hand side of a declaration. However, according to the documentation of Template Haskell’s most current version 2.9.0.0, there is no implementation for this and the field always contains Nothing. Another way to obtain the syntax tree would be to write the axioms in expression quotations. In both cases, though, Template Haskell offers no way of obtaining the necessary type information, as those can only be retrieved for Names using reify, but not for arbitrary syntax trees. Also due to Template Haskell not offering a way of calling the type checker or other convenient ways to help generate correctly typed code yet, the implementation does not currently support polymorphic types. A workaround making it possible to test polymorphic abstract datatypes is to construct a (suitably chosen, cf. [2]) monomorphic instance and rename all participating functions. That way, reify returns the monomorphic types. On the conceptual side, it would be attractive to gain more insight into how effective the kind of tests we generate are in finding bugs in general. This might be achieved by a formalization of our approach and/or collecting experimental evidence from more case studies. Here we discuss only a small variation on the running example, which illustrates an interesting aspect concerning the implementation of equality: 9 The function generate single test throws a compile time error unless there is exactly one way to combine the given Axiom and Op. The error in the shown version of front is hidden from the basic tests only because the implemented equality compares two values by how the implementer thinks they should behave. An observational equality like the following one, which even does not touch the internals, would not so be fooled. q == q0 | isEmpty q /= isEmpty q0 = False | isEmpty q = True | otherwise = front q == front q0 ∧ dequeue q == dequeue q0 Still, the basic tests will not suffice in general. As a very artificial example, consider the queue implementation used so far, but now without the error in front, and replace the following two functions: bq f r = BQ (f ++ reverse r) [ ] enqueue x q@(BQ f r) | isEmpty q = bq f (r ++ [x]) | otherwise = BQ f (r ++ [x]) This error will never be found with the basic tests and using the above observational equality. So using operation invariance tests is still a good idea. Acknowledgments We thank the anonymous reviewers for their comments and suggestions. References [1] C. Amaral, M. Florido, and V.S. Costa. PrologCheck – Property-based testing in Prolog. In FLOPS, pages 1–17. Springer, 2014. [2] J.-P. Bernardy, P. Jansson, and K. Claessen. Testing polymorphic properties. In ESOP, pages 125–144. Springer, 2010. [3] J. Christiansen and S. Fischer. EasyCheck – Test data for free. In FLOPS, pages 322–336. Springer, 2008. [4] K. Claessen and J. Hughes. QuickCheck: A lightweight tool for random testing of Haskell programs. In ICFP, pages 268–279. ACM, 2000. [5] J. Duregård, P. Jansson, and M. Wang. Feat: Functional enumeration of algebraic types. In Haskell Symposium, pages 61–72. ACM, 2012. [6] J. Gannon, P. McMullin, and R. Hamlet. Data abstraction, implementation, specification, and testing. ACM Trans. Program. Lang. Syst., 3(3):211–223, 1981. [7] T. Gödderz. Verification of abstract data types: Automatic testing of operation invariance. Diploma thesis, University of Bonn, Germany, 2014. http://tobias.goedderz.info/ dt-inf.pdf. [8] S. Holdermans. Random testing of purely functional abstract datatypes: Guidelines for dealing with operation invariance. In PPDP, pages 275–284. ACM, 2013. [9] P. Koopman, P. Achten, and R. Plasmeijer. Model based testing with logical properties versus state machines. In IFL, pages 116–133. Springer, 2012. [10] C. Runciman, M. Naylor, and F. Lindblad. SmallCheck and Lazy SmallCheck: Automatic exhaustive testing for small values. In Haskell Symposium, pages 37–48. ACM, 2008. [11] T. Sheard and S. Peyton Jones. Template meta-programming for Haskell. In Haskell Workshop, pages 1–16. ACM, 2002.