=Paper=
{{Paper
|id=Vol-2271/paper1
|storemode=property
|title=Shaving with Occam’s Razor: Deriving Minimalist Theorem Provers for Minimal Logic
|pdfUrl=https://ceur-ws.org/Vol-2271/paper1.pdf
|volume=Vol-2271
|authors=Paul Tarau
|dblpUrl=https://dblp.org/rec/conf/rcra/Tarau18
}}
==Shaving with Occam’s Razor: Deriving Minimalist Theorem Provers for Minimal Logic==
<pdf width="1500px">https://ceur-ws.org/Vol-2271/paper1.pdf</pdf>
<pre>
    Shaving with Occam’s Razor: Deriving Minimalist
          Theorem Provers for Minimal Logic

                                         Paul Tarau

                      Department of Computer Science and Engineering
                                 University of North Texas
                                    paul.tarau@unt.edu


       Abstract. We derive a sequence of minimalist theorem provers for the impli-
       cational fragment of intuitionistic logic. Starting from Roy Dyckhoff’s sound
       and complete LJT calculus we apply declarative program transformations steps,
       while highlighting connections, via the Curry-Howard isomorphism to type in-
       ference mechanisms for the simply typed lambda calculus.
       We follow a test-driven development process with testing on formulas known to
       be tautologies that are inferred as types of lambda terms, as well as as by using
       exhaustive and random formula generators.
       We chose Prolog as our meta-language. Being derived from essentially the same
       formalisms as those we are covering reduces the semantic gap and results in sur-
       prisingly concise and efficient declarative implementations.
       Our code is available at: https://github.com/ptarau/TypesAndProofs.


Keywords: Curry-Howard isomorphism, propositional implicational intuitionistic logic,
type inference and type inhabitation, simply typed lambda terms, theorem proving,
declarative algorithms, logic programming, combinatorial search algorithms.


1   Introduction

The implicational fragment of propositional intuitionistic logic can be defined by two
axiom schemes:
K : A → (B → A)
S:     (A → (B → C)) → ((A → B) → (A → C))
and the modus ponens inference rule:
MP : A, A → B ` B.
    In its simplest form, the Curry-Howard isomorphism [1, 2] connects the implica-
tional fragment of propositional intuitionistic logic (called here minimal logic) and
types in the simply typed lambda calculus. A low polynomial type inference algorithm
associates a type (when it exists) to a lambda term. Harder (PSPACE-complete, see [3])
algorithms associate inhabitants to a given type expression with the resulting lambda
term (typically in normal form) serving as a witness for the existence of a proof for
the corresponding tautology in minimal logic. As a consequence, a theorem prover for
minimal logic can also be seen as a tool for program synthesis, as implemented by code
extraction algorithms in proof assistants like Coq [4].
     Besides the syntactically identical axioms for the combinators S and K and the ax-
ioms of the logic, an intuitive reason for this isomorphism between types and formulas
is that one can see propositions as sets of proofs on which functions, corresponding to
implications A → B act as transformers of proofs of A into proofs of B.
     Our interest in theorem provers for this minimalist logic fragment has been triggered
by its relation, via the Curry-Howard isomorphism, to the inverse problem correspond-
ing to inferring types for lambda terms, type inhabitation, with direct applications to
the generation of simply typed lambda terms that meet a given specification, more effi-
ciently than trying out all possible terms of increasing sizes for a match [5].
     At the same time, generating large lambda terms can help test correctness and scal-
ability of compilers for functional languages [6, 7] and proof assistants. But this is be-
coming increasingly difficult with size, as the asymptotic density of typable terms in
the set of closed lambda terms has been shown to converge to 0 [8]. As a consequence,
even our best generators [9] based on Boltzmann samplers are limited to lambda terms
in normal form of about size 60-70, given the very large number of retries needed to
filter out untypable terms.
     As SAT-problems are solvable quite efficiently in practice, despite being NP-complete,
one might want to see if this extends to the typical PSPACE-complete problem of find-
ing proofs for intuitionistic propositional calculus formulas. If so, an alternate method
would emerge for finding types and inhabitants of comparable sizes as those obtained
from Boltzmann samplers [9].
     Given the possibly subtle semantic variations between our provers, we have also fo-
cused on setting up an extensive combinatorial and random testing framework to ensure
correctness, as well as to evaluate their scalability and performance.
     The paper is organized as follows. Section 2 overviews sequent calculi for impli-
cational propositional intuitionistic logic. Section 3 describes a direct encoding of the
LJT calculus as a Prolog program. Section 4 describes successive derivation steps lead-
ing to simpler and/or faster programs, including nested Horn clause representations and
adaptations to support classical logic via Glivenko’s double-negation translation. Sec-
tion 5 describes our testing framework and section 6 provides performance evaluation.
Section 7 overviews related work and section 8 concludes the paper.


2   Proof systems for implicational intuitionistic propositional logic

Initially, like for other fields of mathematics and logic, Hilbert-style axioms were con-
sidered for intuitionistic logic. While simple and directly mapped to SKI-combinators
via the Curry-Howard isomorphism, their usability for automation is very limited. In
fact, their inadequacy for formalizing even ”hand-written” mathematics was the main
trigger of Gentzen’s work on natural deduction and sequent calculus, inspired by the
need for formal reasoning in the foundation of mathematics [10].
    Thus we start with Gentzen’s own calculus for intuitionistic logic, simplified here to
only cover the purely implicational fragment, given that our focus is on theorem provers
working on formulas that correspond to types of simply-typed lambda terms.
2.1   Gentzen’s LJ calculus, restricted to the implicational fragment of
      propositional intuitionistic logic
We assume familiarity with basic sequent calculus [10] notation. Gentzen’s original LJ
calculus [10] (with the equivalent notation of [11]) uses the following rules.


LJ1 :      A,Γ ` A

            A,Γ ` B
LJ2 :      Γ ` A→B

           A→B,Γ ` A B,Γ ` G
LJ3 :          A→B,Γ ` G

    As one can easily see, when trying a goal-driven implementation that uses the rules
in upward direction, the unchanged premises on left side of rule LJ3 would not en-
sure termination as nothing prevents A and G from repeatedly trading places during the
inference process.

2.2   Dyckhoff’s LJT calculus, restricted to the implicational fragment of
      propositional intuitionistic logic
Motivated by problems related to loop avoidance in implementing Gentzen’s LJ calcu-
lus, Roy Dyckhoff [11] splits rule LJ3 into LJT3 and LJT4 .


LJT1 :        A,Γ ` A

              A,Γ ` B
LJT2 :       Γ ` A→B

               B,A,Γ ` G
LJT3 :        A→B,A,Γ ` G

              D→B,Γ ` C→D B,Γ →G
LJT4 :          (C→D)→B,Γ ` G


    This avoids the need for loop checking to ensure termination. The rules work with
the context Γ being a multiset, but it has been shown later [12] that Γ can be a set, with
duplication in contexts eliminated.
    For supporting negation, one also needs to add LJT5 that deals with the special term
f alse. Then negation of A is defined as A → f alse.


LJT5 :        f alse,Γ ` G


    Interestingly, as it is not unusual with logic formalisms, the same calculus has been
discovered independently in the 50’s by Vorob’ev and in the 80’s by Hudelmaier [13,
14].


3   An executable specification: Dyckhoff’s LJT calculus, literally
Roy Dyckhoff has implemented the LJT calculus as a Prolog program. We have ported
it to SWI-Prolog as a reference implementation (see https://github.com/ptarau/
TypesAndProofs/blob/master/third_party/dyckhoff_orig.pro). However, it is a fairly
large (420 lines) program, partly because it covers the full set of intuitionistic connec-
tives and partly because of the complex heuristics that it implements.
    This brings up the question if, in the tradition of ”lean theorem provers”, we can
build one directly from the LJT calculus, in a goal oriented style, by reading the rules
from conclusions to premises.
    Thus, we start with a simple, almost literal translation of rules LJT1 . . . LJT4 to Pro-
log with values in the environment Γ denoted by the variable Vs.
lprove(T):-ljt(T,[]),!.

ljt(A,Vs):-memberchk(A,Vs),!.               % LJT_1

ljt((A->B),Vs):-!,ljt(B,[A|Vs]).            % LJT_2

ljt(G,Vs1):-atomic(G),                      % LJT_3
  select((A->B),Vs1,Vs2),
  atomic(A),
  memberchk(A,Vs2),
  !,
  ljt(G,[B|Vs2]).

ljt(G,Vs1):-                                % LJT_4
  select( ((C->D)->B),Vs1,Vs2),
  ljt((C->D), [(D->B)|Vs2]),
  !,
  ljt(G,[B|Vs2]).
     Note the use of select/3 to extract a term from the environment (a nondetermin-
istic step) and termination, via a multiset ordering based measure [11]. An example of
use is:
?- lprove(a->b->a).
true.

?- lprove((a->b)->a).
false.

Note also that integers can be used instead of atoms, flexibility that we will use as
needed.
    Besides the correctness of the bf LJT rule set (as proved in [11]), given that the
prover has past our tests, it looks like being already quite close to our interest in a
”lean” prover for minimal logic. However, given the extensive test set (see section 5)
that we have developed, it is not hard to get tempted in getting it a bit simpler and faster,
knowing that the smallest error will be instantly caught.

4     Simpler, faster = better?
We start with transformations that keep the underlying implicational formula unchanged.

4.1    Concentrating nondeterminism into one place
The first transformation merges the work of the two select/3 calls into a single call,
observing that they do similar things after the call. That avoids redoing the same itera-
tion over candidates for reduction, in exchange for not enforcing the application of rule
LJT3 before rule LJT4 .
bprove(T):-ljb(T,[]),!.

ljb(A,Vs):-memberchk(A,Vs),!.
ljb((A->B),Vs):-!,ljb(B,[A|Vs]).
ljb(G,Vs1):-
  select((A->B),Vs1,Vs2),
  ljb_imp(A,B,Vs2),
  !,
  ljb(G,[B|Vs2]).

ljb_imp((C->D),B,Vs):-!,ljb((C->D),[(D->B)|Vs]).
ljb_imp(A,_,Vs):-memberchk(A,Vs).
      This simpler form will facilitate our next transformations.

4.2    Extracting the proof terms
Extracting the proof terms (lambda terms having the formulas we prove as types) is
achieved by decorating in the code with application nodes a/2, lambda nodes l/2 (with
first argument a logic variable) and leaf nodes (with logic variables, same as the identi-
cally named ones in the first argument of the corresponding l/2 nodes).
     The simplicity of the predicate bprove/1 and the fact that this is essentially the
inverse of a type inference algorithm (e.g., the one in [15]) help with figuring out how
the decoration mechanism works.
sprove(T):-sprove(T,_).

sprove(T,X):-ljs(X,T,[]),!.

ljs(X,A,Vs):-memberchk(X:A,Vs),!. % leaf variable

ljs(l(X,E),(A->B),Vs):-!,ljs(E,B,[X:A|Vs]).            % lambda term

ljs(E,G,Vs1):-
  member(_:V,Vs1),head_of(V,G),!, % fail if non-tautology
  select(S:(A->B),Vs1,Vs2),   % source of application
  ljs_imp(T,A,B,Vs2),         % target of application
  !,
  ljs(E,G,[a(S,T):B|Vs2]).    % application

ljs_imp(E,A,_,Vs):-atomic(A),!,memberchk(E:A,Vs).
ljs_imp(l(X,l(Y,E)),(C->D),B,Vs):-ljs(E,D,[X:C,Y:(D->B)|Vs]).

head_of(_->B,G):-!,head_of(B,G).
head_of(G,G).
    Thus lambda nodes decorate implication introductions and application nodes deco-
rate modus ponens reductions in the corresponding calculus. Note that the two clauses
of ljs imp provide the target node T that, when seen from the type inference side,
results from cancelling the source type S and the application type S → T . Note also that
we added in the third clause of ljs/3 a test eliminating non-tautologies, for faster re-
jection of unprovable formulas and modified the second clause of ljs imp/4 to make
obvious the addition of lambda binders for assumptions C and D->B. Calling sprove/2
on the formulas corresponding to the types of the S, K and I combinators, we obtain:
?- sprove(((0->1->2)->(0->1)->0->2),X).
X = l(A, l(B, l(C, a(a(A, C), a(B, C))))).                             % S

?- sprove((0->1->0),X).
X = l(A, l(B, A)).                                                     % K

?- sprove((0->0),X).
X = l(A, A).                                                           % I

   Note also the addition of the predicate head of that eliminates non-tautologies by
making sure that in the third clause of ljs, the goal G is the last element of at least one
implication chain. More on why this works, in subsection 4.4.


4.3   From multiset to set contexts

Replacing the multiset context with sets eliminates repeated computations. The larger
the terms, the more likely this is to be useful. It combines simplicity of bprove and
duplicate avoidance via the add new/3 predicate.
pprove(T):-ljp(T,[]),!.
ljp(A,Vs):-memberchk(A,Vs),!.
ljp((A->B),Vs1):-!,add_new(A,Vs1,Vs2),ljp(B,Vs2).
ljp(G,Vs1):- % atomic(G),
  select((A->B),Vs1,Vs2),
  ljp_imp(A,B,Vs2),
  !,
  add_new(B,Vs2,Vs3),
  ljp(G,Vs3).

ljp_imp(A,_,Vs):-atomic(A),!,memberchk(A,Vs).
ljp_imp((C->D),B,Vs1):-
   add_new((D->B),Vs1,Vs2),
   ljp((C->D),Vs2).

add_new(X,Xs,Ys):-memberchk(X,Xs),!,Ys=Xs.
add_new(X,Xs,[X|Xs]).


4.4   Implicational formulas as nested Horn Clauses
Given the equivalence between: B1 → B2 . . . Bn → H and (in Prolog notation) H :-
B1 , B2 , . . . , Bn , where we chose H is to be the atomic formula ending a chain of implica-
tions, we can recursively transform an implicational formula into one built form nested
clauses, as follows.
toHorn((A->B),(H:-Bs)):-!,toHorns((A->B),Bs,H).
toHorn(H,H).

toHorns((A->B),[HA|Bs],H):-!,toHorn(A,HA),toHorns(B,Bs,H).
toHorns(H,[],H).
Note also that the transformation is reversible and that lists (instead of Prolog’s con-
junction chains) are used to collect the elements of the body of a clause.

?- toHorn(((0->1->2)->(0->1)->0->2),R).
R = (2:-[(2:-[0, 1]), (1:-[0]), 0]).

?- toHorn(((0->1->2->3->4)->(0->1->2)->0->2->3),R).
R = (3:-[(4:-[0, 1, 2, 3]), (2:-[0, 1]), 0, 2]).

   This suggests transforming provers for implicational formulas into equivalent provers
working on nested Horn clauses.
hprove(T0):-toHorn(T0,T),ljh(T,[]),!.

ljh(A,Vs):-memberchk(A,Vs),!.
ljh((B:-As),Vs1):-!,append(As,Vs1,Vs2),ljh(B,Vs2).
ljh(G,Vs1):-               % atomic(G), G not on Vs1
  memberchk((G:-_),Vs1),   % if non-tautology, we just fail
  select((B:-As),Vs1,Vs2), % outer select loop
  select(A,As,Bs),         % inner select loop
  ljh_imp(A,B,Vs2),        % A is in the body of B
  !,trimmed((B:-Bs),NewB), % trim empty bodies
  ljh(G,[NewB|Vs2]).

ljh_imp(A,_B,Vs):-atomic(A),!,memberchk(A,Vs).
ljh_imp((D:-Cs),B,Vs):- ljh((D:-Cs),[(B:-[D])|Vs]).

trimmed((B:-[]),R):-!,R=B.
trimmed(BBs,BBs).

    Note that we have also added a second select/3 call to the third clause of ljh, to
give ljh imp more chances to succeed and commit. Basically, the nested Horn clause
form of implicational logic helps bypassing some intermediate steps, by focusing on the
head of the Horn clause, which corresponds to the last atom in a chain of implications.
This can also be used to speed up non-tautology rejection, by checking in the third
clause of ljh that the goal G is actually the head of at least one of the clauses in the
environment Vs1.
    In fact, it might be worth formalizing and studying the properties of this nested
Horn-clause prover directly, as a new calculus, given also its significant speed-up (as
shown in section 6), relative to all the other provers.


4.5   A lifting to classical logic, via Glivenko’s transformation

The simplest way to turn a propositional intuitionistic theorem prover into a classical
one is to use Glivenko’s translation that prefixes a formula with its double negation. By
adding the atom f alse, to the language of the formulas and a rewriting of the negation
of x into x → f alse, we obtain, after adding the special handling of false as the first
line of ljk/2:
:- op(425,    fy,   ~ ). % negation

gprove(T0):-dneg(T0,T),kprove(T).

kprove(T0):-expand_neg(T0,T),ljk(T,[]),!.

ljk(_,Vs):-memberchk(false,Vs),!.
ljk(A,Vs):-memberchk(A,Vs),!.
ljk((A->B),Vs):-!,ljk(B,[A|Vs]).
ljk(G,Vs1):-
  select((A->B),Vs1,Vs2),
  ljk_imp(A,B,Vs2),
  !,
  ljk(G,[B|Vs2]).

ljk_imp((C->D),B,Vs):-!,ljk((C->D),[(D->B)|Vs]).
ljk_imp(A,_,Vs):-memberchk(A,Vs).

expand_neg(A,R):-atomic(A),!,R=A.
expand_neg(~A,R):-!,expand_neg(A,B),R=(B->false).
expand_neg((A->B),(X->Y)):-expand_neg(A,X),expand_neg(B,Y).
    Note that the predicate kprove/1 simply extends implicational propositional calcu-
lus, as implemented by bprove/1, with the negation operator ~/1, while the predicate
gprove/1 prefixes a classical formula with double negation.


5     The testing framework
Correctness can be checked by identifying false positives or false negatives. A false
positive is a non-tautology that the prover proves, breaking the soundness property. A
false negative is a tautology that the prover fails to prove, breaking the completeness
property.
    While classical tautologies are easily tested (at small scale against truth tables, at
medium scale with classical propositional provers and at larger scale with a SAT solver),
intuitionistic provers require a more creative approach.
    As a first bootstrapping step, assuming that no ”gold standard” prover is available
one can look at the other side of the Curry-Howard isomorphism, and rely on gen-
erators for (typable) lambda terms and generators of formulas for implicational logic
expressions, with results being checked against a trusted type inference algorithm.
    As a next step, a trusted prover can be used as a gold standard to test both for false
positives and negatives.

5.1   Finding false negatives by generating the set of simply typed normal forms
      of a given size
A false negative is identified if our prover fails on a type expression known to have an
inhabitant. Via the Curry-Howard isomorphism, such terms are the types inferred for
lambda terms, generated by increasing sizes.
    We refer to [15] for a detailed description of efficient algorithms generating pairs
of simply typed lambda terms in normal form together with their principal types. The
variant of the code we use here is at: https://github.com/ptarau/TypesAndProofs/
blob/master/allTypedNFs.pro


5.2   Finding false positives by generating all well-formed type expressions of a
      given size
A false positive is identified if the prover succeeds finding an inhabitant for a type
expression that does not have one.
    We obtain type expressions by generating all binary trees of a given size, extracting
their leaf variables and then iterating over the set of their set partitions, while unifying
variables belonging to the same partition. We refer to [15] for a detailed description of
the algorithms.
    The code describing the all-tree and set partition generation as well as their integra-
tion as a type expression generator is at:
https://github.com/ptarau/TypesAndProofs/blob/master/allPartitions.pro.
    We tested the predicate lprove/1 as well as all other provers derived from it for
false negatives against simple types of terms up to size 15 (with size defined as 2 for
applications, 1 for lambdas and 0 for variables) and for false positives against all type
expressions up to size 7 (with size defined as the number of internal nodes).

5.3    Testing against a trusted reference implementation
Assuming we trust an existing reference implementation (e.g., after it passes our generator-
based tests), it makes sense to use it as a ”gold standard”. In this case, we can identify
both false positives and negatives directly, as follows:
gold_test(N,Generator,Gold,Silver, Term, Res):-
  call(Generator,N,Term),
  gold_test_one(Gold,Silver,Term, Res),
  Res\=agreement.

gold_test_one(Gold,Silver,T, Res):-
  ( call(Silver,T) -> \+ call(Gold,T),
     Res = wrong_success
  ; call(Gold,T) -> % \+ Silver
     Res = wrong_failure
  ; Res = agreement
  ).
    When specializing to a generator for all well-formed implication expressions, and
using Dyckhoff’s dprove/1 predicate as a gold standard, we have:
gold_test(N, Silver, Culprit, Unexpected):-
  gold_test(N, allImpFormulas, dprove, Silver, Culprit, Unexpected).
      To test the tester, we design a prover that randomly succeeds or fails.
badprove(_) :- 0 =:= random(2).
      We can now test lprove/1 and badprove/1 as follows:
?- gold_test(6,lprove,T,R).
false. % indicates that no false positive or negative is found
?- gold_test(6,badprove,T,R).
T = (0->1->0->0->0->0->0),
R = wrong_failure ;
...
?- gold_test(6,badprove,T,wrong_success).
T = (0->1->0->0->0->0->2) ;
T = (0->0->1->0->0->0->2) ;
T = (0->1->1->0->0->0->2) ;
...
    More complex implicit correctness tests can be designed, by comparing the behavior
of a prover that handles false, with Glivenko’s double negation transformations that
turns an intuitionistic propositional prover into a classical prover, working on classical
formulas containing implication and negation operators.
    After defining:
gold_classical_test(N,Silver,Culprit,Unexpected):-
  gold_test(N,allClassFormulas,tautology,Silver, Culprit,Unexpected).
    We can run it against the provers gprove/1 and kprove/1, using Melvin Fitting’s
classical tautology prover [16] tautology/1 as a gold standard.
?- gold_classical_test(7,gprove,Culprit,Error).
false. % no false positive or negative found

?- gold_classical_test(7,kprove,Culprit,Error).
Culprit = ((false->false)->0->0->((1->false)->false)->1),
Error = wrong_failure ;
Culprit = ((false->false)->0->1->((2->false)->false)->2),
Error = wrong_failure .
...

While gprove/1, implementing Glivenko’s translation, passes the test, kprove/1 that
handles intuitionistic tautologies (including negated formulas) will fail on classical tau-
tologies that are not also intuitionistic tautologies.

6     Performance and scalability testing
Once passing correctness tests, our provers need to be tested against large random terms.
The mechanism is similar to the use of all-term generators.

6.1   Random simply-typed terms, with Boltzmann samplers
We generate random simply-typed normal forms, using a Boltzmann sampler along
the lines of that described in [9]. The code variant, adapted to our different term-size
definition is at:
https://github.com/ptarau/TypesAndProofs/blob/master/ranNormalForms.pro.
    It works as follows:
?- ranTNF(10,XT,TypeSize).
XT = l(l(a(a(s(0), l(s(0))), 0))) : (((A->B)->B->C)->B->C).
TypeSize = 5.

?- ranTNF(60,XT,TypeSize),nv(XT).
XT = l(l(a(a(0, l(a(a(0, a(0, l(...))), s(s(0))))),
             l(l(a(a(0, a(l(...), a(..., ...))), l(0)))))))
        :
        (A->((((A->A)- ...)->D)->D)->M)->M),
TypeSize = 34.
Interestingly, partly due to the fact that there’s some variation in the size of the terms
Boltzmann sampler generate and more to the fact that the distribution of types favors
(as seen in the second example) the simple tautologies where an atom identical to the
last one is contained in the implication chain leading to it [17, 8], if we want to use these
for scalability tests, additional filtering mechanisms need to be used to statically reject
type expressions that are large but easy to prove as intuitionistic tautologies.
6.2   Random implicational formulas

The generation of random implicational formulas is more intricate.
    Our code combines an implementation of Rémy’s algorithm [18], along the lines of
Knuth’s algorithm R in [19] for the generation of random binary trees at
https://github.com/ptarau/TypesAndProofs/blob/master/RemyR.pro, with code to
generate random set partitions using an urn-based algorithm (see [20]) at
https://github.com/ptarau/TypesAndProofs/blob/master/ranPartition.pro.
    We refer to [21] for a declarative implementation of a variant of Rémy’s algorithm
in Prolog with code adapted for this paper at:
https://github.com/ptarau/TypesAndProofs/blob/master/RemyP.pro.
    As automatic Boltzmann sampler generation is limited to fixed numbers of equiv-
alence classes from which a CF- grammar can be given, we build our the random set
partition generator that groups logical variables in leaf position into equivalence classes
by using an urn-algorithm described in [20].
    Once a random binary tree of size N is generated with the ->/2 constructor label-
ing internal nodes, the N + 1 leaves of the tree are decorated with logic variables. As
variables sharing a name define equivalence classes on the set of variables, each choice
of them corresponds to a set partition of the N + 1 nodes.
    The combined generator, that generates in a few seconds terms of size 1000, works
as follows:

?- ranImpFormula(20,F).
F = (((0->(((1->2)->1->2->2)->3)->2)->4->(3->3)->
                (5->2)->6->3)->7->(4->5)->(4->8)->8) .

?- time(ranImpFormula(1000,_)). % includes tabling large Stirling numbers
% 37,245,709 inferences,7.501 CPU in 7.975 seconds (94% CPU, 4965628 Lips)

?- time(ranImpFormula(1000,_)). % much faster now, thanks to tabling
% 107,163 inferences,0.040 CPU in 0.044 seconds (92% CPU, 2659329 Lips)


Note that we use Prolog’s tabling (a form of automated dynamic programming) to avoid
costly recomputation of the (very large) Sterling numbers.


6.3   Testing with large random terms

Testing for false positives and false negatives for random terms proceeds in a similar
manner to exhaustive testing with terms of a given size.
    Assuming Roy Dyckhoff’s prover as a gold standard, we can find out that our nested
Horn Clause prover hprove/1 can handle 100 terms of size 50 as well as the gold
standard.

?- gold_ran_imp_test(50,100,hprove, Culprit, Unexpected).
false. % indicates no differences with the gold standard
    In fact, the size of the random terms handled by hprove/1 makes using provers
an appealing alternative to random lambda term generators in search for very large
lambda term simple type pairs. Interestingly, on the side of random simply typed terms,
limitations come from their vanishing density, while on the other side they come from
the known PSPACE-complete complexity of the proof procedures.

6.4    Can lean provers actually be fast? A quick performance evaluation
Our benchmarking code is at:
https://github.com/ptarau/TypesAndProofs/blob/master/benchmarks.pro.
     The following table compares several provers on exhaustive ”all-terms” bench-
marks, derived from our correctness test. First, we run them on the types inferred on
all lambda terms of a given size. Next we run them on all implicational formulas of a
given size (set to be about half of the former, as the number of these grows much faster).


      Prover Input size Time on Positive Examples Time on Mix of Examples Total Time
      lprove         13                      0.94                   0.253     1.194
      lprove         14                      4.34                   5.468     9.809
      lprove         15                    29.401                   5.367    34.768
      bprove         13                     0.925                   0.202     1.127
      bprove         14                     4.361                   4.208       8.57
      bprove         15                    31.378                   4.138    35.516
      sprove         13                     1.673                   0.174     1.848
      sprove         14                     7.697                   2.815    10.513
      sprove         15                    36.623                   2.799    39.423
      pprove         13                     1.386                   0.259     1.646
      pprove         14                     6.358                   5.047    11.405
      pprove         15                    29.598                   4.993    34.592
      hprove         13                     0.974                   0.117     1.092
      hprove         14                     4.193                   1.741     5.935
      hprove         15                    19.155                   1.761    20.917
      dprove         13                     5.127                   0.777     5.905
      dprove         14                    22.426                  13.207    35.633
      dprove         15                   103.634                   13.16   116.795

                      Fig. 1. Performance of provers on exhaustive tests


    Note that the size of the implicational formulas in the case of the ”Mix of examples”
is half of the size of the lambda terms whose type are used in the case of ”Positive
examples”
    The nested Horn clauses based hprove/1 turns out to be a clear winner. Most likely,
this comes from concentrating its non-deterministic choices in the two select/3 calls.
These calls happen in constant space and replace with a faster ”shallow backtracking”
loop, the ”deep backtracking” the other provers might need to cover the same search
space.
    In fact, hprove/1 seems to also scale well on larger formulas, following closely the
increase in the numbers of test formulas. Note that ”pos” marks formulas known to be
tautologies, generated by typable lambda terms of sizes 16,17 and 18 and ”mix” marks
implicational formulas of sizes 8, 8 and 9.

??- forall(between(16,18,N),bm(N,hprove)).
[prog=hprove,size=16,pos=90.257,mix=30.411,total=120.668]
[prog=hprove,size=17,pos=425.586,mix=32.214,total=457.8]
[prog=hprove,size=18,pos=2108.01,mix=657.807,total=2765.818]

    Testing exhaustively on small formulas, while an accurate indicator for average
speed, might not favor provers using more complex heuristics or extensive prepro-
cessing. But if that happens, one would expect it to result in a constant factor ratio,
rather than a fast increasing gap as it happens, for instance between Dyckhoff’s original
dprove and our best prover hprove.


7   Related work

The related work derived from Gentzen’s LJ calculus is in the hundreds if not in the
thousands of papers and books. Space constraints limit our discussion to the most
closely related papers, directly focusing on algorithms for implicational intuitionistic
propositional logic.
    Among them the closest are [11, 12] that we have used as starting points for deriv-
ing our provers. We have chosen to implement the LJT calculus directly rather than
deriving our programs from Roy Dyckhoff’s Prolog code.
    Similar calculi, key ideas of which made it into the Coq proof assistant’s code, are
described in [22].
    On the other side of the Curry-Howard isomorphism [23], described in full detail in
[24] finds and/or counts inhabitants of simple types in long normal form.
    Using hypothetical implications in Prolog, although all with a different semantics
than Gentzen’s LJ calculus or its LJT variant, go back as early as [25], followed by
a series of Lambda-Prolog and linear logic-related books and papers, e.g., [26]. The
similarity to the propositional subsets of N-Prolog [25] and λ -Prolog [26] comes from
their close connection to intuitionistic logic, although neither derive implementations
from a pure LJ-based calculus or have termination properties implemented along the
lines the LJT calculus. In [27] backtrackable linear and intuitionistic assumptions that
mimic the implication introduction rule are used, but they do not involve arbitrarily
deep nested implicational formulas.
    Overviews of closely related calculi, using the implicational subset of propositional
intuitionistic logic are [28, 12].


8   Conclusions and future work

Our empirically oriented approach has found variants of lean propositional intuitionistic
provers that are comparable to their more complex peers, derived from similar calculi.
Among them, the nested Horn clause prover might be worth formalizing as a calculus
and subject to deeper theoretical analysis. Given that it shares its main data structures
with Prolog, it seems interesting to optimize it via partial evaluation or compilation to
Prolog.
    Our renewed interest in finding lightweight implementations of these classic theo-
retically hard (PSPACE-complete) combinatorial search problems, is also motivated by
the possibility of parallel implementations using multi-core and GPU algorithms.
    We plan future work in formalizing the nested Horn-clause prover in sequent-calculus
and explore compilation techniques and parallel algorithms for it.
    A generalization to nested Horn clauses with universally quantified variables seems
also promising to explore, with either grounding techniques as used by SAT and ASPm
solvers or via compilation to Prolog.


Acknowledgement
This research has been supported by NSF grant 1423324.


References
 1. Howard, W.: The Formulae-as-types Notion of Construction. In Seldin, J., Hindley, J., eds.:
    To H.B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism. Academic
    Press, London (1980) 479–490
 2. Wadler, P.: Propositions as types. Commun. ACM 58 (2015) 75–84
 3. Statman, R.: Intuitionistic Propositional Logic is Polynomial-Space Complete. Theor. Com-
    put. Sci. 9 (1979) 67–72
 4. The Coq development team: The Coq proof assistant reference manual. (2018) Version 8.8.0.
 5. Tarau, P.: On Type-directed Generation of Lambda Terms. In De Vos, M., Eiter, T., Lierler,
    Y., Toni, F., eds.: 31st International Conference on Logic Programming (ICLP 2015), Techni-
    cal Communications, Cork, Ireland, CEUR (September 2015) available online at http://ceur-
    ws.org/Vol-1433/.
 6. Palka, M.H., Claessen, K., Russo, A., Hughes, J.: Testing an optimising compiler by generat-
    ing random lambda terms. In: Proceedings of the 6th International Workshop on Automation
    of Software Test. AST’11, New York, NY, USA, ACM (2011) 91–97
 7. Fetscher, B., Claessen, K., Palka, M.H., Hughes, J., Findler, R.B.: Making random judg-
    ments: Automatically generating well-typed terms from the definition of a type-system. In:
    Programming Languages and Systems - 24th European Symposium on Programming, ESOP
    2015, Held as Part of the European Joint Conferences on Theory and Practice of Software,
    ETAPS 2015, London, UK, April 11-18, 2015. Proceedings. (2015) 383–405
 8. Kostrzycka, Z., Zaionc, M.: Asymptotic densities in logic and type theory. Studia Logica
    88(3) (2008) 385–403
 9. Bendkowski, M., Grygiel, K., Tarau, P.: Random generation of closed simply typed λ -terms:
    A synergy between logic programming and Boltzmann samplers. TPLP 18(1) (2018) 97–119
10. Szabo, M.E.: The Collected Papers of Gerhard Gentzen. Philosophy of Science 39(1) (1972)
11. Dyckhoff, R.: Contraction-free sequent calculi for intuitionistic logic. Journal of Symbolic
    Logic 57(3) (1992) 795807
12. Dyckhoff, R.: Intuitionistic Decision Procedures Since Gentzen. In Kahle, R., Strahm, T.,
    Studer, T., eds.: Advances in Proof Theory, Cham, Springer International Publishing (2016)
    245–267
13. Hudelmaier, J.: A PROLOG Program for Intuitionistic Logic. SNS-Bericht-. Universität
    Tübingen (1988)
14. Hudelmaier, J.: An O(n log n)-Space Decision Procedure for Intuitionistic Propositional
    Logic. Journal of Logic and Computation 3(1) (1993) 63–75
15. Tarau, P.: A Hiking Trip Through the Orders of Magnitude: Deriving Efficient Generators for
    Closed Simply-Typed Lambda Terms and Normal Forms. In Hermenegildo, M.V., Lopez-
    Garcia, P., eds.: Logic-Based Program Synthesis and Transformation: 26th International
    Symposium, LOPSTR 2016, Edinburgh, UK, Revised Selected Papers, Springer LNCS, vol-
    ume 10184 (September 2017) 240–255 , Best paper award.
16. Fitting, M.: leanTAP Revisited. Journal of Logic and Computation 8(1) (1998) 33–47
17. Genitrini, A., Kozik, J., Zaionc, M.: Intuitionistic vs. Classical Tautologies, Quantitative
    Comparison. In Miculan, M., Scagnetto, I., Honsell, F., eds.: Types for Proofs and Programs,
    International Conference, TYPES 2007, Cividale del Friuli, Italy, May 2-5, 2007, Revised
    Selected Papers. Volume 4941 of Lecture Notes in Computer Science., Springer (2007) 100–
    109
18. Rémy, J.L.: Un procédé itératif de dénombrement d’arbres binaires et son application à
    leur génération aléatoire. RAIRO - Theoretical Informatics and Applications - Informatique
    Théorique et Applications 19(2) (1985) 179–195
19. Knuth, D.E.: The Art of Computer Programming, Volume 4, Fascicle 4: Generating All
    Trees–History of Combinatorial Generation (Art of Computer Programming). Addison-
    Wesley Professional (2006)
20. Stam, A.: Generation of a random partition of a finite set by an urn model. Journal of
    Combinatorial Theory, Series A 35(2) (1983) 231 – 240
21. Tarau, P.: Declarative Algorithms for Generation, Counting and Random Sampling of Term
    Algebras. In: Proceedings of SAC’18, ACM Symposium on Applied Computing, PL track,
    Pau, France, ACM (April 2018)
22. Herbelin, H.: A Lambda-Calculus Structure Isomorphic to Gentzen-Style Sequent Calculus
    Structure. In: Selected Papers from the 8th International Workshop on Computer Science
    Logic. CSL ’94, London, UK, UK, Springer-Verlag (1995) 61–75
23. Ben-Yelles, C.B.: Type assignment in the lambda-calculus: Syntax and semantics. PhD
    thesis, University College of Swansea (1979)
24. Hindley, J.R.: Basic Simple Type Theory. Cambridge University Press, New York, NY, USA
    (1997)
25. Gabbay, D.M.: N-prolog: An extension of prolog with hypothetical implication. ii. logical
    foundations, and negation as failure. The Journal of Logic Programming 2(4) (1985) 251–
    283
26. Miller, D., Nadathur, G.: Programming with Higher-Order Logic. Cambridge University
    Press, New York, NY, USA (2012)
27. Tarau, P., Dahl, V., Fall, A.: Backtrackable State with Linear Affine Implication and As-
    sumption Grammars. In Jaffar, J., Yap, R.H., eds.: Concurrency and Parallelism, Program-
    ming, Networking, and Security. Lecture Notes in Computer Science 1179, Berlin Heidel-
    berg, Springer (December 1996) 53–64
28. Gabbay, D., Olivetti, N.: Goal-oriented deductions. In: Handbook of Philosophical Logic.
    Springer (2002) 199–285

</pre>