Connections: Markov Decision Processes for Classical, Intuitionistic, and Modal Connection Calculi Fredrik Rømming1 , Jens Otten2 and Sean B. Holden1 1 University of Cambridge, Department of Computer Science and Technology, The Computer Laboratory, William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK. 2 University of Oslo, Gaustadalléen 23B, 0373 Oslo, Norway Abstract This paper introduces a framework for integrating Reinforcement Learning (RL) with proof search in connection calculi for classical, intuitionistic, and modal logic. We specify a mapping from the relevant connection calculi to Markov Decision Processes (MDPs), and provide a Python library implementing such MDPs. Keywords Reinforcement Learning, Automated Reasoning, Connection Calculus, Intuitionistic Logic, Modal Logic 1. Introduction Automated Theorem Proving (ATP) is concerned with determining whether a given formula is valid in a specific logic. Complementary to classical logic, intuitionistic logic is used within interactive proof assistants such as NuPRL [1] and Coq [2], while modal logics [3, 4] have appli- cations in planning, natural language processing and program verification. The time complexity of ATP in these non-classical logics is higher than in classical logic (PSPACE-complete [5, 6] compared to NP-complete [7] for the propositional fragment). So far only a few ATP systems for these first-order non-classical logics exist, even though applications would benefit from more efficient ATP systems. One approach to dealing with these non-classical logics is to encode their Kripke semantics with prefixes [8, 9]. Two powerful ATP systems—ileanCoP [10] and MleanCoP [11] for intuitionistic and modal logic, respectively—use prefixes and are based on (clausal) connection calculi for non-classical logics [12, 13, 14]. Combining ATP and Machine Learning (ML) can enhance existing ATP systems (or theorem provers) [15, 16, 17, 18]. Using ML to guide the proof search clearly has the potential to lead to more efficient ATP systems, while preserving their ability to provide formal proofs. ML can be used in fully automated ATP systems for premise selection, strategy choice [19] and inference choice. Whereas the first two approaches use ML in a pre-processing step, in the third approach ML is tightly integrated into the proof search. AReCCa 2023: Automated Reasoning with Connection Calculi, 18 September 2023, Prague, Czech Republic Envelope-Open fr409@cam.ac.uk (F. Rømming); jeotten@ifi.uio.no (J. Otten); sbh11@cl.cam.ac.uk (S. B. Holden) GLOBE http://www.cl.cam.ac.uk/~fr409 (F. Rømming); http://jens-otten.de/ (J. Otten); http://www.cl.cam.ac.uk/~sbh11 (S. B. Holden) Orcid 0000-0001-7545-4662 (F. Rømming); 0000-0002-4331-8698 (J. Otten); 0000-0001-7979-1148 (S. B. Holden) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) AReCCa 2023 107 CEUR-WS.org CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings This paper introduces a framework for integrating Reinforcement Learning (RL) with proof search in connection calculi for classical, intuitionistic, and modal first-order logic. There are two main contributions: 1. The discussion and definition of a mapping from classical and non-classical connection calculi to Markov Decision Processes (MDPs). 2. A Python library implementing such MDPs providing seamless integration with the ML ecosystem facilitating RL experiments for in-prover guidance. We introduce the connection calculi and ML techniques in Section 2. Section 3 specifies a mapping from proof search to the RL setting. Section 4 describes the implementation of the library. The paper concludes with a summary and a plan for future work in Section 5. 2. Preliminaries 2.1. Classical, Intuitionistic and Modal Connection Calculi The following methods are based on uniform clausal connection calculi for classical, intuitionistic and modal logic. We provide a short overview of these calculi; more details can be found in [20, 12, 21, 11]. An atomic formula (denoted by 𝐴) is built up from predicate symbols (denoted by 𝑃, 𝑄, 𝑅), function symbols and term variables (𝑥, 𝑦). A (first-order) formula (denoted by 𝐹) is built up from atomic formulae, the connectives ¬, ∧, ∨, ⇒, and the first-order quantifiers ∀ and ∃. A modal formula might also include the modal operators □ and ◇. A literal 𝐿 has the form 𝐴 or ¬𝐴. In the (clausal) connection calculus a formula is represented as a matrix, which is a representation of the formula in a (prefixed) clausal form. 2.1.1. Classical Logic The classical matrix 𝑀(𝐹 ) of a formula 𝐹 is its representation as a set of clauses, where each clause is a set of literals. It is the representation of 𝐹 in disjunctive normal form, where Skolemization of the eigenvariables [22] is done in the usual way. In the graphical representation of a matrix, its clauses are arranged horizontally, while the literals of each clause are arranged vertically. In contrast to sequent calculi [22] and (standard) tableau calculi [23], the connection calculus uses a connection-driven search to find a proof for the validity of a matrix 𝑀(𝐹 ) for a given formula 𝐹. A connection is a set {𝐴1 , ¬𝐴2 } of literals with the same predicate symbol but different polarities. A term substitution 𝜎𝑇 assigns terms to variables. A connection is 𝜎𝑇 -complementary iff 𝜎𝑇 (𝐴1 ) = 𝜎𝑇 (𝐴2 ). The axiom and the three rules of the (clausal) connection calculus are given in Figure 1 (the prefixes ∶𝑝1 and ∶𝑝2 are to be ignored for classical logic) [20]. The words/language of the calculus are tuples “𝐶, 𝑀, Path”, where 𝑀 is a matrix, 𝐶 is a (subgoal) clause or 𝜀 and (the active) Path is a set of literals or 𝜀. A copy of a clause 𝐶 is made by renaming all variables in 𝐶. The rigid term substitution 𝜎 = 𝜎𝑇 is calculated by using one of the well-known term unification algorithms whenever a connection is identified. A connection proof of 𝑀 is a proof of 𝜀, 𝑀, 𝜀 in the connection calculus with a substitution 𝜎 = 𝜎𝑇 . 108 𝐶2 , 𝑀, {} and 𝐶2 is copy of 𝐶1 ∈𝑀 {}, 𝑀, 𝑃𝑎𝑡ℎ 𝜀, 𝑀, 𝜀 Axiom (A) Start (S) 𝐶, 𝑀, 𝑃𝑎𝑡ℎ∪{𝐿2 ∶𝑝2 } and {𝐿1 ∶𝑝1 , 𝐿2 ∶𝑝2 } is 𝜎-complementary 𝐶∪{𝐿1 ∶𝑝1 }, 𝑀, 𝑃𝑎𝑡ℎ∪{𝐿2 ∶𝑝2 } Reduction (R) 𝐶2 ⧵{𝐿2 ∶𝑝2 }, 𝑀, 𝑃𝑎𝑡ℎ∪{𝐿1 ∶𝑝1 } 𝐶, 𝑀, 𝑃𝑎𝑡ℎ and 𝐶2 is a copy of 𝐶1 ∈𝑀, 𝐿2 ∶𝑝2 ∈𝐶2 , 𝐶∪{𝐿1 ∶𝑝1 }, 𝑀, 𝑃𝑎𝑡ℎ {𝐿1 ∶𝑝1 , 𝐿2 ∶𝑝2 } is 𝜎-complementary Extension (E) Figure 1: The (clausal) connection calculus for classical, intuitionistic and modal logic 2.1.2. Non-Classical Logics For intuitionistic and modal logic, the matrix and the calculus are extended by prefixes, rep- resenting world paths in the Kripke semantics; see [24, 9, 8]. A prefix 𝑝 is a string consisting of variables (denoted by 𝑈 , 𝑉 , 𝑊) and constants (denoted by 𝑎, 𝑏) and assigned to each literal. Skolemization is not only used for the (first-order) eigenvariables, but extended to prefix con- stants [12]. Using the occurs-check during unification ensures that the reduction ordering [8] is acyclic. The intuitionistic and modal matrix 𝑀(𝐹 ) of a formula 𝐹 is a representation of 𝐹 in standard clausal form, in which each literal 𝐿 is marked with its prefix 𝑝 ; see [12, 13, 14]. A prefix substitution 𝜎𝑃 assigns strings to prefix variables and is calculated by a prefix unifica- tion that depends on the specific non-classical logic [12, 13, 11]. In intuitionistic and modal logic, a connection {𝐿1 ∶𝑝1 , 𝐿2 ∶𝑝2 } is 𝜎-complementary iff both its literals and prefixes can be unified under a combined substitution 𝜎 = (𝜎𝑇 , 𝜎𝑃 ); that is, additionally 𝜎𝑃 (𝑝1 ) = 𝜎𝑃 (𝑝2 ) must hold. An intuitionistic/modal connection proof of 𝑀 is a proof of 𝜀, 𝑀, 𝜀 in the connection calculus of Figure 1 with the combined (and admissible [8]) substitution 𝜎 = (𝜎𝑇 , 𝜎𝑃 ) [8, 24]. Example 1. Consider the formula 𝐹1 ∶= ( (𝑃 ∨ ∀𝑥 ¬ (𝑄 𝑥 ⇒ 𝑄 𝑐)) ∧ 𝑅 ) ⇒ (𝑃 ∧ 𝑅) . It has the following intuitionistic (prefixed) matrix 𝑀 ∶= 𝑀(𝐹1 ). 𝑀(𝐹1 ) = { {𝑃 0 ∶𝑈 𝑎2∗ , 𝑅0 ∶𝑈 𝑏2∗ }, {𝑃 1 ∶𝑈 𝑉1 , 𝑄 0 𝑥∶𝑈 𝑉 𝑊 𝑎1∗ 𝑏1∗ }, {𝑃 1 ∶𝑈 𝑉2 , 𝑄 1 𝑐∶𝑈 𝑉 𝑊 𝑎1∗ 𝑉3 }, {𝑅1 ∶𝑈 𝑉4 } } where 𝑎1∗ ∶= 𝑎1 (𝑈 , 𝑉 , 𝑥, 𝑊 ), 𝑏1∗ ∶= 𝑏1 (𝑈 , 𝑉 , 𝑥, 𝑊 ), 𝑎2∗ ∶= 𝑎2 (𝑈 ), 𝑏2∗ ∶= 𝑏2 (𝑈 ).1 𝑀 has the following graphical representation and graphical connection proof with the term substitution 𝜎𝑇 (𝑥) = 𝑐 and the prefix substitution 𝜎𝑃 (𝑉1 ) = 𝑎2∗ , 𝜎𝑃 (𝑉2 ) = 𝑎2∗ , 𝜎𝑃 (𝑉3 ) = 𝑏1∗ , 𝜎𝑃 (𝑉4 ) = 𝑏2∗ (literals of each connection are connected with a line). 𝑃 0 ∶𝑈 𝑎2∗ 𝑃 1 ∶𝑈 𝑉1 𝑃 1 ∶𝑈 𝑉2 [ [ 0 ∗] [ 0 ∗ ∗] [ 1 ] [𝑅1 ∶𝑈 𝑉4 ] ] 𝑅 ∶𝑈 𝑏2 𝑄 𝑥∶𝑈 𝑉 𝑊 𝑎1 𝑏1 𝑄 𝑐∶𝑈 𝑉 𝑊 𝑎1∗ 𝑉3 The formal connection proof of 𝑀 (where prefixes have been omitted for better readability) is shown in Figure 2. 1 The polarities 0 and 1 are used to mark non-negated and negated literals, respectively (see [23, 8]). 109 { }, 𝑀, {𝑃 0 , 𝑄 0 𝑥 ′ } A {𝑃 1 }, 𝑀, {𝑃 0 , 𝑄 0 𝑥 ′ } { }, 𝑀, {𝑃 0 } { }, 𝑀, {𝑅0 } { }, 𝑀, { } R A A A {𝑄 0 𝑥 ′ }, 𝑀, {𝑃 0 } {𝑅0 }, 𝑀, { } E E {𝑃 0 , 𝑅0 }, 𝑀, { } E 𝜀, {{𝑃 0 , 𝑅0 }, {𝑃 1 , 𝑄 0 𝑥}, {𝑃 1 , 𝑄 1 𝑐}, {𝑅1 }}, 𝜀 S Figure 2: Formal connection proof of the matrix 𝑀 = 𝑀(𝐹1 ) 2.2. MDPs and Reinforcement Learning We now provide an introduction to MDPs and RL—details can be found in [25]. Most proof procedures are search algorithms: there is an initial state, states can be modified by actions, and the goal is to find a proof state. The use of heuristics is crucial for performance. For example, in the case of saturation provers, for choosing a pair of clauses to resolve. One might imagine an agent, armed with a heuristic, acting to change the initial state of its environment into a state representing a proof. An MDP represents a more general formulation of this kind of problem. Let 𝑆 denote the set of states, and let 𝐴 denote the set of actions. When an agent performs action 𝑎 ∈ 𝐴 in state 𝑠 ∈ 𝑆, the environment moves to a new state 𝑠 ′ ∈ 𝑆 with probability 𝒮 (𝑆 ′ |𝑠, 𝑎); that is, 𝑠 ′ ∼ 𝒮 (𝑆 ′ |𝑠, 𝑎). At the same time, the agent receives a reward ℛ(𝑠 ′ ; 𝑠, 𝑎) ∈ ℝ. The tuple (𝑆, 𝐴, 𝒮 , ℛ) defines the MDP. Let subscripts 𝑡 denote the sequence of states, actions and rewards through time, and imagine the agent has a policy 𝜋 ∶ 𝑆 → 𝐴 telling it which action to employ in any given state; that is, at time 𝑡 the agent always applies 𝑎𝑡 = 𝜋(𝑠𝑡 ).2 Then, starting from a state 𝑠0 , the agent will move through states 𝑠0 → 𝑠1 ∼ 𝒮 (𝑆1 |𝑠0 , 𝜋(𝑠0 )) → 𝑠2 ∼ 𝒮 (𝑆2 |𝑠1 , 𝜋(𝑠1 )) → ⋯ and receive a sequence of rewards 𝑟0 = ℛ(𝑠1 ; 𝑠0 , 𝜋(𝑠0 )) → 𝑟1 ∼ ℛ(𝑠2 ; 𝑠1 , 𝜋(𝑠1 )) → ⋯ . A utility function 𝑈 𝜋 (𝑠) computes the overall accumulated reward associated with the use of 𝜋, starting from state 𝑠. As future rewards are often perceived as less valuable than short-term rewards, a common function is 𝑈 𝜋 (𝑠) = 𝔼 [𝑟0 + 𝜖𝑟1 + 𝜖 2 𝑟2 + ⋯] = 𝔼 [∑ 𝜖 𝑡 𝑟𝑡 ] ∞ 𝑡=0 where the expected value is with respect to the randomness governing the state transitions, and 𝜖 ∈ [0, 1] sets the trade-off between short-term and long-term rewards. An optimal policy 𝜋 ⋆ is one satisfying 𝜋 ⋆ (𝑠) = argmax𝜋 𝑈 𝜋 (𝑠), and which leads to utility 𝑈 𝜋 (𝑠). ⋆ 2 In general, policies may also be stochastic, i.e., distributions over actions given state. This is particularly useful for exploration during the learning process. 110 Both the optimal policy and its corresponding utility can be expressed by considering what happens if we take particular actions from the current state and follow the optimal policy thereafter 𝑈 𝜋 (𝑠) = max ∑ 𝒮 (𝑠 ′ |𝑠, 𝑎) (ℛ(𝑠 ′ ; 𝑠, 𝑎) + 𝜖𝑈 𝜋 (𝑠 ′ )) ⋆ ⋆ 𝑎 𝑠′ 𝜋 (𝑠) = argmax ∑ 𝒮 (𝑠 ′ |𝑠, 𝑎) (ℛ(𝑠 ′ ; 𝑠, 𝑎) + 𝜖𝑈 𝜋 (𝑠 ′ )) . ⋆ ⋆ 𝑎 𝑠′ Numerous algorithms exist for inferring an optimal policy for an MDP, depending on what is known about the MDP. If little is known, we must learn about the environment by exploring actions and their effects, and this is what RL achieves. 3. Connection Calculi as Markov Decision Processes As described in the previous section, RL concerns agents interacting with environments. In the case of proof procedures, one can consider an agent deciding which choice to make at each point in the proof search. Hence, to apply RL we need to define the proof search environment and its choice points. We now address the description of proof search procedures in connection calculi using MDPs. While the reader familiar with the connection calculus might be tempted to see the relationship between proof and MDP as straightforward, it is in fact more subtle than is apparent at first glance, and requires some care to define correctly. While the connection calculi define procedures whereby sequential decisions are made to find proofs, they do not directly define MDPs. For confluent proof calculi such as resolution, one can treat the words of the calculus as observations and inference rules as actions, since all information about the state of the proof is carried in the most recently generated word. However, this cannot be done with connection calculi, because it is unclear how to handle branching and backtracking. The words of the connection calculi do not carry enough information to know whether the current state is a goal state (a proof), or to know what inferences have or have not been attempted. Allowing dead-end states that are not proofs would be unfaithful to the underlying dynamics (provers run until they have found a proof), so backtracking should be incorporated as part of the MDP. We now specify the state space, action space, transition space, and reward function tuple (𝑆, 𝐴, 𝒮 , ℛ) for CC-MDP, one possible MDP representation of proof search in connection calculi. Definition 1 (CC-MDP State Space). The state space 𝑆 is defined as the set of all possible deriva- tions (a derivation is an incomplete proof in which some leaves are not closed by axioms) in the connection calculus together with the substitution 𝜎 = 𝜎𝑇 . Further, to keep track of the open backtracking choices, each node in the connection derivation is marked with the possible inference steps that have not been attempted so far. In general, there might be exponentially many unifiers for one prefix equation (of one connection) [12, 14, 26]. Therefore, the rigid prefix substitution 𝜎𝑃 (for the non-classical logics) is calculated only after a classical proof has been found. This has turned out to be the most efficient approach [12]. 𝜎𝑃 is therefore not part of the states in 𝑆 (see also the description of the implementation in Section 4). 111 𝜎𝑇 = {𝑥 ′ \𝑐} {¬𝑃}, 𝑀, {𝑃, 𝑄𝑥 } {}, 𝑀, {𝑃} 𝐼3 ′ {𝑄𝑥 } , 𝑀, {𝑃} {𝑅}, 𝑀, {} E 𝐼2 𝐼3 ∶ {} ′ Non-attempted: {𝑃, 𝑅}, 𝑀, {} E 𝐼1 𝐼2 ∶ {(E,{¬𝑃, ¬𝑄𝑐})} 𝜀, {{𝑃, 𝑅}, {¬𝑃, 𝑄𝑥}, {¬𝑃, ¬𝑄𝑐}, {¬𝑅}}, 𝜀 S 𝐼1 ∶ {} Figure 3: A CC-MDP state for the matrix 𝑀 Example 2. Consider the formula ( (𝑃 ∨ ∀𝑥 ¬(𝑄𝑥 ⇒ 𝑄𝑐)) ∧ 𝑅 ) ⇒ (𝑃 ∧ 𝑅) from Example 1 and its classical matrix 𝑀 = {{𝑃, 𝑅}, {¬𝑃, 𝑄𝑥}, {¬𝑃, ¬𝑄𝑐}, {¬𝑅}}. Figure 3 shows a (possible) state 𝑠 ∈ 𝑆 in the proof search of 𝑀. 𝑠 includes a derivation of 𝑀 together with the substitution 𝜎𝑇 , and for each node a list of non-attempted inference steps. Definition 2 (CC-MDP Action Space). The action space 𝐴 consists of rule application actions 𝐴inferences and a backtracking action 𝑎𝑏𝑎𝑐𝑘𝑡𝑟𝑎𝑐𝑘 . There is a rule application action for each rule in the connection calculus. Hence, a rule application action 𝑎𝑟,𝑥 ∈ 𝐴inferences is specified by the rule name 𝑟 ∈ {𝑆, 𝑅, 𝐸} (for Start, Reduction or Extension) and the associated clause and/or literal 𝑥. Specifically, an action 𝑎𝑟,𝑥 can have one of the following forms: 𝑎𝑆,𝐶2 for the Start rule, 𝑎𝑅,𝐿2 for the Reduction rule and 𝑎𝐸,𝐶2 /𝐿2 for the Extension rule. Example 3. To get from the initial state 𝜀, 𝑀, 𝜀 to the state in Figure 3 one can take the actions: 𝑎𝑆,{𝑃,𝑅} , 𝑎𝐸,{¬𝑃,𝑄𝑥 ′ }/¬𝑃 , 𝑎𝐸,{¬𝑄𝑐,¬𝑃}/¬𝑄𝑐 . These actions are a start step with the clause {𝑃, 𝑅}, followed by two extension steps connecting the leftmost literal3 in the leftmost open subgoals of the proof tree to the literal 𝑃 in the clause {¬𝑃, 𝑄𝑥 ′ } and the literal ¬𝑄𝑐 in the clause {¬𝑄𝑐, ¬𝑃}. We say that action 𝑎 is valid in state 𝑠 if action 𝑎 = 𝑎backtrack or 𝑎 is a rule application action 𝑎𝑟,𝑥 ∈ 𝐴inferences denoting a valid rule application to the leftmost literal in the leftmost open subgoal in the proof tree of 𝑠. A valid rule application takes the rigid term substitution 𝜎𝑇 into account, so 𝜎𝑇 (𝐿1 ) = 𝜎𝑇 (𝐿2 ). As for the states in the state space 𝑆, the rigid prefix substitution 𝜎𝑃 is not taken into account (and updated) for the non-classical logics. To handle proper backtracking, when a rule application action 𝑎 is taken from state 𝑠 to 𝑠 ′ , 𝑎 is no longer counted as a non-attempted inference for the node in the tableau of 𝑠 ′ corresponding to the principal node [21] of 𝑎. The special action 𝑎backtrack backtracks the state’s derivation from the leftmost literal in the leftmost subgoal to the previous choice point, which still has non- attempted inferences. This is one of many ways to model backtracking. In particular, this method ensures that the complete (leanCoP) policy: “always choose the first non-attempted inference, otherwise backtrack” can be expressed easily. In general an MDP models state transitions stochastically—performing action 𝑎 in state 𝑠 leads to a new state 𝑠 ′ ∼ 𝒮 (𝑆 ′ |𝑠, 𝑎). In a connection prover the transition is deterministic, in the sense that performing action 𝑎 in state 𝑠 leads reliably to a single outcome state 𝑠 ′ . This gives rise to the following deterministic state transition function. 3 All clauses (including subgoal clauses) are treated as ordered sets of literals. 112 Definition 3 (CC-MDP Transition Function). The transition distribution 𝒮 is defined as ⎧1 if 𝑎 is valid in 𝑠 and 𝑠 ′ is the (deterministic) result 𝒮 (𝑆 ′ = 𝑠 ′ |𝑠, 𝑎) = of applying 𝑎 to 𝑠 ⎨ ⎩0 otherwise. Notice that the transition function is necessarily deterministic, as any probabilistic transition function would not accurately describe the dynamics of the underlying system. We only consider the first literal of the leftmost open subgoal for rule application. This is because all subgoals need to be closed, so we do not consider alternative subgoals and literals within subgoals as choice points for the MDP. Including these as choice points could be interesting, but it would also increase the size of the action space and general complexity introducing a trade-off. Definition 4 (CC-MDP Reward Function). To remain faithful to the underlying problem, we consider the following relatively sparse reward function 1 if 𝑠 ′ is a proof ℛ(𝑠 ′ ; 𝑠, 𝑎) = { 0 otherwise where 𝑠 ′ is a proof iff the derivation of 𝑠 ′ is a proof under the unique (combined) substitution 𝜎 = 𝜎𝑇 (for classical logic) or 𝑠𝑖𝑔𝑚𝑎 = (𝜎𝑇 , 𝜎𝑃 ) of 𝑠 ′ . This reward function is the simplest function accurately describing the goal while preserving optimality of solutions. Example 4. Figure 4 shows the graph representation of a part of CC-MDP. ε, {{P, R}, {¬P, Qx}, {¬P, ¬Qc}, {¬R}}, ε aS,{P,R} {P, R}, M, {} S ε, {{P, R}, {¬P, Qx}, {¬P, ¬Qc}, {¬R}}, ε aE,{¬P,Qx}/¬P aE,{¬P,¬Qc}/¬Qc {Qx′ } , M, {P } {R}, M, {} {¬Qc} , M, {P } {R}, M, {} E E {P, R}, M, {} {P, R}, M, {} S S ε, {{P, R}, {¬P, Qx}, {¬P, ¬Qc}, {¬R}}, ε ε, {{P, R}, {¬P, Qx}, {¬P, ¬Qc}, {¬R}}, ε aE,{¬P,¬Qc}/¬Qc {¬P }, M, {P, Qx′ } {}, M, {P } E {Qx′ } , M, {P } {R}, M, {} E {P, R}, M, {} S ε, {{P, R}, {¬P, Qx}, {¬P, ¬Qc}, {¬R}}, ε Figure 4: Part of the CC-MDP graph (omitting 𝜎s and non-attempted inferences) 113 Proof search in the classical and non-classical connection calculi can be framed as an agent interacting with the CC-MDP environment, giving the necessary theoretical framework for applying RL to the proof search in these connection calculi. Furthermore, techniques such as positive start clauses and iterative deepening can be accounted for with minor tweaks to the components of CC-MDP. 4. Implementation 4.1. Connection Calculi as MDPs in Python Connections [27] is a Python library of connection calculi implemented as MDPs, providing environments for proof search in connection calculi. It provides OpenAI Gym/Gymnasium- like [28] environments for proof search in connection calculi for classical, intuitionistic, and modal first-order logic. It currently supports the modal logics D, T, S4, and S5, each for the constant, cumulative, and varying domains. Connections implements the basic calculi for classical, intuitionistic, and modal logic as described above, enhanced by regularity [29]. The observation and action spaces are as described in Section 3, treating literals and (first-order) terms as objects associated with locations in a matrix (represented as a list of lists of literals) and as a tableau-like proof tree. For intuitionistic and modal logic, literals and terms have an extra field for their prefixes represented as (first-order) terms. Connections is implemented natively in Python with no dependencies. As Python is the de-facto language for ML, the library provides an accessible and reproducible way to conduct RL experiments with provers based on connection calculi. Using standardized frameworks increases confidence in the correctness of the implementation, and the environments can easily be incorporated into the rest of the ML ecosystem alongside frameworks such as RLlib [30], Stable Baselines [31], PyTorch [32], Tensorflow [33] and others. Figure 5 gives an overview of Connections and how it fits into the RL setting. Connections action Connection Environments state ConnectionEnv Prover agent reward ConnectionAction ConnectionState Unification Term unification Prefix unification (D, T, S4, S5, Intuitionistic) Logical primitives Matrix Literal Term Figure 5: Overview of the Connections Python library and its main modules 114 Compared to conducting learning experiments with external calls to Prolog and OCaml imple- mentations of leanCoP, using the Connections environments drastically reduces the complexity needed to control the prover. This is due to eliminating the need for remote procedure calls, and to the imperative basis of the environments, which allow fine-grained control while respecting abstraction levels. Furthmore, using an imperative language to implement Connections allows a more direct control of the proof search, e.g. of backtracking, than is possible in the declarative Prolog programming language (see also [34]). The non-classical Connections environments inherit from the classical environment, adding logic-specific prefixes and prefix unification algorithms. The non-classical provers based on Connections perform a classical proof search, in which the prefixes of the literals in each connection are collected. After a classical proof is found, these prefixes are unified by a prefix unification algorithm to ensure that the classical proof is also a valid non-classical proof. Besides the basic calculi, the Connections environments implement two additional well- known optimizations, significantly reducing the underlying search space while preserving completeness. The start clause 𝐶1 /𝐶2 of the start rule is restricted to positive start clauses (those without negated literals, which likely represent conjecture clauses in the disjunctive normal form) and the regularity condition [29] is employed. The translation into a (prefixed) matrix is done in a pre-processing step. If the problem contains explicit axiom and conjecture formulas, the standard/naive translation into clausal form is performed for the axiom formulas, while a definitional translation [21] is performed for the conjecture formula. This approach has shown the best performance [21]. The Connections environments are not provers by themselves; they expose an interface that agents can use to train and make inferences, completing the RL agent-environment interaction loop, as shown in Figure 5. A prover in this context is an agent making consecutive steps in a Connections environment until it has found proof or timed out. 4.2. Python Connection Provers for Classic and Non-classical Logics The Connections environments can be used to build both learning and non-learning connection provers, depending on the agent used. For example, by using non-learning agents that always choose the first available action, we obtain standard Python connection provers in an elegant and straightforward way—the provers emerge from the interaction between the “always-first” agent and a Connections environment. The complete Python code implementing such a prover is shown in Figure 6. env = ConnectionEnv("problem_path") observation, info = env.reset() while True: action = env.action_space[0] # Always choose first available action observation, reward, terminated, truncated, info = env.step(action) if terminated or truncated: break Figure 6: Python code for the pyCoP prover based on Connections 115 Depending on the environment used (ConnectionEnv, IConnectionEnv or MConnectionEnv) this results in three (stand-alone) Python provers [27] for classical, intuitionistic and modal logics, called pyCoP, ipyCoP and mpyCoP respectively. These are based on the same connection calculi as the leanCoP family of theorem provers implemented in Prolog [20, 21, 12, 10, 13, 11]. By design, the pyCoP provers mimic the classical proof steps of leanCoP 1.0, using the positive start clause technique and iterative deepening on the size of the active path. However, the pyCoP provers do not reorder clauses during proof search, and integrate an enhanced regularity check [29]. This corresponds to version 1.0f of leanCoP [27]. While the main purpose of the Connections environments is to facilitate easy implementation of learned provers for classical, intuitionistic, and modal logic using the MDP + agent view of connection proof search, the pyCoP provers highlight the general (learning and non-learning) capabilities of the Connections environments and give confidence in the correctness of their implementation by showing that they can be used to emulate leanCoP, ileanCoP, and MleanCoP. 5. Conclusion We present a Python library providing a framework for ML in connection calculi for classical and non-classical logics, and with specific emphasis on facilitating experiments using RL to guide proof search. Aside from its ML-centric component, this also represents the first non-Prolog implementation of provers based on the clausal non-classical connection calculi, and using prefix unification to capture the Kripke semantics of intuitionistic and modal first-order logics. We are at present using this library to experiment with RL methods in an attempt to improve the performance of the unmodified provers, and we hope that the library inspires and facilitates others to explore their own ideas within this space. In future work, we intend to extend the library to allow us more fully to address restricted backtracking, refutation techniques, and to include both further modal logics, and non-clausal methods such as those of nanoCoP [35, 36]. References [1] R. L. Constable, et al., Implementing Mathematics with the NuPRL proof development system, Prentice–Hall, Englewood Cliffs, NJ, 1986. [2] Y. Bertot, P. Castéran, Interactive Theorem Proving and Program Development Coq’Art: The Calculus of Inductive Constructions, EATCS Series, Springer, Heidelberg, 2004. [3] P. Blackburn, J. van Bentham, F. Wolter, Handbook of Modal Logic, Elsevier, Amsterdam, 2006. [4] M. Fitting, R. L. Mendelsohn, First-Order Modal Logic, Kluwer, Dordrecht, 1998. [5] R. E. Ladner, The computational complexity of provability in systems of modal proposi- tional logic, SIAM Journal on Computing 6 (1977) 467–480. [6] R. Statman, Intuitionistic propositional logic is polynomial-space complete, Theoretical Computer Science 9 (1979) 67–72. [7] S. A. Cook, The complexity of theorem-proving procedures, in: Third Annual ACM Symposium on Theory of Computing, ACM, New York, 1971, pp. 151–158. 116 [8] L. A. Wallen, Automated Deduction in Non-Classical Logics, MIT Press, Cambridge, 1990. [9] A. Waaler, Connections in nonclassical logics, in: A. Robinson, A. Voronkov (Eds.), Handbook of Automated Reasoning, Elsevier Science, Amsterdam, 2001, pp. 1487–1578. [10] J. Otten, leanCoP 2.0 and ileanCoP 1.2: High performance lean theorem proving in classical and intuitionistic logic, in: A. Armando, P. Baumgartner, G. Dowek (Eds.), Automated Reasoning (IJCAR 2008), volume 5195 of Lecture Notes in Artificial Intelligence, Springer, Heidelberg, 2008, pp. 283–291. [11] J. Otten, MleanCoP: A connection prover for first-order modal logic, in: S. Demri, D. Kapur, C. Weidenbach (Eds.), Automated Reasoning (IJCAR 2014), volume 8562 of Lecture Notes in Artificial Intelligence, Springer, Heidelberg, 2014, pp. 269–276. [12] J. Otten, Clausal connection-based theorem proving in intuitionistic first-order logic, in: TABLEAUX 2005, volume 3702 of Lecture Notes in Artificial Intelligence, Springer, Heidelberg, 2005, pp. 245–261. [13] J. Otten, Implementing connection calculi for first-order modal logics, in: E. Ternovska, K. Korovin, S. Schulz (Eds.), 9th International Workshop on the Implementation of Logics (IWIL 2012), volume 22 of EPIC, EasyChair, 2012, pp. 18–32. [14] J. Otten, W. Bibel, Advances in connection-based automated theorem proving, in: M. Hinchey, J. P. Bowen, E.-R. Olderog (Eds.), Provably Correct Systems, NASA Mono- graphs in Systems and Software Engineering, Springer, Cham, 2017, pp. 211–241. [15] J. Urban, J. Vyskočil, P. Štěpánek, MaLeCoP Machine Learning Connection Prover, in: K. Brünnler, G. Metcalfe (Eds.), TABLEAUX 2011, Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 2011, pp. 263–277. [16] G. Irving, C. Szegedy, A. A. Alemi, N. Een, F. Chollet, J. Urban, DeepMath - Deep Sequence Models for Premise Selection, in: Advances in Neural Information Processing Systems, volume 29, Curran Associates, Inc., 2016. [17] C. Kaliszyk, J. Urban, H. Michalewski, M. Olšák, Reinforcement Learning of Theorem Proving, in: Advances in Neural Information Processing Systems, volume 31, Curran Associates, Inc., 2018. [18] Z. Zombori, J. Urban, C. E. Brown, Prolog Technology Reinforcement Learning Prover, in: N. Peltier, V. Sofronie-Stokkermans (Eds.), Automated Reasoning (IJCAR 2020), Lecture Notes in Computer Science, Springer International Publishing, Cham, 2020, pp. 489–507. [19] C. Mangla, S. B. Holden, L. Paulson, Bayesian ranking for strategy scheduling in automated theorem provers, in: J. Blanchette, L. Kovács, D. Pattinson (Eds.), Automated Reasoning (IJCAR 2022), volume 13385 of Lecture Notes in Artificial Intelligence, Springer, 2022, pp. 559–577. 19 pages. [20] J. Otten, W. Bibel, leanCoP: lean connection-based theorem proving, Journal of Symbolic Computation 36 (2003) 139–161. [21] J. Otten, Restricting backtracking in connection calculi, AI Commun. 23 (2010) 159–182. [22] G. Gentzen, Untersuchungen über das Logische Schließen, Mathematische Zeitschrift 39 (1935) 176–210, 405–431. [23] R. M. Smullyan, First-Order Logic, Ergebnisse der Mathematik und ihrer Grenzgebiete, Springer-Verlag, Berlin, Heidelberg, New York, 1968. [24] J. Otten, Non-clausal connection calculi for non-classical logics, in: R. Schmidt, C. Nalon (Eds.), TABLEAUX 2017, volume 10501 of LNAI, Springer, Cham, 2017, pp. 209–227. 117 [25] R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, 2nd edition ed., MIT Press, 2018. [26] J. Otten, Advancing automated theorem proving for the modal logics D and S5, in: C. Benzmüller, J. Otten (Eds.), Automated Reasoning in Quantified Non-Classical Logics (ARQNL 2022), CEUR Workshop Proceedings, 2022, pp. 81–91. [27] F. Rømming, Connections, 2023. URL: https://github.com/fredrrom/connections. [28] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, W. Zaremba, OpenAI Gym, 2016. ArXiv:1606.01540 [cs]. [29] R. Letz, G. Stenz, Model elimination and connection tableau procedures, in: Handbook of Automated Reasoning, Elsevier Science Publishers, Amsterdam, 2001, pp. 2015–2112. [30] E. Liang, R. Liaw, R. Nishihara, P. Moritz, R. Fox, K. Goldberg, J. Gonzalez, M. Jordan, I. Stoica, RLlib: Abstractions for Distributed Reinforcement Learning, in: Proceedings of the 35th International Conference on Machine Learning, PMLR, 2018, pp. 3053–3062. ISSN: 2640-3498. [31] A. Raffin, A. Hill, A. Gleave, A. Kanervisto, M. Ernestus, N. Dormann, Stable-baselines3: Reliable reinforcement learning implementations, Journal of Machine Learning Research 22 (2021) 1–8. [32] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems 32 (2019). [33] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Joze- fowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Va- sudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org. [34] S. B. Holden, Connect++: A new automated theorem prover based on the connection calculus, in: Proceedings of the Workshop on Automated Reasoning with Connection Calculi (AReCCa), 2023. [35] J. Otten, A non-clausal connection calculus, in: K. Brünnler, G. Metcalfe (Eds.), TABLEAUX 2011, volume 6793 of Lecture Notes in Artificial Intelligence, Springer, Heidelberg, 2011, pp. 226–241. [36] J. Otten, nanoCoP: A non-clausal connection prover, in: N. Olivetti, A. Tiwari (Eds.), Automated Reasoning (IJCAR 2016), volume 9706 of Lecture Notes in Artificial Intelligence, Springer, Heidelberg, 2016, pp. 302–312. 118