1. Introduction

Methods for Solving the Post Correspondence Problem and Certificate Generation

Akihiro Omori

omori.a.ab@m.titech.ac.jp 0

Yasuhiko Minamide

minamide@c.titech.ac.jp 0

Isabelle/HOL.

0 Department of Mathematical and Computing Science, Tokyo Institute of Technology , Tokyo , Japan

92 98

Post Correspondence Problem (PCP) is a well-known undecidable problem. Solving instances with solutions is straightforward with exploration algorithms, but proving infeasibility is challenging. This research introduces two methods to demonstrate infeasibility, including generating formal proofs in The Post Correspondence Problem (PCP), proposed by Post in 1946 [1], is undecidable. PCP instances use tiles with two strings on top and bottom.

1. Introduction

100 1 00 In this example, there are three kinds of tiles, each available in infinite quantities. The problem is to determine whether it is possible to arrange one or more tiles in such a way that the reading of the top and bottom strings matches. In this particular instance, a solution (indices of arrangement of tiles) is “1311322”, and this shows that both the top and bottom read “1001100100100”. 1 00

0 100

For instances that have a solution, it is possible to find the solution within finite time using an exploration algorithm. On the other hand, determining that no solution exists is challenging, and due to the undecidability of the problem, no general algorithm exists for this purpose. Previous research has proposed heuristic algorithms for finding solutions [ 2, 3 ] and Ling Zhao (2003) [ 2 ] attempted to solve all the problems in PCP[ 3,4 ] and left 3,170 problems unsolved. PCP[ 3,4 ] refers to a set of all instances where the number of tiles is 3, and the maximum length of the written strings is 4.

This research makes the following three main contributions. Japan CEUR CEUR Workshop Proceedings

ceur-ws.org ISSN1613-0073 • Propose two novel algorithms to demonstrate that a PCP instance has no solution. • Solve all problems of PCP[ 3,4 ] except for 13 problems.

• Show an example of automatic proof generation for concrete problems.

2. The First Method: String Constraint Formulation

We formulate PCP as a string constraint problem.

Example 2.1 (Example of and ℎ). Let PCP instance = ((1111, 1110), (1101, 1), (11, 1111)) . We denote the top and bottom strings on the -th tile by and ℎ , respectively. Let and ℎ be transducers as defined below. Intuitively, the transducer outputs 1 for ‘1’, 2 for ‘2’, and does not accept the empty string. The string is a solution to the PCP if and only if ( ) = ℎ( ) . 0 1/1111 2/1101 3/11 1/1111 3/11 2/1101 0

2/1 1/1110

2/1 3/1111 1/1110 3/1111 called the string constraint of instance .

Regarding the string constraint of the PCP instance , the satisfiability is undecidable. We consider ′ such that ( ) ⟹ ′( ) and is eficiently decidable. Such ′ is referred to as a relaxation problem (simply relaxation) of . By showing that ′ is unsatisfiable, we would like to show that is also unsatisfiable. For example, considering only the number of characters | ( )| = | specific words is another example of ′. Generalizing these examples, we have the following ℎ( )| is suitable as ′. Additionally, matching Parikh images or the number of proposition.

Proposition 2.3. Let be an arbitrary total integer-vector-output transducer. Consider ( ( )) ∩ ( ℎ( )) ≠ ∅ (1) This is a relaxation problem of and is decidable. We set some and hope it is infeasible.

Although details are omitted, the condition ( ( )) ∩ ( ℎ( )) ≠ ∅ can be reduced to the emptiness problem of a Parikh automaton constructed from , , ℎ, and their product. The Parikh automaton emptiness algorithm we use is largely similar to the one described in Section 3 of [ 4 ], so we omit the details. While not detailed here, our algorithm achieved significant speedup by applying two techniques to this algorithm: (1) delaying and dynamically adding constraints related to connectivity, and (2) reducing the problem to a natural form for Mixed Integer Programming and leveraging a cutting-edge MIP solver.

3. The Second Method: Transition System Formulation

Intuitively, arranging each tile one by one represents a transition, and “the remaining part of the string and whether it is on the top or bottom” represents a state. We call such a pair configuration . PCP can be formulated as a reachability problem: “Is it possible to reach the state of the empty string?” Example 3.1. When arranging two tiles like 100 10 , the state representing it is “top, remainder 010.” If a transition is made by appending 111 , the next state will be “top, remainder 1

0 01 0111.”

3.1. Problem Definition

We formulate PCP as a reachability problem. First, we define the transition system of PCP. Definition 3.2 (Transition System of PCP). Let = (( 1, ℎ1), … , ( , ℎ )) be a PCP instance of size over Σ. We define the transition system = (, , , ) as follows. • State set = { top, bottom} × Σ∗. • Transition function ∶ → 2 is defined as follows.

( bottom, ) = {( bottom, ′) ∣ ∃ ≤ . ℎ = ′} ∪ {(top, ′) ∣ ∃ ≤ . ℎ ′ = } ( top, ) = {( top, ′) ∣ ∃ ≤ . = ℎ ′} ∪ {(bottom, ′) ∣ ∃ ≤ . ′ = ℎ } • Bad state set = {( • Intial state set = ( top, ), ( bottom, )} .

top, ) .

The states after arranging one tile is considered the initial state, as an empty arrangement is not valid.

In the following, is naturally extended and used as ∶ 2 → 2 .

The behavior of the transition ( bottom, ) is illustrated below. When is the current state, adding ( , ℎ ) results in the remaining part becoming the next state ’ . There are two patterns: one where the same side as the previous state continues, and one where the side changes. ℎ ′ ℎ ′ (a) Pattern where the side doesn’t change (b) Pattern where the side changes Definition 3.3 (Reachability Problem of PCP). Does there exist such that ( ) ∩ ≠ ∅ Definition 3.4 (Inductive Invariant of PCP). A set that satisfies the following three conditions is called an inductive invariant (simply invariant).

• ⊆ • is closed under : ( ) ⊆ • does not include : ∩ = ∅ Lemma 3.5. If exists, then it implies that is unreachable from initial states.

In the following section, we introduce algorithms to discover .

3.2. Algorithm

For the Reachability Problem, many powerful algorithms like PDR (Property Directed Reachability)[ 5 ] exist. We extended PDR and achieved some success (see Section 5). We also devised a novel ad-hoc method specific to PCP, described below.

Definition 3.6 (Configuration Automaton) . Let ∈ { top, bottom} and be a finite automaton over Σ. We call the pair (, ) the configuration automaton. The language of (s, A) is denoted as (, ) and defined as follows. This represents a state set of the transition system.

(, ) = {(, ) ∣ ∈ ()}

The aim of this algorithm is to discover a pair of configuration automata (for top and bottom) that represents . It should be noted that not every has such a pair due to the regularity of the underlying automata, which limits the scope of our consideration.

This algorithm manages a graph = ( , ) where each node is a configuration automaton. Specifically, each node is associated with a set of states of a transition system. The algorithm proceeds by expanding the overall union ( ) = ⋃ ( ) until it becomes an invariant.

Intuitively, the edge (, ) in this graph represents a dependency relationship. This relationship means “if cannot reach , then cannot reach either”. If we can construct a graph where every node has such dependencies and does not contain any bad state, then ( ) is an invariant. There are two types of this relation, as follows.

1. Inclusion relation: () ⊊ ( ) 2. Transition relation: (()) = ( )

The algorithm is essentially a breadth-first search (BFS). When considering only the transition relation, the process operates similarly to BFS. A distinctive feature of this algorithm is that it proactively abstracts nodes. For example, when a node such as (top, 0011101) appears, the algorithm attempts to create a node like (top, .∗110.∗) (we use a regex to represent an automaton) and draw an edge to it. If this abstracted node can reach , it is removed and backtracking is performed.

Figure 4 shows a successful execution example for 1111 0 1 . The square nodes 1 11 1100 represent nodes with singleton languages, and the round nodes are abstracted nodes with regular expressions appearing in their labels. The dotted lines represent inclusion relations, and the solid lines represent transition relations. Note that in this figure, the transition relations are extended to (where ≥ 1 ) steps, with intermediate steps omitted. top,111 top,.*0.* bottom,100 top,.*1.* top,.*00.* bottom,001100 bottom,11100 top,1 bottom,11001100 bottom,.*0110.*

4. Certificate Generation

So far, we have presented two methods and complicated algorithms. However, there is a significant possibility that my implementations for these algorithms may contain bugs. Even if we successfully solve all instances of PCP[ 3,4 ], our results would still be far from being considered trusted facts. Therefore, we decided to have our algorithm output proofs in the form of Isabelle/HOL code.

Another possible approach is to use Isabelle/HOL or similar tools to verify the correctness of the algorithm’s implementation. However, this makes it dificult to optimize the algorithm for speed. For instance, The first method relies on an external MIP solver for its eficiency, making it challenging. Additionally, for others to quickly trust our results, it is crucial that all instances of [ 3, 4 ] and their proofs are organized and verified within some proof assistant such as Isabelle/HOL.

Currently, only the second method is capable of outputting a certificate. The first method will be addressed as future work (see Section 6).

4.1. Certificate: Pair of Automata

Consider the transition system of a PCP instance. By defining the invariant concretely in Isabelle/HOL and proving each of the invariant conditions (see Definition 3.4), we can validate it. This method is independent of the implementation details used in the second method and can be utilized by various algorithms discovering invariants.

Our implementation of the second method generates the following code. 1. Definition of the PCP instance 2. Definition of a) The top-side Automaton b) The bottom-side Automaton 3. Proof of the closedness of a) Definition of ( ) (in the form of a specific pair of deterministic automata) b) Concrete definition of the automaton for ∩ c) Proof of ∩ ( ) = ∅ d) Proof of ∩ ( ) similarly, and show that ( ) ⊆ ( )

Proofs such as “the existence of implies that the PCP has no solution” were conducted manually in advance. Examples of complete proofs are found on the author’s GitHub repository [ 6 ].

5. Application to PCP[3,4]

In this research, we address the instances of PCP[ 3,4 ]. Ling Zhao (2003) [ 2 ] attempted to solve all these instances but left 3,170 unsolved. The list of these instances is available on his website [ 7 ]. Our goal was to solve all instances of PCP[ 3,4 ], gradually reducing the number of unsolved problems. As shown in Figure 5, the initial 3,170 unsolved problems were reduced to 127 using the first method. After several additional methods, only 13 problems remained unsolved. These remaining problems are listed on the author’s website [ 8 ].

PDR, SAT, Method2(1), and Method2(2) are techniques for discovering . Certificate generation is implemented for those methods. The method SAT uses a SAT solver to discover , while Method2(1) and Method2(2) difer in their abstraction methods.

SAT 3170 cases 71 cases

Method 1 Method 2(1) 127 cases 26 cases PDR Method 2(2) 73 cases 13 cases 6. Conclusion and Future Work

We have been working for a complete resolution of PCP[ 3,4 ] and came close, with only 13 instances remaining unsolved. To have these results accepted as trusted facts, we also aim to provide formal proofs using Isabelle/HOL for each instance, which has been achieved for the second method. Although both goals are yet to be fully achieved, we believe they are attainable as outlined below.

To solve the remaining 13 problems, we consider two possibilities. One is to solve these instances manually. We predict that most of the 13 problems do not have solutions, and providing ad-hoc proofs by humans might be the quickest way. The other possibility involves devising new variants of the methods in this paper or investing additional computational resources. Since the manual approach can also help gain deeper insights into individual instances and PCP itself, we would like to first aim for manual resolution.

Generating certificates for the first method is challenging because it uses an external Mixed Integer Programming (MIP) solver as a subroutine. Generating a certificate for the feasibility of an MIP is straightforward, as it merely requires providing a specific solution. However, generating a certificate for infeasibility is more dificult. Cheung et al. (2017) [ 9 ] extended the existing MIP solver SCIP to output easily verifiable certificates in their own format. We believe that we can overcome this dificulty by converting these certificates into Isabelle/HOL code.

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number 19K11899 and 24K14891.

[1]

E. L.

Post , A variant of a recursively unsolvable problem , Bulletin of the American Mathematical Society 52 ( 1946 ) 264 - 268 .

[2]

Zhao , Tackling Post's correspondence problem , in: Computers and Games , Springer Berlin Heidelberg, 2003 , pp. 326 - 344 .

[3]

R. J.

Lorentz , Creating dificult instances of the post correspondence problem , in: Computers and Games , Springer Berlin Heidelberg, Berlin, Heidelberg, 2001 , pp. 214 - 228 .

[4]

Verma ,

Seidl , T. Schwentick, On the complexity of equational horn clauses , 2005 , pp. 337 - 352 . doi: 10 .1007/11532231_ 25 .

[5]

A. R.

Bradley , Sat-based model checking without unrolling , in: Proceedings of the 12th International Conference on Verification, Model Checking, and Abstract Interpretation , VMCAI'11 , Springer-Verlag, Berlin, Heidelberg, 2011 , p. 70 - 87 .

[6]

Omori , pcp-proof, https://github.com/Mojashi/pcp-proof, 2024 .

[7]

Zhao , Pcp documents, 2002 . URL: https://webdocs.cs.ualberta.ca/~games/PCP.

[8]

Omori , Unresolved problems, 2024 . URL: https://pcp-vis.pages.dev/gallery.

[9] K. K. H. Cheung , A.

Gleixner , D. E.

Stefy , Verifying integer programming results , in: F. Eisenbrand , J. Koenemann (Eds.), Integer Programming and Combinatorial Optimization , Springer International Publishing, Cham, 2017 , pp. 148 - 160 .