From Decidability to Undecidability by
        Considering Regular Sets of Instances?

                                    Petra Wolf

       Universität Trier, Fachbereich 4 - Abteilung Informatikwissenschaften,
                               D-54286 Trier, Germany
         wolfp@informatik.uni-trier.de,           https://www.wolfp.net/


      Abstract. We are lifting classical problems from single instances to
      regular sets of instances. The task of finding a positive instance of the
      combinatorial problem P in a potentially infinite given regular set is
      equivalent to the so called intReg -problem of P , which asks for a given
      DFA A, whether the intersection of P with L(A) is non-empty. The intReg -
      problem generalizes the idea of considering multiple instances at once
      and connects classical combinatorial problems with the field of automata
      theory. While the question of the decidability of the intReg -problem has
      been answered positively for several NP and even PSPACE-complete
      problems, we are presenting some natural problems even from L with an
      undecidable intReg -problem. We also discuss alphabet sizes and different
      encoding-schemes elaborating the boundary between problem-variants
      with a decidable respectively undecidable intReg -problem.

      Keywords: Deterministic finite automaton · Regular intersection empti-
      ness problem · Undecidability


1   Introduction and Motivation
In many fields multiple problem instances are considered all at once and they
are accepted if there is at least one positive instance among them. The instances
are described through a strongly compressed representation. For instance, in
graph modification problems 1 a graph G together with several graph editing
operations is given and one asks whether G can be transformed into a graph
G0 with a certain property using up to n editing operations [2,6,15]. Here, the
graph G represents the finite set of graphs which can be generated by G using
up to n editing operations. Another example are problems with uncertainty
in the instance [3,9] where some parameters of the instance are unknown and
therefore stand for a variety of values. Finding a positive instance among plenty
of candidates is also a task in synthesis problems [4]. In [4] the authors generate
a finite set of candidate Petri nets among which they search for a solution. The
?
  Copyright c 2020 for this paper by its authors. Use permitted under Creative
  Commons License Attribution 4.0 International (CC BY 4.0). The author was
  partially supported by DFG (FE 560 / 9-1).
1
  A Dagstuhl seminar on “Graph Modification Problems” was held in 2014 [2]
2       P. Wolf

synthesis of an object with a certain property can be seen as the search for an
object with this property among several candidates.
    A natural generalization of the task of finding a positive instance in a finite
set of instances is to search in an infinite set of instances. A well studied class
of potentially infinite languages are the regular languages which are also in a
compressed way represented by finite automata or regular expressions. We call
A = (Q, Σ, δ, q0 , F ) a deterministic finite automaton (DFA for short) if Q is
a finite set of states, Σ a finite alphabet, δ : Q × Σ → Q a total transition
function, q0 ∈ Q the start state, and F ⊆ Q the set of final states. We generalize
δ to words in the usual way. We denote the language accepted by A with
L(A) = {w ∈ Σ ∗ | δ(q0 , w) ∈ F }. Asking whether the accepted language of
a DFA A contains a positive instance of a problem P is equivalent to asking
whether the intersection P ∩ L(A) is non-empty. This question was introduced
in [8] as the intReg -problem of P or intReg (P ) for a fixed problem P .

Definition 1 (intReg (P )). Given: DFA A. Question: Is L(A) ∩ P 6= ∅?

 In [1,12,13], intReg (L) was studied for languages L with low computational
 complexity, but which describe structural word-properties that have high relevance
 for combinatorics on words and formal language theory (e.g., set of primitive
words, palindromes, etc.). There, (efficient) decision procedures are obtained.
     The intReg -problem has been studied independently under the name regular
realizability problem RR(L), where the filter language L plays the role of problem
 P above, i. e., RR(L) = intReg (L) (see [1,16,17,20,21,22,23]), motivated by com-
 putational complexity questions. The aim was to present with the RR-problem
‘a specific class of algorithmic problems that represents complexities of all known
 complexity classes [. . .] in a unified way’ [22]. It turned out that RR-problems are
 universal in the sense that for any problem P , there exists an RR-problem RR(L)
with the same complexity (note that P and L are different languages). In [23]
 the authors focused on context-free filter languages and presented languages
 L for which RR(L) is either P-complete, NL-complete or has an intermediate
 complexity. In [20] the decidability of the RR-problem with filter languages over
 permutations of binary words was studied.
     In contrast, the line of research in [8,24,26,27] aims to use the intReg -problem
 as a tool to get insights into classes of hard problem as for instance the class
 of NP and PSPACE-complete problems. While the decidability of intReg (P ) for
 hard problems P such as SAT [8], ILP [26], Vertex Cover [27] and TQBF [8]
 is known, we present in this work problems, with a complexity ranging from
 contained in L to PSPACE-completeness, with an undecidable intReg -problem.
These results indicate that the decidability of the intReg -problem of a language
 does not directly coincide with its computational complexity. This study rises
 the natural question what, for instance, NP-complete problems with a decidable
intReg problem have in common that separates them from NP-complete problems
with an undecidable intReg -problem. We also examine for some problems the size
 of the input alphabet and the encoding scheme resulting in different decidability
 results of the considered intReg -problem.
                              (Un-)Decidability by Regular Sets of Instances      3

    This paper is structured as follows. First, we discuss machine languages for
several complexity classes. Then, we consider the problems of bounded and
corridor tiling, followed by bounded PCP. We will show that all of these problems
have an undecidable intReg -problem. Next, we investigate the PSPACE-complete
problem of Equivalence of Regular Expressions and prove that the problem
in a shuffled encoding has an undecidable intReg problem. As the proof only uses
the concatenation operator of regular expressions, we get the undecidability of
intReg of the so called String Equivalence Modulo Padding problem in
a shuffled encoding, which lies in L. For this problem, we will discuss different
alphabet sizes and encoding schemes and show that all other considered variants
of this problem have a decidable intReg -problem. Finally, we present a graph
problem on directed multi-hyper-graphs with an undecidable intReg -problem.
This contrasts the results in [27] where classes of graph problems with a decidable
intReg -problem are identified.
    We expect the reader to be familiar with regular languages and their descrip-
tion through finite automata and regular expressions. The reader should also be
familiar with the complexity classes L, NL, NP, and PSPACE. We refer to the
textbooks [7] and [10] for details. Missing proofs can be found in the long version
of this work [25].


2   Machine Languages

For several complexity classes, we can define machine languages which are
complete for their complexity classes. We will show that the following machine
languages have an undecidable intReg -problem. The intReg -problem of the machine
language for NP was already discussed in [8] and is listed here for the sake of
completeness.
Definition 2 (Machine Language for NL).
Given: Encoded nondeterministic Turing machine hM i, input-word x, and a
string an with n ∈ N.
Question: Does M accept x visiting only log(n) different tape-positions?
Encoding: LNL = {hM i$x$an | M is an NTM accepting x in log(n) space}.
The language LNL is complete for the class NL. Every language in NL can be
accepted by a non-deterministic Turing machine which is space-bounded by a
function f ∈ Θ(log). Since f is logspace-constructible, there exists a deterministic
TM Mf which computes f (n) on the input 0n in logarithmic space. Hence, every
fixed problem in NL can be reduced to LNL by hard-wiring the NTM M which
decides the problem and is space-bounded by f , followed by the input word w
and a unary string of size 2f (|w|) . Note that f (|w|) is logarithmically smaller
than |w| and hence can be stored using log(|w|) many cells. A logarithmically
space-bounded TM can compute an output string which is exponentially in the
size of its used memory. As can be easily verified LNL ∈ NL.
    The machine language for NP, in short LNP , is defined analogously demanding
that x is accepted in n steps, while the machine language for PSPACE, in short
4       P. Wolf

LPSPACE , demands x to be accepted in n space. With similar arguments LNP is
complete for NP and LPSPACE is complete for PSPACE.

Theorem 1. intReg (LNL ), intReg (LNP ), and intReg (LPSPACE ) are undecidable.

Proof. We give a reduction from the undecidable non-emptiness-problem for
recursively enumerable sets [10] defined as L6=∅ := {hM i | M is a nondeterministic
TM with L(M ) 6= ∅}. Let hM i be an arbitrary encoded Turing machine with the
input alphabet Σ. We define the regular language f (hM i) := R := {hM i$x$an |
x ∈ Σ ∗ , n ≥ 0}. Then, f (hM i) ∈ intReg (LNL ) ⇔ R ∩ LNL = 6 ∅ ⇔ L(M ) 6= ∅ ⇔
hM i ∈ L6=∅ . The same holds for LNP and LPSPACE . Since the emptiness-problem
for recursive enumerable sets is undecidable, the undecidability of the problems
intReg (LNL ), intReg (LNP ), and intReg (LPSPACE ) follows.                     t
                                                                                 u


3    Bounded and Corridor Tiling

The next problem we want to investigate is about the tiling of the plane. For
a given set of tile types and a fixed corner tile, the question is to fill a plane
with the given tiles under some conditions. While the problem for an infinite
plane is undecidable [14,19], it becomes NP-complete if we restrict the plane
to an n × n-square and preset the tiles on the edges of the square; it becomes
PSPACE-complete if we only restrict the width with preset tiles and ask for a
finite height, such that the plane can be tiled according to the preset tiles [5].
     First, we will give a formal definition of the problem Bounded Tiling. Then,
we will show that this problem has an undecidable intReg -problem by reducing
L6=∅ to the problem intReg (Bounded Tiling).
     A tile is a square unit where each of the edges is labeled with a color from
a finite set C of colors. The color assignment is described by tile types. A tile
type is a sequence t = (w, n, e, s) with w, n, e, s ∈ C of four symbols representing
the coloring of the left, top, right, and bottom edge color. We denote with tw ,
respectively tn , te , ts , the first, respectively second, third, and forth entry of
the tuple t. Tiles can be regarded as instances of tile types. A tile must not be
rotated or reflected. In the following problem, we give a finite set of tile types as
input. From every tile type arbitrary many tiles can be placed. The tiles have to
cover up a square grid region such that adjacent edges have to have the same
color. The grid comes with an edge coloring which contains for each border of
the square grind a sequence of colors presetting the adjacent color of tiles resting
on the edge. A tiling is a mapping from the square grid region to a set of tile
types. With hT i we denote a proper encoding of the tile type set T and with [n]
we denote the set {1, 2, . . . , n}. We call f : [n] × [n] → T a tiling function if for all
i, j ∈ [n], it holds that f (i, j)e = f (i + 1, j)w for i < n and f (i, j)n = f (i, j + 1)s
for j < n meaning that adjacent edges of the tiles have the same color. Here, the
bottom left square of a grid region is indexed by (1, 1).

Definition 3 (Bounded Tiling).
Given: Finite set T of tile types with colors from a finite color set C and an n × n
                                  (Un-)Decidability by Regular Sets of Instances              5

square grid region V with a given edge coloring.
Question: Is there a tiling function f : [n] × [n] → T that tiles V extending the
edge coloring?
Encoding: hT i followed by an edge coloring $l$t$r$b, l = l1 #l2 # . . . #ln , with t =
t1 #t2 # . . . #tn , r = r1 #r2 # . . . #rn , b = b1 #b2 # . . . #bn with li , ti , ri , bi ∈ C.
    Howard Straubing gives in his article “Tiling Problems” [19] a reduction
from the complement of the halting problem to the problem of tiling an infinite
plane. Therefore, he gives an algorithm “that takes input hM i and produces the
associated hT, ci” (where c is the given corner tile in the unrestricted case of the
problem). The tiles represent every possible transition of the Turing machine and
are constructed in a way that correctly tiled rows correspond to configurations
of the given Turing machine. The four colors of the tiles also ensure that two
adjacent rows represent two consecutive configurations. Therefore, the infinite
plane can only be tiled if and only if the Turing machine runs forever.
    Peter van Emde Boas [5] uses a similar construction to simulate Turing
machines and shows that the Bounded Tiling problem is NP-complete. For
a given nondeterministic Turing machine, the possible transitions and tape cell
labelings are transformed into a set of tile types. The input word, padded with
blank symbols, is encoded in the bottom edge coloring b and a distinguished
accepting configuration is encoded in the top edge coloring t. The left and right
borders are colored with the fixed color white which is a color only occurring
on vertical edges and which do not represent any state or alphabet letter of the
Turing machine. So, white can be seen as a neutral border color. Blank symbols
are trailed to the input word to enlarge the size of the square field to the exact
time bound of the Turing machine. The Turing machine is altered in a way
that it accepts with one distinguished accepting configuration. The tile types
are constructed in a way that this accepting configuration can be repeated over
several adjacent rows. Therefore, the constructed edge colored square region can
be correctly tiled matching the edge coloring if and only if the given Turing
machine accepts the input word within its time bound.
    With that construction in mind, we will now prove that the intReg -problem
for Bounded Tiling is undecidable.
Theorem 2. The problem intReg (Bounded Tiling) is undecidable.
Proof. We give a reduction from the undecidable problem L6=∅ . Let hM i be the
encoding of an arbitrary NTM. We construct a regular language R which contains
a positive Bounded Tiling instance if and only if M accepts at least one word.
We alter the machine M to an NTM N which behaves like M except having
only one distinguished accepting configuration, i.e., an empty tape with the head
on the first position of the former input word. According to Straubing [19] and
van Emde Boas [5], there is an algorithm which, given a TM N , produces the
corresponding set of tile types T such that a correct extending tiling of a given
edge colored square field corresponds to a sequence of successive configurations
of the given machine, starting on an input word represented through the coloring
of the bottom border.
6        P. Wolf

    Let TN be the corresponding tile type set for the NTM N and let CN
be the set of colors appearing in TN . Let CΣ ⊆ CN be the subset of colors
representing input alphabet symbols, let  ∈ CN be the white color representing
a white vertical border edge of the square grid, and let  ∈ CN be the color
representing an empty tape cell. Finally let qf ∈ CN be the color representing
the accepting state of the Turing machine. We define the regular set R as
R = L ({hTN i $ ∗ $ qf ∗ $ ∗ $ CΣ ∗ ∗ }) . The set R consists of the set of tile
types for the NTM N together with edge colorings for every possible input
word and every possible size of the field V . The top row will always contain the
accepting configuration of N padded with arbitrary many blank symbols. The
left and right borders of the field V can consist of arbitrary many white edges,
while the bottom row can encode every possible input word with arbitrary many
added blank symbols allowing an arbitrary time bound for the Turing machine.
Note that the edge coloring does not have to define a square, but the square
shape is also contained in the set R for every input word and every number of
padding symbols. Therefore, for every input word w, the set R contains every
size of squared fields with w encoded in the bottom edge coloring. The tile type
set of R is constructed in a way that in a valid tiling adjacent rows will represent
successive configurations of the Turing machine. So, for every number of steps
the TM makes on the input word, there is a square field, with the input word
encoded, in the set R preventing enough space for the configurations of the TM.
This brings us to our main claim, R ∩ Bounded Tiling 6= ∅ ⇔ L(N ) 6= ∅. t         u
  With the same argument, we can show that the PSPACE-complete problem
Corridor Tiling [5] also has an undecidable intReg -problem.


4    Bounded PCP
Another undecidable problem, which becomes decidable if we restrict the size
of the potential solution, is the Post’s Correspondence Problem (in short
PCP). We show that the NP-complete version Bounded Post Correspon-
dence Problem [7] (in short BPCP) has an undecidable intReg -problem by a
reduction from the unrestricted undecidable PCP problem [10].
Definition 4 (BPCP).
Given: Finite alphabet Σ, two sequences A = (a1 , a2 , . . . , an ), B = (b1 , b2 , . . . , bn )
of strings from Σ ∗ , and a positive integer K ≤ n.
Question: Is there a sequence i1 , i2 , . . . , ik of k ≤ K (not necessarily distinct)
positive integers ij ∈ [n] such that ai1 ai2 . . . aik = bi1 bi2 . . . bik ?
Encoding: LBP CP := {a1 #a2 # . . . #an $b1 #b2 # . . . #bn $ bin(K) | K ≤ n ∧ A =
(a1 , a2 , . . . , an ), B = (b1 , b2 , . . . , bn ) is a PCP instance with a solution ≤ K}.
The problem PCP is defined analogously but does not contain a bound K.
Theorem 3. The problem intReg (BPCP) is undecidable.
Proof. We give a reduction PCP ≤ intReg (BPCP). Let A = (a1 , a2 , . . . , an ) and
B = (b1 , b2 , . . . , bn ) be a PCP instance. We construct a regular language R
                                (Un-)Decidability by Regular Sets of Instances          7

consisting of the given PCP instance combined with every possible solution
bound K. Since K is bounded by the length of list A and B, we will pump those
lists up by repeating the last list element of both lists arbitrarily often. Because
the same element can be picked multiple times, adding elements already appearing
in the given lists does not change the solvability of the instance. We define R
as R = {a1 #a2 # . . . #an (#an )∗ $b1 #b2 # . . . #bn (#bn )∗ ${0, 1}∗ }. It holds that
R ∩ BPCP 6= ∅ if and only if there is a sequence of indexes i1 , i2 , . . . , im such
that ai1 ai2 . . . aim = bi1 bi2 . . . bim .                                          t
                                                                                      u


5    Regular Expressions in a Shuffled Encoding

In this section we show that the problem of Equivalence of Regular Ex-
pressions (in short ≡RegEx ) over a binary alphabet in a shuffled encoding has an
undecidable regular intersection emptiness problem. It turns out, that the prob-
lem is already undecidable if the regular expressions do not use alternation or the
Kleene star. Thus, also the problem of String Equivalence Modulo Padding
over a binary alphabet in a shuffled encoding has an undecidable intReg -problem.
When we consider the String Equivalence Modulo Padding problem over a
unary alphabet or in a sequential encoding, the problem becomes decidable. We
first define the problem of Equivalence of Regular Expressions (adapted
from [7]). For a regular expression E, we denote with L(E) the regular language
described by E. We use concatenation implicitly and omit the operator symbol.
The alternation is represented by |-symbols.
Definition 5 (Shuffled≡RegEx ).
Given: A word w = e1 f1 e2 f2 e3 f3 . . . en fn over the alphabet Σ ∪ {∅, , (, ), |, ∗}
such that E = e1 e2 e3 . . . en and F = f1 f2 f3 . . . fn are regular expressions over the
alphabet Σ using the operators alternation, concatenation, and Kleene star. Note
that one regular expression can be padded with  or ∅ if the regular expression are
of unequal length.
Question: Is L(E) = L(F )?
The problem of equivalence of the regular expressions is well known to be PSPACE-
complete [18]. Since we can change the encoding of an ≡RegEx instance from
shuffled to sequential and vice versa in quadratic time, the shuffled version of this
problem is also PSPACE-complete. We will show that intReg (Shuffled≡RegEx )
is undecidable by a reduction from the PCP problem [11]. For readability reasons,
we will refer to words w ∈ Shuffled≡RegEx as w = fe11 . . . fenn . From a given
PCP instance we will construct a regular language LReg , the words of which
will describe all possible solutions of the PCP instance. The words will consist
of two shuffled regular expressions using only the concatenation as an operator.
By construction, the first regular expression will be a concatenation of strings
from the A list of the PCP instance while the second regular expression will
consists of the concatenated corresponding strings from the B list. Since the
regular expressions only use concatenation, languages described by them only
contain one element each. The language LReg will contain two shuffled regular
8           P. Wolf

expressions describing the same language if and only if the PCP instance has a
valid solution.

Theorem 4. The problem intReg (Shuffled≡RegEx ) is undecidable.

Proof. We give a reduction PCP ≤ intReg (Shuffled ≡RegEx ) and translate a
given PCP instance into a regular language LReg . We emphasize references to
the regular expression defining the language LReg , while references to the regular
expressions encoded in the words of LReg are not emphasized. We also emphasize
references to the regular language of shuffled regular expressions.
    Let A = a1 , a2 , . . . , ak and B = b1 , b2 , . . . , bk be a PCP instance. We define a
regular expression, describing a regular language LReg of shuffled regular expres-
sions describing concatenations of list elements. Let LReg be defined through the
regular expression
                                    0                         +
                                     a1 a2 0              ak 0
                                                 ...
                                     b1 0 b2 0            bk 0
                        0
where the string abii0 consists of the two shuffled strings ai , bi where the shorter
string is padded with -symbols at the end until both stings have the same length.
The -symbol is here used as an alphabet symbol of the language LReg and refers
to the regular expression  which will be interpreted as {} and not to the empty
word itself. Therefore, LReg consists of all possible pairwise concatenations of
elements of the lists A and B where the concatenated strings are padded with
-symbols to have the same length.
    For every PCP instance, the described regular expression of the language
LReg can be computed by a computable total function. It remains to show that
the PCP instance A, B has a solution if and only if LReg ∩ Shuffled≡RegEx =               6 ∅.
More precisely, the intersection will contain all solutions of the PCP instance.
    First, consider the only if direction. Let i1 , i2 , . . . , in be a solution of the PCP
instance A, B such that ai1 ai2 . . . ain = bi1 bi2 . . . bin . By construction, the regular
                                                                                     0       0
language LReg contains all possible concatenations of the strings ab110 , . . . , abkk0
corresponding to the pairs (a1 , b1 ), . . . (ak , bk ) of the strings of the lists A and B.
                                                        0    0         0
Therefore, LReg also contains the word w = abii1 0 abii2 0 . . . abiin 0 . The word w consists
                                                          1    2        n

of the two shuffled regular expressions E = ai1 0 ai2 0 . . . aik 0 and F = bi1 0 bi2 0 . . . bik 0 .
Since they are both nonempty strings with padded ’s their described language
is a singleton set. By construction, we have L(E) = {ai1 ai2 . . . aik } and L(F ) =
{bi1 bi2 . . . bik }. By assumption is ai1 ai2 . . . ain = bi1 bi2 . . . bin , therefore we have
L(E) = L(F ) and w ∈ LReg ∩ Shuffled≡RegEx .
     For the other direction, assume LReg ∩ Shuffled≡RegEx 6= ∅. Let w =
ai1 0 ai2 0             0

 bi 0 bi 0
            . . . abiin 0 ∈ LReg ∩ Shuffled≡RegEx consists of the two shuffled regular
    1   2        n

expressions E = ai1 0 ai2 0 . . . aik 0 and F = bi1 0 bi2 0 . . . bik 0 . By assumption is L(E) =
L(F ). Since L(E) and L(F ) each contain only one element, from which the
describing regular expressions differ only by padded -symbols, it holds by
construction that ai1 ai2 . . . ain = bi1 bi2 . . . bin . Therefore, i1 , i2 , . . . , in is a solution
of the PCP instance.                                                                                  t
                                                                                                      u
                                  (Un-)Decidability by Regular Sets of Instances               9

    To show undecidability of the intReg (Shuffled≡RegEx ) problem we have
made use of only one operator of regular expressions, namely the concatena-
tion. If we restrict the Shuffled≡RegEx problem to regular expressions using
only letters from Σ, the -symbol and the concatenation, we get the much eas-
ier problem of Shuffled String Equivalence Modulo Padding, in short
Shuffled≡String . Since we are only using the associative operation of con-
catenation, we can get rid of brackets. All of the following problems are in the
complexity class L, since they can be solved deterministically using two pointers.

Definition 6 (Shuffled≡String ).
Given: A word w = s1 t1 s2 t2 s3 t3 . . . sn tn such that si , ti ∈ Σ ∪ {}. Question:
                                                                    ∗
Is h(s1 s2 s3 . . . sn ) = h(t1 t2 t3 . . . tn ) where h : (Σ ∪ {}) → Σ ∗ is an erasing
homomorphism which leaves all symbols in Σ unchanged and deletes the -symbols.

Corollary 1. The problem intReg (Shuffled≡String ) is undecidable.

    If we restrict the alphabet Σ to singleton sets, the Shuffled≡String becomes
decidable as this problem, considered as a language, is a context-free language.
Alternatively, if we refrain from the shuffled encoding and consider instead a
sequential encoding, the problem also becomes decidable. Here, we identify sub-
automata which accept prefixes up to the symbol $ and sub-automata which
accept suffixes starting after the symbol $. We use a homomorphism h to erase
the padding symbol  and check for each pair of prefix and suffix sub-automata
AP and AS whether h(L(AP )) ∩ h(L(AS )) 6= ∅. Details on the above discussed
variations can be found in the long version of this work [25].


6    An Undecidable intReg Problem About Graphs

In this section we consider a graph problem with an undecidable intReg -problem
which contrasts the decidability results for graph problems in [27].
    For a word w ∈ Σ ∗ and a letter σ ∈ Σ, we denote with w|σ the number of σ’s
in w. We consider directed hyper-multi-graphs with self loops and 2 to 4 vertices
per edge. More formally, we consider graphs of the form G = (V, E), where V is
a set of vertices and E a set of edges together with the function fE : E → V [2..4]
which assigns each edge with a tuple consisting of 2 to 4 vertices incident to
this edge. An edge is called a loop if all of its incident vertices are identical. We
encode G by listing its edges separated by $-signs. Vertices appearing in edges
are encoded by strings of a’s separated by #’s. To extract the encoded graph, we
define the following decoding function. For m ∈ N, 1 ≤ ki ≤ 3, i ≤ m, let
                                                         
                                  m
                                  Y        Yki
                  decdir hyp mul  api         (#aqi,j ) $ = G ,
                                      i=1        j=1

                                                                       S
where G = (V, E) with V = {vpi | i ∈ [m]} ∪ {vqi,j | j ≤ ki }, E =
{e1 , e2 , . . . , em }, fE : ei 7→ (vpi , vqi,1 , ..., vqi,ki ). We present a graph-problem over
10       P. Wolf

this class of graphs for which its intReg -problem is undecidable by encoding the
sets of derivation-trees of two given context-free grammars in Chomsky-normal-
form (CNF for short) into a regular set of directed hyper-multi-graphs. The
languages of the two grammars will share a common word w if and only if the
intersection of the constructed regular language with the graph-problem is non-
empty and contains the two derivations of w. We call G = (V, T, P, S) a context
free grammar in CNF, if V is a finite set of variables, T a finite set of terminal, P
a set of derivation rules of the from A → BC or A → a with A, B, C ∈ V , a ∈ T ,
and S ∈ V the start variable. We first give the construction and then define the
graph problem Embedded Derivation Trees, EDT for short.
Theorem 5. intReg ( EDT) is undecidable.
Proof. Let G1 = (V1 , T, P1 , S 0 ), G2 = (V2 , T, P2 , S 00 ) be two context free grammars
in CNF. We alter them, by introducing two new variables S1 and S2 , to the
grammars G1 = (V1 ∪{S1 }, T, P1 ∪{S1 → S 0 }, S1 ), G2 = (V2 ∪{S2 }, T, P2 ∪{S2 →
S 00 }, S2 ). From now on we will identify V1 ∪ {S1 } as V1 and V2 ∪ {S2 } as V2 .
W.l.o.g. assume that the sets V1 , V2 are disjoint and both grammars share the
terminal alphabet T . Note that the following problem is undecidable: Is there a
word w ∈ T ∗ such that w can be derived from G1 and from G2 ?
      Let m1 = |V1 |, m2 = |V2 |, t = |T |(|T2 |+1) , n = m1 + m2 + 2t + 2. We construct a
regular set R = R1 ·R2 , where R1 (respectively R2 ) is defined as follows: We fix an
order on the elements in V1 , V2 , T such that S1 is the first element in V1 and S2 is
the first element in V2 . We refer to the i’th element of a set S as S[i]. Let V1 [s0 ] =
S 0 , V2 [s00 ] = S 00 for integers s0 and s00 . For the derivation rule S1 → S 0 in G1 , we de-
                                                          0
fine the regular expression rs1 = a0 #a1 #a0 #as $ and for the derivation rule S2 →
                                                                 00
  00                               n−1    m1 +1      n−1
S in G2 we define rs2 = a              #a         #a     #am1 +s $ For every derivation rule
p1 = V1 [i] → V1 [j]V1 [k], i, j, k ≤ m1 in P1 , we define the regular expression rp1 =
a0 #ai (an )∗ #aj (an )∗ #ak (an )∗ $. For a derivation rule p2 = V2 [i] → V2 [j]V2 [k],
i, j, k ≤ m2 in P2 , we define rp2 = an−1 #am1 +i (an )∗ #am1 +j (an )∗ #am1 +k (an )∗ $.
We define for each j ∈ [|T |] and b ∈ {1, 2} a regular expression which encodes a
cycle of length j consisting of binary edges. We call these cycles leave-cycle later.
                                 (j+1)(j+2)
               m1 +m2 +(b−1)t+       2      −2
        jb
                            Y
                                                  ak (an )∗ #ak+1 (an )∗ $
                                                                          
       rlc =
                                     j(j+1)
                k=m1 +m2 +(b−1)t+       2
                                  (j+1)(j+2)                                  j(j+1)
                                             −1
               am1 +m2 +(b−1)t+       2           (an )∗ #am1 +m2 +(b−1)t+       2     (an )∗ $
For derivation rules q1 = V1 [i] → T [j] in P1 and q2 = V2 [i] → T [j] in P2 , we
                                       j(j+1)           j1
define: rq1 = a0 #ai (an )∗ #am1 +m2 + 2 (an )∗ $rlc       , and
                                         j(j+1)
        n−1   m1 +i n ∗      m1 +m2 +t+ 2          n ∗ j2
rq2 = a     #a     (a ) #a                      (a ) $rlc . We are now ready to define
                              S             S             ∗
R1 and R2 . We set Ri = rsi       pi ∈Pi rpi    qi ∈Pi rqi    for i ∈ {1, 2}.
    We now define our graph property such that it filters out the encoded graphs
in the regular set which consists of two derivation trees, one from G1 and one
from G2 , which derivate the same word. It is helpful to consider Figure [1] in the
long version while reading though the following arguments.
                              (Un-)Decidability by Regular Sets of Instances      11

Definition 7 (Multi-Graph Embedding φ).
Let G = (V, E) be a directed hyper-multi-graph such that each edge contains two,
three, or four vertices. The multi-graph embedding φ maps G onto a multi-graph
φ(G) = Gm = (Vm , Em ) with Em ⊆ Vm × Vm in the following way: Vm = V , for
a 4-nary edge (a, b, c, d) ∈ E we add the edges (a, c), (b, c), (b, d) to Em . For a
ternary edge (a, b, c) ∈ E, we add the edges (a, c), (b, c) to Em . Binary edges are
simply added to Em .

Definition 8 (EDT).
Input: A directed hyper-multi-graph G = (V, E) with fE : E → V [2..4] .
Question: Does the multi-graph embedding φ(G) consists fo two connected com-
ponents G1m and G2m such that the following holds. G1m contains a vertex v1
and G2m contains a vertex v2 , such that G1m \{v1 } and G2m \{v2 } are (directed)
binary trees (where edges are pointing from parents to children) in which the leave
layer consists of directed cycles (called leave-cycles). Exactly the root-node and
the parents of leave-cycles have out-degree one, all other nodes which are not
part of a leave-cycle have out-degree 2. The node v1 is connected to exactly one
child of each parent node in G1m \{v1 } except for the root as here v1 is pointing
to the root and not to the child. The only node pointing towards v1 is the root
of G1m \{v1 }. The node v1 has no further connections. The same holds for v2
with respect to G2m \{v2 }. If G1m and G2m are drawn such that v1 and v2 always
point to the left child, then the sequence of lengths of the leave-cycles of G1m
and G2m (read from left to right) must coincide. Both graphs must not contain
multi-edges and loops are only allowed as a leave-cycle. The leave-cycles are not
connected to each other.

We will first argue that for any w ∈ R with decdir hyp mul (w) being a positive
instance of EDT the sub-graphs G1m and G2m of φ(decdir hyp mul (w)) must
correspond to two derivation trees, one for G1 and one for G2 .
    Note that for i ≡ 0 (mod n) and j ≡ −1 (mod n) the only vertex labels
ai (#|$) and aj (#|$) which can be a factor of a word in R are a0 # and an−1 #.
Especially, there are no factors of the form (an )k (#|$) or (an−1 )(an )k (#|$) with
k > 0 and the vertex a0 # is only appearing in sub-graphs corresponding to
derivation rules of G1 whereas an−1 # is only appearing in sub-graphs correspond-
ing to derivation rules of G2 . Hence, for a graph G = φ(decdir hyp mul (w)) with
w ∈ R in order to consist of two disjoint graph G1m and G2m one of them must
contain a vertex encoded by a0 # and hence be constructed by G1 and the other
one must contain a vertex encoded by an−1 # and be constructed by G2 as one
of this elements is part of any regular expression rsi , rpi , rqi .
    Note that by the definition of R1 and R2 each string w ∈ R contains exactly
one factor encoding the derivation rule S1 → S 0 and one factor encoding the
derivation rule S2 → S 00 . There are no other derivation rules in which S1 or S2
appear. Hence, there will be no factor a1 (an )k # and am1 +1 (an )k # in w with
k > 0. As rs1 and rs2 are the only regular expressions allowing to create an edge,
the multi-edge embedding of which creates an arc pointing towards v1 (encoded
12        P. Wolf

by a0 #) and v2 (encoded by an−1 #), the root of the tree2 Gm1 \{v1 } necessarily
be S1 with the child S 0 and the root of the tree Gm2 \{v2 } needs to be S2 with
child S 00 . The sub-tree starting in S 0 will then correspond to a derivation tree of
G1 , when the leave-cycle of length i is interpreted as the i-th letter of T , and the
sub-tree starting in S 00 will correspond to a derivation tree of G2 .
     W.l.o.g. we will focus on Gm1 . By the definition of EDT G1m must be a
binary tree with leave-cycles and by previous arguments have the root S1 with
the single child S 0 , all other internal parent-nodes have out-degree two. As G1m
does not contain any multi-edges and loops are only allowed as leave-cycles,
every inner node V1m [i] has exactly two appearances of its encoding ak # in w
namely in the 4-nary hyper-edge where it appears as one child and in the 4-nary
hyper-edge where it appears as the parent node. The first edge corresponds to a
derivation where the variable represented by V1m [i] is ’created’ and the second
edge to a derivation where this variable is ’consumed’. Hence, following the inner
nodes gives us a derivation tree for derivations of the form A → BC in P1 .
     Every member of a cycle of length k has its own domain of representatives
which is disjoint with the domains of the other elements of the cycle. It is also
disjoint with the domains of elements of cycles with different lengths. We show
that if w ∈ R encodes a positive instance of EDT, then for each leave-cycle
Ci of length k which is the child of a node V1m [j 0 ] encoding the variable V1 [j],
the description of Ci is created by the regular expression rq1 which encodes the
derivation V1 [j] → T [k]. Let Ci [1] be the first node in the cycle Ci , i.e., the
child of V1m [j 0 ]. Then, there is a factor uv in w encoding the edge (V1m [j 0 ], Ci [1])
where u encodes the node V1m [j 0 ] and v encodes the node Ci [1]. We know that
u|a ≡ j (mod n) and v|a ≡ m1 + m2 + k(k+1)       2   (mod n). By the definition of R
the node Ci [1] can only point to a node Ci [2] encoded by a factor x for which
x|a ≡ m1 + m2 + k(k+1) 2    + 1 (mod n) for k ≥ 2, or to itself for k = 1, but the only
regular expressions which allow to create such a factor correspond to derivation
rules A → T [k]. Indeed single edges between the same type of nodes (where the
number of a’s have the same remainder modulo n) can be exchanged between
different derivation rules with the same letter T [k] on the right side but the
number of nodes in a cycle can not be altered that way without losing a cycle
structure or introducing forbidden multi-edges. Hence, we can assume that Ci is
created by a regular expression encoding the derivation rule V1 [j] → T [k].
     Replacing cycles of length k by the corresponding letters T [k] completes our
derivation tree constructed by the inner nodes with derivations of terminals. As
the trees G1m and G2m have the same sequence of cycle lengths in the leave-
level if the child pointing to v1 , respectively v2 is drawn as the left child, the
constructed derivations derive the same word.
     For the other direction, we can construct a word w ∈ R for which the graph
decdir hyp mul (w) is a Yes-instance from the two derivations of a common word
x ∈ L(G1 ) ∩ L(G2 ) as follows. We go through the derivation trees from top
to bottom, left to right and increase an index k with every new considered
2
     We interpret the cycles in the leave-layer as leaves and hence interpret the graph as
     a tree despite the fact that it contains cycles.
                               (Un-)Decidability by Regular Sets of Instances         13

appearance of a variable. We encode each derivation according to the above
defined regular expressions, s.t. for the encoding ae of a variable appearance
e ÷ n = k. Details on constructing w can be found in the long version [25]. tu


References
 1. Anderson, T., Loftus, J., Rampersad, N., Santean, N., Shallit, J.: Detecting Palin-
    dromes, Patterns and Borders in Regular Languages. Information and Computation
    207(11), 1096–1118 (2009)
 2. Bodlaender, H., Heggernes, P., Lokshtanov, D.: Graph Modification Problems
    (Dagstuhl Seminar 14071) (2014)
 3. Chaari, T., Chaabane, S., Aissani, N., Trentesaux, D.: Scheduling Under Uncertainty:
    Survey and Research Directions. In: International Conference on Advanced Logistics
    and Transport, ICALT 2014, Hammamet, Tunisia, May 1-3, 2014. pp. 229–234.
    IEEE (2014)
 4. Desel, J., Reisig, W.: The Synthesis Problem of Petri Nets. Acta Informatica 33(4),
    297–315 (1996)
 5. van Emde Boas, P.: The Convenience of Tilings. Lecture Notes in Pure and Applied
    Mathematics pp. 331–363 (1997)
 6. Gao, X., Xiao, B., Tao, D., Li, X.: A Survey of Graph Edit Distance. Pattern
    Analysis and Applications 13(1), 113–129 (2010)
 7. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory
    of NP-Completeness. W.H. Freeman and Company, New York (1979)
 8. Güler, D., Krebs, A., Lange, K.J., Wolf, P.: Deciding Regular Intersection Emptiness
    of Complete Problems for PSPACE and the Polynomial Hierarchy. In: Klein, S.T.,
    Martı́n-Vide, C., Shapira, D. (eds.) Language and Automata Theory and Appli-
    cations - 12th International Conference, LATA 2018. Lecture Notes in Computer
    Science, vol. 10792, pp. 156–168. Springer (2018)
 9. Hladı́k, M.: Interval Linear Programming: A Survey. In: Linear Programming – New
    Frontiers in Theory and Applications, chap. 2, pp. 85–120. Nova Science Publishers,
    New York (2012)
10. Hopcroft, J.E., Ullman, J.D.: Formal Languages and their Relation to Automata.
    Addison-Wesley series in computer science and information processing, Addison-
    Wesley (1969)
11. Hopcroft, J.E., Ullman, J.D.: Introduction to Automata Theory Languages and
    Computation. Addison-Wesley Publishing Company, Inc. (1979)
12. Horváth, S., Karhumäki, J., Kleijn, J.: Results Concerning Palindromicity. Elektro-
    nische Informationsverarbeitung und Kybernetik 23(8/9), 441–451 (1987)
13. Ito, M., Katsura, M., Shyr, H.J., Yu, S.S.: Automata Accepting Primitive Words.
    Semigroup Forum 37(1), 45–52 (1988)
14. Kari, J.: On the Undecidability of the Tiling Problem. In: Geffert, V., Karhumäki,
    J., Bertoni, A., Preneel, B., Návrat, P., Bieliková, M. (eds.) SOFSEM 2008: Theory
    and Practice of Computer Science, 34th Conference on Current Trends in Theory
    and Practice of Computer Science. Lecture Notes in Computer Science, vol. 4910,
    pp. 74–82. Springer (2008)
15. Liu, Y., Wang, J., Guo, J.: An Overview of Kernelization Algorithms for Graph
    Modification Problems. Tsinghua Science and Technology 19(4), 346–357 (2014)
16. Rubtsov, A.A.: Regular Realizability Problems and Regular Languages. CoRR
    abs/1503.05879 (2015)
14      P. Wolf

17. Rubtsov, A.A., Vyalyi, M.N.: Regular Realizability Problems and Models of a
    Generalized Nondeterminism. CoRR abs/1105.5894 (2011)
18. Stockmeyer, L.J., Meyer, A.R.: Word Problems Requiring Exponential Time (Pre-
    liminary Report). In: Aho, A.V., Borodin, A., Constable, R.L., Floyd, R.W., Harri-
    son, M.A., Karp, R.M., Strong, H.R. (eds.) Proceedings of the 5th Annual ACM
    Symposium on Theory of Computing. pp. 1–9. ACM (1973)
19. Straubing, H.: Tiling Problems. http://www.cs.bc.edu/~straubin/cs385-07/
    tiling, accessed: 2018-09-03
20. Tarasov, S.P., Vyalyi, M.N.: Orbits of Linear Maps and Regular Languages. In: Ku-
    likov, A.S., Vereshchagin, N.K. (eds.) Computer Science - Theory and Applications
    - 6th International Computer Science Symposium in Russia, CSR 2011. Lecture
    Notes in Computer Science, vol. 6651, pp. 305–316. Springer (2011)
21. Vyalyi, M.N.: On Regular Realizability Problems. Problems of Information Trans-
    mission 47(4), 342–352 (2011)
22. Vyalyi, M.N.: On Expressive Power of Regular Realizability Problems. Problems of
    Information Transmission 49(3), 276–291 (2013)
23. Vyalyi, M.N., Rubtsov, A.A.: On Regular Realizability Problems for Context-Free
    Languages. Problems of Information Transmission 51(4), 349–360 (2015)
24. Wolf, P.: Decidability of the Regular Intersection Emptiness Problem. Master’s
    thesis, Wilhelm Schickhard Institut für Informatik, Universität Tübingen, Sand 13,
    D 72076 Tübingen, Germany (2018)
25. Wolf, P.: From Decidability to Undecidability by Considering Regular Sets of
    Instances. CoRR abs/1906.08027 (2019), http://arxiv.org/abs/1906.08027
26. Wolf, P.: On the Decidability of Finding a Positive ILP-Instance in a Regular
    Set of ILP-Instances. In: Hospodár, M., Jirásková, G., Konstantinidis, S. (eds.)
    Descriptional Complexity of Formal Systems - 21st IFIP WG 1.02 International
    Conference, DCFS 2019, Košice, Slovakia, July 17-19, 2019, Proceedings. Lecture
    Notes in Computer Science, vol. 11612, pp. 272–284. Springer (2019)
27. Wolf, P., Fernau, H.: Regular Intersection Emptiness of Graph Problems: Finding a
    Needle in a Haystack of Graphs with the Help of Automata. CoRR abs/2003.05826
    (2020)