<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fractal grammars which recover from perturbations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Whitney Tabor (whitney.tabor@uconn.edu)</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Psychology, University of Connecticut Storrs</institution>
          ,
          <addr-line>CT 06269-1020</addr-line>
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Neural symbolic integration may be a natural phenomenon of dynamical systems. Attractors-subsets of a state space to which a dynamical system returns when perturbed-are a broadly relevant dynamical systems phenomenon. The mathematical theory has mainly focused on autonomous dynamical systems (i.e., not driven by an environment) of the form f : X ! X (where x(t + 1) = f (x(t)) [iterated map] or ddxt = f (x) [differential equation]), and discovered a rich inventory of attractors, including stable fixed points, limit cycles, and chaos. Here, I focus on the iterated map case and consider certain nonautonomous dynamical systems characterized by a finite set of functions f1; f2; :::; fk : X ! X and a language on alphabet = f1; : : : ; kg of one-sided infinite strings which applies the functions in particular orders starting from a specified initial state x(0) in X. I extend the definition of attractor by considering cases where the system returns to an invariant proper subset when perturbed in the environment of the language. The news of this paper is that there is a class of nonautonomous dynamical systems that have attractors for mirror recursion languages, a type of language arguably central to natural language syntax.</p>
      </abstract>
      <kwd-group>
        <kwd>dynamical systems theory</kwd>
        <kwd>attractors</kwd>
        <kwd>asymptotic stability</kwd>
        <kwd>grammar</kwd>
        <kwd>context free languages</kwd>
        <kwd>mirror recursion languages</kwd>
        <kwd>neural-symbolic integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        This paper approaches neural symbolic integration by interpreting certain dynamical systems, which
can be implemented in neural networks, as symbol processors. It uses insights from the classical
theory of computation (Chomsky Hierarchy) to explore and categorize the behavior of these models,
thus helping to relate them to previous, symbolically oriented work in cognitive science, especially
that on natural language syntax. It begins with a brief review of methods for symbol processing
using discrete-update recurrent neural networks, making a transition in the process, from a perspective
in terms of neural information processing to a perspective in terms of dynamical systems theory.
This leads to the question of stability—here, by ”stability”, I mean the ability of a network
processing a complex language to get back on track if something throws it off. The paper proposes a
precise definition of stability, suitable to complex language processing, and shows that at least one
interesting class of dynamical systems for language processing possesses this property. The paper
concludes with some remarks on implications for sentence processing research, dynamical systems
research, and the project of neural-symbolic integration.
        <xref ref-type="bibr" rid="ref9">Elman (1991)</xref>
        found that a discrete update recurrent neural network (the “Simple Recurrent
Network”) trained on sequences of symbols encoded as indexical bit vectors learned to keep track of
English-like center-embedding dependencies, suggesting that the network might be able to model
the phrase-structure foundation that many linguists posit for natural language
        <xref ref-type="bibr" rid="ref10 ref4">(Chomsky, 1957;
Gazdar, 1981)</xref>
        . This work indicates a path to neural-symbolic integration, but only suggestively because
the structure of the model’s computation was only observed impressionistically. A series of
related projects
        <xref ref-type="bibr" rid="ref15 ref16 ref17 ref18 ref19 ref2 ref26">(Bode´n &amp; Wiles, 2000; Pollack, 1987; Rodriguez, 2001; Siegelmann &amp; Sontag, 1991;
Siegelmann, 1999; Tabor, 2000; Wiles &amp; Elman, 1995)</xref>
        ask how networks of recurrently connected
sigmoidal units can precisely process infinite state languages of various kinds. Indeed, the range of
possible computations in networks is great, including all of the Chomsky Hierarchy
        <xref ref-type="bibr" rid="ref18">(Siegelmann &amp;
Sontag, 1991)</xref>
        . These projects all refer to a common principle: the network can use a fractal subset
of its state space to implement a stack or tape memory. A fractal is a set that is made up of smaller
replicas of itself
        <xref ref-type="bibr" rid="ref13">(Mandelbrot, 1977)</xref>
        . A key insight of all these projects is that the spatial recursive
structure of a fractal can be used to keep track of the temporal recursive structure of a complex
language.
      </p>
      <p>
        Recurrent neural networks are instances of the type of feedback systems standardly studied in
dynamical systems theory (the mathematical theory concerned with systems characterized in terms of
how they change). Dynamical systems seem to have a precise relationship to grammars: (a)
“symbolic dynamics”, a method of treating a dynamical system on a connected space as discrete symbol
processor has been very useful in characterizing chaos
        <xref ref-type="bibr" rid="ref8">(e.g., Devaney, 1989)</xref>
        , an important
dynamical phenomenon; (b) an indexed context free language gives the topology of a key trajectory of the
logistic map, a rich and much-studied dynamical system
        <xref ref-type="bibr" rid="ref7">(Crutchfield &amp; Young, 1990)</xref>
        ; (c)
dynamical systems construed as symbol processors in various ways, exhibit a rich range of computational
types, including finite languages, finite state languages, context free languages, queue-based
languages, Turing Machines, and non-computable languages
        <xref ref-type="bibr" rid="ref14 ref17 ref22">(Moore, 1998; Siegelmann, 1999; Tabor,
2009)</xref>
        . In all the cases involving computable infinite state languages, a fractal subset of the state
space supports recursive temporal computation.
      </p>
      <p>
        Generally, when an iterated map dynamical system corresponds to a particular discrete-update
neural network for recursive processing, the state space of the dynamical system corresponds to the
recurrently-connected activation space of the network; the control parameters of the dynamical
system correspond to the weights of the network; the parameterization of the dynamics associated with
a particular symbol corresponds to the activation of a particular unit corresponding to the symbol,
which has first-order or second-order connections to the recurrently connected units, thus specifying
the update behavior of those units; the branches of the fractal are typically associated with
different classes of symbols; the dynamical system may have an associated (finite) partition of the state
space which specifies which maps can be applied from which states; correspondingly, the network
may have one or more classifier layers outside the recurrent dynamics which map the recurrent state
space to next-symbol options.
        <xref ref-type="bibr" rid="ref16 ref17 ref19 ref21 ref23 ref3">(Bode´n &amp; Wiles, 2002; Rodriguez, 2001; Siegelmann, 1999; Tabor,
2000, 2003, 2011, e.g.,)</xref>
        1.2
      </p>
      <sec id="sec-1-1">
        <title>An issue: stability</title>
        <p>The results just discussed point to the rich computational capability of neural networks and related
dynamical systems. However, this computational capability is not very helpful if it is unstable—that
is if disturbance of the system through noise or other forces not tied to the computation easily cause
the system to stop computing successfully on future inputs. There are at least two senses of stability
that one might be concerned with: (i) stability with respect to changes in the state variables—e.g.,
perturbation of the activation states of the neurons in a neural network, and (ii) stability with respect
to changes in the parameter variables—e.g., perturbation of the weights of a neural network. When
small parameter perturbations do not change the topology of the dynamics, the system is said to
exhibit structural stability. Here we focus on state variable stability, noting that structural stability
is an additional property of interest which deserves further investigation (see Conclusions). If a
system is a stable language processor, then it should be able to recover from forces that temporarily
knock it off the track of grammatical processing. Evidence from the sentence processing literature
suggests that when people are disturbed in the course of processing a sentence (e.g., by a garden path
event, or a confusion due to interference) they exhibit a disturbance which lasts beyond the word
that causes the disturbance (“spill-over effects”), but typically only for a few words, with evidence
for resolution of the disturbance occurring at the end of the sentence (“sentence wrap-up effects”).
This suggests that human minds processing language have “asymptotic stability”—as processing
progresses following a disturbance, the system returns to normal processing.
Motivated by these observations, we establish, in the next section, a definition of stability for
symbol-driven dynamical automata.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Back in Kansas Stability</title>
      <p>In classical formal grammar theory, languages are sets of finite-length strings. In the present work,
we consider languages to be sets of one-sided infinite length strings. This simplifies the formalism.
We can assign each language on the Chomsky Hierarchy a place in this framework by
considering, for language L, the set of all one-sided infinite concatenations of strings from L, which we
denote L1, thinking of this as a model of all the sentences that a speaker hears and/or produces in
succession in their lifetime.</p>
      <p>
        Def. An iterated function system
        <xref ref-type="bibr" rid="ref1 ref11">(Barnsley, [1988]1993; Hutchinson, 1981)</xref>
        is a (finite) set of
functions IF S = ff1; : : : ; fkg that map from a space X to itself. We assume that X is a complete
metric space with metric d.
      </p>
      <p>Def. A one-sided-infinite-string language, L, is a set of one-sided infinite strings drawn from a finite
alphabet, = f1; : : : ; kg. We assume that for every member j of , there is some string of L which
contains k.</p>
      <p>For x 2 X, and S = 1 2 : : : N a finite string of symbols drawn from
IF SS (x) to denote the function f 1 (f 2 (: : : (f N (x)) : : :)).
, we use the notation
Def. Consider a point x(0) in X. The labeling, Labx(0), of IF S driven by L is a function from
points in X to the power set of , such that j 2 Labx(0)(x) iff there is a finite initial substring, S,
of some string in L such that IF SS (x(0)) = x and the string formed by adding j to the end of S is
also an initial substring of some string in L.</p>
      <p>Def. Labx(0) is said to process a symbol j at a point x iff j 2 Labx(0)(x) and the system moves from
x to fj (x). A labelling Labx(0) under IF S is said to process a string S iff starting at x(0), every
symbol of S is processed in sequence. The language of Labx(0) under IF S is the set of one-sided
infinite strings processed by Labx(0) under IF S. We say that an IF S L systems is on Labx(0)
when it visits a point x if all continuations of the current string under L are processed by Labx(0).
Def. Consider x 2 X and j 2 . The distance d((x; j); Labx(0)) from the ordered pair (x; j) to the
labeling, Labx(0), is inf fd(x; y)) : y 2 X and j 2 Labx(0)(y)g.</p>
      <p>In other words, assuming a standard (infimum-based) definition of point-set distance, the distance of
a point-symbol ordered pair, (x; j), to a labeling is the distance from x to the set of points at which
j can come next under the labelling.</p>
      <p>Def. A sequence of ordered point-symbol pairs (x(n); n) for n = f0; 1; 2; : : :g is said to
converge to a labeling Labx(0) if, for every , there exists N such that M &gt; N implies
d((x(M ); M ); Labx(0)) &lt; .</p>
      <p>Def. Consider an IF S on a complete metric space X with functions = f1; : : : ; kg and a
onesided infinite string language L on . For initial state, x(0), consider the labeling Labx(0) induced
by L. Consider the sequence of point-symbol pairs P SPS = f(x(n); n); n = 0; 1; : : :g induced
by a member, S = 1 2 : : :, of L (i.e., x((n + 1)) = f n (x(n)) for all n). If, when the system is
perturbed within a radius of x(n) for any (x(n); n) 2 P SPS and then driven henceforth by the
remaining symbols of S, it converges to Labx(0), then it is said to exhibit Back in Kansas Stability
from x(0) when driven by S. If there is a such that all futures from x(0) converge when perturbed
once within , then the system is said to exhibit Back In Kansas stability from x(0). We say, in such
a case, that points of Labx(0) with nonempty labeling, along with their labels, are an attractor of
the IF S-L.</p>
      <p>The idea of Back in Kansas Stability is that the system is expected to recover its rhythm, not under
the influence of some external signal, but rather simply through exposure to sufficient material from
the familiar language. We adopt this approach because, as noted above, studies of sentence
processing provide evidence that recovery from disturbance in sentence processing often takes place across
multiple words and seems to involve a convergence process.</p>
    </sec>
    <sec id="sec-3">
      <title>Lack of Back in Kansas Stability in existing fractal models</title>
      <p>
        We next offer some demonstrations that existing fractal computers lack Back in Kansas Stability.
As noted above,
        <xref ref-type="bibr" rid="ref14">Moore (1998)</xref>
        describes a one-dimensional dynamical automaton that implements
a stack memory for recognizing context free languages. For a stack alphabet of k symbols, Moore’s
system uses a Cantor-set with k 1 gaps between the fractal branches. Pushing and popping are
accomplished by
pushi(x)
popi(x)
=
=
x + (1
      </p>
      <p>) ki
pushi 1(x)
where 1 i k is the symbol being pushed or popped and 0 &lt; &lt; 2k1+1 is a constant, and x(0) =
1. This system has the property that grammatical processing is restricted to the interval [0, 1], but
an erroneous stack operation (e.g., popj (pushi(x)) where j 6= i) results in jxj &gt; 1 (Moore’s system
uses this property to detect ungrammatical transitions). Because push and pop are inverses across
the entire state space and grammatical processing (to empty stack) implements a corresponding pop
for every push, the displacement created by any error persists no matter how many grammatical
sentences follow the error (hence the system does not have Back in Kansas Stability). For example,
if one implements S ! , S ! 1 S 2, S ! 3 S 4 with a two-symbol stack alphabet, choosing
= 17 &lt; 2 21+1 , then the once-erroneous sequence 13121212 : : : results in an endless cycle between
-2 and 0.1429—that is, the system never returns to Labx(0), and, the magnitude of the distance
between the state and Labx(0) endlessly visits a positive constant.</p>
      <p>
        Similarly,
        <xref ref-type="bibr" rid="ref19">Tabor (2000)</xref>
        describes a fractal grammar which processes the non-finite-state context free
language, S ! A B C D, A ! a (S), B ! b (S), C ! c (S), D ! d (S). The model’s state space is
R2 with initial state (1=2; 1=2) and the IFS is given by
fa(x)
fb(x)
fc(x)
fd(x)
=
=
=
=
x2 + 1=2
      </p>
      <p>0
x 1=02
x + 1=02
2 x
0
1=2
The points with nonempty labels in Labx(0) form a ”Sierpinski Gasket”, a kind of two-dimensional
Cantor Set. In this system, as in Moore’s, all the functions are affine, and the map fd fc fb fa(x)
is identity across the whole state space. Since the language always completes every abcd sequence
that is begun, this system, like Moore’s, repeatedly revisits the magnitude of any displacement from
Labx(0), independently of the value of x(0).</p>
      <p>These systems have a form of stability generally recognized in dynamical systems theory—there
is a bound on how far the system travels away from the invariant set of interest—but they lack the
asymptotic stability that seems to characterize human behavior. The lack of asymptotic stability
is related to the fact that transitions from empty stack to empty stack implement identity across
the state space. But identity is only required for grammatical processing, which, in these cases, is
restricted to a proper subset of the state space. It will not disturb grammatical processing to modify
the map in locations away from this manifold. In the next section, we show, for an important class
of languages, a simple way of modifying the off-manifold maps, so that the system gets back onto
the manifold when it has been knocked off, provided it receives exposure to enough grammatical
material following perturbation.
4</p>
      <p>
        The Simplest Context Free Language, 1n2n
We begin with the simplest context free language, 1n2n, and then generalize to all mirror recursion
languages, Lmirror- , of the form, S ! ; S ! 11 S 12; S ! 21 S 22; : : : ; S ! 1 S 2.
Mirror recursion seems to capture the gist of center-embedding structures in languages around the
(1)
(2)
world.
        <xref ref-type="bibr" rid="ref6">Corballis (2007)</xref>
        argues that mirror recursion is the crucial recursive behavior that
distinguishes human languages from animal languages. Although it is true that no human language
exhibits center embedding easily beyond one center-embedded clause, the degradation, as one tests
successively deeper levels of embedding appears to be graceful
        <xref ref-type="bibr" rid="ref12">(Lewis, 1996)</xref>
        , and
        <xref ref-type="bibr" rid="ref5">Christiansen &amp;
Chater (1999)</xref>
        argue that a Simple Recurrent Network captures this quality of the human language
case well. Moreover,
        <xref ref-type="bibr" rid="ref16">Rodriguez (2001)</xref>
        and
        <xref ref-type="bibr" rid="ref24">Tabor et al. (2013)</xref>
        provide evidence that Simple
Recurrent Networks trained on center-embedded material are approximating fractal grammars. These
observations motivate focusing on the mirror recursion case in the effort to understand how fractal
grammars can be stable.
      </p>
      <p>Let L1n2n be the one-sided-infinite-string language, f1n2ng1. Let X = [ 2; 2] and let IF S1-d be
given by
Note that the 2-map is not affine but approximately quadratic. Figure 1 shows the IF S along
with Lab0 and illustrates recovery from a perturbation. In fact, this system always recovers from
perturbation.</p>
      <p>Thm 1. The system IF S1-d-L1n2n processes L1n2n from x(0) = 0 and is Back in Kansas stable
from that point.</p>
      <p>The proof (see Appendix 1) first demonstrates that the language L1n2n is processed by IF S1-d from
x(0) = 0. Then it shows that, starting from any point in the state space, if the system receives
the tail of any string from f1n2ng1, it will be back on Lab0 within two sentences (“finite-time”
convergence). This is a much stronger outcome than is required for Back in Kansas stability. The
rapid convergence stems from fact that the horizontal section of the 2-map resets the system when
it lands between -1 and 1. We have adopted a broad definition here, allowing also non-finite-time
convergence, because simulation experiments which we will not discuss further here suggest that
similar convergence behavior occurs in systems with a single maximum, rather than a plateau, except
that the convergence takes infinite time.
5</p>
    </sec>
    <sec id="sec-4">
      <title>General Mirror Recursion</title>
      <p>For 2 symbols, let x(0) be the origin in R +1. Define IF S( +1)-d for
by
= 1; 2; : : : ; i 2 f1; : : : ; g
fi(x)
f2i(x)
1 &lt; x1 &lt;</p>
      <p>1
0 x1=2 + 1 1</p>
      <p>x2=2</p>
      <p>BB : : : CC
= BBBB xi+1=2xi=21 CCCC</p>
      <p>BB xi+2=2 CC
@ : : : A</p>
      <p>x( + 1)=2
0 2x1 + 2 1</p>
      <p>2x2</p>
      <p>BB : : : CC
= BB 2xi CC</p>
      <p>BB 2xi+1 + 2 CC
BB 2xi+2 CC
@ : : : A
2x( + 1)
1
Here, the -1 in the state change from the middle region for fi(x) is on dimension i + 1.
Thm 2. Each system in the class fIF S( +1)-d-LMirror- g for
stable from x(0) = O, the origin in R +1.</p>
      <p>Appendix 2 sketches the proof of Theorem 2.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        2 f1; 2; : : :g is Back in Kansas
This paper has defined Back in Kansas Stability for nonautonomous dynamical systems, consisting
of functions f1; : : : ; fk : X ! X on a complete metric space, driven by a language L on =
f1; 2; : : : ; kg. The definition was motivated by the fact that people who undergo a disturbance
when processing complex natural language structures seem to recover naturally during the course of
processing additional words. This idea makes a prediction that many other parsing models—those
that tie parsing strictly to the information content of received words—do not make: recovery from
a garden path might be helped by words following the disambiguating word even if these words
provide no additional structural information; some evidence in support of this idea comes from
certain types of length effects in sentence processing where long disambiguating regions are easier
than short ones—see
        <xref ref-type="bibr" rid="ref25">Tabor &amp; Hutchins (2004)</xref>
        .
      </p>
      <p>Another motivation came from the fact that stability is an important organizing principle of
dynamical systems broadly, so when working with neural network dynamical systems, it is desirable to
characterize their stability. The mirror recursion finding prompts a more specific observation: in the
case of iterated maps, the well-known attractors of autonomous dynamical systems are distinguished
by their cardinalities—a fixed point is a single point, a limit cycle contains a finite number of points,
a chaotic attractor has uncountably many points. The case of mirror recursion with Back in Kansas
Stability fills in this picture by identifying countably infinite attractive sets. It may be informative
to ask what conditions support countably infinite attractors and why they are not prominent in the
study of autonomous dynamical systems.
6.1</p>
      <sec id="sec-5-1">
        <title>Future Work</title>
        <p>
          We noted above that a dynamical system is considered structurally stable at a point in its parameter
space if small changes in the parameter values do not affect the topology of the system. Structural
stability seems, in one way, desirable for neural networks, when one is interested in learning: the
generally widely successful approach of gradient descent learning is likely to do badly in a context
where small parameter changes radically alter the system structure. Results reported in
          <xref ref-type="bibr" rid="ref22">Tabor (2009)</xref>
          for fractal grammars suggest, possibly unfortunately, that these types of systems are not structurally
stable in the vicinity of points where they model complex languages. Nevertheless, an interesting
alternative perspective may be worth considering: human languages seem to change structurally in
a continuous way (see discussion in
          <xref ref-type="bibr" rid="ref24">Tabor et al. (2013)</xref>
          ), at least when they change historically, and
possibly also when they are learned. It may be useful to invoke, in place of topological equivalence,
a notion of structural continuity—two dynamical systems are considered structurally proximal if
differences in their topology take a long time to detect. If there is structural continuity, gradient
based learning may still be feasible.
        </p>
        <p>What light does this work shed on the challenge of neural symbolic integration? Broadly, it suggests
studying neural symbolic integration by studying dynamical systems more generally. Specifically,
the results on Back in Kansas Stable point to the fact that not all dynamical computers are stable in
this sense; it also raises the question whether all computations can be stably implemented. Stability
is very likely a property of human language systems since they have to tolerate a great deal of
informational and physical buffeting. It may therefore be useful for the field of neural symbolic
integration to identify, both from a dynamics point of view and a language point of view, what
systems can be stable.
7</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Appendix 1: Proof of Thm. 1</title>
      <p>We first show that Lab0 under IF S1 d processes L1n2n (Part A). Then we show that the system is
Back in Kansas Stable from x(0) = 0 (Part B).</p>
      <sec id="sec-6-1">
        <title>Part A</title>
      </sec>
      <sec id="sec-6-2">
        <title>Part B.</title>
        <p>By definition, every string of L is processed by Lab0 under IF S1 d. Regarding strings of Lab0
under IF S1 d, note that 1’s only ever occur if x 0. If the system starts at 0 and experiences a
sequence of the form 1j 2i where 0 &lt; i &lt; j, then x &lt; 0. Therefore, substrings of the form 21j 2i1,
0 i &lt; j are not processed by Lab0. If the system starts at 0 and experiences a sequence of the
form 1j 2j where 0 j, then the system is at x = 0, where a 2 never occurs (because, under L,
balanced 1s and 2s are always followed by a 1 and 0 is never reached except via balanced 1s and
2s). Therefore substrings of the form 21j 2i1 where i &gt; j &gt; 0 never occur. Therefore the strings
processed by Lab0 under IF S1 d are all and only the strings of L1n2n .</p>
        <p>We wish to show that under perturbation up to radius r followed by grammatical continuation to
infinity, IF S returns to Labx(0) in the limit. In fact, in this case, under any perturbation, followed
by at most one complete string of the form IF S reinhabits Labx(0).</p>
        <p>First, it is useful to define some terminology. We refer to the string in which the perturbation occurs
as the perturbation string. We refer to the interval h = [ 1; 1], where the function b has a fixed
value, as the flat interval. Note that when h is in the flat interval, 2k(h) = 0 for any k 0. Note also
that if the system is in a state hp &lt; 0 and it processes a string of the form 1n2n, then it will arrive
at 0 when it processes the last 2 (this follows from the facts (i) that, at every symbol, 1n2n(hp) is
sandwiched between 1n2n(h0) and 0 and (ii) 1n2n(h0) = 0.) Whenever the state of the system is
thus sandwiched, we say that the system is ahead with respect to grammatical processing.
Turning now to the proof of B, there are two cases:
(i) The perturbation occurs at some time before the last 1 in 1j 2j for some j
0.</p>
        <p>Within this case, it is useful to consider two subcases:
(i-1) The perturbation decreases h. In this case, during processing of the perturbation string, the
system is ahead, so it arrives at 0 by the last 2 and is therefore on Labx(0).
(i-2) The perturbation increases h. In this case, the system is in [-2, -1] at the end of the string.
Therefore, the system is ahead during the processing of the subsequent string. Consequently, it
reaches Labx(0) by the end of the post-perturbation string.
(ii) The perturbation occurs after the last 1 and before the last 2 in 1j 2j for some j
0.</p>
        <p>Again, we consider two cases:
(ii-1) The perturbation decreases h. In this case, the system is still in [-2, -1] at the end of the string
and the future states follow the pattern of (i-2) above.
(ii-2) The perturbation increases h. If, after the increase, h &lt; 1, then the remaining 2’s bring the
system to the flat interval and the case is like (i-1) above. If, after the increase, 1 h 1, then the
remaining 2’s keep the system at 0 and the system is on the attractor at the start of the next string. If,
after the increase, h &gt; 1, then the system is in [-2, -1] at the end of the string and the future follows
(i-2) above.</p>
        <p>Thus, in all cases, the system returns to Labx(0) at least by the end of the post-perturbation string.
8</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Appendix 2: Sketch of Proof of Thm. 2</title>
      <p>The proof of this theorem builds on the proof of Theorem 1. The system behaves on the first
dimension as if the string were 1n2n with all the push symbols invoking 1 and all the pop symbols
invoking 2, following the same dynamics as in the 1-dimensional case above. Whenever the system
is in the flat region of the pop maps and in the second half of a sentence, dimension 1 becomes 0 and
stays there until the end of the sentence. Then, the start of the new sentence resets all the dimensions
to the appropriate values so the effect of perturbation on any dimension is removed. By Thm. 1, this
will always happen within two sentences of a perturbation.</p>
      <p>
        Furthermore, when the system, initialized to x(0) = 0, is unperturbed, f2i(x) fi(x) implements
identity for visited points, x and every stack state corresponds to a distinct point in X because the
push maps on X form a just-touching fractal
        <xref ref-type="bibr" rid="ref1">(Barnsley, [1988]1993)</xref>
        so the unperturbed system in
k + 1 dimensions processes LMirror k.
      </p>
      <sec id="sec-7-1">
        <title>Acknowledgments</title>
        <p>Thanks to Harry Dankowicz and Garrett Smith for comments on an earlier version of this work. We
gratefully acknowledge support from NSF PAC grant 1059662 and NSF INSPIRE 1246920.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Barnsley</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ([
          <year>1988</year>
          ]
          <year>1993</year>
          ).
          <article-title>Fractals everywhere</article-title>
          , 2nd ed. Boston: Academic Press.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Bode´n</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Context-free and context sensitive dynamics in recurrent neural networks</article-title>
          .
          <source>Connection Science</source>
          ,
          <volume>12</volume>
          (
          <issue>3</issue>
          ),
          <fpage>197</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bode´n</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Wiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>On learning context-free and context-sensitive languages</article-title>
          .
          <source>IEEE Transactions on Neural Networks</source>
          ,
          <volume>13</volume>
          (
          <issue>2</issue>
          ),
          <fpage>491</fpage>
          -
          <lpage>493</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Chomsky</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>1957</year>
          ).
          <article-title>Syntactic structures</article-title>
          .
          <source>The Hague: Mouton and Co.</source>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Christiansen</surname>
            ,
            <given-names>M. H.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Chater</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Toward a connectionist model of recursion in human linguistic performance</article-title>
          .
          <source>Cognitive Science</source>
          ,
          <volume>23</volume>
          ,
          <fpage>157</fpage>
          -
          <lpage>205</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Corballis</surname>
            ,
            <given-names>M. C.</given-names>
          </string-name>
          (
          <year>2007</year>
          ).
          <article-title>Recursion, language, and starlings</article-title>
          .
          <source>Cognitive Science</source>
          ,
          <volume>31</volume>
          ,
          <fpage>697</fpage>
          -
          <lpage>704</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Crutchfield</surname>
            ,
            <given-names>J. P.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Young</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          (
          <year>1990</year>
          ).
          <article-title>Computation at the onset of chaos</article-title>
          . In W. H.
          <string-name>
            <surname>Zurek</surname>
          </string-name>
          (Ed.), Complexity, entropy, and
          <source>the physics of information</source>
          (pp.
          <fpage>223</fpage>
          -
          <lpage>70</lpage>
          ). Redwood City, California: Addison-Wesley.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Devaney</surname>
            ,
            <given-names>R. L.</given-names>
          </string-name>
          (
          <year>1989</year>
          ).
          <article-title>An introduction to chaotic dynamical systems</article-title>
          , 2nd ed. Redwood City, CA: Addison-Wesley.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Elman</surname>
            ,
            <given-names>J. L.</given-names>
          </string-name>
          (
          <year>1991</year>
          ).
          <article-title>Distributed representations, simple recurrent networks, and grammatical structure</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>7</volume>
          ,
          <fpage>195</fpage>
          -
          <lpage>225</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Gazdar</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>1981</year>
          ).
          <article-title>On syntactic categories</article-title>
          .
          <source>Philosophical Transactions (Series B) of the Royal Society</source>
          ,
          <volume>295</volume>
          ,
          <fpage>267</fpage>
          -
          <lpage>83</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Hutchinson</surname>
            ,
            <given-names>J. E.</given-names>
          </string-name>
          (
          <year>1981</year>
          ).
          <article-title>Fractals and self similarity</article-title>
          . Indiana University Mathematics Journal,
          <volume>30</volume>
          (
          <issue>5</issue>
          ),
          <fpage>713</fpage>
          -
          <lpage>747</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>1996</year>
          ).
          <article-title>Interference in short-term memory: The magical number two (or three) in sentence processing</article-title>
          .
          <source>Journal of Psycholinguistic Research</source>
          ,
          <volume>25</volume>
          (
          <issue>1</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Mandelbrot</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>1977</year>
          ).
          <article-title>Fractals, form, chance, and dimension</article-title>
          . San Francisco: Freeman.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>1998</year>
          ).
          <article-title>Dynamical recognizers: Real-time language recognition by analog computers</article-title>
          .
          <source>Theoretical Computer Science</source>
          ,
          <volume>201</volume>
          ,
          <fpage>99</fpage>
          -
          <lpage>136</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <surname>Pollack</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1987</year>
          ).
          <article-title>On connectionist models of natural language processing</article-title>
          .
          <source>(Unpublished doctoral dissertation</source>
          , University of Illinois.)
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Rodriguez</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2001</year>
          ).
          <article-title>Simple recurrent networks learn context-free and context-sensitive languages by counting</article-title>
          .
          <source>Neural Computation</source>
          ,
          <volume>13</volume>
          (
          <issue>9</issue>
          ),
          <fpage>2093</fpage>
          -
          <lpage>2118</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Siegelmann</surname>
            ,
            <given-names>H. T.</given-names>
          </string-name>
          (
          <year>1999</year>
          ).
          <article-title>Neural networks and analog computation: Beyond the turing limit</article-title>
          . Boston: Birkha¨user.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <surname>Siegelmann</surname>
            ,
            <given-names>H. T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sontag</surname>
            ,
            <given-names>E. D.</given-names>
          </string-name>
          (
          <year>1991</year>
          ).
          <article-title>Turing computability with neural nets</article-title>
          .
          <source>Applied Mathematics Letters</source>
          ,
          <volume>4</volume>
          (
          <issue>6</issue>
          ),
          <fpage>77</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <surname>Tabor</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2000</year>
          ).
          <article-title>Fractal encoding of context-free grammars in connectionist networks</article-title>
          .
          <source>Expert Systems: The International Journal of Knowledge Engineering and Neural Networks</source>
          ,
          <volume>17</volume>
          (
          <issue>1</issue>
          ),
          <fpage>41</fpage>
          -
          <lpage>56</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <surname>Tabor</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2002</year>
          ).
          <article-title>The value of symbolic computation</article-title>
          .
          <source>Ecological Psychology</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          /2),
          <fpage>21</fpage>
          -
          <lpage>52</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <surname>Tabor</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2003</year>
          ).
          <article-title>Learning exponential state growth languages by hill climbing</article-title>
          .
          <source>IEEE Transactions on Neural Networks</source>
          ,
          <volume>14</volume>
          (
          <issue>2</issue>
          ),
          <fpage>444</fpage>
          -
          <lpage>446</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <surname>Tabor</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2009</year>
          ).
          <article-title>A dynamical systems perspective on the relationship between symbolic and non-symbolic computation</article-title>
          .
          <source>Cognitive Neurodynamics</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ),
          <fpage>415</fpage>
          -
          <lpage>427</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>Tabor</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Recursion and recursion-like structure in ensembles of neural elements</article-title>
          . In H. Sayama,
          <string-name>
            <given-names>A.</given-names>
            <surname>Minai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Braha</surname>
          </string-name>
          , &amp; Y.
          <string-name>
            <surname>Bar-Yam</surname>
          </string-name>
          (Eds.),
          <article-title>Unifying themes in complex systems</article-title>
          .
          <source>proceedings of the viii international conference on complex systems</source>
          (p.
          <fpage>1494</fpage>
          -
          <lpage>1508</lpage>
          ). Cambridge, MA: New England Complex Systems Institute. (http//necsi.edu/events/iccs2011/proceedings.html)
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>Tabor</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
            ,
            <given-names>P. W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Szkudlarek</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2013</year>
          ).
          <article-title>Fractal analysis illuminates the form of connectionist structural gradualness</article-title>
          .
          <source>Topics in Cognitive Science</source>
          ,
          <volume>5</volume>
          ,
          <fpage>634</fpage>
          -
          <lpage>667</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Tabor</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hutchins</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2004</year>
          ).
          <article-title>Evidence for self-organized sentence processing: Digging in effects</article-title>
          .
          <source>Journal of Experimental Psychology: Learning, Memory, and Cognition</source>
          ,
          <volume>30</volume>
          (
          <issue>2</issue>
          ),
          <fpage>431</fpage>
          -
          <lpage>450</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>Wiles</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Elman</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>1995</year>
          ).
          <article-title>Landscapes in recurrent networks</article-title>
          . In J. D.
          <string-name>
            <surname>Moore</surname>
          </string-name>
          &amp;
          <string-name>
            <surname>J. F. Lehman</surname>
          </string-name>
          (Eds.),
          <source>Proceedings of the 17th annual cognitive science conference</source>
          . Lawrence Erlbaum Associates.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>