=Paper=
{{Paper
|id=Vol-1583/CoCoNIPS_2015_paper_15
|storemode=property
|title=Fractal Grammars which Recover from Perturbations
|pdfUrl=https://ceur-ws.org/Vol-1583/CoCoNIPS_2015_paper_15.pdf
|volume=Vol-1583
|authors=Whitney Tabor
|dblpUrl=https://dblp.org/rec/conf/nips/Tabor15
}}
==Fractal Grammars which Recover from Perturbations==
Fractal grammars which recover from perturbations Whitney Tabor (whitney.tabor@uconn.edu) Department of Psychology, University of Connecticut Storrs, CT 06269-1020 USA Abstract Neural symbolic integration may be a natural phenomenon of dynamical systems. Attractors—subsets of a state space to which a dynamical system returns when perturbed—are a broadly relevant dynamical systems phenomenon. The mathe- matical theory has mainly focused on autonomous dynamical systems (i.e., not driven by an environment) of the form f : X → X (where x(t + 1) = f (x(t)) [it- erated map] or dxdt = f (x) [differential equation]), and discovered a rich inventory of attractors, including stable fixed points, limit cycles, and chaos. Here, I focus on the iterated map case and consider certain nonautonomous dynamical systems characterized by a finite set of functions f1 , f2 , ..., fk : X → X and a language on alphabet Σ = {1, . . . , k} of one-sided infinite strings which applies the functions in particular orders starting from a specified initial state x(0) in X. I extend the definition of attractor by considering cases where the system returns to an invari- ant proper subset when perturbed in the environment of the language. The news of this paper is that there is a class of nonautonomous dynamical systems that have attractors for mirror recursion languages, a type of language arguably central to natural language syntax. Keywords: dynamical systems theory, attractors, asymptotic stability, grammar, context free languages, mirror recursion languages; neural-symbolic integration 1 Introduction This paper approaches neural symbolic integration by interpreting certain dynamical systems, which can be implemented in neural networks, as symbol processors. It uses insights from the classical theory of computation (Chomsky Hierarchy) to explore and categorize the behavior of these models, thus helping to relate them to previous, symbolically oriented work in cognitive science, especially that on natural language syntax. It begins with a brief review of methods for symbol processing us- ing discrete-update recurrent neural networks, making a transition in the process, from a perspective in terms of neural information processing to a perspective in terms of dynamical systems theory. This leads to the question of stability—here, by ”stability”, I mean the ability of a network pro- cessing a complex language to get back on track if something throws it off. The paper proposes a precise definition of stability, suitable to complex language processing, and shows that at least one interesting class of dynamical systems for language processing possesses this property. The paper concludes with some remarks on implications for sentence processing research, dynamical systems research, and the project of neural-symbolic integration. 1.1 Elman Net and Subsequent Work Elman (1991) found that a discrete update recurrent neural network (the “Simple Recurrent Net- work”) trained on sequences of symbols encoded as indexical bit vectors learned to keep track of English-like center-embedding dependencies, suggesting that the network might be able to model 1 Copyright © 2015 for this paper by its authors. Copying permitted for private and academic purposes. the phrase-structure foundation that many linguists posit for natural language (Chomsky, 1957; Gaz- dar, 1981). This work indicates a path to neural-symbolic integration, but only suggestively because the structure of the model’s computation was only observed impressionistically. A series of re- lated projects (Bodén & Wiles, 2000; Pollack, 1987; Rodriguez, 2001; Siegelmann & Sontag, 1991; Siegelmann, 1999; Tabor, 2000; Wiles & Elman, 1995) ask how networks of recurrently connected sigmoidal units can precisely process infinite state languages of various kinds. Indeed, the range of possible computations in networks is great, including all of the Chomsky Hierarchy (Siegelmann & Sontag, 1991). These projects all refer to a common principle: the network can use a fractal subset of its state space to implement a stack or tape memory. A fractal is a set that is made up of smaller replicas of itself (Mandelbrot, 1977). A key insight of all these projects is that the spatial recursive structure of a fractal can be used to keep track of the temporal recursive structure of a complex language. Recurrent neural networks are instances of the type of feedback systems standardly studied in dy- namical systems theory (the mathematical theory concerned with systems characterized in terms of how they change). Dynamical systems seem to have a precise relationship to grammars: (a) “sym- bolic dynamics”, a method of treating a dynamical system on a connected space as discrete symbol processor has been very useful in characterizing chaos (e.g., Devaney, 1989), an important dynami- cal phenomenon; (b) an indexed context free language gives the topology of a key trajectory of the logistic map, a rich and much-studied dynamical system (Crutchfield & Young, 1990); (c) dynami- cal systems construed as symbol processors in various ways, exhibit a rich range of computational types, including finite languages, finite state languages, context free languages, queue-based lan- guages, Turing Machines, and non-computable languages (Moore, 1998; Siegelmann, 1999; Tabor, 2009). In all the cases involving computable infinite state languages, a fractal subset of the state space supports recursive temporal computation. Generally, when an iterated map dynamical system corresponds to a particular discrete-update neu- ral network for recursive processing, the state space of the dynamical system corresponds to the recurrently-connected activation space of the network; the control parameters of the dynamical sys- tem correspond to the weights of the network; the parameterization of the dynamics associated with a particular symbol corresponds to the activation of a particular unit corresponding to the symbol, which has first-order or second-order connections to the recurrently connected units, thus specifying the update behavior of those units; the branches of the fractal are typically associated with differ- ent classes of symbols; the dynamical system may have an associated (finite) partition of the state space which specifies which maps can be applied from which states; correspondingly, the network may have one or more classifier layers outside the recurrent dynamics which map the recurrent state space to next-symbol options. (Bodén & Wiles, 2002; Rodriguez, 2001; Siegelmann, 1999; Tabor, 2000, 2003, 2011, e.g.,) 1.2 An issue: stability The results just discussed point to the rich computational capability of neural networks and related dynamical systems. However, this computational capability is not very helpful if it is unstable—that is if disturbance of the system through noise or other forces not tied to the computation easily cause the system to stop computing successfully on future inputs. There are at least two senses of stability that one might be concerned with: (i) stability with respect to changes in the state variables—e.g., perturbation of the activation states of the neurons in a neural network, and (ii) stability with respect to changes in the parameter variables—e.g., perturbation of the weights of a neural network. When small parameter perturbations do not change the topology of the dynamics, the system is said to exhibit structural stability. Here we focus on state variable stability, noting that structural stability is an additional property of interest which deserves further investigation (see Conclusions). If a system is a stable language processor, then it should be able to recover from forces that temporarily knock it off the track of grammatical processing. Evidence from the sentence processing literature suggests that when people are disturbed in the course of processing a sentence (e.g., by a garden path event, or a confusion due to interference) they exhibit a disturbance which lasts beyond the word that causes the disturbance (“spill-over effects”), but typically only for a few words, with evidence for resolution of the disturbance occurring at the end of the sentence (“sentence wrap-up effects”). This suggests that human minds processing language have “asymptotic stability”—as processing progresses following a disturbance, the system returns to normal processing. 2 Motivated by these observations, we establish, in the next section, a definition of stability for symbol-driven dynamical automata. 2 Back in Kansas Stability In classical formal grammar theory, languages are sets of finite-length strings. In the present work, we consider languages to be sets of one-sided infinite length strings. This simplifies the formalism. We can assign each language on the Chomsky Hierarchy a place in this framework by consider- ing, for language L, the set of all one-sided infinite concatenations of strings from L, which we denote L∞ , thinking of this as a model of all the sentences that a speaker hears and/or produces in succession in their lifetime. Def. An iterated function system (Barnsley, [1988]1993; Hutchinson, 1981) is a (finite) set of func- tions IF S = {f1 , . . . , fk } that map from a space X to itself. We assume that X is a complete metric space with metric d. Def. A one-sided-infinite-string language, L, is a set of one-sided infinite strings drawn from a finite alphabet, Σ = {1, . . . , k}. We assume that for every member j of Σ, there is some string of L which contains k. For x ∈ X, and S = σ1 σ2 . . . σN a finite string of symbols drawn from Σ, we use the notation IF SS (x) to denote the function fσ1 (fσ2 (. . . (fσN (x)) . . .)). Def. Consider a point x(0) in X. The labeling, Labx(0) , of IF S driven by L is a function from points in X to the power set of Σ, such that j ∈ Labx(0) (x) iff there is a finite initial substring, S, of some string in L such that IF SS (x(0)) = x and the string formed by adding j to the end of S is also an initial substring of some string in L. Def. Labx(0) is said to process a symbol j at a point x iff j ∈ Labx(0) (x) and the system moves from x to fj (x). A labelling Labx(0) under IF S is said to process a string S iff starting at x(0), every symbol of S is processed in sequence. The language of Labx(0) under IF S is the set of one-sided infinite strings processed by Labx(0) under IF S. We say that an IF S − L systems is on Labx(0) when it visits a point x if all continuations of the current string under L are processed by Labx(0) . Def. Consider x ∈ X and j ∈ Σ. The distance d((x, j), Labx(0) ) from the ordered pair (x, j) to the labeling, Labx(0) , is inf {d(x, y)) : y ∈ X and j ∈ Labx(0) (y)}. In other words, assuming a standard (infimum-based) definition of point-set distance, the distance of a point-symbol ordered pair, (x, j), to a labeling is the distance from x to the set of points at which j can come next under the labelling. Def. A sequence of ordered point-symbol pairs (x(n), σn ) for n = {0, 1, 2, . . .} is said to converge to a labeling Labx(0) if, for every , there exists N such that M > N implies d((x(M ), σM ), Labx(0) ) < . Def. Consider an IF S on a complete metric space X with functions Σ = {1, . . . , k} and a one- sided infinite string language L on Σ. For initial state, x(0), consider the labeling Labx(0) induced by L. Consider the sequence of point-symbol pairs P SPS = {(x(n), σn ), n = 0, 1, . . .} induced by a member, S = σ1 σ2 . . ., of L (i.e., x((n + 1)) = fσn (x(n)) for all n). If, when the system is perturbed within a radius δ of x(n) for any (x(n), σn ) ∈ P SPS and then driven henceforth by the remaining symbols of S, it converges to Labx(0) , then it is said to exhibit Back in Kansas Stability from x(0) when driven by S. If there is a δ such that all futures from x(0) converge when perturbed once within δ, then the system is said to exhibit Back In Kansas stability from x(0). We say, in such a case, that points of Labx(0) with nonempty labeling, along with their labels, are an attractor of the IF S-L. The idea of Back in Kansas Stability is that the system is expected to recover its rhythm, not under the influence of some external signal, but rather simply through exposure to sufficient material from the familiar language. We adopt this approach because, as noted above, studies of sentence process- ing provide evidence that recovery from disturbance in sentence processing often takes place across multiple words and seems to involve a convergence process. 3 3 Lack of Back in Kansas Stability in existing fractal models We next offer some demonstrations that existing fractal computers lack Back in Kansas Stability. As noted above, Moore (1998) describes a one-dimensional dynamical automaton that implements a stack memory for recognizing context free languages. For a stack alphabet of k symbols, Moore’s system uses a Cantor-set with k − 1 gaps between the fractal branches. Pushing and popping are accomplished by pushi (x) = αx + (1 − α) ki (1) popi (x) = pushi −1 (x) 1 where 1 ≤ i ≤ k is the symbol being pushed or popped and 0 < α < 2k+1 is a constant, and x(0) = 1. This system has the property that grammatical processing is restricted to the interval [0, 1], but an erroneous stack operation (e.g., popj (pushi (x)) where j 6= i) results in |x| > 1 (Moore’s system uses this property to detect ungrammatical transitions). Because push and pop are inverses across the entire state space and grammatical processing (to empty stack) implements a corresponding pop for every push, the displacement created by any error persists no matter how many grammatical sentences follow the error (hence the system does not have Back in Kansas Stability). For example, if one implements S → , S → 1 S 2, S → 3 S 4 with a two-symbol stack alphabet, choosing α = 71 < 2·2+11 , then the once-erroneous sequence 13121212 . . . results in an endless cycle between -2 and 0.1429—that is, the system never returns to Labx(0) , and, the magnitude of the distance between the state and Labx(0) endlessly visits a positive constant. Similarly, Tabor (2000) describes a fractal grammar which processes the non-finite-state context free language, S → A B C D, A → a (S), B → b (S), C → c (S), D → d (S). The model’s state space is R2 with initial state (1/2, 1/2) and the IFS is given by x 1/2 fa (x) = 2 + 0 x − 1/2 fb (x) = 0 fc (x) = 0 x + 1/2 (2) 0 fd (x) = 2 x − 1/2 The points with nonempty labels in Labx(0) form a ”Sierpinski Gasket”, a kind of two-dimensional Cantor Set. In this system, as in Moore’s, all the functions are affine, and the map fd ◦ fc ◦ fb ◦ fa (x) is identity across the whole state space. Since the language always completes every abcd sequence that is begun, this system, like Moore’s, repeatedly revisits the magnitude of any displacement from Labx(0) , independently of the value of x(0). These systems have a form of stability generally recognized in dynamical systems theory—there is a bound on how far the system travels away from the invariant set of interest—but they lack the asymptotic stability that seems to characterize human behavior. The lack of asymptotic stability is related to the fact that transitions from empty stack to empty stack implement identity across the state space. But identity is only required for grammatical processing, which, in these cases, is restricted to a proper subset of the state space. It will not disturb grammatical processing to modify the map in locations away from this manifold. In the next section, we show, for an important class of languages, a simple way of modifying the off-manifold maps, so that the system gets back onto the manifold when it has been knocked off, provided it receives exposure to enough grammatical material following perturbation. 4 The Simplest Context Free Language, 1n 2n We begin with the simplest context free language, 1n 2n , and then generalize to all mirror recursion languages, Lmirror-κ , of the form, S → , S → σ11 S σ12 , S → σ21 S σ22 , . . . , S → σκ1 S σκ2 . Mirror recursion seems to capture the gist of center-embedding structures in languages around the 4 world. Corballis (2007) argues that mirror recursion is the crucial recursive behavior that distin- guishes human languages from animal languages. Although it is true that no human language ex- hibits center embedding easily beyond one center-embedded clause, the degradation, as one tests successively deeper levels of embedding appears to be graceful (Lewis, 1996), and Christiansen & Chater (1999) argue that a Simple Recurrent Network captures this quality of the human language case well. Moreover, Rodriguez (2001) and Tabor et al. (2013) provide evidence that Simple Re- current Networks trained on center-embedded material are approximating fractal grammars. These observations motivate focusing on the mirror recursion case in the effort to understand how fractal grammars can be stable. Let L1n 2n be the one-sided-infinite-string language, {1n 2n }∞ . Let X = [−2, 2] and let IF S1-d be given by x f1 (x) = 2 +1 f2 (x) = 2x + 2 x < −1 (3) 0 −1 ≤ x < 1 −2x + 2 1 ≤ x Note that the 2-map is not affine but approximately quadratic. Figure 1 shows the IF S along with Lab0 and illustrates recovery from a perturbation. In fact, this system always recovers from perturbation. Figure 1: Cobweb diagram of IF S1−d driven by L1n 2n from x(0) = 0. The numbers in curly brackets show some of the nonempty state labels. All points in the complement of A = {. . . , − 74 , − 32 , −1, 0, 1, 32 , 74 , . . .} have empty label. The dotted line shows a perturbed trajectory which recovers. After processing a single 1 (x(1) = 1), the IFS was perturbed to 1.15. Subsequently it encountered the completion of the perturbed sentence (“. . . 1 2 2”), followed by a single complete sentence (“1 1 2 2”), by which point it was at 0 and back on the attractor. Thm 1. The system IF S1-d -L1n 2n processes L1n 2n from x(0) = 0 and is Back in Kansas stable from that point. The proof (see Appendix 1) first demonstrates that the language L1n 2n is processed by IF S1-d from x(0) = 0. Then it shows that, starting from any point in the state space, if the system receives the tail of any string from {1n 2n }∞ , it will be back on Lab0 within two sentences (“finite-time” convergence). This is a much stronger outcome than is required for Back in Kansas stability. The rapid convergence stems from fact that the horizontal section of the 2-map resets the system when it lands between -1 and 1. We have adopted a broad definition here, allowing also non-finite-time convergence, because simulation experiments which we will not discuss further here suggest that 5 similar convergence behavior occurs in systems with a single maximum, rather than a plateau, except that the convergence takes infinite time. 5 General Mirror Recursion For 2κ symbols, let x(0) be the origin in Rκ+1 . Define IF S(κ+1)-d for κ = 1, 2, . . . , i ∈ {1, . . . , κ} by −∞ < x1 < −1 −1 ≤ x1 < 1 1 ≤ x1 < ∞ x1 /2 + 1 x1 /2 + 1 x1 /2 + 1 x2 /2 0 x2 /2 ... ... ... x i /2 0 x i /2 fi (x) = xi+1 /2 − 1 fi (x) = fi (x) = −1 x i+1 /2 − 1 xi+2 /2 0 xi+2 /2 ... ... ... x( κ + 1)/2 0 x( κ + 1)/2 2x1 + 2 −2x1 + 2 2x2 2x2 ... ... 2xi 2xi f2i (x) = 2xi+1 + 2 f2i (x) = 0 f2i (x) = 2xi+1 + 2 2xi+2 2xi+2 ... ... 2x( κ + 1) 2x( κ + 1) Here, the -1 in the state change from the middle region for fi (x) is on dimension i + 1. Thm 2. Each system in the class {IF S(κ+1)-d -LM irror-κ } for κ ∈ {1, 2, . . .} is Back in Kansas stable from x(0) = O, the origin in Rκ+1 . Appendix 2 sketches the proof of Theorem 2. 6 Conclusions This paper has defined Back in Kansas Stability for nonautonomous dynamical systems, consisting of functions f1 , . . . , fk : X → X on a complete metric space, driven by a language L on Σ = {1, 2, . . . , k}. The definition was motivated by the fact that people who undergo a disturbance when processing complex natural language structures seem to recover naturally during the course of processing additional words. This idea makes a prediction that many other parsing models—those that tie parsing strictly to the information content of received words—do not make: recovery from a garden path might be helped by words following the disambiguating word even if these words provide no additional structural information; some evidence in support of this idea comes from certain types of length effects in sentence processing where long disambiguating regions are easier than short ones—see Tabor & Hutchins (2004). Another motivation came from the fact that stability is an important organizing principle of dynam- ical systems broadly, so when working with neural network dynamical systems, it is desirable to characterize their stability. The mirror recursion finding prompts a more specific observation: in the case of iterated maps, the well-known attractors of autonomous dynamical systems are distinguished by their cardinalities—a fixed point is a single point, a limit cycle contains a finite number of points, a chaotic attractor has uncountably many points. The case of mirror recursion with Back in Kansas Stability fills in this picture by identifying countably infinite attractive sets. It may be informative to ask what conditions support countably infinite attractors and why they are not prominent in the study of autonomous dynamical systems. 6 6.1 Future Work We noted above that a dynamical system is considered structurally stable at a point in its parameter space if small changes in the parameter values do not affect the topology of the system. Structural stability seems, in one way, desirable for neural networks, when one is interested in learning: the generally widely successful approach of gradient descent learning is likely to do badly in a context where small parameter changes radically alter the system structure. Results reported in Tabor (2009) for fractal grammars suggest, possibly unfortunately, that these types of systems are not structurally stable in the vicinity of points where they model complex languages. Nevertheless, an interesting alternative perspective may be worth considering: human languages seem to change structurally in a continuous way (see discussion in Tabor et al. (2013)), at least when they change historically, and possibly also when they are learned. It may be useful to invoke, in place of topological equivalence, a notion of structural continuity—two dynamical systems are considered structurally proximal if differences in their topology take a long time to detect. If there is structural continuity, gradient based learning may still be feasible. What light does this work shed on the challenge of neural symbolic integration? Broadly, it suggests studying neural symbolic integration by studying dynamical systems more generally. Specifically, the results on Back in Kansas Stable point to the fact that not all dynamical computers are stable in this sense; it also raises the question whether all computations can be stably implemented. Stability is very likely a property of human language systems since they have to tolerate a great deal of informational and physical buffeting. It may therefore be useful for the field of neural symbolic integration to identify, both from a dynamics point of view and a language point of view, what systems can be stable. 7 Appendix 1: Proof of Thm. 1 We first show that Lab0 under IF S1−d processes L1n 2n (Part A). Then we show that the system is Back in Kansas Stable from x(0) = 0 (Part B). Part A By definition, every string of L is processed by Lab0 under IF S1−d . Regarding strings of Lab0 under IF S1−d , note that 1’s only ever occur if x ≥ 0. If the system starts at 0 and experiences a sequence of the form 1j 2i where 0 < i < j, then x < 0. Therefore, substrings of the form 21j 2i 1, 0 ≤ i < j are not processed by Lab0 . If the system starts at 0 and experiences a sequence of the form 1j 2j where 0 ≤ j, then the system is at x = 0, where a 2 never occurs (because, under L, balanced 1s and 2s are always followed by a 1 and 0 is never reached except via balanced 1s and 2s). Therefore substrings of the form 21j 2i 1 where i > j > 0 never occur. Therefore the strings processed by Lab0 under IF S1−d are all and only the strings of L1n 2n . Part B. We wish to show that under perturbation up to radius r followed by grammatical continuation to infinity, IF S returns to Labx(0) in the limit. In fact, in this case, under any perturbation, followed by at most one complete string of the form IF S reinhabits Labx(0) . First, it is useful to define some terminology. We refer to the string in which the perturbation occurs as the perturbation string. We refer to the interval h = [−1, 1], where the function b has a fixed value, as the flat interval. Note that when h is in the flat interval, 2k (h) = 0 for any k ≥ 0. Note also that if the system is in a state hp < 0 and it processes a string of the form 1n 2n , then it will arrive at 0 when it processes the last 2 (this follows from the facts (i) that, at every symbol, 1n 2n (hp ) is sandwiched between 1n 2n (h0 ) and 0 and (ii) 1n 2n (h0 ) = 0.) Whenever the state of the system is thus sandwiched, we say that the system is ahead with respect to grammatical processing. Turning now to the proof of B, there are two cases: (i) The perturbation occurs at some time before the last 1 in 1j 2j for some j ≥ 0. Within this case, it is useful to consider two subcases: (i-1) The perturbation decreases h. In this case, during processing of the perturbation string, the system is ahead, so it arrives at 0 by the last 2 and is therefore on Labx(0) . 7 (i-2) The perturbation increases h. In this case, the system is in [-2, -1] at the end of the string. Therefore, the system is ahead during the processing of the subsequent string. Consequently, it reaches Labx(0) by the end of the post-perturbation string. (ii) The perturbation occurs after the last 1 and before the last 2 in 1j 2j for some j ≥ 0. Again, we consider two cases: (ii-1) The perturbation decreases h. In this case, the system is still in [-2, -1] at the end of the string and the future states follow the pattern of (i-2) above. (ii-2) The perturbation increases h. If, after the increase, h < −1, then the remaining 2’s bring the system to the flat interval and the case is like (i-1) above. If, after the increase, −1 ≤ h ≤ 1, then the remaining 2’s keep the system at 0 and the system is on the attractor at the start of the next string. If, after the increase, h > 1, then the system is in [-2, -1] at the end of the string and the future follows (i-2) above. Thus, in all cases, the system returns to Labx(0) at least by the end of the post-perturbation string. 8 Appendix 2: Sketch of Proof of Thm. 2 The proof of this theorem builds on the proof of Theorem 1. The system behaves on the first di- mension as if the string were 1n 2n with all the push symbols invoking 1 and all the pop symbols invoking 2, following the same dynamics as in the 1-dimensional case above. Whenever the system is in the flat region of the pop maps and in the second half of a sentence, dimension 1 becomes 0 and stays there until the end of the sentence. Then, the start of the new sentence resets all the dimensions to the appropriate values so the effect of perturbation on any dimension is removed. By Thm. 1, this will always happen within two sentences of a perturbation. Furthermore, when the system, initialized to x(0) = 0, is unperturbed, f2i (x) ◦ fi (x) implements identity for visited points, x and every stack state corresponds to a distinct point in X because the push maps on X form a just-touching fractal (Barnsley, [1988]1993) so the unperturbed system in k + 1 dimensions processes LM irror−k . Acknowledgments Thanks to Harry Dankowicz and Garrett Smith for comments on an earlier version of this work. We gratefully acknowledge support from NSF PAC grant 1059662 and NSF INSPIRE 1246920. References Barnsley, M. ([1988]1993). Fractals everywhere, 2nd ed. Boston: Academic Press. Bodén, M., & Wiles, J. (2000). Context-free and context sensitive dynamics in recurrent neural networks. Connection Science, 12(3), 197–210. Bodén, M., & Wiles, J. (2002). On learning context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 13(2), 491–493. Chomsky, N. (1957). Syntactic structures. The Hague: Mouton and Co. Christiansen, M. H., & Chater, N. (1999). Toward a connectionist model of recursion in human linguistic performance. Cognitive Science, 23, 157–205. Corballis, M. C. (2007). Recursion, language, and starlings. Cognitive Science, 31, 697–704. Crutchfield, J. P., & Young, K. (1990). Computation at the onset of chaos. In W. H. Zurek (Ed.), Complexity, entropy, and the physics of information (pp. 223–70). Redwood City, California: Addison-Wesley. Devaney, R. L. (1989). An introduction to chaotic dynamical systems, 2nd ed. Redwood City, CA: Addison-Wesley. Elman, J. L. (1991). Distributed representations, simple recurrent networks, and grammatical struc- ture. Machine Learning, 7, 195–225. 8 Gazdar, G. (1981). On syntactic categories. Philosophical Transactions (Series B) of the Royal Society, 295, 267-83. Hutchinson, J. E. (1981). Fractals and self similarity. Indiana University Mathematics Journal, 30(5), 713-747. Lewis, R. (1996). Interference in short-term memory: The magical number two (or three) in sentence processing. Journal of Psycholinguistic Research, 25(1). Mandelbrot, B. (1977). Fractals, form, chance, and dimension. San Francisco: Freeman. Moore, C. (1998). Dynamical recognizers: Real-time language recognition by analog computers. Theoretical Computer Science, 201, 99–136. Pollack, J. (1987). On connectionist models of natural language processing. (Unpublished doctoral dissertation, University of Illinois.) Rodriguez, P. (2001). Simple recurrent networks learn context-free and context-sensitive languages by counting. Neural Computation, 13(9), 2093-2118. Siegelmann, H. T. (1999). Neural networks and analog computation: Beyond the turing limit. Boston: Birkhäuser. Siegelmann, H. T., & Sontag, E. D. (1991). Turing computability with neural nets. Applied Mathe- matics Letters, 4(6), 77-80. Tabor, W. (2000). Fractal encoding of context-free grammars in connectionist networks. Expert Systems: The International Journal of Knowledge Engineering and Neural Networks, 17(1), 41- 56. Tabor, W. (2002). The value of symbolic computation. Ecological Psychology, 14(1/2), 21–52. Tabor, W. (2003). Learning exponential state growth languages by hill climbing. IEEE Transactions on Neural Networks, 14(2), 444-446. Tabor, W. (2009). A dynamical systems perspective on the relationship between symbolic and non-symbolic computation. Cognitive Neurodynamics, 3(4), 415-427. Tabor, W. (2011). Recursion and recursion-like structure in ensembles of neural elements. In H. Sayama, A. Minai, D. Braha, & Y. Bar-Yam (Eds.), Unifying themes in complex systems. pro- ceedings of the viii international conference on complex systems (p. 1494-1508). Cambridge, MA: New England Complex Systems Institute. (http//necsi.edu/events/iccs2011/proceedings.html) Tabor, W., Cho, P. W., & Szkudlarek, E. (2013). Fractal analysis illuminates the form of connec- tionist structural gradualness. Topics in Cognitive Science, 5, 634–667. Tabor, W., & Hutchins, S. (2004). Evidence for self-organized sentence processing: Digging in effects. Journal of Experimental Psychology: Learning, Memory, and Cognition, 30(2), 431- 450. Wiles, J., & Elman, J. (1995). Landscapes in recurrent networks. In J. D. Moore & J. F. Lehman (Eds.), Proceedings of the 17th annual cognitive science conference. Lawrence Erlbaum Asso- ciates. 9