=Paper=
{{Paper
|id=Vol-1348/maics2013_paper_16
|storemode=property
|title=Metathesis and the Genetic Algorithm: Language as a Complex Adaptive System
|pdfUrl=https://ceur-ws.org/Vol-1348/maics2013_paper_16.pdf
|volume=Vol-1348
|dblpUrl=https://dblp.org/rec/conf/maics/PalmaGOLG13
}}
==Metathesis and the Genetic Algorithm: Language as a Complex Adaptive System==
Metathesis in English and Hebrew: A Computational Account of Usage-Based Phonology Paul De Palma Sara Ganzerli depalma@gonzaga.edu ganzerli@gonzaga.edu Department of Computer Science, Gonzaga University Department of Civil Engineering, Gonzaga University Spokane, WA 99258-0026 Spokane, WA 99258-0026 Shannon Overbay George Luger overbay@gonzaga.edu luger@cs.unm.edu Department of Mathematics, Gonzaga University Department of Computer Science Spokane, WA 99258-0026 University of New Mexico Albuquerque, NM 87131 Kim Glaspey kglaspey@zagmail.gonzaga.edu Department of Computer Science, Gonzaga University Spokane, WA 99258-0026 seventies, and especially with the wide availability of Abstract digitized corpora of spoken language and inexpensive It is now well understood that language use shapes the computing power, the study of language as it is actually acoustic delivery of phonological patterns. One common used has been receiving more attention. Several of the ideas example of this type of language change-under-use is of usage-based linguists have particular implications for the metathesis, which is the reversal of the expected linear study of sound systems. These include the notion that ordering of sounds. The gradual transformation of the Spanish experience with categories of sound affects their word chipotle to chipolte in the United States is an example of representation: the more experience the easier the access. metathetic change. The Genetic Algorithm (GA) is an optimization technique loosely based on the idea of natural Closely related are the ideas that what we know about selection. This paper shows that the GA can provide a categorization generally applies to phonological structures computational model of a usage-based account of examples of (see Rosch 1978, of course). Further, there is no firm metathesis. In the process, it argues that computer models can separation of language structures and the rules that are bring precision to linguistic theory. As an example we create applied to them—data structures and algorithms in the a GA that is able to characterize metathesis in English and language of computer science—as in the formalist tradition then is able to achieve even better results for related (Chomsky and Halle 1968; Pinker 1999), but, rather, expressions in modern Hebrew. linguistic properties emerge from the complex interplay of particular languages and their use, just as do purely Keywords: Genetic Algorithm; metathesis; computational biological systems. In fact, in this view, language emerges phonology; emergent from repeatedly applying underlying and general cognitive mechanisms (Bybee, 2010). Finally, and more generally, a Usage-Based Linguistics correct formal characterization of language, individually or In the first paragraph of her book on usage-based collectively, may not be possible and even if it were, the phonology, Joan Bybee says that “language use plays a role formalism itself does not constitute an explanation of the in shaping the form and content of sound systems…[It] phenomenon under investigation. Rather, as Bybee and affects the nature of mental representation and in some cases McClelland argue (2005), formalisms describe linguistic the actual phonetic shape of words” (Bybee 2001, p. 1). A regularities that result from the normal process of language non-linguist might reasonably reply, “of course, what else use and adaptation. besides use and anatomy could shape sound systems?” Professor Bybee could then show us an elegant but deeply counterintuitive body of work, beginning with that of de Hume’s Model of Metathesis Elizabeth Hume’s (2004) study of metathesis is an Saussure in the early 20th century, which argues that especially nice example of the application of usage-based language use can be separated from language competence techniques to a phenomenon that has puzzled linguists for and, crucially, language competence is where the real action many years. (All examples of metathesis in this paper are is. While granting the richness of the formalist program in taken from Hume). Hume defines metathesis as “the process whereby in certain languages the expected linear language study, those of us coming from other disciplines ordering of sounds is reversed under certain conditions. might be pleased to learn that beginning in the mid-nineteen Thus, in a string of sounds where we would expect the lexicon. In evolutionary terms, an indeterminate speech ordering to be …xy…, we find instead …yx…” (p. 203). signal is one that is not optimally suited to its environment, For example, in recent American usage, the word chipotle, the “existing patterns of the language.” It is important here can frequently be heard as chipolte, where /t/ and /l/ are to clarify a common misconception about natural selection. shifted. A very similar kind of metathesis occurs in binyan It does not claim that a given organism is optimized, that it 5 of perfective verbs in modern Hebrew. When the /-t-/ manifests the best possible arrangement of parts. The indicating the binyan 5 morpheme is followed by a stem theory does claim that differential reproduction allows an initial strident (/s/ or /z/, for example), the morpheme and organism which is better adapted to a specific and limited the strident shift expected positions. Thus we have environment to produce more offspring than one that is not. hitnakem (“he took revenge”) and hidbalet (“he became So, biology is neither random nor goal-directed. Hume prominent”) but, also, histader (“he got organized”) and makes a similar point about metathesis: “the goal of hizdaken (“he grew old”). metathesis is not to improve the overall psychoacoustic (i.e., Perhaps the most perplexing element is that a pattern of universal) cues of a sequence, but rather conforming to the sounds occurring in one order in language A can occur in patterns of usage of a given language is key” (p. 225). the opposite order in language B. Consider examples drawn These two ideas, that frequency of use plays a role in from Hungarian and Pawnee. In certain Hungarian forms, language development—see especially, Bybee 2010)—and glottals that precede approximants surface as approximants that metathesis can be reframed as an emergent preceding glottals (/h/ + /r/, in this case, becomes /r/ + /h/). phenomenon, are the ideas that interest us most and that put Thus the dative tehernek (“load”) becomes in the plural Hume’s account squarely within the usage-based camp. terhek. In Pawnee, just the opposite occurs. The expected ordering /ti-ir-hissask-kus/ becomes tihrisasku, with the Emergentist Models of Language glottal appearing before the approximant. According to The view that language is emergent, that it is, in fact, a Hume, this led metathesis to be analyzed as a phenomenon complex adaptive system, has received attention in recent that is irregular, found in child language, the result of years. One of the earliest accounts is Lindblom’s 1984 performance errors, or simply the result of language change. attempt to select “with the aid of a self-organizing model a In fact, implicit in her discussion, though distinctly ‘phonological structure’” [emphasis in the original]. In fact, underplayed, is that metathesis leads to permanent language a snippet from that article, “DERIVE LANGUAGE FROM change. That is, metathesis is a diachronic phenomenon. NONLANGUAGE!,” has been used recently used as a This raises metathetic change from a mere curiosity whose summary of the goals of usage-based linguistics (Diessel regularities can be described to an element of language 2011). More recently, Ke and Holland (2006) note that change. And, as Joan Bybee, a leading figure in the usage- there are two main approaches to the investigation of based camp reminds us in her recent book, “nothing in language origins. First, there are nativist accounts of linguistics makes any sense except in light of language language competence and performance that concentrate on change” (Bybee 2010, p. 10). Although the pronunciation cognitive mechanisms and their biological underpinnings. of /chipotle/ as /chipolte/, not simply within a linguistic Then there are empirical accounts that concentrate on social generation but within a single speaker, can be accounted for structures and patterns of linguistic transmission. In the by her model, Hume’s work becomes really interesting latter, “language could have evolved from simple when it tries to account for what was once a puzzling aspect communication systems through generations of learning and of linguistic change. How, for instance, did the expected cultural transmission, without new biological mutations /hitsader/ in Modern Hebrew become /histader/? Though specific to language. While the human species may have diachronic processes are not her primary interest, Hume’s evolved to be capable of learning and using language, it is account of metathesis can be reframed in evolutionary more important to recognize that language itself has evolved terms. What any naturally selective process needs is an to learnable for humans” (Ke and Holland 2006. p. 693). initial state, an environment that favors certain forms over Andrew Wedel (2005) offers a nice analogy. It seems others, and an output. Hume’s work provides all three. The unreasonable to assert that one’s ability to hold a fork is initial state, of course, is “the expected linear ordering of genetically encoded in any precise fashion, despite that fact sounds.” The output is the reverse ordering. The “certain that humans, as far as is known, are the only species to use conditions” correspond to the phonological environment them. On the other hand, the manner of fork-holding is that favors some forms over others. culturally transmitted within genetically-encoded Hume argues that metathesis requires two conditions: parameters, namely four fingers and an opposable thumb. An indeterminate speech signal We might even become better fork-holders over time, as our An output that conforms to existing patterns in forks evolve to fit our gifts. This notion, that linguistic the language. transmission occurs within species-specific parameters, is This is another way of saying that if I don’t quite understand captured in the emergentist paradigm. As Ellis put it (cited what you just said, I’ll interpret in light of what I already in Ke and Holland, 2006, p. 694), language acquisition can know. My reinterpretation, of course, will be in the context be explained by “simple learning mechanisms, operating in of what I know best, namely the most frequent sounds in my and across the human systems for perception, motor-action, and cognition as they are exposed to language data as part of Palma, P., 2006). In practice, of course, this means that that communicatively-rich human social environment by an those who attempt to solve these problems must be content organism eager to exploit the functionality of language” with good-enough solutions. Though good-enough may not (Ellis 1998, p. 657). appeal to purists, it is exactly the kind of solution implicit in Both Holland and Ke (2006) and Holland (2005) situate natural selection: a local adaptation to local constraints, their work within the tradition of agent-based and complex where the structures undergoing change are themselves the adaptive systems. Holland—the original developer of the product of a recursive sequence of adaptations. This can be genetic algorithm (Holland 1975)—describes his own expressed quite compactly: efforts to model language acquisition as a complex adaptive system. He uses the phrase “adaptive agent” to describe an GA() Initialize(population); //build initial population individual collection of linguistic rules that communicates ComputeCost(population); //apply cost function with what appears to be a linguistic environment. Some of Sort(population); //rank population these agents have a better fit with the environment than while (population has not converged on a good-enough solution) others. These survive to evolve still better rules. Pair(population); //decide which members reproduce Mate(population); //exchange characteristics Though these accounts are persuasive enough, the real Mutate(population); //randomly perturb genes question to be addressed is what one gets after one creates a Sort(population); //rank population software model of larger system. O’Reilly and Munakata TestConvergence(population); //has a new species appeared? (2000) make an especially persuasive argument for why one might want to model cognitive processes, the most The use of the GA to model metathetic change is important piece of which for our own work is that models consistent with Croft’s (2000) theory of language change force investigators to be explicit about their theories. It is that he calls “utterance selection.” In utterance selection, one thing to describe a process. It is quite another to “normal replication is in essence conformity to convention describe it with the precision necessary to run it on a in language use. Altered replication results from the computer. Thus Hume draws on Ohala’s (1993) observation violation of convention in language uses. And selection is that certain categories of sound, glottals and liquids for essentially the gradual establishment of a convention example, (i.e., the closure of the glottis in bitten and /r/, see through language use” (p. 7). In Croft’s view, the utterance Ladefoged 2006) have “stretched out features” that can corresponds to DNA, the replicators to genes, the variants in bleed over into adjacent sounds causing indeterminancy linguistic structures to alleles. The task in building a model (Hume, 2004, p. 219). To construct a computer model, we is to find, according to Croft, those mechanisms that cause would have to know how stretched out. Glottals have cues certain linguistic structures to be favored over others. that are certainly longer than the release bursts of stops (/b/ These are “the causal mechanisms of selection of linguistic for example). But how much longer? An empirical structures” (p. 31). Hume’s work provides just such a approach suggests itself immediately: conduct experiments. causal mechanism. We show next that this causal Another approach, the one implicit in emergentist theory, is mechanism can be modeled with GA. to build a model and adjust its parameters until its inputs and outputs conform to the data. In a nutshell, this is what Metathesis and the GA guides our efforts. Hume describes several kinds of metathesis, all The Genetic Algorithm conforming, in one way or another, to her initial claim that The Genetic Algorithm (GA) is an optimization method metathesis results from indeterminate speech signals based loosely on the idea of natural selection. Individual processed in terms of frequently occurring sequences of members of a species who are better adapted to a given sounds in a given language. The chipotle/chipolte example environment reproduce more successfully and so pass their is an instance of this recurring pattern: “a consonant with adaptations on to their offspring. Over time, individuals potentially weak phonetic cues often emerges in a context in possessing the adaptation form interbreeding populations, which the cues are more robust than they would have been that is, a new species. In keeping with the biological in the expected, yet non-occurring, order” (p. 209). More metaphor, a candidate solution in a GA is known as a specifically, stop consonants are easier to perceive in chromosome. The chromosome is composed of multiple prevocalic position. In fact, over one-third of the metathesis genes. A collection of chromosomes is a called a tokens that Hume identifies involve a stop consonant. In the population. The GA randomly generates an initial example, [tle] is less favorable in the environment of population of chromosomes which are then ranked American English than is [lte]. That is, the stop consonant according to a fitness function. One of the truly marvelous before the /l/ produces an indeterminate signal for American things about GA is its wide applicability. We have used it to English speakers, who proceed to shift it to the more optimize structural engineering components and are frequent pre-vocalic position. currently applying it to a classic problem in graph theory How to represent this process in a GA is the next (Ganzerli, S., De Palma, P. et al., 2003, 2005, 2008). As it question. Clearly, we must assign a better fitness, a lower happens, both problems are NP-Complete, in effect, cost, to sequences with pre-vocalic stop consonants than to computationally intractable (Overbay, S., Ganzerli, S, De those with post-vocalic stop consonants. But, somehow, both signal indeterminacy and token frequency must be 5. A stop followed by a strident is perceptually weak made part of this process. Here is our approach: and infrequent. Penalize words with prestrident stops. This rule is what allows our GA to generate 1. Input an initial population of the base word and the the kind of metathetic change found in binyan 5 of target word. chipotle is an example of a base word perfective verbs in Modern Hebrew (/hitsader/ and chipolte is an example of the target word. Our /histader/) as well as another instance of English GA works with a total population of 64 words. The metathesis (/ask/ /aks/). relative frequency of the base and target words is a parameter. Thus, we might have one instance of the Method and Results base and four of the target in the initial population. Our GA was constructed using Java programming and run 2. Generate a random sequence of characters that fill under Ubuntu Linux. Its cost function is designed to model, out the population. So, if we seeded the population among many other instances, both the chipotle/chipolte with one instance of the base and four of the target, metathesis as well as binyan 5 of perfective verbs in modern our GA would randomly generate fifty-nine character Hebrew, specifically hitsader/histader. Every parameter sequences. was held constant except the relative frequency of base and 3. Assign a fitness value to each of the sequences that target sounds. Since the sounds being modeled occur in the comprise the population. interior of the word in both cases, the strings potle/polte and 4. Sort, pair, mate, and mutate the population. Sorting itsa/ista functioned as surrogates for the entire words. The is the process of ranking by fitness value. Pairing is population size was set at 64 and the mutation factor set at the process whereby strings of sounds are collected in .5%. For each of 1, 2, and 4 initial chipotle/hitsader two-tuples. As a proof of concept, we adopt a simple tokens, the number of chipotle/histader tokens began at approach. The two-lowest cost strings are paired, parity then was doubled three times. So, for instance, if we followed by the next two lowest cost until we have were working with an initial population of 4 chipotle tokens, 16 breeding pairs. The remaining 32 strings are we would produce results for 4, 8, 16, and 32 chipolte discarded to make room for the progeny of our tokens. Therefore, there were 12 frequency configurations, breeding pairs. Mating is the process by which the four for each set of 1, 2, or 4 chipotle tokens. For each of paired words pass on their genetic composition— these 12 configurations, we ran the GA 250 times, each run their sounds—in the process of generating two new consisting of 250 generations. Along the way, the strings of sounds. Mutating is the random shifting of chipotle/hitsader tokens disappeared. The data is a fixed fraction of the genes in the population. This summarized in the Tables 1 and 2 below. mimics the action of chemical/biological/radiological mutagens on individuals. For our purposes, it Discussion and Future Research prevents the system from getting stuck in local The data illustrates that we were able to design a minima (see Haupt & Haupt 1998). computational model using the Genetic Algorithm that 5. Stop when some predetermined condition is met, else captures Hume’s model of metathetic change. In every one go to step 3. of the 12 frequency configurations, the chipotle tokens The cost function in any GA embodies most of the theory disappeared from the population within three generations being modeled. The other pieces are parameters to the and hitsader tokens within two. “Generation,” of course is system. The most important of these for us is the relative the term used in the GA literature. It is not to be confused frequency of the base word and the target, i.e., the initial with a human generation. Further, within 60 generations, on character sequence and the target of metathetic change average, chipolte tokens made up an average of 95% of the respectively. The cost function itself is an attempt to population. Hebrew metathesis performed even better, with operationalize Hume’s model. Except for a few items histader tokens comprising an average of 97.3% of the designed to exclude randomly generated but non-occurring population within, on average, 48 generations. At this phonetic sequences, it is as follows: point, it might be useful to recall Hume’s two conditions for 1. A prevocalic stop is more salient than a postvocalic metathesis: the speech signal must be indeterminate, and the stop. Give a fitness boost to words with prevocalic output must conform to existing patterns in the language. As stops. we indicated with the Hungarian and Pawnee attestations 2. By observation 1, penalize words with postvocalic above, metathesis is not just a rule-based phenomenon stops. found in the same form cross-linguistically. Rather, it is 3. Glottals, liquids, glides (/w/, for example) tend to intimately tied to existing sound patterns within a language. bleed over into adjacent sounds . This is especially Said another way, metathesis is a usage-based phenomenon. true when they follow a stop. Penalize words with Our model demonstrates this in terms of a very solid glottals, liquids, and glides that follow a stop. frequency effect. The maximum number of target tokens 4. A stop followed by a consonant is perceptually tends to stabilize more quickly and at a higher percent of the weak. Penalize words with stops followed by total population as the number of target tokens in the initial consonants. population increases. Further, the larger the set, where a set is defined as the number of base tokens in the initial Table 2: Hitsader, 250 Runs, 250 Generations Each population, the better the performance. This is illustrated Ratio Generation Generation Percent of most strongly when we look at data from the first and last of Base to Hitsader Histader Histader element of each configuration; that is when we compare 1:1, Target Disappeared Stabilized Tokens at 2:2, and 4:4 with 1:8, 2:16, and 4:32. The more frequent the Stabilization target within the initial population, the more quickly the 1:1 2 79 84.3 population stabilizes on the target and at a higher percent of 1:2 2 65 98.4 1:4 2 55 98.4 the total population. 1:8 2 39 98.4 Nevertheless, Hume’s model is underspecified from an 2:2 2 59 98.4 algorithmic/computational standpoint. Though it specifies 2:4 2 47 98.4 very clearly what kinds of sounds are potentially vulnerable 2:8 2 43 98.4 to metathetic change and in what context, the computational 2:16 1 33 98.4 4:4 2 49 98.4 modeler must guess how to weight the various phonetic 4:8 1 43 98.4 factors involved and, in particular, to guess at frequency 4:16 1 32 98.4 thresholds. We regard our study as a proof of concept. In 4:32 1 23 100 future work we will build our frequency hypotheses into the rules themselves. For example, instead of simply rewarding strings with a prevocalic stop and penalizing those with a Acknowledgements postvocalic stop, we will use transcribed corpora to estimate The authors would like to acknowledge the many student the frequency of both vulnerable cues and the targets of research assistants who have contributed their talent and metathetic change. These frequencies will be used to weight enthusiasm to the Gonzaga University Center for the penalties and rewards, thus making as precise as Evolutionary Algorithms for over a decade. possible important observations like, “Indeterminancy sets the stage for metathesis, and the knowledge of the sound patterns of one’s language influences how the signal is References processed and, thus, the order in which the sounds are Bybee, J. 2001. Phonlogy and Language Use. parsed” (Hume, 2004, pp. 209- 210). Our goal is that by Cambridge: Cambridge University Press. gathering data on vulnerable sounds in corpora of actual Bybee, J. 2010. Langauge, Usage and Cognition. speech, we will be able to generate all of the instances of Cambridge: Cambridge University Press. metathesis within a language. This will add weight to Bybee, J., and McClelland, J. 2005. Alternatives to the Hume’s observations and perhaps be useful in accounting combinatorial paradigm of linguistic theory based on for and predicting other types of language change. domain general principles of human cognition. The Linguistic Review 22,381-410. Table 1: Chipotle, 250 Runs, 250 Generations Each Chomsky, N., and Halle, M. 1968. The Sound Pattern of English. NY: Harper and Row. Ratio Generation Generation Percent of Croft, W. 2000. Explaining Language Change: An of Base to Chipotle Chipolte Chipolte Evolutionary Approach. Harlow, England: Pearson. Target Disappeared Stabilized Tokens at Stabilization Ellis, N. 1998. Emergentism, Connectionism, and 1:1 3 119 73.4 Language Learning. Language Learning 48, 631-634. 1:2 3 68 93.7 Ganzerli, S., & De Palma, Paul. (2008). Genetic Algorithms 1:4 3 60 96.8 and Structural Design Using Convex Models of 1:8 2 44 98.4 Uncertainty. In Y. Tsompanakis, N. Lagaros, M. 2:2 3 90 92.1 2:4 3 72 96.8 Papadrakakis, eds. Structural Design Optimization 2:8 2 50 98.4 Considering Uncertainties. London: A.A. Balkema 2:16 2 31 98.4 Publishers, A Member of the Taylor and Francis Group. 4:4 3 58 96.8 Ganzerli, S., De Palma, P., Stackle, P., Brown, A. 2005. 4:8 2 44 98.4 Info-Gap uncertainty on structural optimization via 4:16 2 33 98.4 genetic algorithms. Proceedings of the Ninth 4:32 1 26 98.4 International Conference on Structural Safety and Reliability, Rome. Ganzerli, S., De Palma, P., Smith, J., & Burkhart, M. 2003. Efficiency of genetic algorithms for optimal structural design considering convex modes of uncertainty. Proceedings of The Ninth International Conference on Applications of Statistics and Probability in Civil Engineering, San Francisco. Haupt, L. & Haupt, S. 1998. Practical Genetic Algorithms. New York: John Wiley and Sons. Diessel, H. 2011. Review Articles: Language, usage, and cognition. Language 87(4):830-844. Holland, J. 1975. Adaptation in Natural and Artificial Systems. Ann Arbor: The University of Michigan Press. Holland, J. 2005. Language acquisition as a complex adaptive system. In Minett, J., Wang, W. eds. Language Acquisition, Change and Emergence: Essays in Evolutionary Linguistics. Hong Kong: City University of Hong Kong Press. Hume, E. (2004). The Indeterminancy/Attestation Model of Metathesis. Language 80(2): 203-237. Ke, J., & Holland, J. 2006. Language Origin from an Emergentist Perspective. Applied Linguistics 27(4): 691- 716). Ladefoged, P. 2006. A Course in Phonetics. Boston: Thomson/Wadsworth. Lindblom, B., MacNeilage, P., & Studdert-Kennedy, M. 1984. Self-organizing processes and the explanation of phonological universals. In Butterworth, B., Comrie, B., Dahl, O. eds. Explanations for Language Universals. New York: Mouton. O’Reilly, R., & Munakata, Y. 2000. Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain. Cambridge: MIT Press. Ohala, J. 1993. Sound change as nature’s speech perception experiment. Speech and Communication 13.155-161. Overbay, S., Ganzerli, S., De Palma, P, Brown, A., Stackle, P. 2006. Trusses, NP-Completeness, and Genetic Algorithms. Proceedings of the 17th Analysis and Computation Specialty Conference. St. Louis. Pinker, S. 1999. Words and Rules. NY: HarperCollins. Rosch, E. 1978. Principles of Categorization. In E. Rosch and B. Loyd (eds.), Cognition and Categorization, 27-48. Hillsdale, NJ: Lawrence Erlbaum Associates. Wedel, A. Contrast Maintenance in Language and the Innateness Debate. Retrieved 2/18/2013 from http://dingo.sbs.arizona.edu/ ~wedel/research/PDF/wedelcontrastsummary.pdf