=Paper=
{{Paper
|id=Vol-1347/paper08
|storemode=property
|title=Lexical emergentism and the "frequency-by-regularity" interaction
|pdfUrl=https://ceur-ws.org/Vol-1347/paper08.pdf
|volume=Vol-1347
|dblpUrl=https://dblp.org/rec/conf/networds/MarziFP15
}}
==Lexical emergentism and the "frequency-by-regularity" interaction==
Lexical emergentism and the “frequency-by-regularity” interaction Claudia Marzi Marcello Ferro Vito Pirrelli Institute for Computational Linguistics - National Research Council - Pisa {claudia.marzi,marcello.ferro,vito.pirrelli}@ilc.cnr.it assume that accessing a word in some way affects Abstract the access representation of that word (e.g. Foster, 1976; Marslen-Wilson, 1993; Sandra, 1994). In spite of considerable converging In spite of such a wealth of converging evidence of the role of inflectional evidence, however, little efforts have been put so paradigms in word acquisition and far into providing detailed, algorithmic models of processing, little efforts have been put so the interaction between word frequency, far into providing detailed, algorithmic paradigm frequency, paradigm regularity and models of the interaction between lexical lexical familiarity in word acquisition and token frequency, paradigm frequency, processing. We offer here such an algorithmic paradigm regularity. We propose a neuro- account, and discuss some theoretical computational account of this interaction, implications on the basis of computational and discuss some theoretical implications simulations. of preliminary experimental results. 2 The computational model 1 Introduction In the present contribution, we use Temporal Self- Over the last fifteen years, growing evidence has organising Maps (TSOMs) to simulate dynamic accrued of the role of morphological paradigms in effects of lexical storage, organisation and the developmental course of word acquisition. competition. Children have been shown to be sensitive to sub- regularities holding among paradigm cells (see, among others, Orsolini et al., 1998; Laudanna et al., 2004 on Italian; Dabrowska, 2004, 2005 on Polish; and Labelle and Morris, 2011 on French). In line with this evidence, and contrary to both rule-based (e.g. Pinker and Ullman, 2002; Albright, 2002) and connectionist approaches to word acquisition (Rumelhart and McClelland, 1986), no unique paradigm cell can be identified as the base source of all inflected forms produced by the speaker, but the structure of the entire Figure 1. An integrated activation pattern for the input paradigm is understood to play a fundamental role string “#pop$”. Note that two distinct, but topologically in both word acquisition and processing. neighbouring nodes respond to the two p’s in pop, bearing Such evidence supports a view of the mental witness to the process of selective sensitivity to time-bound lexicon as an emergent integrative system, instances of the same symbol type. For simplicity, only the nodes that are most highly activated by each input symbol whereby words are concurrently, redundantly and are shaded and tagged with that symbol. competitively stored (Alegre and Gordon, 1999; Baayen et al., 2007). The view assumes that all TSOMs, a variant of classical Kohonen’s SOMs word forms are memorised in the lexicon, thus (Kohonen, 2001), are dynamic memories that are making no distinction between regular and trained to store and classify time-series of irregular inflected forms, or between uniquely symbols through patterns of activation of fully stored bases and all other non-base forms interconnected nodes (Koutnik, 2007; Ferro et al., produced by the speaker on demand (see Baayen, 2010; Pirrelli et al., 2011; Marzi et al., 2012). Map 2007; Marzi, 2014; for a recent overview). In nodes mimic neural clusters, with inter-node addition, to capture the fact that words connections representing neuron synapses whose encountered frequently exhibit different lexical weights determine the amount of influence that properties from words encountered relatively the activation of one node has on another node infrequently, any model of lexical access must (Fig. 1). Each map node receives input Copyright © by the paper’s authors. Copying permitted for private and academic purposes. In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org 37 connections from an input layer where individual of paradigm members, we can investigate the symbols making up a word are presented one at a relative contribution of input factors to the timing time, in their order of appearance. Input and pace of lexical acquisition and suggest an connections thus convey information of the explanatory account of their interaction. current input stimulus to map nodes. Hebbian connections, on the other hand, are strengthened 3 Experimental evidence each time two nodes are activated at consecutive Fifty German and fifty Italian verb time ticks, conveying the probabilistic (sub)paradigms were selected among the most expectation that one node will be activated soon highly ranked paradigms by cumulative frequency after another node is activated. in a reference corpus (CELEX Lexical database When a symbol is shown on the input layer at for German, Baayen et al., 1995; Paisà Corpus for a certain time tick, all map nodes are fired Italian, Lyding et al., 2014). For each paradigm, synchronously, their overall pattern of activation an identical set of 15 cells was used for training, representing the processing response of a TSOM for an overall number of 750 inflected forms for to the symbol at that time tick. Due to principles each language. Each data set was administered to of topological organisation of map’s responses, the map for 100 epochs under two different similar input stimuli (i.e. two instances of the training regimes: a uniform distribution (UD: 5 same symbol in different contexts) tend to be tokens per word), and a function of real word associated with largely overlapping memory frequency distributions in the reference corpus traces (e.g. the two p nodes activated by pop in (SD: tokens are in the range of 1 to 1000). By Fig. 1). During training, nodes get gradually varying frequency and comparing the inflectional specialised to respond most strongly to specific complexity of training data across the two time-bound instantiations of symbols, while experiments, we expected to gain some insights remaining relatively inactive in the presence of into the interplay between morphological other stimuli. A recurrent activation pattern regularity (defined by levels of predictability in associated with an input symbol occurring in a stem and ending allomorphy of training data in the specific context can thus be seen as the map’s two languages) and word frequency in word memory trace for that symbol in that context. acquisition. After training, we monitored the An input word is administered to a TSOM as behaviour of the four resulting TSOMs (namely a time series of symbols, i.e. a sequence of letters UD Italian, SD Italian, UD German and SD or sounds presented on the input layer one at a German) by controlling the time of acquisition of time. The map’s response to a word stimulus is the individual words, the time of acquisition of entire overall activation pattern obtained through paradigms, and their acquisitional time span. For integration of the activation patterns triggered by our present purposes, we define the time of the individual symbols making up the word (see acquisition of a single word as the training epoch Fig. 1 for a simplified example with the word whence a TSOM can accurately recall the word in pop). Accordingly, if two input strings present question from its memory trace. Recall is a some symbols in common (e.g. pop and cop, write difficult task that requires that the map has and written), they will tend to activate largely developed a clear notion of how to unfold a overlapping patterns of strongly responsive synchronous activation pattern (the word’s nodes. Like in the case of individual symbols, the memory trace) into a sequence of nodes integrated activation pattern for an input word is, representing the correct letters making up the at the same time, the systematic processing word, in the appropriate order. Likewise, for each response of the map to an input stimulus, and the paradigm, its time of acquisition by a map is the word’s memorised representation (or memory mean acquisition epoch of all forms belonging to trace) in the map. the paradigm. To investigate issues of “frequency-by- As a general trend, TSOMs acquire word regularity” interaction (Ellis and Smith, 1998), we forms by token frequency, with higher-frequency compared two sets of parallel experiments carried words being successfully recalled at earlier out on German verb paradigms (Marzi et al., learning epochs. However, when it comes to the 2014) and Italian verb paradigms. By keeping actual timing of paradigm acquisition, things get constant some input conditions, such as selection considerably more complex, with the notion of of paradigm cells and degrees of morphological morphological regularity interacting non-trivially redundancy within training paradigms, while with token frequency distributions. In fact, in both varying others, such as the frequency distribution 38 German and Italian, the vast majority of predictable stem allomorphy due to a limited paradigms are acquired earlier (p<.005) in a UD number of alternants, show a correlation between regime than in an SD regime (Fig. 2). stem cumulative frequency and acquisition time (r=-.24 p<.00001). Conversely, in Italian, where verb conjugation exhibits more extensive and less predictable patterns of allomorphy than in German (Pirrelli, 2000), acquisition of irregular paradigms does not appear to benefit from stem cumulative token frequencies (r=.01, p>.5). This suggests that extensive allomorphy in a paradigm tends to minimise the influence of cumulative frequency on its acquisition, and isolated forms can only take advantage of their own token frequency, while taking no advantage of the frequency boost provided by other cells of the same paradigm. As a result, Italian irregular paradigms are acquired significantly (p<.005) later than their German homologues. Our data cannot be explained away as a simple by-product of word-frequency effects. Experiments provide, in fact, evidence of interactive processing effects in word acquisition, whereby morphological regularity modulates frequency. Data analysis shows that recurrent Figure 2: Time course of regular (left) and irregular (right) patterns appear to determine global co- paradigms ranked by increasing learning epoch under SD (grey circles) and UD (white circles) regimes for both organisation of stored word forms and distributed, Italian (top) and German (bottom). Values are averaged overlapping memory traces, which ultimately across 5 map instances for each type. favour generalisation in lexical acquisition. Forms containing recurrent patterns can take advantage 4 Frequency by regularity interaction of the memory traces shared with other related forms, namely forms sharing the same stem, and Our simulations show that, in both languages, connections between the nodes making up their word forms in regular paradigms tend to be memory traces are strengthened since patterns are acquired earlier (significantly earlier learning shown more often in training, similarly to high- epochs, p<.001), and regular paradigms are frequency isolated words. acquired more quickly (significantly shorter This is particularly true for regular, highly learning spans, i.e. lower number of epochs entropic paradigms, i.e. those regular paradigms between the acquisition time of the first and the whose members exhibit uniform frequency last member of a paradigm, p<.005) than irregular distributions, and for irregular highly systematic paradigms are. In German data, regular paradigms paradigms. Conversely, where memory traces are less sensitive to token frequency effects than overlap less systematically, this effect is irregular paradigms are, as witnessed by the considerably reduced, as witnessed by the strong correlation (r=.95, p<.00001) between the difference in time of acquisition between regular time course of acquisition of regular paradigms in and irregular paradigms, particularly in Italian SD and UD regimes (Fig. 2, bottom left panel). conjugation. Token frequency affects the acquisition of regular In TSOMs, the effects are the dynamic result paradigms to a lesser extent than the acquisition of two interacting dimensions of memory self- of irregular ones, because regular stems can take organisation: (i) the syntagmatic or linear advantage of their cumulative frequency across dimension, which controls the level of the whole paradigm. In fact, forms in regular predictability and entrenchment of memory traces paradigms exhibit a significant correlation in the lexicon through the probabilistic between stem cumulative frequency and time of distribution of weights over inter-node Hebbian acquisition (r=-.40, p<.00001). Similarly, also connections; and (ii) the paradigmatic or vertical German irregular paradigms, which exhibit a dimension, which controls for the number of 39 similar, paradigmatically-related word forms that (Fig. 3, bottom). We observe, in fact, a highly get co-activated when one member of a paradigm significant correlation (r=.49, p<.00001 for both is input to the map (Pirrelli et al., 2014). datasets) between levels of filtering and words’ High-frequency words develop quick learning epochs. entrenchment of Hebbian connections, which High-frequency words predictably show eventually cause high levels of node activation in higher activation levels than low-frequency their memory traces and sparser co-activation of words, with an interesting difference of the memory traces of other words. Strong connections interaction of frequency and activation levels of and high activation levels mean high expectations regulars and irregulars. High-frequency, highly for frequently activated memory traces, which are irregular words (e.g. German ist or Italian è) are thus recalled more easily and are less confusable stored in isolation, with highly-activated memory with other neighbouring words. Likewise, in nodes and no co-activation with other words. As a regular and sub-regular paradigms, sharing result, they require little filtering to be recalled memory traces can strengthen connections and and are acquired considerably quickly. High- raise node activation levels, since all related forms frequency regular paradigms, despite in both can take advantage of the memory traces shared Italian and German training sets their average with other members of the same paradigm. frequency is nearly half the average frequency of high-frequency irregulars, show comparable levels of activation with high-frequency irregulars, due to the facilitatory effect of having more words that consistently activate the same pattern of nodes. This evidence shows that regularity indeed modulates the interaction between frequency and activation strength, and it gives a strong indication that acquisition of regulars is typically paradigm- based, whereas acquisition of irregulars is mostly item-based. Surely, as the notion of paradigm regularity is inherently graded, some verb systems show Figure 3: Levels of activation strength (top) and filtering higher sensitivity to these effects than others. This (bottom) for Italian (left) and German (right), for four is illustrated by German sub-regular paradigms, regularity-by-frequency classes. Low-frequency is set which present fewer and more predictable stem below the first quartile of frequency distributions in the alternants than Italian sub-paradigms, and thus two training sets, while high-frequency being set above the larger stem-sharing word families. Accordingly, third quartile. TSOMs allocate comparatively higher levels of This dynamic provides an algorithmic activation to low-frequency German sub-regulars account of the observation that regularity favours and acquire them earlier than their Italian acquisition of both high- and low-frequency homologues. words, as shown in Fig. 3, where we compare The evidence reported here establishes, in our average levels of activation for four classes of view, an important connection between aspects of training word forms: low-frequency regulars, low morphological structure, frequency distributions frequency irregulars, high-frequency regulars and of words in paradigms, and lexical acquisition in high-frequency irregulars.1 concurrent, competitive storage. Acquisition of Activation levels of low-frequency words redundant morphological patterns play an appear to be significantly stronger within regular increasingly important role in an emergent paradigms than within irregular paradigms (Fig. lexicon, shifting acquisitional strategies from rote 3, top). Stronger activation levels make patterns memorisation (typical of irregular low-entropy less confusable and easier to be accessed, as paradigms) to dynamic memory-based witnessed by the lower level of filtering2 required generalisation. for activation patterns to be recalled accurately 1 Frequency thresholds are set below the first quartile (low 2 Filtering an integrated activation pattern refers to the frequency) and above the third quartile (high frequency) in process of bringing down to zero the levels of activation of the frequency distribution of training word forms. nodes that do not reach a set threshold. 40 References Lingue e Linguaggio, XIII (2): 263-290. Maria Alegre and Peter Gordon. 1999. Frequency Claudia Marzi. 2014. Models and dynamics of the effects and the representational status of regular morphological lexicon in mono- and bilingual inflections. Journal of Memory and Language, 40: acquisition. Unpublished PhD Dissertation. 41-61. University of Pavia. www.comphyslab.it/redirect/?id=claudia.marzi.en_phd Harald R. Baayen, Richard Piepenbrock and Leon Gulikers. 1995. The CELEX Lexical Database (CD- Margherita Orsolini, Rachele Fanari and Hugo Bowles. ROM). Philadelphia: Linguistic Data Consortium. 1998. Acquiring regular and irregular inflections in a language with verb classes. Language and Harald R. Baayen. 2007. Storage and computation in cognitive processes, 13(4): 425-464. the mental lexicon. In G. Jarema and G. Libben (eds.), The Mental Lexicon: Core Perspectives, 81- Steven Pinker and Michael Ullman. 2002. The past and 104. Amsterdam: Elsevier. future of the past tense. Trends in Cognitive Science, 6: 456-463. Lucia Colombo, Alessandro Laudanna, Maria De Martino and Cristina Brivio. 2004. Regularity Vito Pirrelli, Claudia Marzi and Marcello Ferro. 2014. and/orconsistency in the production of the past Two-dimensional Wordlikeness Effects in Lexical participle? Brain and Language, 90: 128-142. Organisation. In: Basili R., Lenci A., Magnini B. (eds.) Proceedings of the First Italian Conference Ewa Dabrowska. 2004. Rules or schemata? Evidence on Computational Linguistic, December 9-11, 2014. from Polish. Language and cognitive processes, 19 301-305, Pisa: Pisa University Press. (2): 225–271. Vito Pirrelli, Marcello Ferro and Basilio Calderone. Ewa Dabrowska. 2005. Productivity and beyond: 2011. Learning paradigms in time and space. mastering the Polish genitive inflection. Journal of Computational evidence from Romance languages. child language, 32: 191-205. In M. Maiden, J. C. Smith, M. Goldbach and M. O. Nick C. Ellis and Richard Schmidt. 1998. Rules or Hinzelin (eds.), Morphological Autonomy: Perspectives from Romance Inflectional Associations in the Acquisition of Morphology? Morphology, 135-157. Oxford: Oxford University The Frequency by Regularity Interaction in Press. Human and PDP Learning of Morphosyntax. Language and Cognitive Processes, 13: 307-336. Vito Pirrelli. 2000. Paradigmi in morfologia. Un approccio interdisciplinare alla flessione verbale Marcello Ferro, Giovanni Pezzulo and Vito Pirrelli. dell'italiano. Pisa-Roma: Istituti editoriali e 2010. Morphology, Memory and the Mental poligrafici internazionali. Lexicon. In Pirrelli, V. (ed.), Lingue e Linguaggio, IX(2): 199-238. Kim Plunkett and Virginia Marchman. 1993. From rote learning to system building – acquiring verb Jan Koutnik. 2007. Inductive Modelling of Temporal morphology in children and connectionist nets. Sequences by Means of Self-organization. In Cognition, 48: 21-69. Proceeding of International Workshop on Inductive Modelling. Prague: 269-277. David E. Rumelhart and James L. McClelland. 1986. On learning the past tense of English verbs. In Marie Labelle and Lori Morris. 2011. The acquisition McClelland, J.L. and Rumelhart, D.E. (eds.) of a verbal paradigm: Verb Morphology in French Parallel distributed processing, 217-270. L1 children. Prépublication. (Montréal, Québec, Cambridge: MIT Press. Canada, UQAM, département de linguistique). http://www.archipel.uqam.ca/3992/1/Labelle- Morris_AcquisitionVerbalParadigm.pdf Verena Lyding, Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice Dell'Orletta, Henrik Dittmann, Alessandro Lenci and Vito Pirrelli. 2014. The PAISÀ Corpus of Italian Web Texts. In F. Bildhauer and R. Schäfer (eds.) Proceedings of the 9th Web as Corpus Workshop (WaC-9): 36-43. Gothenburg. Claudia Marzi, Marcello Ferro and Vito Pirrelli. 2012. Word alignment and paradigm induction. Lingue e Linguaggio, XI (2): 251-274. Claudia Marzi, Marcello Ferro and Vito Pirrelli. 2014. Morphological structure through lexical parsability. 41