=Paper= {{Paper |id=Vol-1347/paper08 |storemode=property |title=Lexical emergentism and the "frequency-by-regularity" interaction |pdfUrl=https://ceur-ws.org/Vol-1347/paper08.pdf |volume=Vol-1347 |dblpUrl=https://dblp.org/rec/conf/networds/MarziFP15 }} ==Lexical emergentism and the "frequency-by-regularity" interaction== https://ceur-ws.org/Vol-1347/paper08.pdf

Lexical emergentism and the “frequency-by-regularity” interaction
Claudia Marzi Marcello Ferro Vito Pirrelli
Institute for Computational Linguistics - National Research Council - Pisa
{claudia.marzi,marcello.ferro,vito.pirrelli}@ilc.cnr.it

assume that accessing a word in some way affects
Abstract the access representation of that word (e.g. Foster,
1976; Marslen-Wilson, 1993; Sandra, 1994).
In spite of considerable converging
In spite of such a wealth of converging
evidence of the role of inflectional
evidence, however, little efforts have been put so
paradigms in word acquisition and
far into providing detailed, algorithmic models of
processing, little efforts have been put so
the interaction between word frequency,
far into providing detailed, algorithmic
paradigm frequency, paradigm regularity and
models of the interaction between lexical
lexical familiarity in word acquisition and
token frequency, paradigm frequency,
processing. We offer here such an algorithmic
paradigm regularity. We propose a neuro-
account, and discuss some theoretical
computational account of this interaction,
implications on the basis of computational
and discuss some theoretical implications
simulations.
of preliminary experimental results.
2 The computational model
1 Introduction
In the present contribution, we use Temporal Self-
Over the last fifteen years, growing evidence has
organising Maps (TSOMs) to simulate dynamic
accrued of the role of morphological paradigms in
effects of lexical storage, organisation and
the developmental course of word acquisition.
competition.
Children have been shown to be sensitive to sub-
regularities holding among paradigm cells (see,
among others, Orsolini et al., 1998; Laudanna et
al., 2004 on Italian; Dabrowska, 2004, 2005 on
Polish; and Labelle and Morris, 2011 on French).
In line with this evidence, and contrary to both
rule-based (e.g. Pinker and Ullman, 2002;
Albright, 2002) and connectionist approaches to
word acquisition (Rumelhart and McClelland,
1986), no unique paradigm cell can be identified
as the base source of all inflected forms produced
by the speaker, but the structure of the entire Figure 1. An integrated activation pattern for the input
paradigm is understood to play a fundamental role string “#pop$”. Note that two distinct, but topologically
in both word acquisition and processing. neighbouring nodes respond to the two p’s in pop, bearing
Such evidence supports a view of the mental witness to the process of selective sensitivity to time-bound
lexicon as an emergent integrative system, instances of the same symbol type. For simplicity, only the
nodes that are most highly activated by each input symbol
whereby words are concurrently, redundantly and are shaded and tagged with that symbol.
competitively stored (Alegre and Gordon, 1999;
Baayen et al., 2007). The view assumes that all TSOMs, a variant of classical Kohonen’s SOMs
word forms are memorised in the lexicon, thus (Kohonen, 2001), are dynamic memories that are
making no distinction between regular and trained to store and classify time-series of
irregular inflected forms, or between uniquely symbols through patterns of activation of fully
stored bases and all other non-base forms interconnected nodes (Koutnik, 2007; Ferro et al.,
produced by the speaker on demand (see Baayen, 2010; Pirrelli et al., 2011; Marzi et al., 2012). Map
2007; Marzi, 2014; for a recent overview). In nodes mimic neural clusters, with inter-node
addition, to capture the fact that words connections representing neuron synapses whose
encountered frequently exhibit different lexical weights determine the amount of influence that
properties from words encountered relatively the activation of one node has on another node
infrequently, any model of lexical access must (Fig. 1). Each map node receives input

Copyright © by the paper’s authors. Copying permitted for private and academic purposes.
In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org

37
connections from an input layer where individual of paradigm members, we can investigate the
symbols making up a word are presented one at a relative contribution of input factors to the timing
time, in their order of appearance. Input and pace of lexical acquisition and suggest an
connections thus convey information of the explanatory account of their interaction.
current input stimulus to map nodes. Hebbian
connections, on the other hand, are strengthened 3 Experimental evidence
each time two nodes are activated at consecutive
Fifty German and fifty Italian verb
time ticks, conveying the probabilistic
(sub)paradigms were selected among the most
expectation that one node will be activated soon
highly ranked paradigms by cumulative frequency
after another node is activated.
in a reference corpus (CELEX Lexical database
When a symbol is shown on the input layer at
for German, Baayen et al., 1995; Paisà Corpus for
a certain time tick, all map nodes are fired
Italian, Lyding et al., 2014). For each paradigm,
synchronously, their overall pattern of activation
an identical set of 15 cells was used for training,
representing the processing response of a TSOM
for an overall number of 750 inflected forms for
to the symbol at that time tick. Due to principles
each language. Each data set was administered to
of topological organisation of map’s responses,
the map for 100 epochs under two different
similar input stimuli (i.e. two instances of the
training regimes: a uniform distribution (UD: 5
same symbol in different contexts) tend to be
tokens per word), and a function of real word
associated with largely overlapping memory
frequency distributions in the reference corpus
traces (e.g. the two p nodes activated by pop in
(SD: tokens are in the range of 1 to 1000). By
Fig. 1). During training, nodes get gradually
varying frequency and comparing the inflectional
specialised to respond most strongly to specific
complexity of training data across the two
time-bound instantiations of symbols, while
experiments, we expected to gain some insights
remaining relatively inactive in the presence of
into the interplay between morphological
other stimuli. A recurrent activation pattern
regularity (defined by levels of predictability in
associated with an input symbol occurring in a
stem and ending allomorphy of training data in the
specific context can thus be seen as the map’s
two languages) and word frequency in word
memory trace for that symbol in that context.
acquisition. After training, we monitored the
An input word is administered to a TSOM as
behaviour of the four resulting TSOMs (namely
a time series of symbols, i.e. a sequence of letters
UD Italian, SD Italian, UD German and SD
or sounds presented on the input layer one at a
German) by controlling the time of acquisition of
time. The map’s response to a word stimulus is the
individual words, the time of acquisition of entire
overall activation pattern obtained through
paradigms, and their acquisitional time span. For
integration of the activation patterns triggered by
our present purposes, we define the time of
the individual symbols making up the word (see
acquisition of a single word as the training epoch
Fig. 1 for a simplified example with the word
whence a TSOM can accurately recall the word in
pop). Accordingly, if two input strings present
question from its memory trace. Recall is a
some symbols in common (e.g. pop and cop, write
difficult task that requires that the map has
and written), they will tend to activate largely
developed a clear notion of how to unfold a
overlapping patterns of strongly responsive
synchronous activation pattern (the word’s
nodes. Like in the case of individual symbols, the
memory trace) into a sequence of nodes
integrated activation pattern for an input word is,
representing the correct letters making up the
at the same time, the systematic processing
word, in the appropriate order. Likewise, for each
response of the map to an input stimulus, and the
paradigm, its time of acquisition by a map is the
word’s memorised representation (or memory
mean acquisition epoch of all forms belonging to
trace) in the map.
the paradigm.
To investigate issues of “frequency-by-
As a general trend, TSOMs acquire word
regularity” interaction (Ellis and Smith, 1998), we
forms by token frequency, with higher-frequency
compared two sets of parallel experiments carried
words being successfully recalled at earlier
out on German verb paradigms (Marzi et al.,
learning epochs. However, when it comes to the
2014) and Italian verb paradigms. By keeping
actual timing of paradigm acquisition, things get
constant some input conditions, such as selection
considerably more complex, with the notion of
of paradigm cells and degrees of morphological
morphological regularity interacting non-trivially
redundancy within training paradigms, while
with token frequency distributions. In fact, in both
varying others, such as the frequency distribution

38
German and Italian, the vast majority of predictable stem allomorphy due to a limited
paradigms are acquired earlier (p<.005) in a UD number of alternants, show a correlation between
regime than in an SD regime (Fig. 2). stem cumulative frequency and acquisition time
(r=-.24 p<.00001).
Conversely, in Italian, where verb
conjugation exhibits more extensive and less
predictable patterns of allomorphy than in
German (Pirrelli, 2000), acquisition of irregular
paradigms does not appear to benefit from stem
cumulative token frequencies (r=.01, p>.5). This
suggests that extensive allomorphy in a paradigm
tends to minimise the influence of cumulative
frequency on its acquisition, and isolated forms
can only take advantage of their own token
frequency, while taking no advantage of the
frequency boost provided by other cells of the
same paradigm. As a result, Italian irregular
paradigms are acquired significantly (p<.005)
later than their German homologues.
Our data cannot be explained away as a
simple by-product of word-frequency effects.
Experiments provide, in fact, evidence of
interactive processing effects in word acquisition,
whereby morphological regularity modulates
frequency. Data analysis shows that recurrent
Figure 2: Time course of regular (left) and irregular (right)
patterns appear to determine global co-
paradigms ranked by increasing learning epoch under SD
(grey circles) and UD (white circles) regimes for both organisation of stored word forms and distributed,
Italian (top) and German (bottom). Values are averaged overlapping memory traces, which ultimately
across 5 map instances for each type. favour generalisation in lexical acquisition. Forms
containing recurrent patterns can take advantage
4 Frequency by regularity interaction of the memory traces shared with other related
forms, namely forms sharing the same stem, and
Our simulations show that, in both languages, connections between the nodes making up their
word forms in regular paradigms tend to be memory traces are strengthened since patterns are
acquired earlier (significantly earlier learning shown more often in training, similarly to high-
epochs, p<.001), and regular paradigms are frequency isolated words.
acquired more quickly (significantly shorter This is particularly true for regular, highly
learning spans, i.e. lower number of epochs entropic paradigms, i.e. those regular paradigms
between the acquisition time of the first and the whose members exhibit uniform frequency
last member of a paradigm, p<.005) than irregular distributions, and for irregular highly systematic
paradigms are. In German data, regular paradigms paradigms. Conversely, where memory traces
are less sensitive to token frequency effects than overlap less systematically, this effect is
irregular paradigms are, as witnessed by the considerably reduced, as witnessed by the
strong correlation (r=.95, p<.00001) between the difference in time of acquisition between regular
time course of acquisition of regular paradigms in and irregular paradigms, particularly in Italian
SD and UD regimes (Fig. 2, bottom left panel). conjugation.
Token frequency affects the acquisition of regular In TSOMs, the effects are the dynamic result
paradigms to a lesser extent than the acquisition of two interacting dimensions of memory self-
of irregular ones, because regular stems can take organisation: (i) the syntagmatic or linear
advantage of their cumulative frequency across dimension, which controls the level of
the whole paradigm. In fact, forms in regular predictability and entrenchment of memory traces
paradigms exhibit a significant correlation in the lexicon through the probabilistic
between stem cumulative frequency and time of distribution of weights over inter-node Hebbian
acquisition (r=-.40, p<.00001). Similarly, also connections; and (ii) the paradigmatic or vertical
German irregular paradigms, which exhibit a dimension, which controls for the number of

39
similar, paradigmatically-related word forms that (Fig. 3, bottom). We observe, in fact, a highly
get co-activated when one member of a paradigm significant correlation (r=.49, p<.00001 for both
is input to the map (Pirrelli et al., 2014). datasets) between levels of filtering and words’
High-frequency words develop quick learning epochs.
entrenchment of Hebbian connections, which High-frequency words predictably show
eventually cause high levels of node activation in higher activation levels than low-frequency
their memory traces and sparser co-activation of words, with an interesting difference of the
memory traces of other words. Strong connections interaction of frequency and activation levels of
and high activation levels mean high expectations regulars and irregulars. High-frequency, highly
for frequently activated memory traces, which are irregular words (e.g. German ist or Italian è) are
thus recalled more easily and are less confusable stored in isolation, with highly-activated memory
with other neighbouring words. Likewise, in nodes and no co-activation with other words. As a
regular and sub-regular paradigms, sharing result, they require little filtering to be recalled
memory traces can strengthen connections and and are acquired considerably quickly. High-
raise node activation levels, since all related forms frequency regular paradigms, despite in both
can take advantage of the memory traces shared Italian and German training sets their average
with other members of the same paradigm. frequency is nearly half the average frequency of
high-frequency irregulars, show comparable
levels of activation with high-frequency
irregulars, due to the facilitatory effect of having
more words that consistently activate the same
pattern of nodes.
This evidence shows that regularity indeed
modulates the interaction between frequency and
activation strength, and it gives a strong indication
that acquisition of regulars is typically paradigm-
based, whereas acquisition of irregulars is mostly
item-based.
Surely, as the notion of paradigm regularity
is inherently graded, some verb systems show
Figure 3: Levels of activation strength (top) and filtering
higher sensitivity to these effects than others. This
(bottom) for Italian (left) and German (right), for four is illustrated by German sub-regular paradigms,
regularity-by-frequency classes. Low-frequency is set which present fewer and more predictable stem
below the first quartile of frequency distributions in the alternants than Italian sub-paradigms, and thus
two training sets, while high-frequency being set above the larger stem-sharing word families. Accordingly,
third quartile.
TSOMs allocate comparatively higher levels of
This dynamic provides an algorithmic activation to low-frequency German sub-regulars
account of the observation that regularity favours and acquire them earlier than their Italian
acquisition of both high- and low-frequency homologues.
words, as shown in Fig. 3, where we compare The evidence reported here establishes, in our
average levels of activation for four classes of view, an important connection between aspects of
training word forms: low-frequency regulars, low morphological structure, frequency distributions
frequency irregulars, high-frequency regulars and of words in paradigms, and lexical acquisition in
high-frequency irregulars.1 concurrent, competitive storage. Acquisition of
Activation levels of low-frequency words redundant morphological patterns play an
appear to be significantly stronger within regular increasingly important role in an emergent
paradigms than within irregular paradigms (Fig. lexicon, shifting acquisitional strategies from rote
3, top). Stronger activation levels make patterns memorisation (typical of irregular low-entropy
less confusable and easier to be accessed, as paradigms) to dynamic memory-based
witnessed by the lower level of filtering2 required generalisation.
for activation patterns to be recalled accurately

1 Frequency thresholds are set below the first quartile (low 2 Filtering an integrated activation pattern refers to the

frequency) and above the third quartile (high frequency) in process of bringing down to zero the levels of activation of
the frequency distribution of training word forms. nodes that do not reach a set threshold.

40
References Lingue e Linguaggio, XIII (2): 263-290.
Maria Alegre and Peter Gordon. 1999. Frequency Claudia Marzi. 2014. Models and dynamics of the
effects and the representational status of regular morphological lexicon in mono- and bilingual
inflections. Journal of Memory and Language, 40: acquisition. Unpublished PhD Dissertation.
41-61. University of Pavia.
www.comphyslab.it/redirect/?id=claudia.marzi.en_phd
Harald R. Baayen, Richard Piepenbrock and Leon
Gulikers. 1995. The CELEX Lexical Database (CD- Margherita Orsolini, Rachele Fanari and Hugo Bowles.
ROM). Philadelphia: Linguistic Data Consortium. 1998. Acquiring regular and irregular inflections in
a language with verb classes. Language and
Harald R. Baayen. 2007. Storage and computation in cognitive processes, 13(4): 425-464.
the mental lexicon. In G. Jarema and G. Libben
(eds.), The Mental Lexicon: Core Perspectives, 81- Steven Pinker and Michael Ullman. 2002. The past and
104. Amsterdam: Elsevier. future of the past tense. Trends in Cognitive Science,
6: 456-463.
Lucia Colombo, Alessandro Laudanna, Maria De
Martino and Cristina Brivio. 2004. Regularity Vito Pirrelli, Claudia Marzi and Marcello Ferro. 2014.
and/orconsistency in the production of the past Two-dimensional Wordlikeness Effects in Lexical
participle? Brain and Language, 90: 128-142. Organisation. In: Basili R., Lenci A., Magnini B.
(eds.) Proceedings of the First Italian Conference
Ewa Dabrowska. 2004. Rules or schemata? Evidence on Computational Linguistic, December 9-11, 2014.
from Polish. Language and cognitive processes, 19 301-305, Pisa: Pisa University Press.
(2): 225–271.
Vito Pirrelli, Marcello Ferro and Basilio Calderone.
Ewa Dabrowska. 2005. Productivity and beyond: 2011. Learning paradigms in time and space.
mastering the Polish genitive inflection. Journal of Computational evidence from Romance languages.
child language, 32: 191-205. In M. Maiden, J. C. Smith, M. Goldbach and M. O.
Nick C. Ellis and Richard Schmidt. 1998. Rules or Hinzelin (eds.), Morphological Autonomy:
Perspectives from Romance Inflectional
Associations in the Acquisition of Morphology?
Morphology, 135-157. Oxford: Oxford University
The Frequency by Regularity Interaction in
Press.
Human and PDP Learning of Morphosyntax.
Language and Cognitive Processes, 13: 307-336. Vito Pirrelli. 2000. Paradigmi in morfologia. Un
approccio interdisciplinare alla flessione verbale
Marcello Ferro, Giovanni Pezzulo and Vito Pirrelli.
dell'italiano. Pisa-Roma: Istituti editoriali e
2010. Morphology, Memory and the Mental
poligrafici internazionali.
Lexicon. In Pirrelli, V. (ed.), Lingue e Linguaggio,
IX(2): 199-238. Kim Plunkett and Virginia Marchman. 1993. From rote
learning to system building – acquiring verb
Jan Koutnik. 2007. Inductive Modelling of Temporal
morphology in children and connectionist nets.
Sequences by Means of Self-organization. In
Cognition, 48: 21-69.
Proceeding of International Workshop on Inductive
Modelling. Prague: 269-277. David E. Rumelhart and James L. McClelland. 1986.
On learning the past tense of English verbs. In
Marie Labelle and Lori Morris. 2011. The acquisition
McClelland, J.L. and Rumelhart, D.E. (eds.)
of a verbal paradigm: Verb Morphology in French
Parallel distributed processing, 217-270.
L1 children. Prépublication. (Montréal, Québec,
Cambridge: MIT Press.
Canada, UQAM, département de linguistique).
http://www.archipel.uqam.ca/3992/1/Labelle-
Morris_AcquisitionVerbalParadigm.pdf
Verena Lyding, Egon Stemle, Claudia Borghetti,
Marco Brunello, Sara Castagnoli, Felice
Dell'Orletta, Henrik Dittmann, Alessandro Lenci
and Vito Pirrelli. 2014. The PAISÀ Corpus of
Italian Web Texts. In F. Bildhauer and R. Schäfer
(eds.) Proceedings of the 9th Web as Corpus
Workshop (WaC-9): 36-43. Gothenburg.
Claudia Marzi, Marcello Ferro and Vito Pirrelli. 2012.
Word alignment and paradigm induction. Lingue e
Linguaggio, XI (2): 251-274.
Claudia Marzi, Marcello Ferro and Vito Pirrelli. 2014.
Morphological structure through lexical parsability.