=Paper= {{Paper |id=Vol-1347/paper08 |storemode=property |title=Lexical emergentism and the "frequency-by-regularity" interaction |pdfUrl=https://ceur-ws.org/Vol-1347/paper08.pdf |volume=Vol-1347 |dblpUrl=https://dblp.org/rec/conf/networds/MarziFP15 }} ==Lexical emergentism and the "frequency-by-regularity" interaction== https://ceur-ws.org/Vol-1347/paper08.pdf
    Lexical emergentism and the “frequency-by-regularity” interaction
         Claudia Marzi                  Marcello Ferro                  Vito Pirrelli
          Institute for Computational Linguistics - National Research Council - Pisa
       {claudia.marzi,marcello.ferro,vito.pirrelli}@ilc.cnr.it

                                                               assume that accessing a word in some way affects
                      Abstract                                 the access representation of that word (e.g. Foster,
                                                               1976; Marslen-Wilson, 1993; Sandra, 1994).
    In spite of considerable converging
                                                                    In spite of such a wealth of converging
    evidence of the role of inflectional
                                                               evidence, however, little efforts have been put so
    paradigms in word acquisition and
                                                               far into providing detailed, algorithmic models of
    processing, little efforts have been put so
                                                               the interaction between word frequency,
    far into providing detailed, algorithmic
                                                               paradigm frequency, paradigm regularity and
    models of the interaction between lexical
                                                               lexical familiarity in word acquisition and
    token frequency, paradigm frequency,
                                                               processing. We offer here such an algorithmic
    paradigm regularity. We propose a neuro-
                                                               account, and discuss some theoretical
    computational account of this interaction,
                                                               implications on the basis of computational
    and discuss some theoretical implications
                                                               simulations.
    of preliminary experimental results.
                                                               2    The computational model
1    Introduction
                                                               In the present contribution, we use Temporal Self-
Over the last fifteen years, growing evidence has
                                                               organising Maps (TSOMs) to simulate dynamic
accrued of the role of morphological paradigms in
                                                               effects of lexical storage, organisation and
the developmental course of word acquisition.
                                                               competition.
Children have been shown to be sensitive to sub-
regularities holding among paradigm cells (see,
among others, Orsolini et al., 1998; Laudanna et
al., 2004 on Italian; Dabrowska, 2004, 2005 on
Polish; and Labelle and Morris, 2011 on French).
In line with this evidence, and contrary to both
rule-based (e.g. Pinker and Ullman, 2002;
Albright, 2002) and connectionist approaches to
word acquisition (Rumelhart and McClelland,
1986), no unique paradigm cell can be identified
as the base source of all inflected forms produced
by the speaker, but the structure of the entire                Figure 1. An integrated activation pattern for the input
paradigm is understood to play a fundamental role              string “#pop$”. Note that two distinct, but topologically
in both word acquisition and processing.                       neighbouring nodes respond to the two p’s in pop, bearing
     Such evidence supports a view of the mental               witness to the process of selective sensitivity to time-bound
lexicon as an emergent integrative system,                     instances of the same symbol type. For simplicity, only the
                                                               nodes that are most highly activated by each input symbol
whereby words are concurrently, redundantly and                are shaded and tagged with that symbol.
competitively stored (Alegre and Gordon, 1999;
Baayen et al., 2007). The view assumes that all                TSOMs, a variant of classical Kohonen’s SOMs
word forms are memorised in the lexicon, thus                  (Kohonen, 2001), are dynamic memories that are
making no distinction between regular and                      trained to store and classify time-series of
irregular inflected forms, or between uniquely                 symbols through patterns of activation of fully
stored bases and all other non-base forms                      interconnected nodes (Koutnik, 2007; Ferro et al.,
produced by the speaker on demand (see Baayen,                 2010; Pirrelli et al., 2011; Marzi et al., 2012). Map
2007; Marzi, 2014; for a recent overview). In                  nodes mimic neural clusters, with inter-node
addition, to capture the fact that words                       connections representing neuron synapses whose
encountered frequently exhibit different lexical               weights determine the amount of influence that
properties from words encountered relatively                   the activation of one node has on another node
infrequently, any model of lexical access must                 (Fig. 1). Each map node receives input

          Copyright © by the paper’s authors. Copying permitted for private and academic purposes.
In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
                          Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org

                                                          37
connections from an input layer where individual            of paradigm members, we can investigate the
symbols making up a word are presented one at a             relative contribution of input factors to the timing
time, in their order of appearance. Input                   and pace of lexical acquisition and suggest an
connections thus convey information of the                  explanatory account of their interaction.
current input stimulus to map nodes. Hebbian
connections, on the other hand, are strengthened            3    Experimental evidence
each time two nodes are activated at consecutive
                                                            Fifty German and fifty Italian verb
time ticks, conveying the probabilistic
                                                            (sub)paradigms were selected among the most
expectation that one node will be activated soon
                                                            highly ranked paradigms by cumulative frequency
after another node is activated.
                                                            in a reference corpus (CELEX Lexical database
     When a symbol is shown on the input layer at
                                                            for German, Baayen et al., 1995; Paisà Corpus for
a certain time tick, all map nodes are fired
                                                            Italian, Lyding et al., 2014). For each paradigm,
synchronously, their overall pattern of activation
                                                            an identical set of 15 cells was used for training,
representing the processing response of a TSOM
                                                            for an overall number of 750 inflected forms for
to the symbol at that time tick. Due to principles
                                                            each language. Each data set was administered to
of topological organisation of map’s responses,
                                                            the map for 100 epochs under two different
similar input stimuli (i.e. two instances of the
                                                            training regimes: a uniform distribution (UD: 5
same symbol in different contexts) tend to be
                                                            tokens per word), and a function of real word
associated with largely overlapping memory
                                                            frequency distributions in the reference corpus
traces (e.g. the two p nodes activated by pop in
                                                            (SD: tokens are in the range of 1 to 1000). By
Fig. 1). During training, nodes get gradually
                                                            varying frequency and comparing the inflectional
specialised to respond most strongly to specific
                                                            complexity of training data across the two
time-bound instantiations of symbols, while
                                                            experiments, we expected to gain some insights
remaining relatively inactive in the presence of
                                                            into the interplay between morphological
other stimuli. A recurrent activation pattern
                                                            regularity (defined by levels of predictability in
associated with an input symbol occurring in a
                                                            stem and ending allomorphy of training data in the
specific context can thus be seen as the map’s
                                                            two languages) and word frequency in word
memory trace for that symbol in that context.
                                                            acquisition. After training, we monitored the
     An input word is administered to a TSOM as
                                                            behaviour of the four resulting TSOMs (namely
a time series of symbols, i.e. a sequence of letters
                                                            UD Italian, SD Italian, UD German and SD
or sounds presented on the input layer one at a
                                                            German) by controlling the time of acquisition of
time. The map’s response to a word stimulus is the
                                                            individual words, the time of acquisition of entire
overall activation pattern obtained through
                                                            paradigms, and their acquisitional time span. For
integration of the activation patterns triggered by
                                                            our present purposes, we define the time of
the individual symbols making up the word (see
                                                            acquisition of a single word as the training epoch
Fig. 1 for a simplified example with the word
                                                            whence a TSOM can accurately recall the word in
pop). Accordingly, if two input strings present
                                                            question from its memory trace. Recall is a
some symbols in common (e.g. pop and cop, write
                                                            difficult task that requires that the map has
and written), they will tend to activate largely
                                                            developed a clear notion of how to unfold a
overlapping patterns of strongly responsive
                                                            synchronous activation pattern (the word’s
nodes. Like in the case of individual symbols, the
                                                            memory trace) into a sequence of nodes
integrated activation pattern for an input word is,
                                                            representing the correct letters making up the
at the same time, the systematic processing
                                                            word, in the appropriate order. Likewise, for each
response of the map to an input stimulus, and the
                                                            paradigm, its time of acquisition by a map is the
word’s memorised representation (or memory
                                                            mean acquisition epoch of all forms belonging to
trace) in the map.
                                                            the paradigm.
   To investigate issues of “frequency-by-
                                                                 As a general trend, TSOMs acquire word
regularity” interaction (Ellis and Smith, 1998), we
                                                            forms by token frequency, with higher-frequency
compared two sets of parallel experiments carried
                                                            words being successfully recalled at earlier
out on German verb paradigms (Marzi et al.,
                                                            learning epochs. However, when it comes to the
2014) and Italian verb paradigms. By keeping
                                                            actual timing of paradigm acquisition, things get
constant some input conditions, such as selection
                                                            considerably more complex, with the notion of
of paradigm cells and degrees of morphological
                                                            morphological regularity interacting non-trivially
redundancy within training paradigms, while
                                                            with token frequency distributions. In fact, in both
varying others, such as the frequency distribution




                                                       38
German and Italian, the vast majority of                              predictable stem allomorphy due to a limited
paradigms are acquired earlier (p<.005) in a UD                       number of alternants, show a correlation between
regime than in an SD regime (Fig. 2).                                 stem cumulative frequency and acquisition time
                                                                      (r=-.24 p<.00001).
                                                                           Conversely, in Italian, where verb
                                                                      conjugation exhibits more extensive and less
                                                                      predictable patterns of allomorphy than in
                                                                      German (Pirrelli, 2000), acquisition of irregular
                                                                      paradigms does not appear to benefit from stem
                                                                      cumulative token frequencies (r=.01, p>.5). This
                                                                      suggests that extensive allomorphy in a paradigm
                                                                      tends to minimise the influence of cumulative
                                                                      frequency on its acquisition, and isolated forms
                                                                      can only take advantage of their own token
                                                                      frequency, while taking no advantage of the
                                                                      frequency boost provided by other cells of the
                                                                      same paradigm. As a result, Italian irregular
                                                                      paradigms are acquired significantly (p<.005)
                                                                      later than their German homologues.
                                                                           Our data cannot be explained away as a
                                                                      simple by-product of word-frequency effects.
                                                                      Experiments provide, in fact, evidence of
                                                                      interactive processing effects in word acquisition,
                                                                      whereby morphological regularity modulates
                                                                      frequency. Data analysis shows that recurrent
 Figure 2: Time course of regular (left) and irregular (right)
                                                                      patterns appear to determine global co-
 paradigms ranked by increasing learning epoch under SD
 (grey circles) and UD (white circles) regimes for both               organisation of stored word forms and distributed,
 Italian (top) and German (bottom). Values are averaged               overlapping memory traces, which ultimately
 across 5 map instances for each type.                                favour generalisation in lexical acquisition. Forms
                                                                      containing recurrent patterns can take advantage
4    Frequency by regularity interaction                              of the memory traces shared with other related
                                                                      forms, namely forms sharing the same stem, and
Our simulations show that, in both languages,                         connections between the nodes making up their
word forms in regular paradigms tend to be                            memory traces are strengthened since patterns are
acquired earlier (significantly earlier learning                      shown more often in training, similarly to high-
epochs, p<.001), and regular paradigms are                            frequency isolated words.
acquired more quickly (significantly shorter                               This is particularly true for regular, highly
learning spans, i.e. lower number of epochs                           entropic paradigms, i.e. those regular paradigms
between the acquisition time of the first and the                     whose members exhibit uniform frequency
last member of a paradigm, p<.005) than irregular                     distributions, and for irregular highly systematic
paradigms are. In German data, regular paradigms                      paradigms. Conversely, where memory traces
are less sensitive to token frequency effects than                    overlap less systematically, this effect is
irregular paradigms are, as witnessed by the                          considerably reduced, as witnessed by the
strong correlation (r=.95, p<.00001) between the                      difference in time of acquisition between regular
time course of acquisition of regular paradigms in                    and irregular paradigms, particularly in Italian
SD and UD regimes (Fig. 2, bottom left panel).                        conjugation.
Token frequency affects the acquisition of regular                         In TSOMs, the effects are the dynamic result
paradigms to a lesser extent than the acquisition                     of two interacting dimensions of memory self-
of irregular ones, because regular stems can take                     organisation: (i) the syntagmatic or linear
advantage of their cumulative frequency across                        dimension, which controls the level of
the whole paradigm. In fact, forms in regular                         predictability and entrenchment of memory traces
paradigms exhibit a significant correlation                           in the lexicon through the probabilistic
between stem cumulative frequency and time of                         distribution of weights over inter-node Hebbian
acquisition (r=-.40, p<.00001). Similarly, also                       connections; and (ii) the paradigmatic or vertical
German irregular paradigms, which exhibit a                           dimension, which controls for the number of




                                                                 39
similar, paradigmatically-related word forms that                   (Fig. 3, bottom). We observe, in fact, a highly
get co-activated when one member of a paradigm                      significant correlation (r=.49, p<.00001 for both
is input to the map (Pirrelli et al., 2014).                        datasets) between levels of filtering and words’
     High-frequency words develop quick                             learning epochs.
entrenchment of Hebbian connections, which                                High-frequency words predictably show
eventually cause high levels of node activation in                  higher activation levels than low-frequency
their memory traces and sparser co-activation of                    words, with an interesting difference of the
memory traces of other words. Strong connections                    interaction of frequency and activation levels of
and high activation levels mean high expectations                   regulars and irregulars. High-frequency, highly
for frequently activated memory traces, which are                   irregular words (e.g. German ist or Italian è) are
thus recalled more easily and are less confusable                   stored in isolation, with highly-activated memory
with other neighbouring words. Likewise, in                         nodes and no co-activation with other words. As a
regular and sub-regular paradigms, sharing                          result, they require little filtering to be recalled
memory traces can strengthen connections and                        and are acquired considerably quickly. High-
raise node activation levels, since all related forms               frequency regular paradigms, despite in both
can take advantage of the memory traces shared                      Italian and German training sets their average
with other members of the same paradigm.                            frequency is nearly half the average frequency of
                                                                    high-frequency irregulars, show comparable
                                                                    levels of activation with high-frequency
                                                                    irregulars, due to the facilitatory effect of having
                                                                    more words that consistently activate the same
                                                                    pattern of nodes.
                                                                          This evidence shows that regularity indeed
                                                                    modulates the interaction between frequency and
                                                                    activation strength, and it gives a strong indication
                                                                    that acquisition of regulars is typically paradigm-
                                                                    based, whereas acquisition of irregulars is mostly
                                                                    item-based.
                                                                          Surely, as the notion of paradigm regularity
                                                                    is inherently graded, some verb systems show
 Figure 3: Levels of activation strength (top) and filtering
                                                                    higher sensitivity to these effects than others. This
 (bottom) for Italian (left) and German (right), for four           is illustrated by German sub-regular paradigms,
 regularity-by-frequency classes. Low-frequency is set              which present fewer and more predictable stem
 below the first quartile of frequency distributions in the         alternants than Italian sub-paradigms, and thus
 two training sets, while high-frequency being set above the        larger stem-sharing word families. Accordingly,
 third quartile.
                                                                    TSOMs allocate comparatively higher levels of
     This dynamic provides an algorithmic                           activation to low-frequency German sub-regulars
account of the observation that regularity favours                  and acquire them earlier than their Italian
acquisition of both high- and low-frequency                         homologues.
words, as shown in Fig. 3, where we compare                               The evidence reported here establishes, in our
average levels of activation for four classes of                    view, an important connection between aspects of
training word forms: low-frequency regulars, low                    morphological structure, frequency distributions
frequency irregulars, high-frequency regulars and                   of words in paradigms, and lexical acquisition in
high-frequency irregulars.1                                         concurrent, competitive storage. Acquisition of
     Activation levels of low-frequency words                       redundant morphological patterns play an
appear to be significantly stronger within regular                  increasingly important role in an emergent
paradigms than within irregular paradigms (Fig.                     lexicon, shifting acquisitional strategies from rote
3, top). Stronger activation levels make patterns                   memorisation (typical of irregular low-entropy
less confusable and easier to be accessed, as                       paradigms)       to     dynamic       memory-based
witnessed by the lower level of filtering2 required                 generalisation.
for activation patterns to be recalled accurately

1 Frequency thresholds are set below the first quartile (low        2 Filtering an integrated activation pattern refers to the

frequency) and above the third quartile (high frequency) in         process of bringing down to zero the levels of activation of
the frequency distribution of training word forms.                  nodes that do not reach a set threshold.




                                                               40
References                                                      Lingue e Linguaggio, XIII (2): 263-290.
Maria Alegre and Peter Gordon. 1999. Frequency                Claudia Marzi. 2014. Models and dynamics of the
  effects and the representational status of regular            morphological lexicon in mono- and bilingual
  inflections. Journal of Memory and Language, 40:              acquisition. Unpublished PhD Dissertation.
  41-61.                                                        University of Pavia.
                                                                www.comphyslab.it/redirect/?id=claudia.marzi.en_phd
Harald R. Baayen, Richard Piepenbrock and Leon
  Gulikers. 1995. The CELEX Lexical Database (CD-             Margherita Orsolini, Rachele Fanari and Hugo Bowles.
  ROM). Philadelphia: Linguistic Data Consortium.               1998. Acquiring regular and irregular inflections in
                                                                a language with verb classes. Language and
Harald R. Baayen. 2007. Storage and computation in              cognitive processes, 13(4): 425-464.
  the mental lexicon. In G. Jarema and G. Libben
  (eds.), The Mental Lexicon: Core Perspectives, 81-          Steven Pinker and Michael Ullman. 2002. The past and
  104. Amsterdam: Elsevier.                                      future of the past tense. Trends in Cognitive Science,
                                                                 6: 456-463.
Lucia Colombo, Alessandro Laudanna, Maria De
  Martino and Cristina Brivio. 2004. Regularity               Vito Pirrelli, Claudia Marzi and Marcello Ferro. 2014.
  and/orconsistency in the production of the past                Two-dimensional Wordlikeness Effects in Lexical
  participle? Brain and Language, 90: 128-142.                   Organisation. In: Basili R., Lenci A., Magnini B.
                                                                 (eds.) Proceedings of the First Italian Conference
Ewa Dabrowska. 2004. Rules or schemata? Evidence                 on Computational Linguistic, December 9-11, 2014.
  from Polish. Language and cognitive processes, 19              301-305, Pisa: Pisa University Press.
  (2): 225–271.
                                                              Vito Pirrelli, Marcello Ferro and Basilio Calderone.
Ewa Dabrowska. 2005. Productivity and beyond:                    2011. Learning paradigms in time and space.
  mastering the Polish genitive inflection. Journal of           Computational evidence from Romance languages.
  child language, 32: 191-205.                                   In M. Maiden, J. C. Smith, M. Goldbach and M. O.
Nick C. Ellis and Richard Schmidt. 1998. Rules or                Hinzelin (eds.), Morphological Autonomy:
                                                                 Perspectives     from     Romance     Inflectional
    Associations in the Acquisition of Morphology?
                                                                 Morphology, 135-157. Oxford: Oxford University
    The Frequency by Regularity Interaction in
                                                                 Press.
    Human and PDP Learning of Morphosyntax.
    Language and Cognitive Processes, 13: 307-336.            Vito Pirrelli. 2000. Paradigmi in morfologia. Un
                                                                 approccio interdisciplinare alla flessione verbale
Marcello Ferro, Giovanni Pezzulo and Vito Pirrelli.
                                                                 dell'italiano. Pisa-Roma: Istituti editoriali e
  2010. Morphology, Memory and the Mental
                                                                 poligrafici internazionali.
  Lexicon. In Pirrelli, V. (ed.), Lingue e Linguaggio,
  IX(2): 199-238.                                             Kim Plunkett and Virginia Marchman. 1993. From rote
                                                                learning to system building – acquiring verb
Jan Koutnik. 2007. Inductive Modelling of Temporal
                                                                morphology in children and connectionist nets.
   Sequences by Means of Self-organization. In
                                                                Cognition, 48: 21-69.
   Proceeding of International Workshop on Inductive
   Modelling. Prague: 269-277.                                David E. Rumelhart and James L. McClelland. 1986.
                                                                On learning the past tense of English verbs. In
Marie Labelle and Lori Morris. 2011. The acquisition
                                                                McClelland, J.L. and Rumelhart, D.E. (eds.)
  of a verbal paradigm: Verb Morphology in French
                                                                Parallel   distributed   processing,   217-270.
  L1 children. Prépublication. (Montréal, Québec,
                                                                Cambridge: MIT Press.
  Canada, UQAM, département de linguistique).
  http://www.archipel.uqam.ca/3992/1/Labelle-
  Morris_AcquisitionVerbalParadigm.pdf
Verena Lyding, Egon Stemle, Claudia Borghetti,
  Marco Brunello, Sara Castagnoli, Felice
  Dell'Orletta, Henrik Dittmann, Alessandro Lenci
  and Vito Pirrelli. 2014. The PAISÀ Corpus of
  Italian Web Texts. In F. Bildhauer and R. Schäfer
  (eds.) Proceedings of the 9th Web as Corpus
  Workshop (WaC-9): 36-43. Gothenburg.
Claudia Marzi, Marcello Ferro and Vito Pirrelli. 2012.
  Word alignment and paradigm induction. Lingue e
  Linguaggio, XI (2): 251-274.
Claudia Marzi, Marcello Ferro and Vito Pirrelli. 2014.
  Morphological structure through lexical parsability.




                                                         41