=Paper= {{Paper |id=Vol-1749/paper25 |storemode=property |title=Written Word Production and Lexical Self–Organisation: Evidence from English (Pseudo)Compounds |pdfUrl=https://ceur-ws.org/Vol-1749/paper25.pdf |volume=Vol-1749 |authors=Marcello Ferro,Franco Alberto Cardillo,Vito Pirrelli,Christina L. Gagné,Thomas L. Spalding |dblpUrl=https://dblp.org/rec/conf/clic-it/FerroCPGS16 }} ==Written Word Production and Lexical Self–Organisation: Evidence from English (Pseudo)Compounds== https://ceur-ws.org/Vol-1749/paper25.pdf
Written word production and lexical self-organisation: evidence from
                  English (pseudo)compounds
      Marcello Ferro                   Franco Alberto Cardillo                   Vito Pirrelli
    ILC-CNR Pisa, Italy                  ILC-CNR Pisa, Italy                  ILC-CNR Pisa, Italy
 marcello.ferro@ilc.crn.it       francoalberto.cardillo@ilc.cnr.it         vito.pirrelli@ilc.cnr.it
               Christina L. Gagné                                  Thomas L. Spalding
Dept. of Psychology, University of Alberta, Canada   Dept. of Psychology, University of Alberta, Canada
              cgagne@ualberta.ca                                   spalding@ualberta.ca


                                                     1    The evidence
                  Abstract
                                                     A key question concerning the representation and
 Elevation in typing latency for the initial         processing of compound words has focused on
 letter of the second constituent of an Eng-         whether (and, if so, how) morphological struc-
 lish compound, relative to the latency for          ture plays a role. The bulk of the research on this
 the final letter of the first constituent of        issue has come from recognition or comprehen-
 the same compound, provides evidence                sion tasks such as lexical decision or reading.
 that implementation of a motor plan for             However, written production provides a useful
 written compound production involves                counterpart and allows researchers to examine
 smaller constituents, in both semantically          whether morphemes are used even after a word
 transparent and semantically opaque                 has been accessed. One advantage of a typing
 compounds. We investigate here the im-              task (in which the time to type each letter of a
 plications of this evidence for algorithmic         word is recorded) is that researchers can examine
 models of lexical organisation, to show             differences in processing difficulty at various
 that effects of differential perception of          points in the word. Previous research found an
 the internal structure of compounds and             elevation in typing latency for the initial letter of
 pseudo-compounds can also be simulated              the second constituent relative to the latency for
 as peripheral stages of lexical access by a         the final letter of the first constituent for English
 self-organising connectionist architec-             (Gagné & Spalding 2014; Libben et al. 2012;
 ture, even in the absence of morpho-                Libben & Weber 2014) and German compounds
 semantic information. This complemen-               (Sahel et al. 2008; Will et al. 2006). This eleva-
 tary evidence supports a maximization-              tion in typing latency at the morpheme boundary
 of-opportunity approach to lexical mod-             suggests that the system plans the output mor-
 elling, accounting for the integration of           pheme by morpheme, rather than as a whole unit,
 effects of pre-lexical and lexical access.          and that morphological programming is not
                                                     complete when the motor system begins the out-
 Il rallentamento nel tempo di battitura             put of the word (Kandel et al. 2008).
 del primo carattere del secondo costi-                  Gagné and Spalding (2016) examined the role
 tuente di un composto inglese, rispetto al          of morphemic structure and semantic transparen-
 tempo dell’ultimo carattere del primo co-           cy on typing latency. The stimuli consisted in
 stituente, dimostra che l’implementazione           200 compounds, 50 pseudo-compounds, and 250
 del programma motorio per la scrittura              monomorphemic words matched pairwise with
 di un composto è influenzata dai costi-             the compounds and pseudo-compounds in the
 tuenti del composto stesso, siano essi se-          number of syllables and letters. The pseudo-
 manticamente trasparenti o opachi. Il               compounds contain two words that do not func-
 presente contributo offre un modello                tion as morphemes (e.g., carpet contains car and
 computazionale di questa evidenza, e ne             pet). The compounds varied in whether the first
 valuta l’impatto sull’organizzazione del            and second constituent were semantically trans-
 lessico mentale: la percezione del confine          parent. The items were displayed individually
 di morfema tra i due costituenti è analiz-          using a progressive demasking procedure and
 zata come il risultato dell’interazione di-         participants typed the word as the computer rec-
 namica tra processi di accesso pre- e               orded the time required to type each letter.
 post-lessicale.
    The time to initiate the first letter was equiva-   cally opaque compounds (and, to an extent, in
lent for monomorphemic and compound words.              pseudo-compounds). Race models (panel C:
Typing times got faster across the word for both        Schreuder & Baayen 1995) posit parallel path-
word types, but the rate of change was faster for       ways for compound processing (both holistic and
monomorphemic words than for compound                   compositional), depending on variables such as
words. This difference was not observed when            whole word vs. constituent frequency, but it is
comparing monomorphemic words and pseudo-               not clear how they can account for effects of in-
compounds.                                              teraction between the two paths. Connectionist
   For compounds, the rate of speed-up was              models (panel D: Rumelhart & McClelland
slower when the first constituent was transparent       1986, Plaut & Gonnerman 2000), on the other
than when it was opaque, but was unaffected by          hand, tend to dispense with specialized represen-
the transparency of the second constituent. The         tational levels and access procedures, and make
elevation in typing latency at the morpheme             room for distributed effects of sublexical co-
boundary was larger when the first constituent          activation through overlaying patterns of pro-
was transparent than when it was opaque, but            cessing units. A defining feature of these models
was unaffected by the transparency of the second        is that they blur the traditional distinction be-
constituent. This difference is due to the final        tween representations and processing units. We
letter of the first constituent when the first con-     suggest that blurring this distinction can go a
stituent requiring less time to type when it was        long way in addressing some of the issues that
transparent than when it was opaque.                    appear to elude models A, B and C.
   The data for the pseudo-compounds indicated
that embedded morphemes influence production,
even when they do not function as morphemes.
Typing latency increased one letter prior to the
end of the first constituent of a pseudo-
compound and remained elevated through the
boundary (e.g., both r and c in scarcity were ele-
vated relative to the a).
1.1   Implications for lexical architectures
The reported evidence clearly indicates that mor-
phemic structure is involved in written word
production. The production of compounds differs
from that of monomorphemic words and the se-               Figure 1 – Four architectures of form-meaning mapping
mantic transparency of the two constituents leads        in the mental lexicon: C1+C2 designates two-word com-
                                                        pounds and Cs mono-morphemic constituents (adapted from
to different effects. Furthermore, embedded                               Diependaele et al. 2012)
pseudo-morphemes appear to influence the pro-
duction of pseudo-compounds, but not in the                Temporal Self-Organising Maps (TSOMs:
same way that the embedded morphemes affect             Ferro et al. 2011; Marzi et al. 2014; Pirrelli et al.
the production of compounds.                            2015), are a time-sensitive variant of Kohonen’s
   This appears to lend only partial support to         SOMs (Kohonen, 2002), where words are stored
models of lexical architecture where both com-          through routinized, time-bound patterns of re-
pounds and their constituents are represented and       peatedly successful processing units. Since all
processed as independent access units (Figure 1).       input words are stored concurrently on the same
In panel A, following Taft & Forster (1975), ac-        layer of fully connected nodes, TSOMs account
cess and output of compounds are mediated by            for effects of co-activation of competing repre-
their constituents (Cs), but extra procedures           sentations in terms of a continuous function of
would be needed to account for the role of se-          distributional regularities in the input data. In
mantic transparency in modulating the size of           what follows, starting from Gagné & Spalding’s
elevation in typing latency at the morpheme             evidence, we will focus on peripheral stages of
boundary. A supralexical account (panel B: Gi-          lexical access/output, to verify if mechanisms of
raudo & Grainger 2000, Grainger et al. 1991),           parallel, distributed pattern activation can ac-
where constituents are activated upon composi-          count for differential processing effects between
tional interpretation of compounds, cannot cap-         compounds and pseudo-compounds even in the
ture the persistence of typing effects in semanti-
absence of morpho-semantic information. Alt-               and the stronger its re-entrant connection. As a
hough computational testing is carried out on              result of this dynamic, high-frequency words
TSOMs only, our discussion and concluding re-              recruit specialised node chains, low-frequency
marks address issues that go beyond a specific             words are responded to by weaker, “blended”
computational framework.                                   node chains.


2     TSOMs
A TSOM consists of a grid of memory nodes
with two layers of connectivity. The first layer
(or I-layer) fully connects each node to the input
vector, where symbols are sampled at discrete
time ticks as patterns of activation ranging in the          Figure 2 - Left: operation of Hebbian rules on potentiated
[0, 1] interval. Weights on the I-layer are adjust-          (‘+’) and inhibited (‘-‘) connections. Right: forward one-
                                                           tick-delay connections leaving ‘A’ at time t-1. Larger nodes
ed in training for individual nodes to develop                represent BMUs. Shades of grey indicate levels of node
specialised sensitivity to particular input sym-                                      activation.
bols. Each node is also connected to all other
nodes through a layer of re-entrant connections            2.1    The Experiment
(or T-layer), whose weight strength determines             The 200 compounds and 50 pseudo-compounds
the amount of influence that activation of one             used by Gagné & Spalding were used to train a
node has on other nodes at a one-tick delay.               40x40 node TSOM for 100 learning epochs. Be-
   When an input symbol is presented at time t,            sides compounds and pseudo-compounds, the
the level of activation 𝑦! 𝑡 of node i is a func-          training set included 500 (pseudo)constituents as
tion of: (a) the node’s sensitivity to the current         individual words (e.g. car and wash in carwash,
input symbol (𝑦!_!"#$%,! 𝑡 ), and (b) the re-entrant       car and pet in carpet), for a total amount of 750
support the node receives from the map activa-             items. At each training epoch, monomorphemic
tion state at t-1 (𝑦!_!"#$%,! 𝑡 = 𝑓 𝑦! (𝑡 − 1) , where f   words were shown 10 times as often as com-
is a linear function and j ranges over all map             pounds. We ran 5 repetitions of the experiment,
nodes). More formally:                                     and results were analysed using linear mixed ef-
    𝑦! 𝑡 = 𝛼 ∙ 𝑦!_!"#$%,! 𝑡 + 1 − 𝛼 ∙ 𝑦!_!"#$%,! 𝑡         fects models (LME), with experiment repetitions
                                                           and training items as random variables.
The node responding most strongly to the input                To analyse differential processing effects for
symbol S at time tick t is called Best Matching            pseudo-compounds and compounds, we focused
Unit (hereafter BMU(S, t) or BMU(t) for short).            on two types of evidence: (i) per-letter perfor-
   The map’s response to a sequence of input               mance of a trained TSOMs in incrementally an-
symbols like carpet is a chain of consecutively            ticipating compounds and pseudo-compounds;
firing BMUs, each responding to a letter in car-           (ii) structural connectivity of BMUs responding
pet. During training, connection weights between           to letter bigrams at the C1-C2 boundary.
consecutive BMUs are adjusted to the frequency                To anticipate a progressively presented input
distribution of input symbols in the training set,         word, a TSOM propagates the activation of the
according to Hebbian principles of correlative             current BMU(t) through its forward temporal
learning. Given the bigram ab, the connection              connections, and outputs, at each time tick, the
strength between BMU(a, t-1) and BMU(b, t)                 symbol 𝑆!"#(!!!) encoded on the I_layer of the
increases if a often precedes b (entrenchment)             most strongly (pre)activated node:
and decreases if b is often preceded by a symbol
other than a (competition) (Figure 2, left). Com-            𝐵𝑀𝑈(𝑡 + 1) = argmax 𝑚!,!                   ℎ = 𝐵𝑀𝑈 𝑡
                                                                                !!!,…,!
bination of entrenchment and competition yields
selective specialisation of chains of BMUs (Fig-           where 𝑚!,! is the weight value on the forward
ure 2, right). If the same input symbol follows            temporal connection from node h to node i. Each
different contexts, it will tend to be responded to        correctly predicted symbol in the input word is
by more BMUs, one for each context. The                    assigned the prediction score of the preceding
stronger the probabilistic support that the input          symbol incremented by 1. Otherwise, the symbol
symbol receives from its preceding context, the            receives a 0-point score.
more likely the recruitment of a dedicated BMU,
                                                                  BMU(t) is, this structural evidence can account
                                                                  for a delay in processing and a drop in anticipa-
                                                                  tion at the morpheme boundary of compounds,
                                                                  but not of pseudo-compounds.




                                                                   Figure 4 – Marginal plots of interaction effects between
                                                                   compound vs. pseudo-compound constituents and letter
                                                                  distance to morpheme boundary in an LME model fitting
                                                                  pointwise entropy of forward BMU connections. Negative
                                                                   and positive x values indicate letter positions located, re-
                                                                        spectively, in the first and second constituent.
 Figure 3 – Marginal plots of interaction effects between
 compounds vs. pseudo-compounds and letter distance to
morpheme boundary in an LME model fitting anticipation            3    Discussion and conclusions
of up-coming BMUs by a TSOM. Negative and positive x
values indicate letter positions located, respectively, in the    Trained on both compounds and pseudo-
first and second constituent. Anticipation is plotted across      compounds, TSOMs develop a growing sensi-
whole (pseudo)compounds (top panel), and by individual            tivity to surface distributional properties of input
               constituents (bottom panel).                       data, turning chains of randomly connected, gen-
   Figure 3 (top panel) illustrates the rate of letter            eral-purpose nodes into specialised sub-chains of
anticipation across the word for both compounds                   BMUs that respond to specific letter strings at
and pseudo-compounds, plotted by distance to                      specific positions. Compounds not only tend to
the morpheme boundary. The steeper rate for                       occur, on average, less frequently than their
pseudo-compounds than for compounds shows                         C1/C2 constituents do as independent words (Ji
that pseudo-compounds are easier to pre-                          et al. 2011), but they tend to present lower-
dict/anticipate than compounds. We take this                      frequency bigrams at the C1-C2 boundary than
evidence to be in line with evidence of a faster                  do pseudo-compounds. Principles of Hebbian
speedup rate in the typing of monomorphemic                       learning allow TSOMs to capitalise on both ef-
vs. compound words. A closer look at anticipa-                    fects. Entrenchment makes expectations for high-
tion rates for individual constituents (Figure 3                  frequency bigrams stronger and expectations for
bottom panel) shows a drop of anticipation at the                 low-frequency bigrams weaker. At the same
C1-C2 boundary (more prominent for com-                           time, the competition between C1 as an inde-
pounds than pseudo-compounds) with a steeper                      pendent word and C1 as the first constituent in a
increase in C1 and C2 for pseudo-compounds,                       C1-C2 compound biases the map’s expectation
which happen to be, on average, shorter than C1                   towards the most frequent event (C1 in isola-
and C2 in real compounds.                                         tion). Compound families, i.e. sets of compounds
   To look for structural correlates of anticipation              sharing C1 (windmill, windshield etc.) or C2
rates in the map, we conducted, for each item, a                  (snowball, basketball etc.), magnify these ef-
letter-by-letter analysis of values of pointwise                  fects, making the map more sensitive to formal
entropy (PWH) for the connections between con-                    discontinuity at morpheme boundaries. When
secutive BMUs, namely h=BMU(t-1) and                              more C2s can follow the same C1 in complemen-
i=BMU(t):                                                         tary distribution, the left-to-right expectation for
                                      𝑚!,!                        a particular C2 to occur, given C1, decreases.
          𝑃𝑊𝐻 𝑚!,! = −𝑙𝑜𝑔                                         Likewise, when more C1s competitively select
                                      ! 𝑚!,!
                                                                  the same C2, the individual contribution of each
  The value of PWH for the connection between                     C1 to the prediction of C2 decreases. We conjec-
end-C1 and start-C2 (x = 0) has a local peak in                   ture that more global effects of lexical organisa-
compounds only (Figure 4). Since PWH provides                     tion like these may eventually blur local memory
a measure of how unexpected the activation of
effects based on position-independent bigram           tory effects of a) graded perception of constituent
frequencies.                                           boundary in both compounds and pseudo-
    Our simulations with TSOMs can model the           compounds, apparently requiring prelexical de-
correlation between continuously varying distri-       composition, and b) higher anticipation rates for
butional regularities in the input data and periph-    pseudo-compounds than compounds, supporting
eral levels of routinized recognition and produc-      full form representations for lexical access.
tion patterns. These patterns are in line with
Gagné & Spalding’s evidence of (a) the influ-          References
ence of embedded pseudo-morphemes on cas-
                                                       Baroni M., Guevara E., & Pirrelli V. (2007) NN
caded models of written word production, and             Compounds in Italian: Modelling Category Induc-
(b) faster anticipation rates for monomorphemic          tion and Analogical Extension, in V. Pirrelli (Ed.)
vs. compound words.                                      Psycho-computational issues in morphology learn-
    Further experimental results (not reported           ing and processing. Lingue e Linguaggio VI (2),
here), obtained by including compound families           263 - 290.
in the training data, confirm slower anticipation      Diependaele, K., Grainger, J., & Sandra, D. (2012).
rates for true compound constituents, due to the         Derivational Morphology and Skilled Reading: An
combined effect of word frequency distributions          Empirical Overview. In M. J. Spivey, J. McRae, &
and word compositionality in compound fami-              M. F. Joanisse (Eds.) The Cambridge Handbook of
lies. The size of a compound family can arguably         Psycholinguistics, 311-332. Cambridge University
be a function of the degree of productivity and          Press.
semantic transparency of its members (Baroni et        Ferro M., Marzi, C., & Pirrelli, V. (2011). A Self-
al. 2007). The influence of the compound family          Organizing Model of Word Storage and Pro-
size on anticipation rates can shed light on the         cessing: Implications for Morphology Learning.
influence of levels of semantic transparency on          Lingue e Linguaggio X (2), 209 - 226.
compound processing. Simulation evidence sug-          Gagné, C. L., & Spalding, T. L. (2014). Typing time
gests that the bigger the family, the stronger its       as an index of morphological and semantic effects
influence will be. Finally, we also monitored the        during English compound processing. Lingue e
influence of increasing token frequencies of             Linguaggio, XIII(2), 241-262.
monomorphemic words in the training data on
                                                       Gagné, C. L., & Spalding, T. L. (2016). Effects of
the map perception of constituent boundaries             morphology and semantic transparency on typing
within compounds. As expected, for constant              latencies in English compound and pseudo-
frequency values of compounds in the training            compound words. Journal of Experimental Psy-
set, the higher the token frequency of monomor-          chology: Learning, Memory, and Cognition, 42,
phemic words, the higher the pointwise entropy           1489-1495.
of connections at the C1-C2 boundary.
                                                       Giraudo, H., & Grainger, J. (2000). Effects of prime
    A full account of Gagné & Spalding’s evi-            word frequency and cumulative root frequency in
dence of a graded influence of semantic trans-           masked morphological priming. Language and
parency on compound processing is beyond the             Cognitive Processes, 15(4/5), 421–44.
reach of the computational architecture presented
                                                       Grainger, J., Colé, P., & Segui, J. (1991). Masked
here. Surface effects of discontinuity in the inter-
                                                         morphological priming in visual word recognition.
nal structure of compounds (as opposed to pseu-          Journal of Memory and Language, 30, 370–84.
do-compounds) appear to provide a purely for-
mal, pre-lexical scaffolding for truly morpho-         Kandel, S., Álvarez, C. J., & Vallée, N. (2008). Mor-
semantic effects to emerge at later processing           phemes also serve as processing units in handwrit-
                                                         ing production. In M. Baciu (Ed.), Neuropsycholo-
stages. To model these effects, we appear to be
                                                         gy and cognition of language. Behavioural, neuro-
in need of a parallel processing architecture able       psychological and neuroimaging studies of spoken
to effectively integrate several representational        and written language, 87–100. Kerala, India: Re-
levels (orthographic, phonological, morphologi-          search Signpost.
cal, and conceptual) and different processing
                                                       Libben, G. (2010). Compound words, semantic trans-
steps within a single distributed system (Smolka
                                                         parency, and morphological transcendence. Lin-
et al. 2009). Nonetheless, our simulations show          guistische Berichte, Sonderheft 17, 317-330.
that by letting compounds, pseudo-compounds
and (pseudo)constituents compete for the same          Libben, G., & Weber, S. (2014). Semantic transparen-
level of memory resources on a topological map,          cy, compounding, and the nature of independent
it is possible to account for apparently contradic-      variables. In F. Rainer, W. Dressler, F. Gardani, &
  H. C. Luschutzky (Eds.), Morphology and mean-
  ing. Amsterdam: Benjamins.
Libben, G., Weber, S., & Miwa, K. (2012). P3: A
  technique for the study of perception, production,
  and participant properties. The Mental Lexicon,
  7(2), 237-248. doi:10.1075/ml.7.2.05lib
Marzi, C., Ferro, M., & Pirrelli, V. (2014) Morpho-
  logical structure through lexical parsability. Lingue
  e Linguaggio XIII (2), 263-290.
Pirrelli, V., Ferro, M., & Marzi, C. (2015). Computa-
   tional complexity of abstractive morphology, In
   Baerman M., Brown D., Corbett G. (Eds.) Under-
   standing and Measuring Morphological Complexi-
   ty. 141-166, Oxford, United Kingdom: Oxford
   University Press.
Plaut, D. C., & Gonnerman, L. M. (2000). Are
   nonsemantic morphological effects incompatible
   with a distributed connectionist approach to lexical
   processing? Language and Cognitive Processes,
   15(4/5), 445–85.
Sahel, S., Nottbusch, G., Grimm, A., & Weingarten,
  R. (2008). Written production of German com-
  pounds: Effects of lexical frequency and semantic
  transparency. Written Language & Literacy, 11(2),
  211-227. doi:10.1075/wll.11.2.06sah
Schreuder, R., & Baayen, H.R. (1995) Modeling mor-
  phological processing. In Feldman, L. B. (Ed.)
  Morphological aspects of language processing,
  131–56, Hillsdale, NJ: Erlbaum.
Smolka, E., Komlosi, S., & Rösler, F. (2009). When
  semantics means less than morphology: The pro-
  cessing of German prefixed verbs. Language and
  Cognitive Processes, 24(3), 337-375.
Rumelhart, D., & McClelland, J. 1986. On learning
  the past tense of English verbs. In Rumelhart, D.E,
  McClelland J. (Eds.), Parallel distributed pro-
  cessing: Explanations in the microstructure of
  cognition, 216-271, The MIT Press.
Taft, M., & Forster, K. I. (1975). Lexical storage and
  retrieval of prefixed words. Journal of Verbal
  Learning and Verbal Behavior, 14, 638–47.
Will, U., Nottbusch, G., & Weingarten, R. (2006).
  Linguistic units in word typing: Effects of word
  presentation modes and typing delay. Written Lan-
  guage & Literacy, 9(1), 153-176.