Written word production and lexical self-organisation: evidence from English (pseudo)compounds Marcello Ferro Franco Alberto Cardillo Vito Pirrelli ILC-CNR Pisa, Italy ILC-CNR Pisa, Italy ILC-CNR Pisa, Italy marcello.ferro@ilc.crn.it francoalberto.cardillo@ilc.cnr.it vito.pirrelli@ilc.cnr.it Christina L. Gagné Thomas L. Spalding Dept. of Psychology, University of Alberta, Canada Dept. of Psychology, University of Alberta, Canada cgagne@ualberta.ca spalding@ualberta.ca 1 The evidence Abstract A key question concerning the representation and Elevation in typing latency for the initial processing of compound words has focused on letter of the second constituent of an Eng- whether (and, if so, how) morphological struc- lish compound, relative to the latency for ture plays a role. The bulk of the research on this the final letter of the first constituent of issue has come from recognition or comprehen- the same compound, provides evidence sion tasks such as lexical decision or reading. that implementation of a motor plan for However, written production provides a useful written compound production involves counterpart and allows researchers to examine smaller constituents, in both semantically whether morphemes are used even after a word transparent and semantically opaque has been accessed. One advantage of a typing compounds. We investigate here the im- task (in which the time to type each letter of a plications of this evidence for algorithmic word is recorded) is that researchers can examine models of lexical organisation, to show differences in processing difficulty at various that effects of differential perception of points in the word. Previous research found an the internal structure of compounds and elevation in typing latency for the initial letter of pseudo-compounds can also be simulated the second constituent relative to the latency for as peripheral stages of lexical access by a the final letter of the first constituent for English self-organising connectionist architec- (Gagné & Spalding 2014; Libben et al. 2012; ture, even in the absence of morpho- Libben & Weber 2014) and German compounds semantic information. This complemen- (Sahel et al. 2008; Will et al. 2006). This eleva- tary evidence supports a maximization- tion in typing latency at the morpheme boundary of-opportunity approach to lexical mod- suggests that the system plans the output mor- elling, accounting for the integration of pheme by morpheme, rather than as a whole unit, effects of pre-lexical and lexical access. and that morphological programming is not complete when the motor system begins the out- Il rallentamento nel tempo di battitura put of the word (Kandel et al. 2008). del primo carattere del secondo costi- Gagné and Spalding (2016) examined the role tuente di un composto inglese, rispetto al of morphemic structure and semantic transparen- tempo dell’ultimo carattere del primo co- cy on typing latency. The stimuli consisted in stituente, dimostra che l’implementazione 200 compounds, 50 pseudo-compounds, and 250 del programma motorio per la scrittura monomorphemic words matched pairwise with di un composto è influenzata dai costi- the compounds and pseudo-compounds in the tuenti del composto stesso, siano essi se- number of syllables and letters. The pseudo- manticamente trasparenti o opachi. Il compounds contain two words that do not func- presente contributo offre un modello tion as morphemes (e.g., carpet contains car and computazionale di questa evidenza, e ne pet). The compounds varied in whether the first valuta l’impatto sull’organizzazione del and second constituent were semantically trans- lessico mentale: la percezione del confine parent. The items were displayed individually di morfema tra i due costituenti è analiz- using a progressive demasking procedure and zata come il risultato dell’interazione di- participants typed the word as the computer rec- namica tra processi di accesso pre- e orded the time required to type each letter. post-lessicale. The time to initiate the first letter was equiva- cally opaque compounds (and, to an extent, in lent for monomorphemic and compound words. pseudo-compounds). Race models (panel C: Typing times got faster across the word for both Schreuder & Baayen 1995) posit parallel path- word types, but the rate of change was faster for ways for compound processing (both holistic and monomorphemic words than for compound compositional), depending on variables such as words. This difference was not observed when whole word vs. constituent frequency, but it is comparing monomorphemic words and pseudo- not clear how they can account for effects of in- compounds. teraction between the two paths. Connectionist For compounds, the rate of speed-up was models (panel D: Rumelhart & McClelland slower when the first constituent was transparent 1986, Plaut & Gonnerman 2000), on the other than when it was opaque, but was unaffected by hand, tend to dispense with specialized represen- the transparency of the second constituent. The tational levels and access procedures, and make elevation in typing latency at the morpheme room for distributed effects of sublexical co- boundary was larger when the first constituent activation through overlaying patterns of pro- was transparent than when it was opaque, but cessing units. A defining feature of these models was unaffected by the transparency of the second is that they blur the traditional distinction be- constituent. This difference is due to the final tween representations and processing units. We letter of the first constituent when the first con- suggest that blurring this distinction can go a stituent requiring less time to type when it was long way in addressing some of the issues that transparent than when it was opaque. appear to elude models A, B and C. The data for the pseudo-compounds indicated that embedded morphemes influence production, even when they do not function as morphemes. Typing latency increased one letter prior to the end of the first constituent of a pseudo- compound and remained elevated through the boundary (e.g., both r and c in scarcity were ele- vated relative to the a). 1.1 Implications for lexical architectures The reported evidence clearly indicates that mor- phemic structure is involved in written word production. The production of compounds differs from that of monomorphemic words and the se- Figure 1 – Four architectures of form-meaning mapping mantic transparency of the two constituents leads in the mental lexicon: C1+C2 designates two-word com- pounds and Cs mono-morphemic constituents (adapted from to different effects. Furthermore, embedded Diependaele et al. 2012) pseudo-morphemes appear to influence the pro- duction of pseudo-compounds, but not in the Temporal Self-Organising Maps (TSOMs: same way that the embedded morphemes affect Ferro et al. 2011; Marzi et al. 2014; Pirrelli et al. the production of compounds. 2015), are a time-sensitive variant of Kohonen’s This appears to lend only partial support to SOMs (Kohonen, 2002), where words are stored models of lexical architecture where both com- through routinized, time-bound patterns of re- pounds and their constituents are represented and peatedly successful processing units. Since all processed as independent access units (Figure 1). input words are stored concurrently on the same In panel A, following Taft & Forster (1975), ac- layer of fully connected nodes, TSOMs account cess and output of compounds are mediated by for effects of co-activation of competing repre- their constituents (Cs), but extra procedures sentations in terms of a continuous function of would be needed to account for the role of se- distributional regularities in the input data. In mantic transparency in modulating the size of what follows, starting from Gagné & Spalding’s elevation in typing latency at the morpheme evidence, we will focus on peripheral stages of boundary. A supralexical account (panel B: Gi- lexical access/output, to verify if mechanisms of raudo & Grainger 2000, Grainger et al. 1991), parallel, distributed pattern activation can ac- where constituents are activated upon composi- count for differential processing effects between tional interpretation of compounds, cannot cap- compounds and pseudo-compounds even in the ture the persistence of typing effects in semanti- absence of morpho-semantic information. Alt- and the stronger its re-entrant connection. As a hough computational testing is carried out on result of this dynamic, high-frequency words TSOMs only, our discussion and concluding re- recruit specialised node chains, low-frequency marks address issues that go beyond a specific words are responded to by weaker, “blended” computational framework. node chains. 2 TSOMs A TSOM consists of a grid of memory nodes with two layers of connectivity. The first layer (or I-layer) fully connects each node to the input vector, where symbols are sampled at discrete time ticks as patterns of activation ranging in the Figure 2 - Left: operation of Hebbian rules on potentiated [0, 1] interval. Weights on the I-layer are adjust- (‘+’) and inhibited (‘-‘) connections. Right: forward one- tick-delay connections leaving ‘A’ at time t-1. Larger nodes ed in training for individual nodes to develop represent BMUs. Shades of grey indicate levels of node specialised sensitivity to particular input sym- activation. bols. Each node is also connected to all other nodes through a layer of re-entrant connections 2.1 The Experiment (or T-layer), whose weight strength determines The 200 compounds and 50 pseudo-compounds the amount of influence that activation of one used by Gagné & Spalding were used to train a node has on other nodes at a one-tick delay. 40x40 node TSOM for 100 learning epochs. Be- When an input symbol is presented at time t, sides compounds and pseudo-compounds, the the level of activation 𝑦! 𝑡 of node i is a func- training set included 500 (pseudo)constituents as tion of: (a) the node’s sensitivity to the current individual words (e.g. car and wash in carwash, input symbol (𝑦!_!"#$%,! 𝑡 ), and (b) the re-entrant car and pet in carpet), for a total amount of 750 support the node receives from the map activa- items. At each training epoch, monomorphemic tion state at t-1 (𝑦!_!"#$%,! 𝑡 = 𝑓 𝑦! (𝑡 − 1) , where f words were shown 10 times as often as com- is a linear function and j ranges over all map pounds. We ran 5 repetitions of the experiment, nodes). More formally: and results were analysed using linear mixed ef- 𝑦! 𝑡 = 𝛼 ∙ 𝑦!_!"#$%,! 𝑡 + 1 − 𝛼 ∙ 𝑦!_!"#$%,! 𝑡   fects models (LME), with experiment repetitions and training items as random variables. The node responding most strongly to the input To analyse differential processing effects for symbol S at time tick t is called Best Matching pseudo-compounds and compounds, we focused Unit (hereafter BMU(S, t) or BMU(t) for short). on two types of evidence: (i) per-letter perfor- The map’s response to a sequence of input mance of a trained TSOMs in incrementally an- symbols like carpet is a chain of consecutively ticipating compounds and pseudo-compounds; firing BMUs, each responding to a letter in car- (ii) structural connectivity of BMUs responding pet. During training, connection weights between to letter bigrams at the C1-C2 boundary. consecutive BMUs are adjusted to the frequency To anticipate a progressively presented input distribution of input symbols in the training set, word, a TSOM propagates the activation of the according to Hebbian principles of correlative current BMU(t) through its forward temporal learning. Given the bigram ab, the connection connections, and outputs, at each time tick, the strength between BMU(a, t-1) and BMU(b, t) symbol 𝑆!"#(!!!) encoded on the I_layer of the increases if a often precedes b (entrenchment) most strongly (pre)activated node: and decreases if b is often preceded by a symbol other than a (competition) (Figure 2, left). Com- 𝐵𝑀𝑈(𝑡 + 1) = argmax 𝑚!,!                  ℎ = 𝐵𝑀𝑈 𝑡 !!!,…,! bination of entrenchment and competition yields selective specialisation of chains of BMUs (Fig- where 𝑚!,! is the weight value on the forward ure 2, right). If the same input symbol follows temporal connection from node h to node i. Each different contexts, it will tend to be responded to correctly predicted symbol in the input word is by more BMUs, one for each context. The assigned the prediction score of the preceding stronger the probabilistic support that the input symbol incremented by 1. Otherwise, the symbol symbol receives from its preceding context, the receives a 0-point score. more likely the recruitment of a dedicated BMU, BMU(t) is, this structural evidence can account for a delay in processing and a drop in anticipa- tion at the morpheme boundary of compounds, but not of pseudo-compounds. Figure 4 – Marginal plots of interaction effects between compound vs. pseudo-compound constituents and letter distance to morpheme boundary in an LME model fitting pointwise entropy of forward BMU connections. Negative and positive x values indicate letter positions located, re- spectively, in the first and second constituent. Figure 3 – Marginal plots of interaction effects between compounds vs. pseudo-compounds and letter distance to morpheme boundary in an LME model fitting anticipation 3 Discussion and conclusions of up-coming BMUs by a TSOM. Negative and positive x values indicate letter positions located, respectively, in the Trained on both compounds and pseudo- first and second constituent. Anticipation is plotted across compounds, TSOMs develop a growing sensi- whole (pseudo)compounds (top panel), and by individual tivity to surface distributional properties of input constituents (bottom panel). data, turning chains of randomly connected, gen- Figure 3 (top panel) illustrates the rate of letter eral-purpose nodes into specialised sub-chains of anticipation across the word for both compounds BMUs that respond to specific letter strings at and pseudo-compounds, plotted by distance to specific positions. Compounds not only tend to the morpheme boundary. The steeper rate for occur, on average, less frequently than their pseudo-compounds than for compounds shows C1/C2 constituents do as independent words (Ji that pseudo-compounds are easier to pre- et al. 2011), but they tend to present lower- dict/anticipate than compounds. We take this frequency bigrams at the C1-C2 boundary than evidence to be in line with evidence of a faster do pseudo-compounds. Principles of Hebbian speedup rate in the typing of monomorphemic learning allow TSOMs to capitalise on both ef- vs. compound words. A closer look at anticipa- fects. Entrenchment makes expectations for high- tion rates for individual constituents (Figure 3 frequency bigrams stronger and expectations for bottom panel) shows a drop of anticipation at the low-frequency bigrams weaker. At the same C1-C2 boundary (more prominent for com- time, the competition between C1 as an inde- pounds than pseudo-compounds) with a steeper pendent word and C1 as the first constituent in a increase in C1 and C2 for pseudo-compounds, C1-C2 compound biases the map’s expectation which happen to be, on average, shorter than C1 towards the most frequent event (C1 in isola- and C2 in real compounds. tion). Compound families, i.e. sets of compounds To look for structural correlates of anticipation sharing C1 (windmill, windshield etc.) or C2 rates in the map, we conducted, for each item, a (snowball, basketball etc.), magnify these ef- letter-by-letter analysis of values of pointwise fects, making the map more sensitive to formal entropy (PWH) for the connections between con- discontinuity at morpheme boundaries. When secutive BMUs, namely h=BMU(t-1) and more C2s can follow the same C1 in complemen- i=BMU(t): tary distribution, the left-to-right expectation for 𝑚!,! a particular C2 to occur, given C1, decreases. 𝑃𝑊𝐻 𝑚!,! = −𝑙𝑜𝑔                   Likewise, when more C1s competitively select ! 𝑚!,! the same C2, the individual contribution of each The value of PWH for the connection between C1 to the prediction of C2 decreases. We conjec- end-C1 and start-C2 (x = 0) has a local peak in ture that more global effects of lexical organisa- compounds only (Figure 4). Since PWH provides tion like these may eventually blur local memory a measure of how unexpected the activation of effects based on position-independent bigram tory effects of a) graded perception of constituent frequencies. boundary in both compounds and pseudo- Our simulations with TSOMs can model the compounds, apparently requiring prelexical de- correlation between continuously varying distri- composition, and b) higher anticipation rates for butional regularities in the input data and periph- pseudo-compounds than compounds, supporting eral levels of routinized recognition and produc- full form representations for lexical access. tion patterns. These patterns are in line with Gagné & Spalding’s evidence of (a) the influ- References ence of embedded pseudo-morphemes on cas- Baroni M., Guevara E., & Pirrelli V. (2007) NN caded models of written word production, and Compounds in Italian: Modelling Category Induc- (b) faster anticipation rates for monomorphemic tion and Analogical Extension, in V. Pirrelli (Ed.) vs. compound words. Psycho-computational issues in morphology learn- Further experimental results (not reported ing and processing. Lingue e Linguaggio VI (2), here), obtained by including compound families 263 - 290. in the training data, confirm slower anticipation Diependaele, K., Grainger, J., & Sandra, D. (2012). rates for true compound constituents, due to the Derivational Morphology and Skilled Reading: An combined effect of word frequency distributions Empirical Overview. In M. J. Spivey, J. McRae, & and word compositionality in compound fami- M. F. Joanisse (Eds.) The Cambridge Handbook of lies. The size of a compound family can arguably Psycholinguistics, 311-332. Cambridge University be a function of the degree of productivity and Press. semantic transparency of its members (Baroni et Ferro M., Marzi, C., & Pirrelli, V. (2011). A Self- al. 2007). The influence of the compound family Organizing Model of Word Storage and Pro- size on anticipation rates can shed light on the cessing: Implications for Morphology Learning. influence of levels of semantic transparency on Lingue e Linguaggio X (2), 209 - 226. compound processing. Simulation evidence sug- Gagné, C. L., & Spalding, T. L. (2014). Typing time gests that the bigger the family, the stronger its as an index of morphological and semantic effects influence will be. Finally, we also monitored the during English compound processing. Lingue e influence of increasing token frequencies of Linguaggio, XIII(2), 241-262. monomorphemic words in the training data on Gagné, C. L., & Spalding, T. L. (2016). Effects of the map perception of constituent boundaries morphology and semantic transparency on typing within compounds. As expected, for constant latencies in English compound and pseudo- frequency values of compounds in the training compound words. Journal of Experimental Psy- set, the higher the token frequency of monomor- chology: Learning, Memory, and Cognition, 42, phemic words, the higher the pointwise entropy 1489-1495. of connections at the C1-C2 boundary. Giraudo, H., & Grainger, J. (2000). Effects of prime A full account of Gagné & Spalding’s evi- word frequency and cumulative root frequency in dence of a graded influence of semantic trans- masked morphological priming. Language and parency on compound processing is beyond the Cognitive Processes, 15(4/5), 421–44. reach of the computational architecture presented Grainger, J., Colé, P., & Segui, J. (1991). Masked here. Surface effects of discontinuity in the inter- morphological priming in visual word recognition. nal structure of compounds (as opposed to pseu- Journal of Memory and Language, 30, 370–84. do-compounds) appear to provide a purely for- mal, pre-lexical scaffolding for truly morpho- Kandel, S., Álvarez, C. J., & Vallée, N. (2008). Mor- semantic effects to emerge at later processing phemes also serve as processing units in handwrit- ing production. In M. Baciu (Ed.), Neuropsycholo- stages. To model these effects, we appear to be gy and cognition of language. Behavioural, neuro- in need of a parallel processing architecture able psychological and neuroimaging studies of spoken to effectively integrate several representational and written language, 87–100. Kerala, India: Re- levels (orthographic, phonological, morphologi- search Signpost. cal, and conceptual) and different processing Libben, G. (2010). Compound words, semantic trans- steps within a single distributed system (Smolka parency, and morphological transcendence. Lin- et al. 2009). Nonetheless, our simulations show guistische Berichte, Sonderheft 17, 317-330. that by letting compounds, pseudo-compounds and (pseudo)constituents compete for the same Libben, G., & Weber, S. (2014). Semantic transparen- level of memory resources on a topological map, cy, compounding, and the nature of independent it is possible to account for apparently contradic- variables. In F. Rainer, W. Dressler, F. Gardani, & H. C. Luschutzky (Eds.), Morphology and mean- ing. Amsterdam: Benjamins. Libben, G., Weber, S., & Miwa, K. (2012). P3: A technique for the study of perception, production, and participant properties. The Mental Lexicon, 7(2), 237-248. doi:10.1075/ml.7.2.05lib Marzi, C., Ferro, M., & Pirrelli, V. (2014) Morpho- logical structure through lexical parsability. Lingue e Linguaggio XIII (2), 263-290. Pirrelli, V., Ferro, M., & Marzi, C. (2015). Computa- tional complexity of abstractive morphology, In Baerman M., Brown D., Corbett G. (Eds.) Under- standing and Measuring Morphological Complexi- ty. 141-166, Oxford, United Kingdom: Oxford University Press. Plaut, D. C., & Gonnerman, L. M. (2000). Are nonsemantic morphological effects incompatible with a distributed connectionist approach to lexical processing? Language and Cognitive Processes, 15(4/5), 445–85. Sahel, S., Nottbusch, G., Grimm, A., & Weingarten, R. (2008). Written production of German com- pounds: Effects of lexical frequency and semantic transparency. Written Language & Literacy, 11(2), 211-227. doi:10.1075/wll.11.2.06sah Schreuder, R., & Baayen, H.R. (1995) Modeling mor- phological processing. In Feldman, L. B. (Ed.) Morphological aspects of language processing, 131–56, Hillsdale, NJ: Erlbaum. Smolka, E., Komlosi, S., & Rösler, F. (2009). When semantics means less than morphology: The pro- cessing of German prefixed verbs. Language and Cognitive Processes, 24(3), 337-375. Rumelhart, D., & McClelland, J. 1986. On learning the past tense of English verbs. In Rumelhart, D.E, McClelland J. (Eds.), Parallel distributed pro- cessing: Explanations in the microstructure of cognition, 216-271, The MIT Press. Taft, M., & Forster, K. I. (1975). Lexical storage and retrieval of prefixed words. Journal of Verbal Learning and Verbal Behavior, 14, 638–47. Will, U., Nottbusch, G., & Weingarten, R. (2006). Linguistic units in word typing: Effects of word presentation modes and typing delay. Written Lan- guage & Literacy, 9(1), 153-176.