Using Neural Network Models to Model Cerebral Hemispheric Differences in Proc- essing Ambiguous Words Orna Peleg and Zohar Eviatar Institute of Information Processing and Decision Making Haifa University University of Haifa, Mount Carmel, Haifa 31905, Israel [opeleg , zohare]@research.haifa.ac.il Larry Manevitz and Hananel Hazan Department of Computer Science Haifa University University of Haifa,Mount Carmel,Haifa 31905,Israel [manevitz , hhazan01]@cs.haifa.ac.il Abstract In this work we relate neuropsychological studies which have shown that while both cerebral hemispheres Neuropsychological studies have shown that both process written words, they do it in somewhat different cerebral hemispheres process orthographic, phono- ways. logical and semantic aspects of written words, al- Our hypothesis was that these observed differences beit in different ways. The Left Hemisphere (LH) arise from the difference in the way interactions between is more influenced by the phonological aspect of orthographic, phonological and semantical elements occur. written words whereas lexical processing in the Specifically, in the Left Hemisphere we imagine that all Right Hemisphere (RH) is more sensitive to visual these elements influence each other directly, while in the form. We explain this phenomenon by postulating Right Hemisphere they are not all directly connected; i.e. that in the Left Hemisphere (LH) orthography, phonology is not connected directly to orthography and phonology and semantics are interconnected while hence its influence must be mitigated by semantical process- in the Right Hemisphere (RH), phonology is not ing. connected directly to orthography and hence its in- In our laboratory, we have attempted to measure subtle fluence must be mitigated by semantical process- differences in human subjects partially by using the richness ing. We test this hypothesis by complementary of Hebrew in both homophonic and heterophonic homo- human psychophysical experiments and by dual graphs (in standard orthography Hebrew is written without (one RH and one LH) computational neural net- vowels) and measuring the difference in response when pre- work model architecturally modified from Kowa- senting homographs directly to one hemisphere or the other. moto's [1993] model to follow our hypothesis. In To compare our human results with computational ones, we this paper we present the results of the computa- designed and present here a connectionist (neural network) tional model and show that the results obtained are model of each hemisphere for lexical disambiguation based analogous to the human experiments. on the well-known Kawamoto [1993] model. Our model includes two separate networks, one for each hemisphere. One network incorporates Kawamoto's 1 Introduction version in which the entire network is completely con- Abstract theoretical descriptions of processes underlying nected. (Thus orthographic, phonological and semantical mental processes are difficult to test, but can be approached "neurons" are not distinguished architecturally.) This net- in at least two ways. First, one can directly examine human work successfully simulated the time course of lexical dis- subjects with psychophysical experiments and see if the ambiguation in the Left Hemisphere. In the other network, measured responses correspond to the theoretical explana- direct connections between orthographic and phonological tions. This requires delicate design of experiments. Sec- units are removed. The speed of convergence in resolving ondly, we can try to construct artificial networks designed ambiguities were studied in these two networks under a va- according to the theoretical explanation and see if under riety of conditions simulating various kinds of priming. The such constraints the expected responses do in fact emerge. comparative results presented are analagous to the results The delicacy in this approach is to make the model as sim- obtained under our human subject testing thereby strength- ple as possible so that one can be sure that the response is in ening our belief in the correctness of our psychological ex- fact emerging from the theoretical description. Thus both planation of the processing. methods complement each other. 2 Background (reflected in the strength of the connections in the network) and by the context. Neuropsychological studies have shown that both cerebral Kawamoto’s model uses perhaps the simplest architecture hemispheres process orthographic, phonological and seman- that can suffice for LH processing during reading in general tic aspects of written words, albeit in different ways. Be- and ambiguity resolution in particular. Thivierge, Titone and havioral studies have shown that the LH is more influenced Schultz (2005) recently presented a connectionist model of by the phonological aspect of written words whereas lexical LH involvement during ambiguity resolution, in which the processing in the RH is more sensitive to visual form. In representations of the words were identical to the vectors addition, semantically ambiguous words (e.g., "bank") were used by Kawamoto. (Other computational models of reading found to result in different time-lines of meaning activation have included interconnections between orthographic, pho- in the two hemispheres. However, computational models of nological, and semantic representations [e.g., Seidenberg & reading in general and of lexical ambiguity resolution in McClelland 198]). The model proposed below incorporates particular, have not incorporated this asymmetry into their two networks, the first architectural identically to Kawa- architecture. moto’s original model, and the second architecturally modi- A large amount of psycholinguistic literature indicates fied in order to account for RH language processing. that readers utilize both frequency and context to resolve Note that Kawamoto's network, however, does not model lexical ambiguity [e.g., Duffy, Morris & Rayner 1988; Ti- hemispheric differences. tone 1998; Peleg, Giora & Fein 2001, 2004]. The idea that multiple sources of evidence (relative frequency as well as 2.2 Two-Hemisphere Model context) affect the degree to which a particular meaning is In this paper, we present a preliminary model for lexical activated and the eventual outcome of the resolution, as well disambiguation in the two cerebral hemispheres that is as the process, can be nicely captured within a neural net- based on the above work of Kawamoto. The model includes work (connectionist) approach to language processing. In two separate networks. One network incorporates Kawa- connectionist terminology, the computation of meaning is a moto’s version, and successfully simulates the time course constraint satisfaction problem: the computed meaning is of lexical disambiguation in the LH. In the other network that which satisfies the multiple constraints represented by based on the behavior of the disconnected RH of split brain the weights on connections between units in different parts patients [Zaidel & Peters, 1982], we made a change in Ka- of the network. wamoto's architecture, removing the direct connections be- 2.1 Kawamoto Model tween orthographic and phonological units. Taken together, the two networks produce processing asymmetries compa- A connectionist account of lexical ambiguity resolution rable to those found in the behavioral studies. was presented by Kawamoto [1993]. In his fully recurrent network, ambiguous and unambiguous words are repre- 2.3 The effect of frequency and context on seman- sented as distributed pattern of activity over a set of simple tic ambiguity resolution in the two cerebral processing units. Each lexical entry is represented over a 216 - bit vector divided into separate sub-vectors represent- hemispheres. ing the “spelling”, ”pronunciation”, "part of speech" and In Latin orthographies (such as English), the orthographic “meaning”. The network is trained with a simple error cor- representation (the spelling) of a word is usually associated rection algorithm by presenting it with the pattern to be with one phonological representation. Thus, most studies of learned. The result is that these patterns (the entire word lexical ambiguity have used homophonic homographs including its orthographic, phonological and semantic fea- (homonyms - a single orthographic and phonological repre- tures) become attractors in the 216-dimensional representa- sentation associated with two meanings). As a result, mod- tional space. The network is tested by presenting it with just els of hemispheric differences in lexical processing have part of the lexical entry (e.g., its spelling pattern) and testing focused mainly on semantic organization [e.g., Beeman how long various parts of the network take to settle into a 1998]. We suggest that this reliance on homonyms may pattern corresponding to a particular lexical entry. Kawa- have limited our understanding of hemispheric involvement moto trained his network in such a way that the more fre- in meaning activation, neglecting the contribution of phono- quent combination for a particular orthographic representa- logical asymmetries to hemispheric differences in semantic tion was the "deeper" attractor; i.e. the completion of the activation and has limited the range of models proposed to other features (semantic and phonological) would usually describe the process of reading in general. fall into this attractor. (This was accomplished by biasing Visual word recognition studies demonstrate that, even the learning process of the network.). However, using a though both hemispheres have access to orthographic and technological analogy of "priming" to bias the appropriate phonological representations of words, the LH is more in- completion, the resulting attractor could in fact be the less fluenced by the phonological aspects of a written word [e.g., frequent combination – which corresponds nicely to human Zaidel, 1982; Zaidel & Peters 1981; Lavidor and Ellis behavioral data. Indeed, consistent with human empirical 2003], whereas lexical processing in the RH is more sensi- results, after the network was trained, the resolution process tive to the visual form of a written word [e.g., Marsollek, was affected by the frequency of the different lexical entries Kosslyn & Squire, 1992; Marsolek, Schacter & Nicholas 1996; Lavidor and Ellis 2003]. Given that many psycholin- guistic models suggest that silent reading always includes a computation of these same representations. As a result, phonological factor [e.g., Berendt & Perfetti, 1995; Frost meaning activation in the LH is initially influenced primar- 1998; Van Orden, Pennington & Stone, 1990; Lukatela and ily by phonology [e.g., Lavidor & Ellis, 2003] resulting in Turvey 1994], it is conceivable that such asymmetries may immediate exhaustive activation of all meanings related to a also impact the assignment of meaning to written words given phonological form, regardless of frequency or contex- during on-line sentence comprehension. tual information [e.g., Burgess & Simpson 1988; Titone This study takes advantage of Hebrew orthography that in 1998; Swinney & Love, 2002]. contrast to less opaque Latin orthographies, offers an oppor- tunity to compare different types of ambiguities within the RH Structure: Phonological codes are not directly re- same language [e.g., Frost and Bentin 1992]. lated to orthographic codes and are activated indirectly via In Hebrew, letters represent mostly consonants, and vow- semantic codes. This organization predicts a different se- els can optionally be superimposed on consonants as dia- quential ordering of events in which the phonological com- critical marks. Since the vowel marks are usually omitted, putation of orthographic representations begins later than readers frequently encounter words with more than one pos- the semantic computation of these same representations. As sible interpretation. Thus, in addition to semantic ambigui- a result, lexical access in the RH is initially influenced by ties (a single orthographic and phonological form associated orthography [e.g., Lavidor & Ellis, 2003] and by semantic with multiple meanings), the relationship between the information, so that less frequent or contextually inappro- orthographical and the phonological forms of a word is also priate meanings are not immediately activated. Neverthe- frequently ambiguous. For example, the printed letter string less, these meanings can be activated later when phonologi- "‫ "מלח‬in Hebrew has two different pronunciations (/melach/ cal information becomes available [e.g., Burgess & Simpson or /malach/), each of which has a different meaning (‘salt’ 1988; Titone 1998]. or ‘sailor’). 4 Testing the Model: 3 The Model This model is tested according to the philosophy describe We propose a model that incorporates a right hemisphere in the abstract in two complementary ways: structure (i.e. network) and a left hemisphere structure (i.e. (i) By psychophysical experiments with human subjects. network) that differ in the coordination and relationships (ii) By a computational neural network model. between orthographic, phonological and semantic processes. (In this paper we mainly describe the computation network The two structures are homogeneous in the sense that all and its results). computations involve the same sources of information. If our ideas are correct and orthographic codes activate However, the time course of meaning activation and the phonological codes directly in the LH and indirectly in the relative influence of different sources of information at dif- RH, we should observe that the distinction in processing the ferent points in time during this process is different, because two kinds of word types (i.e. homophonic and heterophonic these sources of information relate to each other in different homographs) should occur at different stage in processing in ways. A graphic representation of the model is presented the LH and RH. below: Specifically within the LH these differences will be seen in the early stage of lexical access, where as with RH, these 3.1 The Split Reading Model differences will only be seen at a later point in time. 4.1 Brief Description of Preliminary Human Re- LH: Orthography Phonology Semantics sults In our lab, we have recently investigated the role phonol- ogy plays in silent reading by examining the activation of dominant and subordinate meanings of homophonic and RH: Orthography Phonology Semantics heterophonic homographs (a single orthographic representa- tion associated with two phonological representation, each associated with a different meaning) in the two hemi- LH Structure: Orthographic, phonological and semantic spheres. We used a divided visual field paradigm that al- codes are fully connected. The connections between these lows the discernment of differential hemispheric processing different sources of information are bi-directional and the of tachistoscopically presented stimuli. Heterophonic and different processes may very well run in parallel. However, homophonic homographs were used as primes in a lexical the model incorporates a sequential ordering of events that decision task, where the target words were either related to results from some processes occurring faster than others. the dominant meaning or to the subordinate meaning of the For example, in the LH, orthographic codes are directly ambiguous word, or were unrelated. We measured semantic related to both phonological and semantic codes. However, facilitation by response times. A significant interaction be- because orthography is more systematically related to pho- tween visual field of presentation (right or left), type of nology than to semantics, the phonological computation of stimulus (heterophonic or homophonic homograph) and orthographic representations is faster than the semantic type of target words suggested that heterophonic and homo- tion strength is determined by the magnitude of the learning phonic homographs were disambiguated differently in the constant and the magnitude of the error (ti - ii.). two visual fields, and by implication, in the two hemi- The activity of a single unit in both networks is represented spheres. With homophonic homographs, targets related to as a real value ranging between -1.0 and + 1.0. both dominant and subordinate meanings were activated in the RVF/LH, while in the LVF/RH only dominant meanings 1 x >1  evoked facilitated responses (panel A in Figure 1). Alterna- LIMIT =  − 1 x < − 1 [3] tively, with heterophonic homographs only dominant mean-  x otherwise ings evoked facilitated responses, and only in the LVF/RH  (panel B in Figure 1). The activity of a unit is computed from three different sources: the 1st is the sum of all outputs of other units in the Homophonic Homographs net; the 2nd is the direct input from the external environ- Facilitation due to priming 100 ment; and the 3rd is the output of the unit in the previous 80 iteration multiplied by the decay rate. 60 dom Since all units are mutually connected these influences lead 40 20 * * * sub to changes in the activity of a unit as a function of time 0 (where time changes in discrete steps). That is, the activity -20 of a unit (a) at time t + 1 is: RH LH -40     a(t + 1) = LIMIT δa(t ) + ∑ wij (t )a j (t ) + si (t ) [4] LVF RVF Figure 1 panel A: RVF/LH advantage for homophones   j   Heterophonic Homographs Where δ is a decay variable that changes from 0.7 to 1. si(t) is Facilitation due to priming 100 the influence of the input stimulus on unit ai at time (t+1), 80 dom and LIMIT bounds the activity to the range from -1.0 to +1.0. 60 sub    a (t + 1) = LIMIT δa (t ) + ∑ wij (t )a j (t )  40 20 * [5] 0 - 20   j   - 40 RH LH LVF RVF In each simulation, 12 identical LH and RH networks were used to simulate 12 subjects in an experiment. Each Figure 1 panel B: LVF/RH advantage for heterophones network was trained on 1300 learning trials. On each learn- ing trial an entry was selected randomly from the lexicon. 4.2 Computational Simulations Dominant and subordinate meanings were selected with a The units in the LH and RH network were implemented ratio of 5 to 3. After the networks were trained they were as described by Kawamoto [1993] with the following tested by presenting just the spelling part of the entry as the changes: (a) the original 48 4-letters words were replaced input (to simulate neutral context) or by presenting part of with 48 patterns representing 24 pairs of polarized Hebrew the semantic sub-vector together with the spelling (to simu- 3-letter homographs, half heterophonic and half homo- late prior contextual bias). In each simulation the input sets phonic. (b) 45 features (instead of 48) represented the the initial activation of the units. The level was set to +0.25 word's spelling and 60 features (instead of 48) represented if the corresponding input feature was positive, -0.25 if it its pronunciation. This is because the pronunciation includes was negative and 0 otherwise. In order to assess lexical ac- the vowels that were omitted from the spelling. The repre- cess, the number of iterations through the network for all the sentation for "part of speech" (all nouns) and "meaning" units in the spelling, pronunciation or meaning fields to be- remains the same as in the original model. Overall, each come saturated, was measured. A response was considered entry is represented as a vector of 270 binary-valued fea- an error if the pattern of activity did not correspond with the tures. Both networks were trained on the same input with a input, or if all the units did not saturate after 50 iterations. simple error correction algorithm [1, 2]: 4.2.1 Results and Discussion ∆ W ij = η (t i − i i )t j [1] Table 1 below presents a summary of the number of itera- tions needed for all units of homophonic and heterophonic i i = ∑ W ij t j [2] homographs to become saturated in the LH and in the RH j networks when no context, a dominant context or a subordi- nate context is presented. Where η is a scalar learning constant fixed to 0.0015, ti and tj are the target activation levels of units i and j, and ii is the net input to unit i. The magnitude of the change in connec- When homographs are presented with a biasing context, only the contextually compatible meaning is accessed in LH RH both networks, In addition dominant meanings in dominant context homo hetero homo Hetero contexts are accessed faster than subordinate meanings in subordinate contexts (Table 1). Interestingly, in the LH net- No 14.91 17.69 19.37 18.58 work, homophonic advantage in processing time disappears Dominant 7.42 7.69 8.36 8.52 when a biasing context is provided. Moreover, when Subordinate 13.24 10.47 14.27 14.76 homographs are presented with a subordinate context, it takes longer to access the subordinate meaning of homo- Table 1: homo=homophonic homographs phones homographs compare to heterophones homographs hetero=heterophonic homographs (Table 1). In both cases, as predicted phonological disam- biguation precedes meaning disambiguation (Table 2). Table 2 below presents a summary of the time to saturate Because heterophonic homographs have different pro- units in the phonological and meaning sub-vectors in the LH nunciations, these homographs involve the mapping of a (Table 2a) and in the RH (Table 2b) networks when no con- single orthographic code onto two phonological codes. As a text, a dominant context or a subordinate context is pre- result, when no context is presented, the speed of lexical sented. access is slower for heterophonic homographs then for homophonic homographs. On the other hand, when context Table 2a: is provided, the single phonological code of homophonic LH homographs is still associated with both meanings, whereas homo hetero the phonological representation of heterophonic homo- graphs is associated with only one meaning. As a result, context phono sem phono sem when homographs are presented in a subordinate context, a no 8.53 14.09 11.66 14.73 longer period of competition between dominant and subor- dominant 6.15 6.19 6.19 6.72 dinate meanings is observed in the case of homophonic homographs. In contrast, in the case of heterophonic homo- Sub-ordinate 6.85 10.67 6.70 8.60 graphs, meanings are accessed immediately after a phono- Table 2b: logical representation is computed. RH homo hetero 5 Summary context phono sem phono sem These results have important implications for the role no 14.69 18.35 14.68 16.60 phonology plays in accessing the meaning of words in silent dominant 7.19 6.71 7.47 7.17 reading. One class of models suggests that printed words Sub-ordinate 9.16 10.45 9.36 10.20 activate orthographic codes that are directly related to mean- phono=phonological subvector sem=semantic subvector ings in semantic memory. An alternative class of models asserts that access to meaning is mediated by phonology [for When homographs are presented without a biasing con- reviews see Frost 1998; Van Orden and Kloos 2005]. Our text, only the dominant meaning is accessed in both net- results supports the idea that in the LH words are read more works. However, in the LH network, meanings are accessed phonologically (from orthography to phonology to mean- faster. This is consistent with LH advantage for lexical ing), whereas in the RH, words are read more visually (from processing reported in the literature. More importantly, orthography to meaning). homophonic and heterophonic homographs are processed Overall, the two networks produce processing asym- differently in the two networks. In the LH network, lexical metries comparable to those found in behavioral studies. In access is longer for heterophonic homographs then for the LH network, orthographic units are directly related to homophonic homographs (Table 1) due to the time- both phonological and semantic units. However, because consuming competition between the two phonological rep- orthography is more systematically related to phonology resentations. Indeed, more iterations were needed for the than to semantics, the phonological computation of ortho- phonological units to become saturated in the case of heter- graphic representations is faster than the semantic computa- ophonic homographs than for homophonic homographs tion of these same representations. As a result, meaning (Table 2). This is consistent with the idea that in the LH, activation in the LH is initially influenced primarily by pho- phonological information guides early stages of meaning nology. In the RH network, phonological codes are not di- activation. Alternatively, in the RH network, phonological rectly related to orthographic codes and are activated indi- differences are less pronounced (Table 2) and processing rectly via semantic codes. This organization results a differ- times of homophonic and heterophonic homographs are ent sequential ordering of events in which the phonological similar (Table 1). This is consistent with the idea that in the computation of orthographic representations begins later RH, orthographic and semantic sources of information exert than the semantic computation of these same representa- their influence earlier than phonological information. tions. As a result, lexical access in the RH is initially more [Marsolek, C. J., Schacter, D. L., & Nicholas, C. D. 1996] influenced by orthography and by semantic. Form-specific visual priming for new associations in the right cerebral hemisphere. Memory and Cognition, 24, References 539–556. [Beeman M 1998] Coarse semantic coding and discourse [Peleg, O., Giora, R., & Fein, O. 2001] Salience and con- comprehension. In: Beeman M, Chiarello C, editors. text effects: Two are better than one. Metaphor and Right hemisphere language comprehension: Perspec- Symbol, 16, 173-192. tives from cognitive neuroscience. Mahwah (N.J.): Law- rence Erlbaum Associates. 255–284 . [Peleg, O., Giora, R., & Fein, O. 2004] Contextual strength: The Whens and hows of context effects. In I. Noveck & [Berent, I., & Perfetti, C. A. 1995] A rose is a REEZE: The D. Sperber (Eds.), experimental Pragmatics (pp.172- two-cycles of phonology assembly in reading English. 186). Basingstoke: Pagrave. Psychological Review, 102, 146-184 . [Seidenberg, M.S.,& McClelland J.L. 1989] A distributed [Burgess, C. & Simpson, G. B., 1988] Cerebral hemispheric developmental model of word recognition and naming. mechanisms in the retrieval of ambiguous word mean- Psychological Review, 96, 523-568. ings. Brain and Language, 33, 86-103. [Swinney, D. & Love, T. 2002] Context effects on lexical [Frost, R. 1998] Toward a strong phonological theory of processing during auditory sentence comprehension; on visual word recognition: True issues and false trails. the time course and neurological bases of a basic com- Psychological Bulletin, 123, 71-99. prehension process. In: Witruk, Friederici, Lachmann (Eds.). Basic Functions of Language, Reading and Read- [Frost, R. & Bentin, S. 1992] Processing phonological and ing Disability , Kluwer Academic (Section 2, ch 1, pp25- semantic ambiguity: Evidence from semantic priming at 40). different SOAs. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 58-68. [Titone, D. A. 1998] Hemispheric differences in context sensitivity during lexical ambiguity resolution. Brain [Hopfield J. J 1982] Neural networks and physical systems and Language, 65, 361-394. with emergent collective computational abilities. Pro- ceedings of the National Academy of Science, USA, 79, [Thivierge, J.P., Titone, D., & Shultz, T.R. 2005] Simulat- 2554-2558. ing frontotemporal pathways involved in lexical ambigu- ity resolution. Poster Proceedings of the Cognitive Sci- [Kawamoto, A. H. 1993] Nonlinear dynamics in the resolu- ence Society,2005. tion of lexical ambiguity: A parallel distributed process- ing account. Journal of Memory and Language , 32 , [Van Orden, G. C, & Kloos, H. 2005] The question of pho- 474-516. nology and reading. In M. S. Snowling, C. Hulme, & M. Seidenberg (Eds.). The science of reading: A handbook. [Lavidor, M . & Ellis, A. W. 2003] Orthographic and pho- Blackwell Pub. 39-60. nological priming in the two cerebral hemispheres. Lat- erality, 8, 201-223. [Van Orden, G. C., Pennington, B. F., & Stone, G. O. 1990] Word identification in reading and the promise of sub- [Lukatela, G., & Turvey, M.T. 1994a] Visual access is symbolic psycholinguistics. Psychological Review, 97, initially phonological. 1: Evidence from associative 488-522. priming by words, homophones, and pseudohomo- phones. Journal of Experimental Psychology: General, [Zaidel, E 1982] Reading in the disconnected right hemi- 123, 107-128. sphere: An aphasiological perspective Dyslexia: Neu- ronal, Cognitive and Linguistic Aspects Oxford, Perga- [Lukatela, G., & Turvey, M.T. 1994b] Visual access is ini- mon Press 35: 67-91. tially phonological. 2: Evidence from associative prim- ing by homophones, and pseudohomophones. Journal of [Zaidel, E., & Peters, A. M. 1981] phonological encoding Experimental Psychology: General, 123, 331-353. and ideographic reading by the disconnected right hemi- sphere: Two case Studies. Brain & Language, 14, 205- [Marsolek, C. J., Kosslyn, S. M., & Squire, L. R. 1992] 234. Form-specific visual priming in the right cerebral hemi- sphere. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 492–508.