Semantic representations in monolingual and bilingual connectionist networks Nicholas Rendell (n.rendell@bbk.ac.uk) Eddy J. Davelaar (e.davelaar@bbk.ac.uk) Department of Psychology, Birkbeck College. University of London Malet Street, London WC1E 7HX Abstract simply, a nurturing environment may be a moderating factor for the relationship between multilingualism and cognitive A neural network model is presented which investigates the suggestion that being multilingual contributes to an reserve. individual’s level of cognitive reserve. Two versions of this model were produced, one which learnt the names of input More recent studies of multilingualism and its association representations in a single language and another model which with cognitive reserve have attempted to control for learnt input representations in two languages. The languages education and intelligence. Bak, Nissan, Allerhand, & in both models were split further into two semantic Deary (2014) utilised the Lothian Birth Cohort, a group of categories. The relationship between the representations of both semantic categories in the first language of each of the English native speakers of European origin who were two versions was investigated. Further manipulations came in initially tested for a level of intelligence at age 11 in 1947. the form of changing the sensitivity of the artificial neurons in This allowed the authors to control for childhood the network and varying the hidden layer size with respect to intelligence, gender and socioeconomic status. The variable levels of brain reserve. Findings were not participants, now 73, were tested on fluid intelligence, immediately interpretable in terms of age-related decline in memory, speed of information processing, reading and the absence of a behavioral measure. However, the variance verbal fluency. The results demonstrated a protective effect in trajectories of category separation provide a cautionary tale against the interpretation of any measures gained at discreet of bilingualism with no negative effects of having more than intervals. one language. Reading verbal fluency and general intelligence were the most affected and general intelligence Keywords: ageing; connectionist model; language; in particular was related to improvement in executive bilingualism processes. Of note was the similarity in performance between active (using second language) and passive (not Introduction required to use second language) bilinguals. This Recent studies of bilingual and multilingual individuals contradicts a view of cognitive reserve resulting from the have demonstrated some offsetting of normal cognitive continual practice of cognitive mechanisms. However, the ageing (Kavé, Eyal, Shorek, & Cohen-Mansfield, 2008) and increase in general intelligence in both active and passive protective effects again the onset of the cognitive symptoms bilinguals suggests that the effect of initial use of a second of dementia (Bialystok, Craik, & Freedman, 2007). These language is sufficient to upgrade cognitive processes. This is effects are presented in opposition to linguistic deficits also also demonstrated in the advantages for acquiring a second reported in bilinguals (Bialystok, 2008). This study language in later life. examines both of these consequences by comparing representations of picture categories in monolingual and To understand why bilingualism confers an advantage to bilingual networks. cognitive ageing, the cognitive mechanisms involved in speaking more than one language must be unpacked. In The contribution of language in offsetting age-related terms of nonverbal effects, these are wholly positive. Initial cognitive deficits is one of a number of factors, known findings in a study comparing English only speaking collectively as the latent variable cognitive reserve (Stern, Canadian children with their French- English speaking 2003, 2009). The existence of the variability in levels of counterparts on verbal and nonverbal tests found that the protective factors is evidenced in studies which have bilingual children outperformed the monolinguals in almost demonstrated a poor relationship between an individual’s all aspects, especially the nonverbal intelligence tests (Peal cognitive intactness in vivo and levels of brain pathology & Lambert, 1962). Equivalence was found in visual post mortem (Mortimer, 1997; Valenzuela & Sachdev, perception but advantages were found in symbol 2006). Given the number of different factors which manipulation. Such early findings may subscribe to the contribute to cognitive reserve, it may be difficult to criticism of a lack of control for potential confounds untangle them. Years of education has been linked to the (Bialystok, 2001). However, the study demonstrated at least ability to stave off age-related decline (Albert et al., 1995; the potential for cognitive improvement in bilingual Barnes, Tager, Satariano, & Yaffe, 2004; Scarmeas, Albert, individuals. Further, the apparent improvement in Manly, & Stern, 2006). Therefore, it is reasonable to nonverbal abilities for increased effort in the cognitive assume years of education, childhood intelligence or, more domain of language refutes the modular notion of cognitive 732 processing (Bialystok, Craik, Green, & Gollan, 2009). Method Studies in metalinguistic capabilities have uncovered the Architecture mechanisms behind the cognitive advantage in bilingual The model in this study was a simple three layer, individuals. For example, Bialystok (1988) found that feedforward back propagating neural network. Two versions bilingual children demonstrated an advantage in tasks were produced, a monolingual version and a bilingual requiring cognitive control. Further, in error checking and version. The hidden layer was varied in size for this explanation of ungrammatical sentences, Galambos & investigation since it represented a view of passive reserve Goldin-Meadow (1990) found that bilingual children (Stern, 2009) which could easily be manipulated. For both performed better in the trials which required a change in the models, the hidden layer size was manipulated to contain 5, focus of attention. However, both monolinguals and 10, 15 or 20 nodes. bilinguals performed equally on the actual explanation of the errors. Nonlinguistic studies have also demonstrated Stimulus Patterns benefits for bilinguals in executive control tasks using Given the focus of study for the models was representation perceptual stimuli (Costa, Hernández, & Sebastián-Gallés, storage in the hidden layer rather than performance, a 2008). This suggests that bilingualism provides a holistic compromise between an artificial language and a realistic strengthening of executive control processes. This assertion corpus was used for input. The inputs used in both models has been supported with neuroimaging studies which were patterns of 26 binary digits. The first 20 digits were demonstrate stronger resting-state connectivity in the frontal randomised with the addition of a further six inputs. The lobe for bilingual rather than monolingual individuals first three of these represented a language tag. This was (Grady, Luk, Craik, & Bialystok, 2015; Luk, Bialystok, added to the experimental paradigm to guarantee separation Craik, & Grady, 2011). Gold (2014) asserts that increased of the two sets of pictures in the bilingual model since the activity with frontal regions as a result of bilingualism rest of the input consisted of a random pattern. The final serves to protect against age-related decline within those three binary digits of each input presentation related to the circuits related to executive processing. membership of semantic category A or B. 34 input patterns were used in the monolingual model and the monolingual Whilst the cognitive advantages of bilingualism appear input set was augmented with a further 34 patterns for the well documented, the linguistic deficits associated with bilingual model, making 64 in total for the bilingual model. having more than one language are equally well-researched. The input ‘words’ used for the monolingual network were For example, it is generally accepted that one of the taken from a dataset of English phonemes which had been predominant negative effects of bilingualism is the converted to a binary input set using a set of 19 features vocabulary size. This is generally smaller compared to (Thomas & Karmiloff-Smith, 2003). The input set for the monolinguals for both languages spoken (Mahon & monolingual network comprised of 34 English words with a Crutchley, 2006; Portocarrero, Burright, & Donovick, further 34 Greek words produced for the bilingual model. 2007). However, equivalence in vocabulary size for L1 The English words (L1) were used both in the monolingual between monolinguals and bilinguals has been found in very and bilingual model and the Greek words represented the young children (age 24 months; Poulin-Dubois, Bialystok, second language in the bilingual model (L2). Both Blaye, Polonia, & Yott, 2013). In addition to size of lexicon, monolingual and bilingual models had 40 output nodes. In bilinguals also appear to have more trouble accessing the monolingual model, the first 19 nodes in each output particular words. Picture naming tasks have shown that vector were taken up by English words whilst the rest of the bilinguals are slower than their monolingual counterparts (e. output nodes were left at zero. This set of output vectors g. Gollan, Montoya, Fennema-Notestine, & Morris, 2005). was the same for the first 34 output vectors in the bilingual Further, verbal fluency tasks in which participants are asked model. However, for the second 34 output vectors the last to name as many words as possible for a given category or 21 units were used in the rest of the outputs since they categories, have demonstrated a disadvantage for bilinguals related to the Greek names for the input patterns. (e.g. Rosselli et al., 2000). Tip of the tongue (Gollan & Acenas, 2004) errors are also more frequent in multilingual Training speakers and it is also reported that bilinguals have trouble Both networks were initially trained for 800 epochs. For identifying specific words through noise (Rogers, Lister, comparison, test data was introduced to both monolingual Febo, Besing, & Abrams, 2006). The aim of this study was and bilingual networks in the form of both categories of L1 to investigate any differences in the development of storage only. 50 simulants were trained in this manner, the starting of representations between monolingual and bilingual weights for each was seeded randomly from a uniform groups of simulants. Two neural network models were distribution of between 0 and 1. All of the following trained to remember the names of a number of ‘pictures’ in analyses represent mean scores. Training for both networks one (monolingual) or two (bilingual) languages. Ageing of took around 200 epochs for the error to reach an asymptotic the networks was simulated by adjusting the gain of the state. Overall, error settled at a slightly higher level in the transfer function (Li, Lindenberger, & Sikström, 2001; bilingual network. This can be attributed to the increase in Servan-Schreiber, Printz, & Cohen, 1990). constraints in the bilingual network as it needed to 733 accommodate the same amount of ‘pictures’ as the monolingual network but in both languages. Given that an Figure 2: Scatterplots representing the distributions of asymptotic state was achieved around 200 epochs, it was representations of categories A and B within L1 of the decided that at 220 epochs the network was considered bilingual network. Each graph refers to hidden layer sizes mature. It may be that there is a considerable ‘grace period’ of five (A), ten (B), fifteen (C) and twenty (D) nodes. The during which the brain does not immediately decline after blue dots relate to category A and the red dots relate to maturity. However, for the purposes of this study, aging picture representations in category B. can be said to begin upon reaching an asymptotic state, both for the purposes of analysis and interventions. The main analysis within this study relied on the online calculation of the separation of categories A and B within Analysis L1. This was achieved through the calculation of a single Using a methodology similar to Thomas (1998) the centroid L1. This provided the distances to an overall mean. difference between monolingual and bilingual modes in Further, distances to the representations within each terms of the storage of representations for both categories of category from the overall centroid were calculated. This L1 was investigated. To this end, a scatterplot was produced provided a measure of within and overall variance upon by carrying out multidimensional scaling on the Euclidean which to calculate an F value. This was calculated at each distances between the activation vectors in response to each epoch for each simulant over the four different hidden layer picture input (see Figures 1. & 2.). After multidimensional sizes. scaling was applied, the inputs were divided into the categories A & B and the three dimensions were plotted to Results illustrate any differences of semantic storage in The overall trend in categorical separation is driven by the representational space. feature differences between the two categories. Spreading of the categories can be seen progressing over the spectrum of A B hidden layer sizes for both models. However, the effect appears greater with the monolingual model. Conversely, clustering of representations within the hidden layer of the bilingual model is tighter. This was confirmed by cluster analysis of category A from L1 in both monolingual and bilingual networks carried out over the period of training (Figure 3.). This demonstrates that overall, higher hidden C D layer size networks showed the greatest spacing between representations with the two most dispersed categories belonging to the monolingual network. A 2*2 ANOVA carried out on the distances from the individual scores of the simulants at maturity with monolingual or bilingual as one factor and hidden layer as the other. The analysis demonstrated main effects for both hidden layer (F (3,392) = 1539, p<.001) and type of network (F (1,392) = 516, Figure 1: Scatterplots representing the distributions of p<.001) together with an interaction between the two (F representations of categories A and B within L1 of the (3,392) = 1953, p<.001). Therefore, it appears that such an monolingual network. Each graph refers to hidden layer effect may be due to the differences in space cause by the sizes of five (A), ten (B), fifteen (C) and twenty (D) nodes. additional constraints of second language storage offset only The blue dots relate to category A and the red dots relate to by a higher storage capacity. picture representations in category B. A B Sum of distances from centroid C D Epochs 734 Figure 3: Line graph demonstrating the projection of the and bilingual models. Differences in the projections over sum of the distances from calculated centroid in category for time were investigated between the ten hidden layer monolingual and bilingual models over all hidden layer networks for monolingual and bilingual networks. Firstly a sizes. Lines represent mean score of 50 simulants. bounded line graph was produced (Figure 5.) This demonstrated a greater variability in scores for the In order to provide a more valid interpretation of monolingual simulants. biological change over lifespan, the model integrated a gradual decline in the slope of the log sigmoidal transfer function. This manipulation reflects an age related reduction in dopamine which in turn relates to cognitive decline and a generally greater susceptibility to neural noise as the network becomes less discerning (Bäckman et al., 2010; Li et al., 2001; Servan-Schreiber et al., 1990). Firstly, gain was set to decline gradually in steps of .0015 from the beginning of training. However, differentiation of F-Value either category could not be achieved. Therefore, a necessary and more valid representation of dopamine attenuation over lifespan was achieved by initiating the decline of gain after maturity, in this case, 220 epochs (Figure 4.) Epochs Figure 5: Bounded line graph of monolingual (red) and bilingual (blue) mean F-Value scores for ten unit hidden F-Value layer versions only. The shaded area around each line is one standard error of the mean. Multilevel analysis was used in which individual scores were used as the dependent variable with the epoch and as the first level predictor. The second level was grouped by whether the model was monolingual or bilingual. This was estimated as a random effect due to the differing projections between monolingual and bilingual. Epochs Υij = γ00 + u0j + rij Figure 4: Projections of F-values as a measure of the dedifferentiation between categories A & B in L1 for both Where Y represents the F-value at Epoch i for group j. A models. The age-related decrease in sensitivity of the Likelihood ratio test demonstrated a significantly better fit artificial neurons in the network is represented by the gain with the inclusion of the random component ‘group’ decline. This started at 220 epochs and decreased gradually (p<.001). until the end of training at 800 epochs. Discussion Inline analysis of the separation items from both This study represents an initial attempt at exploring the way categories demonstrated an increasing separation of in which bilingualism influences how categories within semantic information up until maturity for all groups. languages are represented. The models presented in this However, characteristics of the different projections differ paper represent the first attempt at modelling and analyzing in response to gain decline. Dedifferentiation appears multilingualism from a representational perspective. This almost immediately for ten and fifteen node monolingual study explored differences between monolingual and models. However, for the other sizes of hidden layer for bilingual models, each with differing levels of hidden layer monolingual and all bilingual hidden layer sizes, size, taken as a proxy of brain reserve capacity. Both models dedifferentiation in the representational separation of were trained over a number of epochs, representing a categories for all hidden layer sizes for both monolingual lifespan. Further, a change in the sigmoidal transfer function 735 slope was included after asymptote. The results of this study spacing to behavioral outcomes. Further, it is important to provide commentary on the way in which picture note that the variability, especially within monolingual representations separate according to input characteristics. simulants, demonstrates the importance that must be placed Further, the results also provide an explanation of the on monitoring the progression of clinical outcomes for a behavioral observations of multilingual individuals. single individual. The difference in projections demonstrate that behavioral outcomes may vary according to point in an The analysis carried out in this study suggests two main individual’s lifespan that testing has occurred as well as the effects of bilingualism on semantic memory. Firstly, individual differences within a group. However, in order to Categories within a single representational space in a fully relate performance to the changes in representational bilingual speaker are sensitive to space. Equivalence on clustering demonstrated in this study, further research is hidden layer size or brain reserve capacity leads to a greater required. contraction and overlap of representations that the monolingual equivalent. Secondly, gain change produces A novel analysis of the representational space was carried differing effects according to the constraining factors. Ten out on simple three layer networks portraying monolingual and fifteen node hidden layer monolingual networks and bilingual speakers. The results demonstrate that some declined in representational separation almost immediately. of the negative effects of multilingualism may be due to All other networks continued to separate to a degree before space constraints rather than relative underuse of multiple declining. languages. Further, this research raises questions as to the efficacy of testing at discreet time points over multiple A trend toward poorer separation and greater clustering individuals where continuous assessment may provide a together of the bilingual versions could lead to the observed clearer picture of the contribution of multilingualism to deficits in lexical access. The slower reaction times offsetting cognitive decline. observed in verbal fluency tasks (Bialystok, 2008) may be related to the inability to separate the individual References representations to the same degree of success as their Albert, M. S., Jones, K., Savage, C. R., Berkman, L., Seeman, T., monolingual counterparts. This provides a different Blazer, D., & Rowe, J. W. (1995). Predictors of cognitive perspective to the weaker link hypothesis (Gollan, Montoya, change in older persons: MacArthur studies of successful Cera, & Sandoval, 2008). The weaker link hypothesis aging. Psychology and Aging, 10(4), 578–579. suggests that links between representations are weaker due to the relative lack of use compared to monolingual Alladi, S., Bak, T. H., Duggirala, V., Surampudi, B., Shailaja, M., individuals who would use one language all of the time Shukla, A. K., Kaul, S. (2013). Bilingualism delays age at onset of dementia, independent of education and rather than share communication between two languages. immigration status. Neurology, 81(22), 1938–44. Here, we suggest that links may be too close between representations. Further, recall errors in lexical decision Bäckman, L., Lindenberger, U., Li, S.-C., & Nyberg, L. (2010). making tasks may be due to the smaller distances and Linking cognitive aging to alterations in dopamine greater overlap observed in bilingual representational space. neurotransmitter functioning: recent data and future avenues. Neuroscience and Biobehavioral Reviews, 34(5), What might be expected from smaller clustering of 670–7. representations is an increased search speed. However, the speed accuracy tradeoff may account for the lower than Bak, T. H., Nissan, J. J., Allerhand, M. M., & Deary, I. J. (2014). expected speeds observed in behavioral studies of lexical Does bilingualism influence cognitive aging? Annals of recall. Neurology, 75(6), 959–63. Barnes, D. E., Tager, I. B., Satariano, W. A., & Yaffe, K. (2004). Given the executive nature of the positive contribution of The Relationship Between Literacy and Cognition in Well- multilingualism to cognitive reserve, it is difficult to relate Educated Elders. The Journals of Gerontology Series A: the changes occurring in representational space to executive Biological Sciences and Medical Sciences, 59(4), 390–395. processes. However, what this study has indicated is that Bialystok, E. (1988). Levels of bilingualism and levels of linguistic caution must be made when interpreting behavioral results awareness. Developmental Psychology, 24, 560–567. gained from individuals over discreet periods of time. This Bialystok, E. (2001). Bilingualism in Development: Language, is due to both the within group variance and the between Literacy, and Cognition (Vol. 8, p. 288). Cambridge category differences in projections over differing amounts University Press. of passive reserve and cognitive reserve, the former represented by differing amounts of hidden layer units and Bialystok, E. (2008). Bilingualism: The good, the bad, and the the latter represented by multilingualism. Specifically, a indifferent. Bilingualism: Language and Cognition, 12(01), 3-11. result of note is the steeper trajectory of decline near end of lifespan. This differed between mono and bilingual groups Bialystok, E., Craik, F. I. M., & Freedman, M. (2007). with a steeper decline observed in monolingual models. Bilingualism as a protection against the onset of symptoms Further research is required to relate the representational of dementia. Neuropsychologia, 45(2), 459–64. 736 Bialystok, E., Craik, F. I. M., Green, D. W., & Gollan, T. H. Peal, E., & Lambert, W. E. (1962). The Relation Of Bilingualism (2009). Bilingual Minds. Psychological Science in the To Intelligence. Psychological Monographs: General and Public Interest , 10 (3 ), 89–129. Applied, 76(27), 1–23. Chertkow, H., Whitehead, V., Phillips, N., Wolfson, C., Atherton, Portocarrero, J. S., Burright, R. G., & Donovick, P. J. (2007). J., & Bergman, H. (2010). Multilingualism (but not always Vocabulary and verbal fluency of bilingual and monolingual bilingualism) delays the onset of Alzheimer disease: college students. Archives of Clinical Neuropsychology : evidence from a bilingual community. Alzheimer Disease The Official Journal of the National Academy of and Associated Disorders, 24(2), 118–25. Neuropsychologists, 22(3), 415–22. Costa, A., Hernández, M., & Sebastián-Gallés, N. (2008). Poulin-Dubois, D., Bialystok, E., Blaye, A., Polonia, A., & Yott, J. Bilingualism aids conflict resolution: evidence from the (2013). Lexical access and vocabulary development in very ANT task. Cognition, 106(1), 59–86. young bilinguals. The International Journal of Bilingualism : Cross-Disciplinary, Cross-Linguistic Studies Galambos, S. J., & Goldin-Meadow, S. (1990). The effects of of Language Behavior, 17(1), 57–70. learning two languages on levels of metalinguistic awareness. Cognition, 34(1), 1–56. Roberts, P. M., Garcia, L. J., Desrochers, A., & Hernandez, D. (2002). English performance of proficient bilingual adults Gold, B. T. (2014). Lifelong bilingualism and neural reserve on the Boston Naming Test. Aphasiology, 16(4-6), 635–645. against Alzheimer’s disease: A review of findings and potential mechanisms. Behavioural Brain Research, 281C, Rogers, C. L., Lister, J. J., Febo, D. M., Besing, J. M., & Abrams, 9–15. H. B. (2006). Effects of bilingualism, noise, and reverberation on speech perception by listeners with normal Gollan, T. H., & Acenas, L. A. R. (2004). What is a TOT? Cognate hearing. Applied Psycholinguistics, 27(03), 465–485. and translation effects on tip-of-the-tongue states in Spanish-English and tagalog-English bilinguals. Journal of Rosselli, M., Ardila, A., Araujo, K., Weekes, V. A., Caracciolo, Experimental Psychology. Learning, Memory, and V., Padilla, M., & Ostrosky-Solís, F. (2000). Verbal fluency Cognition, 30(1), 246–69. and repetition skills in healthy older Spanish-English bilinguals. Applied Neuropsychology, 7(1), 17–24. Gollan, T. H., Montoya, R. I., Cera, C., & Sandoval, T. C. (2008). More use almost always a means a smaller frequency effect: Scarmeas, N., Albert, S. M., Manly, J. J., & Stern, Y. (2006). Aging, bilingualism, and the weaker links hypothesis. Education and rates of cognitive decline in incident Journal of Memory and Language, 58(3), 787–814. Alzheimer’s disease. Journal of Neurology, Neurosurgery & Psychiatry, 77(3), 308–316. Gollan, T. H., Montoya, R. I., Fennema-Notestine, C., & Morris, S. K. (2005). Bilingualism affects picture naming but not Servan-Schreiber, D., Printz, H., & Cohen, J. (1990). A network picture classification. Memory & Cognition, 33(7), 1220– model of catecholamine effects: gain, signal-to-noise ratio, 1234. and behavior. Science, 249(4971), 892–895. Gollan, T. H., Montoya, R. I., & Werner, G. A. (2002). Semantic Stern, Y. (2003). The concept of cognitive reserve: a catalyst for and letter fluency in Spanish-English bilinguals. research. J Clin.Exp.Neuropsychol., 25(5), 589–593. Neuropsychology, 16(4), 562–76. Stern, Y. (2009). Cognitive reserve. Neuropsychologia, 47(10), Grady, C. L., Luk, G., Craik, F. I. M., & Bialystok, E. (2015). 2015–2028. Brain network activity in monolingual and bilingual older adults. Neuropsychologia, 66, 170–81. Thomas, M. S. (1998, January). Distributed representations and the bilingual lexicon: One store or two?. In 4th Neural Kavé, G., Eyal, N., Shorek, A., & Cohen-Mansfield, J. (2008). Computation and Psychology Workshop, London, 9–11 Multilingualism and cognitive state in the oldest old. April 1997. Psychology and Aging, 23(1), 70–78. Thomas, M. S. C., & Karmiloff-Smith, A. (2003). Modeling Li, S. C., Lindenberger, U., & Sikström, S. (2001). Aging language acquisition in atypical phenotypes. Psychological cognition: from neuromodulation to representation. Trends Review, 110(4), 647–82. in Cognitive Sciences, 5(11), 479–486. Valenzuela, M. J., & Sachdev, P. (2006). Brain reserve and Luk, G., Bialystok, E., Craik, F. I. M., & Grady, C. L. (2011). dementia: a systematic review. Psychological Medicine, Lifelong bilingualism maintains white matter integrity in 36(04), 441–454. older adults. The Journal of Neuroscience : The Official Journal of the Society for Neuroscience, 31(46), 16808– 16813. Mahon, M., & Crutchley, A. (2006). Performance of typically- developing school-age children with English as an additional language on the British Picture Vocabulary Scales II. Child Language Teaching and Therapy, 22(3), 333–351. Mortimer, J. A. (1997). Brain reserve and the clinical expression of Alzheimer’s disease. Geriatrics, 52 Suppl 2, S50–3. 737