Local associations and semantic ties in overt and masked semantic priming Andrea Nadalini Marco Marelli Roberto Bottini Davide Crepaldi International School Bicocca University Center for mind/brain International School for Advanced Studies Milan, Italy sciences for Advanced Studies Trieste, Italy marco.marelli Trento, Italy Trieste, Italy anadalini @unimib.it roberto.bottini@ dcrepaldi @sissa.it unitn.it @sissa.it portamentali. Alla luce di tali risultati, Abstract ciò che è stato tradizionalmente conside- rato come effetto semantico potrebbe ba- English. Distributional semantic models sarsi principalmente su associazioni lo- (DSM) are widely used in psycholinguis- cali di co-occorrenza lessicale. tic research to automatically assess the degree of semantic relatedness between 1 Introduction words. Model estimates strongly corre- late with human similarity judgements Over the past two decades, computational se- and offer a tool to successfully predict a mantics has made a lot of progress in the strive wide range of language-related phenom- for developing techniques that are able to pro- ena. In the present study, we compare the vide human-like estimates of the semantic relat- state-of-art model with pointwise mutual edness between lexical items. Distributional Se- information (PMI), a measure of local as- mantic Models (DSM; Baroni and Lenci, 2010) sociation between words based on their assume that it is possible to represent lexical surface cooccurrence. In particular, we meaning based on statistical analyses of the way test how the two indexes perform on a words are used in large text corpora. Words are dataset of sematic priming data, showing modeled as vectors and populate a high- how PMI outperforms DSM in the fit to dimensionsional space where similar words tend the behavioral data. According to our re- to cluster together. Meaning relatedness between sult, what has been traditionally thought two words corresponds to the proximity of their of as semantic effects may mostly rely on vectors; for example, one can approximate relat- local associations based on word co- edness as the cosine of the angle formed by two occurrence. word-vectors: !∙! cosθ = Italiano. I modelli semantici distribuzio- | ! |∙| ! | nali sono ampiamente utilizzati in psico- DSMs have been proposed as a psychologically linguistica per quantificare il grado di plausible models of semantic memory, with par- similarità tra parole. Tali stime sono in ticular emphasis on how meaning representations linea con i corrispettivi giudizi umani, e are achieved and structured (e.g. LSA, Landauer offrono uno strumento per modellare and Dumais, 1997; HAL, Lund and Burgess, un'ampia gamma di fenomeni relativi al 1996). So, they can be pitted against human be- linguaggio. Nel presente studio, confron- havior, in search for psychological validation of tiamo il modello con la pointwise mutual this modeling. For example, the model’s esti- information (PMI), una misura di asso- mates have been used to make reliable predic- ciazione locale tra parole basata sulla tions about the processing time associated with loro cooccorrenza. In particolare, ab- the stimuli (Baroni et al., 2014; Mandera et al., biamo testato i due indici su un set di dati 2017). di priming semantico, mostrando come la PMI riesca a spiegare meglio i dati com- The technique most commonly used to explore over-estimating the importance of rare items semantic processing is the priming paradigm (Manning and Schütze, 1999). (McNamara, 2005), according to which the recognition of a given word (the target) is easier Despite many DSMs use measures of local asso- if preceded by a related word (the prime; e.g., ciation between words like PMI to build contin- cat–dog). Interestingly, facilitation can be ob- gency matrices, the information conveyed by two served both when the prime word is fully visible similar word-vectors is different from the infor- and when it is kept outside of participants’ mation conveyed by two highly recurrent words. awareness through visual masking (Forster and Cosine similarity is based on “higher order” co- Davis, 1984; de Wit and Kinoshita, 2015). In this occurrences: two words are similar in the way technique, the prime stimulus is displayed short- they are used together with all the other words in ly, embedded between a forward and a backward the vocabulary. Local measures as PMI instead string (Figure 1). rely only on the effective co-presence of two given words. Two synonyms like the words car and automobile are not likely to often appear close to each other in a given text, still they rep- resent the same referent, and therefore expected to be used in similar contexts. Based on these considerations, PMI and DSMs can be pitted against human behavior, in search Figure 1: exemplar trial in a masked priming experiment. for psychological validation of this modeling. In The prime stimulus is briefly presented (<= 50 ms), between particular, we tested how PMI and cosine prox- the two masks, before the onset of the target stimulus. imity predicts priming in a set of data encom- passing different prime visibility conditions Beside words’ distribution, one can be interested (masked vs unmasked) and prime durations (33, in the local association strength between lexical 50, 200, 1200 ms). items, starting from the assumption that two words that are often used close to each other, 2 Our Study tend to become associated. Yet, a given pair may be often attested only because the two compo- 2.1 Material nents are in turn highly frequent. Therefore, raw All the stimuli used in the current study were frequency counts are often transformed into italian words. 50 words referring to animals and some kinds of association measure which can 50 words referring to tools were used as target determine if the pair is attested above chance stimuli. Each word in this list was paired with (Evert, 2008). A common method is to compute three words from the same category, resulting in pointwise mutual information (PMI) between 300 unique prime-target couples which were di- two words, according to the formula: vided into three rotations. We add to each rota- tion 100 additional filler trials which will not be !(!₁,!₂) PMI(w1,w2) = log2 included in the analysis step. More precisely, we !(!₁)!(!₂) used abstract word as target stimuli, paired with animals and tool primes different from those pre- where p(w1,w2) corresponds to the probability of sented in the experimental trials. In this way we the word pair, while p(w1) and p(w2) to the indi- ensured that the response to the target was not vidual probabilities of the two components predictable by the presence of the prime. (Church and Hanks, 1990). Relatedness estimates were obtained by looking at the stimuli distribution across the ItWac cor- PMI has been used to model a wide range of pus, a linguistic database of nearly 2 billion psycholinguistics phenomena, from similarity words built through web crawling (Baroni et al., judgements (Recchia and Jones, 2009) to reading 2009). We downloaded the lemmatized and part- speed (Ellis and Simpson-Vlach, 2009). Moreo- of-speech annotated corpus, freely provided by ver, PMI has also been shown to successfully the authors. All characters were set to lowercase, generalize to non-linguistic fields as epistemolo- and special characters were removed together gy and psychology of reasoning (Tentori et al., with a list of stop-words. 2014). On the other hand, PMI has the limit of PMI between the word pairs was computed block they were asked to press the yes-button if based on frequency counts gained by sliding a 5- the target word referred to a tool. The order of words window along ItWac. Cosine proximity the two blocks was counterbalanced across sub- between word vectors was obtained training a jects. 10 practice and 2 warm-up trials were pre- word2vec model (Mikolov et al., 2013) on the sented before each block. Participants could take same corpus. Model’s parameters were set ac- a short break halfway through each block. cording to the WEISS model (Marelli, 2017). All Each trial began with a 750 ms fixation-cross words attested at least 100 times were included (+). Prime duration was varied across experi- in the model, which was trained using the con- ments: 33, 50, 200 and 1200 ms respectively. In tinuous-bag-of-word architecture, a 5-word win- the former two conditions, prime visibility was dow and 200 dimensions. The parameter k for prevented through forward and backward visual negative sampling was set to 10, and the sub- masks. Finally, the target word was left on the sampling parameter to 10-5. screen until a response was provided. Correlations between semantic and lexical varia- bles are shown in Table 1. Prime visibility task. In the experiments with the Target Target PMI cosine masked primes, participants were not informed length frequency about their presence. This was only revealed af- ter the relevant session, when participants were Target length 1 invited to take part into a prime visibility task requiring them to spot the presence of the letter Target -.211 1 “n” within the masked word. After the first two frequency examples, where prime duration was increased to PMI .091 -.205 1 150 ms to ensure visibility, 10 practice and 80 experimental trials were displayed. Prime visibil- cosine .147 -.059 .541 1 ity was quantified through a d–prime analysis carried out on each participant (Green and Swets Table 1: Correlations between lexical and semantic indexes in our stimulus set. ,1966). 2.3 Results 2.2 Methods Participants: Overall, 246 volunteers were Response times (RT) were analyzed on accurate, recruited for the current study, and were assigned yes-response trials only. RT were inverse trans- to the different prime timing conditions. All sub- formed to approximate a normal distribution and jects were native Italian speakers, with normal or employed as a dependent variable in linear corrected-to-normal vision and no history of neu- mixed-effects regression models. This analysis rological or learning diseases. allows us to control for all the covariates that may have affected the performance, such as trial Apparatus: All stimuli were displayed on a 25’’ position in the randomized list, rotation, RT and monitor with a refresh rate of 120 Hz, using accuracy on the preceding trial, the response re- MatLab Psychtoolbox. The words and the masks quired in the preceding trial, frequency and were presented in Arial font 32, in white color length of the target. All these variables, together against a black background. with the two semantic indexes (PMI and cosine proximity), were entered in the model as fixed Procedure: Participants were engaged in a clas- effects, while participants and items were con- sic YES/NO task, requiring them to classify the sidered as random intercepts. Model selection stimuli as members of either the animal or the was implemented stepwise, progressively remov- tool category, according to the instructions. YES- ing those variables whose contribution to good- response were always provided with the domi- ness of fit was not significant. nant hand. Each unique prime-target pair was presented on- In the masked priming data, neither PMI nor co- ly once to each participant. Experimental ses- sine proximity were reliable predictors by them- sions included a total of 200 trials, which were selves (p=.298 and p=.206, respectively). How- divided into two blocks. In one block, subjects ever, both indexes interacted with prime visibil- were asked to press the yes-button if the target ity as tracked by participants’ d–prime word referred to an animal, while in the other (𝐹!"#∗!! (1, 9750)= 13.74, p<.001; 𝐹!"#∗!! (1, 9745)= 13.24, p<.001.). As illustrated in Figure Conclusion 1, the more each participant could see the prime word, the higher the priming effect she dis- Thanks to the help of computational methods, we played. provided new insights on the nature of the pro- cessing that supports semantic priming. Overall, effects seem to be primarily driven by local word associations as tracked by Pointwise Mutual In- formation—when semantic priming emerged, PMI effects were consistently stronger and more solid than those related to DSM estimates. This would be in line with previous literature suggest- ing that the behavior of the human cognitive sys- Figure 1. Interaction between d’ and prime–target associa- tem may be effectively described by Information tion. Both PMI (left) and cosine proximity (right) effects Theory principles. For example, Paperno and become stronger as prime visibility (d’) increases. Error colleagues (Paperno et al., 2014) showed that bars refer to 95% C.I. PMI is a significant predictor of human judge- ments of word co–occurrence. In the overt priming data, both PMI and cosine The results from masked priming offer another proximity yield a significant main effect (50ms important insight—some kind of prime visibility presentation time: 𝐹!"# (1,9769)= 10.36, p= .001; may be required for semantic/associative priming 𝐹!"# (1, 9769)= 8.602, p= .0058), but only PMI to emerge. Other studies have shown genuine significantly predicts priming when both indexes semantic effects with subliminally presented are entered into the model (𝐹!"# (1,9769)= 10.36, stimuli (Bottini et al., 2016). However, they typi- p= .001; 𝐹!"# (1,9769)=0.60, p=.489). Results cally used words from small/closed classes (e.g., were very consistent across conditions and showed spatial words, planet names). Conversely, we the same pattern when prime presentation time was drew stimuli across the lexicon, and sampled 200ms or 1200ms (see Figure 2). form very large category such as animals and tools; this may point to an effect of target pre- dictability. In general, our data cast some doubts on a wide–across–the–lexicon processing of se- mantic information outside of awareness. References Baroni M., S. Bernardini, A. Ferraresi and E. Zan- chetta. (2009). The WaCky Wide Web: A Collec- tion of Very Large Linguistically Processed Web- Crawled Corpora. Language Resources and Evalu- ation 43 (3): 209-226. Baroni, M., Dinu, G., and Kruszewski, G. (2014). Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meet- ing of the Association for Computational Linguis- tics (Volume 1: Long Papers) (Vol. 1, pp. 238-247) Bottini, R., Bucur, M., and Crepaldi, D. (2016). The nature of semantic priming by subliminal spatial words: Embodied or disembodied?. Journal of Ex- Figure 2. Significant effect of PMI (right) and non- perimental Psychology: General, 145(9), 1160. significant effect of cosine proximity (right) across prime presentation times (50ms, 200ms, 1200ms on the first, se- de Wit, B., and Kinoshita, S. (2015). The masked se- cond and third row respectively). Error bars refer to 95% mantic priming effect is task dependent: Reconsid- C.I. ering the automatic spreading activation process. Journal of Experimental Psychology: Learning, Memory, and Cognition, 41(4), 1062. Ellis, N. C., and Simpson-Vlach, R. (2009). Formula- ic language in native speakers: Triangulating psy- cholinguistics, corpus linguistics, and education. More Accurate and Time Consistent? Cognitive Corpus Linguistics and Linguistic Theory, 5(1), 61- science, 40(3), 758-778. 78. Evert, S. (2008). Corpora and collocations. Corpus linguistics. An international handbook, 2, 223-233. Forster, K. I., and Davis, C. (1984). Repetition prim- ing and frequency attenuation in lexical access. Journal of experimental psychology: Learning, Memory, and Cognition, 10(4), 680. Green D.M. and Swets J.A. (1966). Signal detection theory and psychophysics. Wiley New York. Landauer, T. K., and Dumais, S. T. (1997). A solution to Plato's problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological review, 104(2), 211. Lund, K., and Burgess, C. (1996). Producing high- dimensional semantic spaces from lexical co- occurrence. Behavior research methods, instru- ments, and computers, 28(2), 203-208. Mandera, P., Keuleers, E., and Brysbaert, M. (2017). Explaining human performance in psycholinguistic tasks with models of semantic similarity based on prediction and counting: A review and empirical validation. Journal of Memory and Language, 92, 57-78. Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press. Marelli, M. (2017). Word-Embeddings Italian Seman- tic Spaces: A semantic model for psycholinguistic research. Psihologija, 50(4), 503-520. McNamara, T. P. (2005). Semantic priming: Perspec- tives from memory and word recognition. Psychol- ogy Press. Mikolov, P., Chen, K., Corrado, G. S. Dean, J. (2013). Efficient estimation of word representations in vec- tor space. Available from ArXiv:1301.3781. Paperno, D., Marelli, M., Tentori, K., and Baroni, M. (2014). Corpus-based estimates of word associa- tion predict biases in judgment of word co- occurrence likelihood. Cognitive psychology, 74, 66-83. Recchia, G., and Jones, M. N. (2009). More data trumps smarter algorithms: Comparing pointwise mutual information with latent semantic analysis. Behavior research methods, 41(3), 647-656. Rohaut, B. and Naccache, L. (2018), What are the boundaries of unconscious semantic cognition? Eur J Neurosci. . doi:10.1111/ejn.13930. Tentori, K., Chater, N., and Crupi, V. (2016). Judging the Probability of Hypotheses Versus the Impact of Evidence: Which Form of Inductive Inference Is