=Paper=
{{Paper
|id=Vol-1419/paper0026
|storemode=property
|title=On Mental Imagery in Lexical Processing: Computational Modeling of the Visual Load Associated to Concepts
|pdfUrl=https://ceur-ws.org/Vol-1419/paper0026.pdf
|volume=Vol-1419
|dblpUrl=https://dblp.org/rec/conf/eapcogsci/RadicioniGCBLSM15
}}
==On Mental Imagery in Lexical Processing: Computational Modeling of the Visual Load Associated to Concepts==
On Mental Imagery in Lexical Processing: Computational Modeling of the Visual Load Associated to Concepts Daniele P. Radicioniχ , Francesca Garbariniψ , Fabrizio Calzavariniφ Monica Biggioψ , Antonio Lietoχ , Katiuscia Saccoψ , Diego Marconiφ (FirstName.Surname@unito.it) χ Department of Computer Science, Turin University – Turin, Italy φ Department of Philosophy, Turin University – Turin, Italy ψ Department of Psychology, Turin University – Turin, Italy Abstract Intuitively, words like ‘dog’ or ‘apple’ refer to concrete This paper investigates the notion of visual load, an es- entities and are associated with a high visual load, im- timate for a lexical item’s efficacy in activating mental plying that these terms immediately generate a mental images associated with the concept it refers to. We elab- image. Conversely, words like ‘algebra’ or ‘idempotence’ orate on the centrality of this notion which is deeply and variously connected to lexical processing. A com- are hardly accompanied by the production of vivid im- putational model of the visual load is introduced that ages. Although the construct of visual load is closely builds on few low level features and on the dependency related to that of concreteness, concreteness and visual structure of sentences. The system implementing the proposed model has been experimentally assessed and load can clearly dissociate, in that i) some words have shown to reasonably approximate human response. been rated high in visual load but low in concreteness, Keywords: Visual imagery; Computational modeling; such as some concrete nouns that have been rated low Natural Language Processing. in visual load (Paivio, Yuille, & Madigan, 1968); and, conversely, ii) abstract words such as ‘bisection’ are as- Introduction sociated with a high visual load. Ordinary experience suggests that lexical competence, The notion of visual load is relevant to many disci- i.e. the ability to use words, includes both the abil- plines, in that it contributes to shed light on a wide vari- ity to relate words to the external world as accessed ety of cognitive and linguistic tasks and helps explaining through perception (referential tasks) and the ability to a plethora of phenomena observed in both impaired and relate words to other words in inferential tasks of sev- normal subjects. In the next Section we survey a mul- eral kinds (Marconi, 1997). There is evidence from both tidisciplinary literature showing how mental imagery af- traditional neuropsychology and more recent neuroimag- fects memory, learning and comprehension; we consider ing research that the two aspects of lexical competence how imagery is characterized at the neural level; and we may be implemented by partly different brain processes. show how visual information is exploited in state-of-the- However, some very recent experiments appear to show art Natural Language Processing research. In the subse- that typically visual areas are also engaged by purely quent Section we illustrate the proposed computational inferential tasks, not involving visual perception of ob- model for providing concepts with their visual load char- jects or pictures (Marconi et al., 2013). The present work acterization. We then describe the experiments designed can be considered as a preliminary investigation aimed to assess the model through an implemented system, re- at verifying this main hypothesis, by investigating the port and discuss the obtained results. Conclusion will following issues: i) to what extent the visual load asso- summarize the work done and provide an outlook on fu- ciated with concepts can be assessed, and which sort of ture work. agreement exists among humans about the visual load associated to concepts; ii) which features underlie the Related Work visual load associated to concepts; and iii) whether the notion of visual load can be grasped and encapsulated As regards linguistic competence, it is generally ac- into a computational model. cepted that visual load facilitates cognitive perfor- As it is widely acknowledged, one main visual cor- mance (Bergen, Lindsay, Matlock, & Narayanan, 2007), relate of language is imageability, that is the property leading to faster lexical decisions than not-visually of a particular word or sentence to produce an experi- loaded concepts (Cortese & Khanna, 2007). For ex- ence of imagery: in the following, we focus on visual im- ample, nouns with high visual load ratings are remem- agery (thus disregarding acoustic, olfactory and tactile bered better than those with low visual load ratings in imagery), which we denote as visual load. The visual long-term memory tests (Paivio et al., 1968). More- load is related to the easiness of producing visual im- over, visually loaded terms are easier to recognize for agery when an external linguistic stimulus is processed. subjects with deep dyslexia, and individuals respond 181 tings in Yet, vi- lexicalized concepts therein. We propose a model that subjects relies on a simple hypothesis additively combining few quickly low-level features, refined by exploiting syntactic infor- visually mation. ). Neu- The notion of visual load, in fact, is used by and large L’ animale che mangia banane su un albero è la scimmia y apha- The animal that eats bananas on a tree is the monkey in literature with different meanings, thus giving rise to ms that different levels of ambiguity. We define visual load as the Hyde, & Figure1:1:The Thedependency (simplified)tree dependency tree correspond- concept representing a direct indicator (a numeric value) Figure corresponding to a stim- e oppo- ing to the sentence ‘The animal that eats bananas on a of the efficacy for a lexical item to activate mental images ulus. Sa↵ran, tree is the Monkey’. associated to the concept referred to by the lexical item. Warring- We expect that visual load also represents an indirect of information processing—, including a deep represen- measure of the probability of activation of brain areas rds and tation, more which quicklyis anda semantic accurately networkwhenstoredmakingin long-term judgments deputed to the visual processing. activity. memory that contains about visually a hierarchical loaded sentences (Kiranrepresentation & Tuchtenhagen, of We conjecture that the visual load is primarily as- left in- image 2005).descriptions; the spatial research Neuropsychological representation has shown intended that sociated to concepts, although lexical phenomena like greater for collecting many aphasic image components patients perform along withwith better theirlinguistic spatial terms availability (implying that the most frequently image- features; items that the more visualeasily representation elicit visual that builds (Coltheart, imagery on an oc- used terms are easier to recognize than those seen less 98; Mel- cupancy array, storing 1980), although information the opposite pattern such hasasalso shape, beensize,doc- often (Tversky & Kahneman, 1973)) can also affect it. seman- etc.. umented (Cipolotti & Warrington, 1995). Based on the work by Kemmerer (2010) we explore the emporal Visual imageability of concepts evoked by words and hypothesis that a limited number of primitive elements ble sen- sentences is commonlyModel known to affect brain activity. can be used to characterize and evaluate the visual load The Whilemodeling phase has visuosemantic been characterized processing regions, such by the needin- as left associated to concepts. Namely, Kemmerer’s Simulation r, McE- offerior defining temporalthe notion gyrus and of visual fusiform load in arevealed gyrus uniformgreater and Framework allows to grasp information about a wide va- ounting computationally involvement during tractable manner. Such the comprehension concept, of highly in image- riety of concepts and properties used to denote objects, g di↵er- fact, able is usedand words by sentences and large(Bookheimer in literatureetwith di↵erent al., 1998; Mel- events and spatial relations. Three main visual semantic on, etc.) meanings, let, Tzourio, thus Denis, giving & raise to di↵erent Mazoyer, levels 1998), otherof ambi- seman- components have been individuated that, in our opin- some of guity. We define tic brain regions visual (i.e.,load as the concept superior and middle representing temporal ion, are also suitable to be used as different dimensions percep- acortex) direct indicator (a numeric are selectively value)byoflow-imageable activated the efficacy forsen- a along which to characterize the concept of visual load. lexical tencesitem(Melletto activate mental et al., 1998; images Just, associated Newman, Keller, to McE- the They are: color properties, shape properties, and mo- linguis- concept leney, &referred Carpenter, to by the lexical 2004). item. Consequently, Furthermore, a growing num- tion properties. The perception of these properties is ge Pro- weberexpect of studiesthatsuggests visual load thatalso words represents encodingan indirectvi- different expected to occur in a immediate way, such that “dur- egoriza- measure of the probability of activation of a brain area sual properties (such as color, shape, motion, etc.) are ing our ordinary observation of the world, these three the tra- deputed to the visual processing. processed in cortical areas that overlap with some of the attributes of objects are tightly bound together in uni- om text We conjecture that visual loadvisual is situated at theof inter- areas that are activated during perception those fied conscious images” (Kemmerer, 2010). We added a features section of lexical and semantic spaces, mostly associated properties (Kemmerer, 2010). further perceptual component related to size. More pre- 14). Fi- to the semantic level. That features is, the visual load istoprimar- Investigating the visual associated linguis- cisely, our assumption is that information about the size opment ily associated to a concept, although lexical phenomena tic input can be useful to build semantic resources de- of a given concept can also contribute, as an adjoint fac- used to like terms ounding signed toavailability deal with Natural(implying that theProcessing Language most frequently (NLP) tor and not as a primitive one, to the computation of a used terms such problems, are easier to recognizeverbs as individuating thansubcategorization those seen less er, Fer- visual load value for the considered concept. often (Tversky & Kahneman, frames (Bergsma & Goebel, 2011), enriching 1973)) can also a↵ect it. the tradi- In this setting, we represent each concept/property as tional extraction of distributional semantics from the Based on the work by Kemmerer (2010) we explore text a boolean-valued vector of four elements, each encoding ery de- hypothesis that a limited with a multimodal approach, number of primitive integrating textual elements features the following information: lemma, morphological infor- t recon- can withbevisual used to characterize ones (Bruni, Tran, and & evaluate Baroni,the visualFinally, 2014). load mation on POS (part of speech), and then whether the ual and associated to concepts. Namely, Kemmerer’s visual attributes are at the base of the development of Simulation considered concept/property conveys information about ed with Framework annotated allowscorporatoand grasp information resources about that can beausedwidetova- ex- color, shape, motion and size.1 For example, this piece ds their riety of concepts and properties used tend text-based distributional semantics by grounding to denote objects, of information scheme events and spatialonrelations. word meanings Three main visual features, visual as well semantic (Silberer, Fer- table,Noun,1,1,0,1 (1) ual and components have rari, & Lapata, 2013). been individuated that, in our opin- ct coun- ion, are also suitable to be used as di↵erent dimensions can be used to indicate that the concept table (associated n: while along which to characterize Modelthe concept of visual load. with a Noun, and differing, e.g., from that associated objects They are: color properties, shape properties, and mo- with a Verb) conveys information about color, shape and Although much work has been invested in different ar- is acti- tion properties. The perception of these properties is size, but not about motion. In the following, these are eas for investigating imageability in general and visual Unger- expected to occur in a immediate way, such that “dur- 1 imagery in particular, at the best of our knowledge no We adopt here a simplification, since we are assuming e stages ing our ordinary observation of the world, these three that the pair hlemma, POSi is sufficient to identify a con- attempt has been carried out to formally characterize wn kind attributes of objects are tightly bound together in uni- cept/property, and that in general we can access items by visual load, and no computational model has been de- disregarding the word sense disambiguation problem, which vised to compute how visually loaded are sentences and is known as an open problem in the field of NLP. 182 of information tactic structure of sentences is computed through the the following information: lemma, morphological infor- load of the concepts denoted P by the lexical item Turin University Parser (TUP) in the dependency for- finger,Noun,1,1,0,1 (1) mation on POS (part of speech), and then whether the sentence, that is VL(s) = c2s VL(c). mat (Lesmo, 2007). Dependency formalisms represent considered concept/property conveys information about The calculation of the VL score also account syntactic relations by connecting a dominant word, the can be used to indicate that the concept finger (associ- color, shape, motion and size.1 For example, this piece dependency structure of the input sentences. head (e.g., the verb ‘fly’ in the sentence The eagle flies) ated to a Noun, and di↵ering, e.g., from that associated ofand information a dominated word, the dependent (e.g., the noun tactic structure of sentences is computed thr to a Verb) conveys information about color, shape and ‘eagle’ in the same sentence). The connection between Turin University Parser (TUP) in the depend size, but not about motion. In the following these are re- finger,Noun,1,1,0,1 (1) mat (Lesmo, 2007). Dependency formalisms these two words is usually represented by using labeled ferred to as the visual features · associated to the given directed edges (e.g., subject): the concept collection of all(associ- depen- syntactic relations by connecting a dominant w concept. can be used to indicate that the finger head (e.g., the verb ‘fly’ in the sentence The ea dency ated to relations a Noun, of a di↵ering, and sentence forms e.g., a tree, from thatrooted in the associated We have then built a dictionary by extracting it from a main verb conveys (see theinformation parse tree illustrated in shape Figureand 1). and a dominated word, the dependent (e.g., definitionset target T d of stimuli (illustrated hereafter) Morphological to information a Verb) about color, Dictionary‘eagle’ annotated z }| { z }| { composed by simple size, The dependency structure is relevant in our approach, but not about motion. In the following these are re- in the same sentence). The connection { The big carnivore with yellowsentences and blackdescribing stripes isathe concept; and manually annotated . . . tiger because we assume that some sort of reinforcement withef-features these two words is usually represented by usin | {z } the visual features associated to each concept. The as- hlemma, ferredPOSito as the visual features · associated to the given directed edges (e.g., subject): the collection of a stimulus st fect may apply in cases where both a word and its de- signment of features scores has been conducted by the concept. hlemma, POSi (or governor(s)) are associated to some visual dency relations of a sentence forms a tree, root pendent(s) authors on a purely introspective basis. hlemma, We have then built a dictionary by extracting it from a POSi main verb (see the parse tree illustrated in F feature. For example, a phrase like ‘with lemma,POS, black stripes’ , sha , mot , sizstructure is relevant in our a Di↵erent weighting schemes w ~ = {↵, , } have been set of stimuli (illustrated hereafter) composed by simple colThe dependency hlemma, POSi to evoke mental images in a more vivid way is expected lemma,POS, , sha , we , siz that some sort of reinforce tested in order to set features contribution to the visual sentences load associated to a concept c, that results from com- the ..... describing a concept; and manually than its elements taken in isolation (that is, ‘black’ and visual features associated to each concept. The as- colfect ‘stripes’), and its visual load is expected lemma,POS, annotated colbecause to still grow , sha motassume , mot may apply, siz in cases where both a word an puting signment if we addofa features authors on a coordinated purely scores has like term, introspective beeninconducted basis. ..... ‘with yellow by and the pendent(s) (or governor(s)) are associated to som X black stripes’. Yet, the VL would –recursively– grow if feature. For example, a phrase like ‘with black VL(c, w) ~ = i = ↵( col + sha )+ mot + siz . (2) weDi↵erent added a weighting governor term schemes w ~ =with (like ‘fur {↵, yellow , } have been and black is expected to evoke mental images in a more v TUP parser i tested in order to set features contribution stripes’). We then introduced a parameter ⇠ to control to the visual than its elements taken in isolation (that is, ‘b load associated to a concept c, that the contribution of the aforementioned features in case results from com- ‘stripes’), and its visual load is expected to s For the experimentation we set ↵ to 1.35, to 1.1 and puting the corresponding terms are linked in the parse tree by if we add a coordinated term, like in ‘with ye to .9. To the ends of combining the contribution of concepts X a modifier/argument relation (denoted as mod and arg black stripes’. Yet, the VL would –recursively VL(c, in w) ~ = 3). i = ↵( col + sha )+ mot + siz . (2) Equation we added a governor term (like ‘fur with yellow a in a sentence sDependency VL score for s, we adapt to the overallstructure i 2 the principle of compositionality to the visual load do- ( stripes’). We then introduced a parameter ⇠ t main. In other words, we assume that the visual load of For ⇠ VL(ci ) if 9 cj s.t. mod(ci , cj ) _ arg(ci , cj ) the contribution of the aforementioned feature VL(c the experimentation i) = we set ↵ to 1.35, to 1.1 and a sentence can be computed by starting from the visual to .9. VL(ci ) otherwise. the corresponding terms are linked in the pars 1 To the ends of combining the contribution of concepts (3) a modifier/argument relation (denoted as mod We adopt here a simplification, since we are assuming In the experimentation ⇠ was set to 1.2. in Equation 3). that the pair hlemma, POSi is sufficient to identify a con- in a sentence s to the overall VL score for s, we adapt cept/property, and that in general we can access weighting items by scheme w, ~ is then the principle computed as follows: of compositionality 2 to the visual load NVNV do- Non-Visual ( Target—Non-Visual Definition disregarding the word sense disambiguation problem, which main. TheInstimuli Pin thewedataset other words, assumeare thatpairs consisting the visual load(e.g.,of The quality of⇠people of V L(ci )thatif 9 cj s.t. easily mod(c solve i , cj ) _ difficult is actually an open problem in the field of Natural Language VL(d, V L(ci ) = Processing (Vidhu Bhala & Abirami, 2014). aasentence w) ~ = d and definition can be c2d aVL(c) computed (st = hd, targetbyT starting (4) fromT i),thesuch problems visual as is said . . . Vintelligence). L(ci ) otherwise. definition d target T 2 This principle states that the meaning of an expression VL(T, z w) ~ = VL(T ). }| (5) { z }| { 1 is a function of the meanings of its parts and of the way they We big The adopt here awith carnivore simplification, yellow and black sincestripes we are assuming is the . . . tiger. In the experimentation | the pair hlemma, POSi is{zsufficient to identifyFor each} condition, there were 48 ⇠sentences, was set tofor1.2.overall sentence that are syntactically combined: to get the meaning of a Aggiungere figura e descrizione di alto stimulus livello st della a con- we combine words to form phrases, then we combine phrases cept/property, and that in general we can access items 192 sentences. by Each trial lasted about 30 minutes. The to form clauses, and so on (Partee, 1995). pipeline. disregarding The visual the loadword System sense disambiguation associated implementing to st components, problem, given which number the the of words The (nouns stimuli and in the dataset and adjectives) are pairs the (syn-cons is actually an open problem in computational the field of Naturalmodel Languageof VL a definition d andofathetarget = hd, T i), T (st sentences Processing (Vidhu Bhala & Abirami, 2014). tactic dependency) structure considered 2 Experimentation were homogeneous z definition d within conditions. }| This principle states that the meaning of an expression Materials isand Methods a function of the meanings of its parts and of the wayThe they same The big carnivore with yellow and black stripes is t |set of stimuli used for the {z human experi- are syntactically combined: to get the meaning of a sentence Figure 2: The pipeline to compute the VL Forty-five score according healthy volunteers we combine to the words (23 to form proposed females phrases,andthen computational 22 males), we combine ment phrases was given model. in input to the system stimulus st implementing the 19 52 years to of formageclauses, (meanand ±sd so on= (Partee, 25.7 ± 5.1) 1995)., were The visual load proposed computational associated model. to st components, The system was used to g recruited for the experiment. One of them was excluded compute the visual load score associated to (lexicalized) because she was outlier with respect to the group. None concepts according to Eq. 4 and 5, implementing the vi- referred to as the visual features φ· associated withofthe the subjects ‘eagle’ had a in theof same history psychiatric sentence). or neuro- Theloadconnection sual model in Eq. 2,betweenwith the system’s parameters logical disorders. All participants gave their written in- set to the aforementioned values. given concept. these two words formed consent before taking part to the experimental is usually represented by using labeled We have then built a dictionary by extracting it from procedure, which directed was approvededgesby(e.g., the ethical subject): commit- the Data analysis of all depen- collection tee of the University of Turin, in accordance with the a set of stimuli (illustrated hereafter) composed of sim- Declaration dencyof Helsinkirelations ( BMJ 1991;of 302:a 1194 sentence ). Par- forms a tree, rooted The participants’ performance in inthe the “naming by def- ple sentences describing a concept; next, we have man- ticipants were mainall naı̈veverb to the (see the parse experimental procedure tree inition”illustrated task was evaluated by recording, for each re- in Figure 1). sponse, the reaction time RT, in milliseconds, and the and to the aims of the study. ually annotated the visual features associated with each The set of Thestimuli dependency structure is relevant was devised by the multidisciplinary accuracy AC, inasour approach, the percentage of the correct answers. team of philosophers, neuropsychologists and computer Then, for each subject, both RT and AC were com- concept. The automatic annotation of visual properties because we assume scientists in the frame of a broader project aimed at in- that a reinforcement effect may ap- bined in the Inverse Efficiency Score (IES), by using associated with concepts is deferred to future work:vestigating it ply the both in role cases where of visual load in both concepts a word in- and the its IES formula dependent(s) = (RT · AC)/100.(or IES is a metrics com- can be addressed either through a classical Information volved in inferential governor(s))and referential aretasks. associated withmonly visual used to aggregate reaction time and accuracy and features. For ex- to summarize them. The mean IES value was used as Experimental design and procedure Participants Extraction approach building on statistics, or in a more ample, a phrase such as ‘with black were asked to perform an inferential task “Naming by dependent stripes’ variable and is entered expected in a 2 ⇥ 2 repeated mea- sures ANOVA with ‘target’ (two levels: ‘visual’ and ‘not- semantically-principled way. to evoke definition”. During the task mental a sentence images was pronounced in a more vivid way than visual’) and ‘definition’ (two levels: its el- ‘visual’ and ‘not- and the subjects were instructed to listen to the stim- Different weighting schemes w ~ = {α, β, γ} have been ements taken in ulus given in the headphones and to overtly name, as isolation (that is, visual’) ‘black’ and as within-subjects ‘stripes’), factors. Post hoc comparisons were performed by using the Duncan test. tested in order to determine the features’ contribution to and accurately moreover its visual as fast as possible, the target load word is cor-expected to further grow if The scores obtained by the participants in the vi- responding to the definition, using a microphone con- the visual load associated with a concept c, that results we add a coordinated term, as in nected to a response box. Auditory stimuli were pre- sual‘with yellow and load questionnaire were black analyzed by using paired T-tests, two tailed. Two comparisons were performed from computing sented through stripes’. the E-Prime Moreover, software, which thewasVL alsowould –recursively– grow if for visual and not-visual targets, and for visual and not- used to record data on accuracy and reaction times. X we added a governor term (like ‘fur Furthermore, at the end of the experimental session, visualwith yellow and black definitions. VL(c, w)~ = φi = α(φcol +φsha )+β φmot +γ φsiz .the(2) stripes’). subjects were administered We then introduced a questionnaire: they had aingThe computational model results were analyzed by us- parameter ξ to control to rate on a 1 7 Likert scale the intensity of the visual paired T-tests, two tailed. Two comparisons were i the contribution load they perceived as related to each target and to each of the aforementioned performed for features visual in casetargets and for vi- and not-visual sual and not-visual definitions. For the experimentation we set α to 1.35, β to 1.1 and the corresponding terms are linked in the parse tree by definition. The factorial design of the study included two within- Correlations between IES, computational model γ to .9: these assignments reflect the fact that color subjects and factors, a modifier/argument in which the visual load of bothrelation target (denoted as mod and visual load and arg questionnaire. We also explored the shape information is considered more important, inand thedefinitioninwasEquation manipulated.3). The resulting four ex- existence of correlations between IES, the visual load perimental conditions were as follows: questionnaire and the computational model output by computation of VL. ( using linear regressions. For both the IES values and VV Visual Target—Visual Definition (e.g., ‘The bird of ξ VL(ci ) if ∃ cj s.t.themod(c ) ∨ arg(c To the ends of combining the contribution of concepts i , cjscores, questionnaire i , cj ) for each item the we calculated prey with VL(c ) great wings =flying over the mountains is the mean of the 30 subjects’ responses. In a first model, we i in a sentence s to the overall VL score for P s, we adopted. . . eagle’); VL(c i ) otherwise. used the visual-load questionnaire scores as independent the following additive schema: VL(s) = c∈s VL(c).VNV Visual Target—Non-Visual Definition (e.g., The variable to predict the participants’ (3)performance (with hottest of the four elements of the ancients is . . . fire); the IESas dependent variable); in a second model, we The computation of the VL score also accounts for In the experimentation ξ was set usedtothe1.2. computational data as independent variable to the dependency structure of the input sentences. NVV The Non-Visual Target—Visual Definition (e.g., The predict the participants’ visual load evaluation (with the nose of Pinocchio stretched when he said a . . . lie); questionnaire scores as independent variable). syntactic structure of sentences is computed by the The stimuli in the dataset are pairs consisting of Turin University Parser (TUP) in the dependency for- a definition d and a target T (st = hd, T i), such as mat (Lesmo, 2007). Dependency formalisms represent z definition d }| target T { z }| { syntactic relations by connecting a dominant word, the The big carnivore with yellow and black stripes is the . . . tiger. | {z } head (e.g., the verb ‘fly’ in the sentence The eagle flies) stimulus st and a dominated word, the dependent (e.g., the noun The visual load associated to st components, given the 183 weighting scheme w, ~ is then computed as follows: prey with great wings flying over the mountains is the P . . . eagle’); VL(d, w) ~ = c∈d VL(c) (4) VL(T, w) ~ = VL(T ). (5) VNV Visual Target—Non-Visual Definition (e.g., The hottest of the four elements of the ancients is . . . fire); The whole pipeline from the input parsing to compu- NVV Non-Visual Target—Visual Definition (e.g., The tation of the VL for the considered stimulus has been nose of Pinocchio stretched when he told a . . . lie); implemented as a computer program; its main steps in- clude the parsing of the stimulus, the extraction of the NVNV Non-Visual Target—Non-Visual Definition (lexicalized) concepts by exploiting the output of the (e.g., The quality of people that easily solve difficult morphological analysis, and the tree traversal of the de- problems is said . . . intelligence). pendency structure resulting from the parsing step. The For each condition, there were 48 sentences, 192 sen- morphological analyzer has been preliminarily fed with tences overall. Each trial lasted about 30 minutes. The the whole set of stimuli, and its output has been anno- number of words (nouns and adjectives), their balancing tated with the visual features and stored into a dictio- across stimuli, and the (syntactic dependency) structure nary. At run time, the dictionary is accessed based on of the considered sentences were uniform within condi- morphological information, then used to retrieve the val- tions, so that the most relevant variables were controlled. ues of the features associated with the concepts in the The same set of stimuli used for the human experiment stimulus. The output obtained by the proposed model was given in input to the system implementing the com- has been compared with the results obtained in a behav- putational model. ioral experimentation as described below. Data analysis Experimentation The participants’ performance in the “Naming from def- Materials and Methods inition” task was evaluated by recording, for each re- Thirty healthy volunteers, native Italian speakers, (16 sponse, the reaction time RT, in milliseconds, and the females and 14 males), 19 − 52 years of age (mean accuracy AC, computed as the percentage of correct an- ±sd = 25.7 ± 5.1), were recruited for the experiment. swers. The answers were considered correct if the target None of the subjects had a history of psychiatric or neu- word was plausibly matched with the definition. Then, rological disorders. All participants gave their written for each subject, both RT and AC were combined in informed consent before participating in the experimen- the Inverse Efficiency Score (IES), by using the formula tal procedure, which was approved by the ethical com- IES = (RT/AC) · 100. IES is a metrics commonly used mittee of the University of Turin, in accordance with to aggregate reaction time and accuracy, and to summa- the Declaration of Helsinki (World Medical Association, rize them (Townsend & Ashby, 1978). The mean IES 1991). Participants were all naı̈ve to the experimental value was used as the dependent variable and entered procedure and to the aims of the study. in a 2 × 2 repeated measures ANOVA with ‘target’ (two levels: ‘visual’ and ‘non-visual’) and ‘definition’ (two lev- Experimental design and procedure Participants els: ‘visual’ and ‘non-visual’) as within-subjects factors. were asked to perform an inferential task “Naming from Post hoc comparisons were performed by using the Dun- definition”. During the task a sentence was pronounced can test. and the subjects were instructed to listen to the stim- The scores obtained by the participants in the visual ulus given in the headphones and to overtly name, as load questionnaire were analyzed by using unpaired T- accurately and as fast as possible, the target word cor- tests, two tailed. Two comparisons were performed for responding to the definition, using a microphone con- visual and non-visual targets, and for visual and non- nected to a response box. Auditory stimuli were pre- visual definitions. The computational model results were sented through the E-Prime software, which was also analyzed by using unpaired T-tests, two tailed. Two used to record data on accuracy and reaction times. Fur- comparisons were performed for visual and non-visual thermore, at the end of the experimental session, the targets and for visual and non-visual definitions. subjects were administered a questionnaire: they had to Correlations between IES, computational model rate on a 1 − 7 Likert scale the intensity of the visual and visual load questionnaire. We also explored the load they perceived as related to each target and to each existence of correlations between IES, the visual load definition. questionnaire and the computational model output by The factorial design of the study included two within- using linear regressions. For both the IES values and subjects factors, in which the visual load of both target the questionnaire scores, we computed for each item the and definition was manipulated. The resulting four ex- mean of the 30 subjects’ responses. In a first model, we perimental conditions were as follows: used the visual load questionnaire scores as independent VV Visual Target—Visual Definition (e.g., ‘The bird of variable to predict the participants’ performance (with 184 Figure 3: The graph shows, for each condition, the mean Figure 4: Linear regression “Inverse Efficiency Score IES with standard error. (IES) by Visual Load Questionnaire”. The mean score in the Visual Load Questionnaire, reported on 1 − 7 Lik- ert scale, was used as an independent variable to predict IESas the dependent variable); in a second model, we the subjects’ performance, as quantified by the IES. used the computational data as independent variable to predict the participants’ visual load evaluation (with the questionnaire scores as the independent variable). In order to verify the consistency of the correlation effects, the general agreement of the subjects. By compar- we also performed linear regressions where we controlled ing the computational model scores for visual (mean for three covariate variables: the number of words, their ±sd = 4.0 ± 2.4) and non-visual (mean ±sd = 2.9 ± 2.0) balancing across stimuli and the syntactic dependency definitions we found a significant difference (p < 0.001; structure. unpaired T-test, two tailed). By comparing the compu- tational model scores for visual (mean ±sd = 2.53±1.29) Results and non-visual (mean ±sd = 0.26 ± 0.64) targets we The ANOVA showed a significant effect of the within- found a significant difference (p < 0.001). This suggest subject factors “target” (F1,29 = 14.4; p < 0.001), sug- that we were able to computationally model the visual- gesting that the IES values were significantly lower in load of both targets and descriptions, describing it as a the visual than in the non-visual targets, and “defini- linear combination of different low-level features: color, tion” (F1,29 = 32.78; p < 0.001), suggesting that the IES shape, motion and dimension. values were significantly lower in the visual than in the Results correlations. By using the visual load ques- non-visual definitions. This means that, for both the tar- tionnaire scores as independent variable we were able get and the definition, the participants’ performance was to significantly (R2 = 0.4; p < 0.001) predict the partici- significantly faster and more accurate in the visual than pants’ performance (that is, their IES values), illustrated in the non-visual condition. We also found a significant in Figure 4. This means that the higher the participants’ interaction “target*definition” (F1,29 = 7.54; p = 0.01). visual score for a definition, the better the participants’ Based on the Duncan post hoc comparison, we verified performance in giving the correct response (or, alterna- that this interaction was explained by the effect of the tively, the lower the IES value). visual definitions of the visual targets (VV condition), By using the computational data as independent vari- in which the participants’ performance was significantly able we were able to significantly (R2 = 0.44; p < 0.001) faster and more accurate than in all the other conditions predict the participants’ visual load evaluation (their (VNV; NVV; NVNV), as shown in Figure 3. questionnaire scores), as shown in Figure 5. This means By comparing the questionnaire scores for visual that a correlation exists between the computational pre- (mean ±sd = 5.69 ± 0.55) and non-visual (mean ±sd = diction about the visual load of the definitions and the 4.73 ± 0.71) definitions we found a significant difference participants visual load evaluation: the higher is the (p < 0.001; unpaired T-test, two tailed). By compar- computational model result, the higher is the partici- ing the questionnaire scores for visual (mean ±sd = pants’ visual score in the questionnaire. We also found 6.32 ± 0.4) and non-visual (mean ±sd = 4.23 ± 0.9) that these effects were still significant in the regres- targets we found a significant difference (p < 0.001). sion models where the number of words, their balancing This suggest that our arbitrary categorization of each across stimuli and the syntactic dependency structure sentences within the four conditions was supported by was controlled for. 185 Coltheart, M. (1980). Deep dyslexia: A right hemisphere hypothesis. Deep dyslexia, 326–380. Cortese, M. J., & Khanna, M. M. (2007). Age of acquisi- tion predicts naming and lexical-decision performance above and beyond 22 other predictor variables: An analysis of 2,342 words. Q J Exp Psychol A, 60 (8), 1072–1082. Just, M. A., Newman, S. D., Keller, T. A., McEleney, A., & Carpenter, P. A. (2004). Imagery in sentence comprehension: an fmri study. Neuroimage, 21 (1), 112–124. Kemmerer, D. (2010). Words and the Mind: How words capture human experience. In B. Malt & P. Wolff (Eds.), (chap. How Words Capture Visual Experience - The Perspective from Cognitive Neuroscience). Ox- ford Scholarship Online. Figure 5: Linear regression “Visual Load Questionnaire Kiran, S., & Tuchtenhagen, J. (2005). Imageability ef- by Computational Model”. The mean value obtained fects in normal spanish–english bilingual adults and in by the Computational model was used as an indepen- aphasia: Evidence from naming to definition and se- dent variable to predict the subjects’ scores on the Visual mantic priming tasks. Aphasiology, 19 (3-5), 315–327. Load Questionnaire, reported on 1 − 7 Likert scale. Lesmo, L. (2007, June). The Rule-Based Parser of the NLP Group of the University of Torino. Intelligenza Conclusions Artificiale, 2 (4), 46–47. In the next future we plan to extend the representation Lieto, A., Minieri, A., Piana, A., & Radicioni, D. P. of the conceptual information by grounding the concep- (2015). A knowledge-based system for prototypical tual representation on a hybrid representation composed reasoning. Connection Science, 27 (2), 137–152. of conceptual spaces and ontologies (Lieto, Minieri, Pi- Lieto, A., Radicioni, D. P., & Rho, V. (2015, July). ana, & Radicioni, 2015; Lieto, Radicioni, & Rho, 2015). A Common-Sense Conceptual Categorization System Additionally, we plan to integrate the current model in Integrating Heterogeneous Proxytypes and the Dual the context of cognitive architectures. Process of Reasoning. In Proc. of IJCAI 2015. Buenos Aires, Argentina: AAAI Press. Acknowledgments Marconi, D. (1997). Lexical competence. MIT Press. This work has been partly supported by the Project The Marconi, D., Manenti, R., Catricala, E., Della Rosa, Role of the Visual Imagery in Lexical Processing, grant P. A., Siri, S., & Cappa, S. F. (2013). The neural TO-call03-2012-0046, funded by Università degli Studi substrates of inferential and referential semantic pro- di Torino and Compagnia di San Paolo. cessing. Cortex , 49 (8), 2055–2066. Mellet, E., Tzourio, N., Denis, M., & Mazoyer, B. (1998). References Cortical anatomy of mental imagery of concrete nouns Bergen, B. K., Lindsay, S., Matlock, T., & Narayanan, S. based on their dictionary definition. Neuroreport, (2007). Spatial and linguistic aspects of visual imagery 9 (5), 803–808. in sentence comprehension. Cognitive Sci , 31 (5), 733– Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Con- 764. creteness, imagery, and meaningfulness values for 925 Bergsma, S., & Goebel, R. (2011). Using visual infor- nouns. Journal of experimental psychology, 76 , 1. mation to predict lexical preference. In RANLP (pp. Silberer, C., Ferrari, V., & Lapata, M. (2013). Models 399–405). of semantic representation with visual attributes. In Bookheimer, S., Zeffiro, T., Blaxton, T., Gaillard, W., Acl 2013 proceedings (pp. 572–582). Malow, B., & Theodore, W. (1998). Regional cerebral Townsend, J. T., & Ashby, F. G. (1978). Methods of blood flow during auditory responsive naming: evi- modeling capacity in simple processing systems. Cog- dence for cross-modality neural activation. Neurore- nitive theory, 3 , 200–239. port, 9 (10), 2409–2413. Tversky, A., & Kahneman, D. (1973). Availability: A Bruni, E., Tran, N.-K., & Baroni, M. (2014). Multimodal heuristic for judging frequency and probability. Cog- distributional semantics. J. Artif. Intell. Res., 49 , 1– nitive psychology, 5 (2), 207–232. 47. World Medical Association. (1991). Code of Ethics: Cipolotti, L., & Warrington, E. K. (1995). Semantic Declaration of Helsinki. BMJ , 302 , 1194. memory and reading abilities: A case report. J INT NEUROPSYCH SOC , 1 (01), 104–110. 186