=Paper= {{Paper |id=Vol-1419/paper0026 |storemode=property |title=On Mental Imagery in Lexical Processing: Computational Modeling of the Visual Load Associated to Concepts |pdfUrl=https://ceur-ws.org/Vol-1419/paper0026.pdf |volume=Vol-1419 |dblpUrl=https://dblp.org/rec/conf/eapcogsci/RadicioniGCBLSM15 }} ==On Mental Imagery in Lexical Processing: Computational Modeling of the Visual Load Associated to Concepts== https://ceur-ws.org/Vol-1419/paper0026.pdf
             On Mental Imagery in Lexical Processing:
  Computational Modeling of the Visual Load Associated to Concepts
                Daniele P. Radicioniχ , Francesca Garbariniψ , Fabrizio Calzavariniφ
                Monica Biggioψ , Antonio Lietoχ , Katiuscia Saccoψ , Diego Marconiφ

                                              (FirstName.Surname@unito.it)
                          χ
                              Department of Computer Science, Turin University – Turin, Italy
                                φ
                                  Department of Philosophy, Turin University – Turin, Italy
                               ψ
                                  Department of Psychology, Turin University – Turin, Italy


                        Abstract                                    Intuitively, words like ‘dog’ or ‘apple’ refer to concrete
  This paper investigates the notion of visual load, an es-         entities and are associated with a high visual load, im-
  timate for a lexical item’s efficacy in activating mental         plying that these terms immediately generate a mental
  images associated with the concept it refers to. We elab-         image. Conversely, words like ‘algebra’ or ‘idempotence’
  orate on the centrality of this notion which is deeply
  and variously connected to lexical processing. A com-             are hardly accompanied by the production of vivid im-
  putational model of the visual load is introduced that            ages. Although the construct of visual load is closely
  builds on few low level features and on the dependency            related to that of concreteness, concreteness and visual
  structure of sentences. The system implementing the
  proposed model has been experimentally assessed and               load can clearly dissociate, in that i) some words have
  shown to reasonably approximate human response.                   been rated high in visual load but low in concreteness,
  Keywords: Visual imagery; Computational modeling;                 such as some concrete nouns that have been rated low
  Natural Language Processing.                                      in visual load (Paivio, Yuille, & Madigan, 1968); and,
                                                                    conversely, ii) abstract words such as ‘bisection’ are as-
                     Introduction                                   sociated with a high visual load.
Ordinary experience suggests that lexical competence,                  The notion of visual load is relevant to many disci-
i.e. the ability to use words, includes both the abil-              plines, in that it contributes to shed light on a wide vari-
ity to relate words to the external world as accessed               ety of cognitive and linguistic tasks and helps explaining
through perception (referential tasks) and the ability to           a plethora of phenomena observed in both impaired and
relate words to other words in inferential tasks of sev-            normal subjects. In the next Section we survey a mul-
eral kinds (Marconi, 1997). There is evidence from both             tidisciplinary literature showing how mental imagery af-
traditional neuropsychology and more recent neuroimag-              fects memory, learning and comprehension; we consider
ing research that the two aspects of lexical competence             how imagery is characterized at the neural level; and we
may be implemented by partly different brain processes.             show how visual information is exploited in state-of-the-
However, some very recent experiments appear to show                art Natural Language Processing research. In the subse-
that typically visual areas are also engaged by purely              quent Section we illustrate the proposed computational
inferential tasks, not involving visual perception of ob-           model for providing concepts with their visual load char-
jects or pictures (Marconi et al., 2013). The present work          acterization. We then describe the experiments designed
can be considered as a preliminary investigation aimed              to assess the model through an implemented system, re-
at verifying this main hypothesis, by investigating the             port and discuss the obtained results. Conclusion will
following issues: i) to what extent the visual load asso-           summarize the work done and provide an outlook on fu-
ciated with concepts can be assessed, and which sort of             ture work.
agreement exists among humans about the visual load
associated to concepts; ii) which features underlie the                                Related Work
visual load associated to concepts; and iii) whether the
notion of visual load can be grasped and encapsulated               As regards linguistic competence, it is generally ac-
into a computational model.                                         cepted that visual load facilitates cognitive perfor-
   As it is widely acknowledged, one main visual cor-               mance (Bergen, Lindsay, Matlock, & Narayanan, 2007),
relate of language is imageability, that is the property            leading to faster lexical decisions than not-visually
of a particular word or sentence to produce an experi-              loaded concepts (Cortese & Khanna, 2007). For ex-
ence of imagery: in the following, we focus on visual im-           ample, nouns with high visual load ratings are remem-
agery (thus disregarding acoustic, olfactory and tactile            bered better than those with low visual load ratings in
imagery), which we denote as visual load. The visual                long-term memory tests (Paivio et al., 1968). More-
load is related to the easiness of producing visual im-             over, visually loaded terms are easier to recognize for
agery when an external linguistic stimulus is processed.            subjects with deep dyslexia, and individuals respond


                                                              181
 tings in
 Yet, vi-                                                                                                   lexicalized concepts therein. We propose a model that
subjects                                                                                                    relies on a simple hypothesis additively combining few
  quickly                                                                                                   low-level features, refined by exploiting syntactic infor-
 visually                                                                                                   mation.
). Neu-                                                                                                        The notion of visual load, in fact, is used by and large
                L’ animale che mangia banane su un albero è la scimmia
 y apha-       The animal that eats bananas on a tree is the monkey                                         in literature with different meanings, thus giving rise to
ms that                                                                                                     different levels of ambiguity. We define visual load as the
Hyde, &       Figure1:1:The
                         Thedependency
                             (simplified)tree
                                           dependency  tree correspond-                                     concept representing a direct indicator (a numeric value)
             Figure                           corresponding to a stim-
 e oppo-      ing to the sentence ‘The animal that eats bananas on a                                        of the efficacy for a lexical item to activate mental images
             ulus.
 Sa↵ran,      tree is the Monkey’.                                                                          associated to the concept referred to by the lexical item.
Warring-                                                                                                    We expect that visual load also represents an indirect
             of information processing—, including a deep represen-                                         measure of the probability of activation of brain areas
 rds and     tation,
               more which
                        quicklyis anda semantic
                                            accurately networkwhenstoredmakingin long-term
                                                                                    judgments               deputed to the visual processing.
activity.    memory       that contains
               about visually                    a hierarchical
                                     loaded sentences          (Kiranrepresentation
                                                                          & Tuchtenhagen,      of              We conjecture that the visual load is primarily as-
  left in-   image
               2005).descriptions;          the spatial research
                             Neuropsychological               representation
                                                                          has shown intended  that          sociated to concepts, although lexical phenomena like
  greater    for  collecting
               many      aphasic  image    components
                                      patients       perform   along    withwith
                                                                   better      theirlinguistic
                                                                                       spatial              terms availability (implying that the most frequently
  image-     features;
               items that  the more
                                 visualeasily
                                           representation
                                                   elicit visual  that   builds (Coltheart,
                                                                      imagery      on an oc-                used terms are easier to recognize than those seen less
98; Mel-     cupancy      array, storing
               1980), although                  information
                                        the opposite        pattern  such
                                                                        hasasalso
                                                                                shape,
                                                                                     beensize,doc-          often (Tversky & Kahneman, 1973)) can also affect it.
  seman-     etc..
               umented (Cipolotti & Warrington, 1995).                                                         Based on the work by Kemmerer (2010) we explore the
emporal           Visual imageability of concepts evoked by words and                                       hypothesis that a limited number of primitive elements
ble sen-       sentences is commonlyModel           known to affect brain activity.                         can be used to characterize and evaluate the visual load
             The
               Whilemodeling       phase has
                         visuosemantic             been characterized
                                               processing        regions, such by the      needin-
                                                                                     as left                associated to concepts. Namely, Kemmerer’s Simulation
r, McE-
             offerior
                 defining
                       temporalthe notion
                                      gyrus and of visual
                                                      fusiform load    in arevealed
                                                                    gyrus    uniformgreater and             Framework allows to grasp information about a wide va-
 ounting
             computationally
               involvement during      tractable       manner. Such
                                             the comprehension                  concept,
                                                                          of highly            in
                                                                                          image-            riety of concepts and properties used to denote objects,
g di↵er-
             fact,
               able is   usedand
                      words        by sentences
                                       and large(Bookheimer
                                                        in literatureetwith          di↵erent
                                                                             al., 1998;      Mel-           events and spatial relations. Three main visual semantic
on, etc.)
             meanings,
               let, Tzourio, thus Denis,
                                     giving &   raise    to di↵erent
                                                     Mazoyer,             levels
                                                                     1998),    otherof ambi-
                                                                                         seman-             components have been individuated that, in our opin-
 some of
             guity.   We define
               tic brain      regions visual
                                           (i.e.,load   as the concept
                                                     superior      and middle representing
                                                                                      temporal              ion, are also suitable to be used as different dimensions
 percep-
             acortex)
                direct indicator         (a numeric
                           are selectively                value)byoflow-imageable
                                                   activated            the efficacy forsen-    a           along which to characterize the concept of visual load.
             lexical
               tencesitem(Melletto activate      mental
                                     et al., 1998;           images
                                                         Just,          associated
                                                                  Newman,       Keller, to McE-
                                                                                             the            They are: color properties, shape properties, and mo-
 linguis-    concept
               leney, &referred
                            Carpenter, to by     the lexical
                                             2004).                item. Consequently,
                                                        Furthermore,        a growing num-                  tion properties. The perception of these properties is
 ge Pro-     weberexpect
                    of studiesthatsuggests
                                      visual load  thatalso
                                                          words  represents
                                                                     encodingan       indirectvi-
                                                                                  different                 expected to occur in a immediate way, such that “dur-
egoriza-     measure      of the probability           of activation      of a brain       area
               sual properties        (such as color,          shape, motion,         etc.)     are         ing our ordinary observation of the world, these three
 the tra-    deputed      to  the   visual    processing.
               processed in cortical areas that overlap with some of the                                    attributes of objects are tightly bound together in uni-
om text          We conjecture        that visual       loadvisual
                                                                is situated    at theof  inter-
               areas   that are activated           during              perception          those           fied conscious images” (Kemmerer, 2010). We added a
features     section    of  lexical   and    semantic       spaces,     mostly    associated
               properties (Kemmerer, 2010).                                                                 further perceptual component related to size. More pre-
14). Fi-     to the   semantic level.         That features
                                                       is, the visual     load istoprimar-
                  Investigating        the visual                    associated          linguis-           cisely, our assumption is that information about the size
 opment      ily  associated       to a   concept,     although       lexical   phenomena
               tic input can be useful to build semantic resources                              de-         of a given concept can also contribute, as an adjoint fac-
 used to     like  terms
ounding
               signed    toavailability
                             deal with Natural(implying       that theProcessing
                                                          Language        most frequently (NLP)             tor and not as a primitive one, to the computation of a
             used    terms such
               problems,       are easier      to recognizeverbs
                                      as individuating              thansubcategorization
                                                                          those seen less
 er, Fer-                                                                                                   visual load value for the considered concept.
             often   (Tversky       &   Kahneman,
               frames (Bergsma & Goebel, 2011), enriching  1973))     can  also a↵ect       it.
                                                                                     the tradi-                In this setting, we represent each concept/property as
               tional extraction of distributional semantics from the
                 Based    on   the  work     by  Kemmerer         (2010)   we    explore      text          a boolean-valued vector of four elements, each encoding
ery de-      hypothesis       that a limited
               with a multimodal           approach, number       of primitive
                                                           integrating     textual  elements
                                                                                        features            the following information: lemma, morphological infor-
t recon-     can
               withbevisual
                        used to     characterize
                                 ones   (Bruni, Tran,  and &  evaluate
                                                                  Baroni,the     visualFinally,
                                                                             2014).        load
                                                                                                            mation on POS (part of speech), and then whether the
ual and      associated      to   concepts.      Namely,       Kemmerer’s
               visual attributes are at the base of the development of           Simulation
                                                                                                            considered concept/property conveys information about
ed with      Framework
               annotated allowscorporatoand  grasp    information
                                                   resources             about
                                                                  that can    beausedwidetova-  ex-         color, shape, motion and size.1 For example, this piece
ds their     riety   of  concepts      and    properties       used
               tend text-based distributional semantics by grounding   to denote     objects,
                                                                                                            of information
 scheme      events    and spatialonrelations.
               word meanings                            Three main
                                           visual features,               visual
                                                                    as well         semantic
                                                                              (Silberer,      Fer-                                table,Noun,1,1,0,1                 (1)
 ual and     components          have
               rari, & Lapata, 2013).    been     individuated        that,   in   our    opin-
ct coun-     ion, are also suitable to be used as di↵erent dimensions                                       can be used to indicate that the concept table (associated
n: while     along which to characterize            Modelthe concept of visual load.                        with a Noun, and differing, e.g., from that associated
 objects     They are: color properties, shape properties, and mo-                                          with a Verb) conveys information about color, shape and
               Although much work has been invested in different ar-
  is acti-   tion properties. The perception of these properties is                                         size, but not about motion. In the following, these are
               eas for investigating imageability in general and visual
  Unger-     expected to occur in a immediate way, such that “dur-                                              1
               imagery in particular, at the best of our knowledge no                                             We adopt here a simplification, since we are assuming
e stages     ing our ordinary observation of the world, these three                                         that the pair hlemma, POSi is sufficient to identify a con-
               attempt has been carried out to formally characterize
wn kind      attributes of objects are tightly bound together in uni-                                       cept/property, and that in general we can access items by
               visual load, and no computational model has been de-                                         disregarding the word sense disambiguation problem, which
               vised to compute how visually loaded are sentences and                                       is known as an open problem in the field of NLP.


                                                                                                      182
                                               of information                                                               tactic structure of sentences is computed through the
                                                                                                                          the following information: lemma, morphological infor-                         load of the concepts denoted       P by the lexical item
                                                                                                                            Turin University Parser (TUP) in the dependency for-
                                                                       finger,Noun,1,1,0,1                         (1)    mation on POS (part of speech), and then whether the                           sentence, that is VL(s) = c2s VL(c).
                                                                                                                            mat (Lesmo, 2007). Dependency formalisms represent
                                                                                                                          considered concept/property conveys information about                              The calculation of the VL score also account
                                                                                                                            syntactic relations by connecting a dominant word, the
                                     can be used to indicate that the concept finger (associ-                             color, shape, motion and size.1 For example, this piece                        dependency structure of the input sentences.
                                                                                                                            head (e.g., the verb ‘fly’ in the sentence The eagle flies)
                                     ated to a Noun, and di↵ering, e.g., from that associated                             ofand
                                                                                                                             information
                                                                                                                                 a dominated word, the dependent (e.g., the noun                         tactic   structure of sentences is computed thr
                                     to a Verb) conveys information about color, shape and                   ‘eagle’ in the same sentence). The connection between                                       Turin University Parser (TUP) in the depend
                                     size, but not about motion. In the following these are re-                                        finger,Noun,1,1,0,1                                      (1)      mat (Lesmo, 2007). Dependency formalisms
                                                                                                             these two words is usually represented by using labeled
                                     ferred to as the visual features · associated to the given              directed     edges   (e.g.,   subject):       the concept
                                                                                                                                                                 collection    of all(associ-
                                                                                                                                                                                        depen-           syntactic relations by connecting a dominant w
                                     concept.                                                              can    be  used     to indicate       that    the                finger                       head (e.g., the verb ‘fly’ in the sentence The ea
                                                                                                             dency
                                                                                                           ated    to relations
                                                                                                                       a  Noun,     of a di↵ering,
                                                                                                                                   and     sentence forms  e.g.,    a tree,
                                                                                                                                                                   from   thatrooted      in the
                                                                                                                                                                                   associated
                                        We have then built a dictionary by extracting it from a              main     verb conveys
                                                                                                                              (see theinformation
                                                                                                                                           parse tree illustrated             in shape
                                                                                                                                                                                   Figureand     1).     and a dominated word, the dependent (e.g.,
                           definitionset                            target T
                                      d of stimuli (illustrated hereafter)               Morphological     to information
                                                                                                                a  Verb)                                      about      color,         Dictionary‘eagle’  annotated
     z                          }|                               { z }| { composed by simple size,           The dependency structure is relevant in our approach,
                                                                                                                   but   not   about    motion.       In   the   following     these     are    re-
                                                                                                                                                                                                                   in the same sentence). The connection




                                                                                                              {
     The big carnivore with yellowsentences
                                      and blackdescribing
                                                   stripes isathe
                                                                concept;    and manually annotated
                                                                   . . . tiger                               because      we assume that some sort of reinforcement                     withef-features  these two words is usually represented by usin
     |                               {z                                      }
                                     the visual features associated to each concept. The as-         hlemma,
                                                                                                           ferredPOSito  as  the  visual     features        ·  associated      to  the    given         directed edges (e.g., subject): the collection of a
                                 stimulus st                                                                 fect may apply in cases where both a word and its de-
                                     signment of features scores has been conducted by the                 concept.
                                                                                                     hlemma,     POSi (or governor(s)) are associated to some visual                                     dency relations of a sentence forms a tree, root
                                                                                                             pendent(s)
                                     authors on a purely introspective basis.                        hlemma,   We   have then built a dictionary by extracting it from a
                                                                                                                 POSi                                                                                    main verb (see the parse tree illustrated in F
                                                                                                             feature. For example, a phrase like ‘with lemma,POS,            black stripes’                , sha  , mot , sizstructure is relevant in our a
                                        Di↵erent weighting schemes w        ~ = {↵, , } have been          set of stimuli (illustrated hereafter) composed                          by simple colThe            dependency
                                                                                                     hlemma,     POSi to evoke mental images in a more vivid way
                                                                                                             is expected
                                                                                                                                                                             lemma,POS,                    , sha , we      , siz that some sort of reinforce
                                     tested in order to set features contribution to the visual sentences
                                     load associated to a concept c, that results from com- the
                                                                                                                    ..... describing a concept; and manually
                                                                                                             than its elements taken in isolation (that is, ‘black’ and
                                                                                                                  visual features associated to each concept. The as- colfect
                                                                                                             ‘stripes’), and its visual load is expected lemma,POS,
                                                                                                                                                                                   annotated colbecause
                                                                                                                                                                               to still grow               , sha
                                                                                                                                                                                                                      motassume
                                                                                                                                                                                                                  , mot
                                                                                                                                                                                                                may   apply, siz
                                                                                                                                                                                                                               in cases where both a word an
                                     puting                                                                signment
                                                                                                             if we addofa features
                                                                                                           authors      on  a
                                                                                                                                coordinated
                                                                                                                                purely
                                                                                                                                            scores     has like
                                                                                                                                                   term,
                                                                                                                                          introspective
                                                                                                                                                               beeninconducted
                                                                                                                                                                 basis.
                                                                                                                                                                                  .....
                                                                                                                                                                         ‘with yellow   by and the       pendent(s) (or governor(s)) are associated to som
                                                    X                                                        black stripes’. Yet, the VL would –recursively– grow if                                     feature. For example, a phrase like ‘with black
                                       VL(c, w)
                                              ~ =         i = ↵( col + sha )+       mot +      siz . (2)     weDi↵erent
                                                                                                                  added a weighting
                                                                                                                              governor term  schemes        w
                                                                                                                                                            ~ =with
                                                                                                                                                    (like ‘fur       {↵, yellow
                                                                                                                                                                           , } have         been
                                                                                                                                                                                    and black            is expected to evoke mental images in a more v
                                       TUP parser     i                                                    tested    in  order    to  set   features       contribution
                                                                                                             stripes’). We then introduced a parameter ⇠ to control           to   the    visual         than its elements taken in isolation (that is, ‘b
                                                                                                           load    associated      to   a   concept       c,   that
                                                                                                             the contribution of the aforementioned features in case   results    from      com-         ‘stripes’), and its visual load is expected to s
                                     For the experimentation we set ↵ to 1.35, to 1.1 and
                                                                                                           puting
                                                                                                             the corresponding terms are linked in the parse tree by                                     if we add a coordinated term, like in ‘with ye
                                        to .9.
                                        To the ends of combining the contribution of concepts                                 X
                                                                                                             a modifier/argument relation (denoted as mod and arg                                        black stripes’. Yet, the VL would –recursively
                                                                                                             VL(c,
                                                                                                             in      w)
                                                                                                                      ~ = 3). i = ↵( col + sha )+ mot + siz . (2)
                                                                                                                 Equation                                                                                we added a governor term (like ‘fur with yellow a
                                     in a sentence sDependency             VL score for s, we adapt
                                                        to the overallstructure
                                                                                                                                i
                                                                             2
                                     the principle of compositionality to the visual load do-                              (                                                                             stripes’). We then introduced a parameter ⇠ t
                                     main. In other words, we assume that the visual load of For                              ⇠ VL(ci ) if 9 cj s.t. mod(ci , cj ) _ arg(ci , cj ) the contribution of the aforementioned feature
                                                                                                             VL(c the  experimentation
                                                                                                                    i) =                          we set ↵ to 1.35, to 1.1 and
                                     a sentence can be computed by starting from the visual                   to .9.          VL(ci )         otherwise.                                                 the corresponding terms are linked in the pars
                                         1                                                                     To the ends of combining the contribution of concepts                            (3)      a modifier/argument relation (denoted as mod
                                           We adopt here a simplification, since we are assuming             In the experimentation ⇠ was set to 1.2.                                                    in Equation 3).
                                     that the pair hlemma, POSi is sufficient to identify a con- in a sentence s to the overall VL score for s, we adapt
                                     cept/property, and that in general we can access     weighting
                                                                                             items by scheme        w,
                                                                                                                    ~ is then
                                                                                                           the principle           computed as follows:
                                                                                                                                of compositionality           2
                                                                                                                                                                 to the visual load    NVNV    do- Non-Visual ( Target—Non-Visual Definition
                                     disregarding the word sense disambiguation problem, which main.            TheInstimuli      Pin thewedataset
                                                                                                                         other words,              assumeare    thatpairs     consisting
                                                                                                                                                                        the visual      load(e.g.,of The quality of⇠people
                                                                                                                                                                                                  of                        V L(ci )thatif 9   cj s.t.
                                                                                                                                                                                                                                            easily     mod(c
                                                                                                                                                                                                                                                     solve    i , cj ) _
                                                                                                                                                                                                                                                           difficult
                                     is actually an open problem in the field of Natural Language             VL(d,                                                                                      V L(ci ) =
                                     Processing (Vidhu Bhala & Abirami, 2014).                             aasentence
                                                                                                                       w)
                                                                                                                        ~ = d and
                                                                                                                 definition can be c2d     aVL(c)
                                                                                                                                      computed                 (st = hd,
                                                                                                                                              targetbyT starting           (4)
                                                                                                                                                                          fromT i),thesuch  problems
                                                                                                                                                                                          visual  as       is said . . . Vintelligence).
                                                                                                                                                                                                                            L(ci )      otherwise.
                                                                                                                                             definition  d                                target   T
                                         2
                                           This principle states that the meaning of an expression            VL(T,
                                                                                                                z      w)
                                                                                                                        ~ =           VL(T ). }|                           (5)         { z }| {
                                                                                                               1
                                     is a function of the meanings of its parts and of the way they              We big
                                                                                                                The    adopt    here awith
                                                                                                                           carnivore       simplification,
                                                                                                                                               yellow and black   sincestripes
                                                                                                                                                                          we are     assuming
                                                                                                                                                                                 is the  . . . tiger.    In the experimentation
                                                                                                                | the pair hlemma, POSi is{zsufficient to identifyFor                           each} condition,      there were 48 ⇠sentences,
                                                                                                                                                                                                                                            was set tofor1.2.overall
                                                                                              sentence that
                                     are syntactically combined: to get the meaning of a Aggiungere           figura     e  descrizione         di   alto
                                                                                                                                                   stimulus  livello
                                                                                                                                                             st          della            a con-
                                     we combine words to form phrases, then we combine phrases cept/property, and that in general we can access items                                  192 sentences.
                                                                                                                                                                                                 by            Each trial lasted about 30 minutes. The
                                     to form clauses, and so on (Partee, 1995).           pipeline.        disregarding
                                                                                                             The visual the   loadword               System
                                                                                                                                           sense disambiguation
                                                                                                                                     associated                       implementing
                                                                                                                                                       to st components,   problem, given  which
                                                                                                                                                                                       number       the
                                                                                                                                                                                                the of words The (nouns
                                                                                                                                                                                                                    stimuli and
                                                                                                                                                                                                                              in the    dataset and
                                                                                                                                                                                                                                   adjectives)       are pairs
                                                                                                                                                                                                                                                         the (syn-cons
                                                                                                           is actually an open problem in            computational
                                                                                                                                                        the field of Naturalmodel    Languageof VL       a definition      d andofathetarget             = hd, T i),
                                                                                                                                                                                                                                                  T (st sentences
                                                                                                           Processing      (Vidhu Bhala & Abirami, 2014).                              tactic dependency)            structure             considered
                                                                                                               2
                                                                                                                  Experimentation                                                      were homogeneous     z
                                                                                                                                                                                                                                     definition d
                                                                                                                                                                                                                     within conditions.   }|
                                                                                                                 This principle states that the meaning of an expression
                                                                                          Materials isand           Methods
                                                                                                              a function     of the meanings of its parts and of the wayThe                   they same The      big carnivore with yellow and black stripes is t
                                                                                                                                                                                                            |set of stimuli used for the        {z human experi-
                                                                                                           are syntactically combined: to get the meaning of a sentence
            Figure 2: The pipeline to compute the VL Forty-five                            score according
                                                                                                         healthy    volunteers
                                                                                                           we combine         to the
                                                                                                                            words   (23
                                                                                                                                     to form  proposed
                                                                                                                                           females
                                                                                                                                                 phrases,andthen        computational
                                                                                                                                                                 22 males),
                                                                                                                                                                       we combine ment  phrases    was given model.
                                                                                                                                                                                                                in input to the system     stimulus st
                                                                                                                                                                                                                                                implementing the
                                                                                          19 52 years      to of
                                                                                                               formageclauses,
                                                                                                                          (meanand  ±sd so on= (Partee,
                                                                                                                                                  25.7 ± 5.1)  1995)., were                              The visual load
                                                                                                                                                                                       proposed computational                   associated
                                                                                                                                                                                                                            model.             to st components,
                                                                                                                                                                                                                                      The system       was used to g
                                                                                          recruited for the experiment. One of them was excluded                                       compute the visual load score associated to (lexicalized)
                                                                                          because she was outlier with respect to the group. None                                      concepts according to Eq. 4 and 5, implementing the vi-
referred to as the visual features φ· associated withofthe                                    the subjects ‘eagle’
                                                                                                                 had a in         theof same
                                                                                                                            history          psychiatric    sentence).
                                                                                                                                                                  or neuro-            Theloadconnection
                                                                                                                                                                                       sual             model in Eq. 2,betweenwith the system’s parameters
                                                                                          logical disorders. All participants gave their written in-                                   set to the aforementioned values.
given concept.                                                                                             these         two       words
                                                                                          formed consent before taking part to the experimental     is    usually            represented                    by    using        labeled
   We have then built a dictionary by extracting it from                                  procedure, which directed was approvededgesby(e.g.,   the ethical subject):
                                                                                                                                                                    commit- the        Data          analysis of all depen-
                                                                                                                                                                                               collection
                                                                                          tee of the University of Turin, in accordance with the
a set of stimuli (illustrated hereafter) composed of sim-                                 Declaration dencyof Helsinkirelations
                                                                                                                             ( BMJ 1991;of         302:a 1194
                                                                                                                                                            sentence ). Par- forms                  a tree, rooted
                                                                                                                                                                                       The participants’            performance  in inthe the “naming by def-
ple sentences describing a concept; next, we have man-                                    ticipants were   mainall naı̈veverb to the (see         the parse
                                                                                                                                         experimental            procedure tree inition”illustrated    task was evaluated by recording, for each re-
                                                                                                                                                                                                                  in    Figure         1).
                                                                                                                                                                                       sponse, the reaction time RT, in milliseconds, and the
                                                                                          and to the aims of the study.
ually annotated the visual features associated with each                                     The set of Thestimuli dependency                        structure is relevant
                                                                                                                      was devised by the multidisciplinary                             accuracy AC,       inasour        approach,
                                                                                                                                                                                                                  the percentage      of the correct answers.
                                                                                          team of philosophers,           neuropsychologists              and     computer             Then, for each subject, both RT and AC were com-
concept. The automatic annotation of visual properties                                                     because             we     assume
                                                                                          scientists in the frame of a broader project aimed at in-
                                                                                                                                                            that        a   reinforcement                       effect       may      ap-
                                                                                                                                                                                       bined in the Inverse Efficiency Score (IES), by using
associated with concepts is deferred to future work:vestigating                                 it         ply the
                                                                                                          both      in role
                                                                                                                          cases        where
                                                                                                                                of visual      load in  both concepts a word
                                                                                                                                                                           in-          and
                                                                                                                                                                                       the           its IES
                                                                                                                                                                                               formula     dependent(s)
                                                                                                                                                                                                                 = (RT · AC)/100.(or     IES is a metrics com-
can be addressed either through a classical Information                                   volved in inferential
                                                                                                           governor(s))and referential   aretasks. associated withmonly                     visual  used to aggregate reaction time and accuracy and
                                                                                                                                                                                                           features.           For    ex-
                                                                                                                                                                                       to summarize them. The mean IES value was used as
                                                                                          Experimental design and procedure Participants
Extraction approach building on statistics, or in a more                                                   ample, a phrase such as ‘with black
                                                                                          were asked to perform an inferential task “Naming by
                                                                                                                                                                                       dependent      stripes’
                                                                                                                                                                                                         variable and  is entered
                                                                                                                                                                                                                            expected in a 2 ⇥ 2 repeated mea-
                                                                                                                                                                                       sures ANOVA with ‘target’ (two levels: ‘visual’ and ‘not-
semantically-principled way.                                                                               to evoke
                                                                                          definition”. During         the task  mental
                                                                                                                                   a sentence     images
                                                                                                                                                      was pronounced  in a more                  vivid        way      than
                                                                                                                                                                                       visual’) and ‘definition’ (two levels:     its   el- ‘visual’ and ‘not-
                                                                                          and the subjects were instructed to listen to the stim-
   Different weighting schemes w                    ~ = {α, β, γ} have been                                ements            taken          in
                                                                                          ulus given in the headphones and to overtly name, as
                                                                                                                                                   isolation              (that          is,
                                                                                                                                                                                       visual’)   ‘black’        and
                                                                                                                                                                                                      as within-subjects  ‘stripes’),
                                                                                                                                                                                                                              factors. Post hoc comparisons
                                                                                                                                                                                       were performed by using the Duncan test.
tested in order to determine the features’ contribution                                        to and
                                                                                          accurately       moreover               its visual
                                                                                                                as fast as possible,          the target    load word is  cor-expected to further grow if
                                                                                                                                                                                            The scores obtained by the participants in the vi-
                                                                                          responding to the definition, using a microphone con-
the visual load associated with a concept c, that results                                                  we add a coordinated term, as in
                                                                                          nected to a response box. Auditory stimuli were pre-
                                                                                                                                                                                       sual‘with           yellow and
                                                                                                                                                                                                 load questionnaire         were black
                                                                                                                                                                                                                                   analyzed by using paired
                                                                                                                                                                                       T-tests, two tailed. Two comparisons were performed
from computing                                                                            sented through   stripes’.
                                                                                                                 the E-Prime     Moreover,
                                                                                                                                      software, which       thewasVL      alsowould –recursively– grow if
                                                                                                                                                                                       for visual and not-visual targets, and for visual and not-
                                                                                          used to record data on accuracy and reaction times.
                 X                                                                                         we added a governor term (like ‘fur
                                                                                             Furthermore, at the end of the experimental session,
                                                                                                                                                                                       visualwith           yellow and black
                                                                                                                                                                                                    definitions.
 VL(c, w)~ =           φi = α(φcol +φsha )+β φmot +γ φsiz .the(2)                                          stripes’).
                                                                                                subjects were      administered   We        then introduced
                                                                                                                                       a questionnaire:            they had           aingThe       computational model results were analyzed by us-
                                                                                                                                                                                             parameter               ξ to control
                                                                                          to rate on a 1 7 Likert scale the intensity of the visual                                             paired T-tests, two tailed. Two comparisons were
                   i
                                                                                                           the      contribution
                                                                                          load they perceived as related to each target and to each of      the       aforementioned   performed         for  features
                                                                                                                                                                                                               visual           in casetargets and for vi-
                                                                                                                                                                                                                        and not-visual
                                                                                                                                                                                       sual and not-visual definitions.
For the experimentation we set α to 1.35, β to 1.1 and                the corresponding terms are linked in the parse tree by
                                                                                          definition.
                                                         The factorial design of the study included two within-          Correlations between IES, computational model
γ to .9: these assignments reflect the fact that color subjects
                                                        and factors,  a modifier/argument
                                                                           in which the visual load of bothrelation
                                                                                                                target (denoted       as mod
                                                                                                                         and visual load            and arg
                                                                                                                                            questionnaire.     We also explored the
shape information is considered more important, inand   thedefinitioninwasEquation
                                                                             manipulated.3). The resulting four ex-      existence of correlations between IES, the visual load
                                                       perimental conditions were as follows:                            questionnaire and the computational model output by
computation of VL.                                                                    (                                  using linear regressions. For both the IES values and
                                                       VV Visual Target—Visual Definition          (e.g., ‘The bird of
                                                                                        ξ VL(ci ) if ∃ cj s.t.themod(c                       ) ∨ arg(c
   To the ends of combining the contribution of concepts                                                                              i , cjscores,
                                                                                                                             questionnaire                 i , cj ) for each item the
                                                                                                                                                    we calculated
                                                         prey with VL(c         )
                                                                      great wings =flying over the mountains is the      mean of the 30 subjects’ responses. In a first model, we
                                                                              i
in a sentence s to the overall VL score for
                                          P s, we adopted. . . eagle’);                 VL(c    i )         otherwise.   used the visual-load questionnaire scores as independent
the following additive schema: VL(s) = c∈s VL(c).VNV Visual Target—Non-Visual Definition (e.g., The variable to predict the participants’                    (3)performance (with
                                                         hottest of the four elements of the ancients is . . . fire);    the IESas dependent variable); in a second model, we
   The computation of the VL score also accounts for                  In the experimentation ξ was set                   usedtothe1.2.
                                                                                                                                   computational data as independent variable to
the dependency structure of the input sentences. NVV   The Non-Visual Target—Visual Definition (e.g., The predict the participants’ visual load evaluation (with the
                                                         nose of Pinocchio stretched when he said a . . . lie);          questionnaire scores as independent variable).
syntactic structure of sentences is computed by the                      The stimuli in the dataset are pairs consisting of
Turin University Parser (TUP) in the dependency for-                  a definition d and a target T (st = hd, T i), such as
mat (Lesmo, 2007). Dependency formalisms represent                       z
                                                                                                           definition d
                                                                                                                }|
                                                                                                                                                       target T
                                                                                                                                                     { z }| {
syntactic relations by connecting a dominant word, the                   The big carnivore with yellow and black stripes is the . . . tiger.
                                                                         |                                           {z                                          }
head (e.g., the verb ‘fly’ in the sentence The eagle flies)                                                      stimulus st
and a dominated word, the dependent (e.g., the noun                   The visual load associated to st components, given the


                                                                                                                  183
weighting scheme w,
                 ~ is then computed as follows:                     prey with great wings flying over the mountains is the
                          P                                         . . . eagle’);
              VL(d, w)
                    ~ =     c∈d VL(c)                 (4)
              VL(T, w)
                    ~ =       VL(T ).                 (5)         VNV Visual Target—Non-Visual Definition (e.g., The
                                                                   hottest of the four elements of the ancients is . . . fire);
   The whole pipeline from the input parsing to compu-
                                                                  NVV Non-Visual Target—Visual Definition (e.g., The
tation of the VL for the considered stimulus has been
                                                                   nose of Pinocchio stretched when he told a . . . lie);
implemented as a computer program; its main steps in-
clude the parsing of the stimulus, the extraction of the          NVNV Non-Visual Target—Non-Visual Definition
(lexicalized) concepts by exploiting the output of the               (e.g., The quality of people that easily solve difficult
morphological analysis, and the tree traversal of the de-            problems is said . . . intelligence).
pendency structure resulting from the parsing step. The
                                                                  For each condition, there were 48 sentences, 192 sen-
morphological analyzer has been preliminarily fed with
                                                                  tences overall. Each trial lasted about 30 minutes. The
the whole set of stimuli, and its output has been anno-
                                                                  number of words (nouns and adjectives), their balancing
tated with the visual features and stored into a dictio-
                                                                  across stimuli, and the (syntactic dependency) structure
nary. At run time, the dictionary is accessed based on
                                                                  of the considered sentences were uniform within condi-
morphological information, then used to retrieve the val-
                                                                  tions, so that the most relevant variables were controlled.
ues of the features associated with the concepts in the
                                                                  The same set of stimuli used for the human experiment
stimulus. The output obtained by the proposed model
                                                                  was given in input to the system implementing the com-
has been compared with the results obtained in a behav-
                                                                  putational model.
ioral experimentation as described below.
                                                                  Data analysis
                Experimentation                                   The participants’ performance in the “Naming from def-
Materials and Methods                                             inition” task was evaluated by recording, for each re-
Thirty healthy volunteers, native Italian speakers, (16           sponse, the reaction time RT, in milliseconds, and the
females and 14 males), 19 − 52 years of age (mean                 accuracy AC, computed as the percentage of correct an-
±sd = 25.7 ± 5.1), were recruited for the experiment.             swers. The answers were considered correct if the target
None of the subjects had a history of psychiatric or neu-         word was plausibly matched with the definition. Then,
rological disorders. All participants gave their written          for each subject, both RT and AC were combined in
informed consent before participating in the experimen-           the Inverse Efficiency Score (IES), by using the formula
tal procedure, which was approved by the ethical com-             IES = (RT/AC) · 100. IES is a metrics commonly used
mittee of the University of Turin, in accordance with             to aggregate reaction time and accuracy, and to summa-
the Declaration of Helsinki (World Medical Association,           rize them (Townsend & Ashby, 1978). The mean IES
1991). Participants were all naı̈ve to the experimental           value was used as the dependent variable and entered
procedure and to the aims of the study.                           in a 2 × 2 repeated measures ANOVA with ‘target’ (two
                                                                  levels: ‘visual’ and ‘non-visual’) and ‘definition’ (two lev-
Experimental design and procedure Participants
                                                                  els: ‘visual’ and ‘non-visual’) as within-subjects factors.
were asked to perform an inferential task “Naming from
                                                                  Post hoc comparisons were performed by using the Dun-
definition”. During the task a sentence was pronounced
                                                                  can test.
and the subjects were instructed to listen to the stim-
                                                                     The scores obtained by the participants in the visual
ulus given in the headphones and to overtly name, as
                                                                  load questionnaire were analyzed by using unpaired T-
accurately and as fast as possible, the target word cor-
                                                                  tests, two tailed. Two comparisons were performed for
responding to the definition, using a microphone con-
                                                                  visual and non-visual targets, and for visual and non-
nected to a response box. Auditory stimuli were pre-
                                                                  visual definitions. The computational model results were
sented through the E-Prime software, which was also
                                                                  analyzed by using unpaired T-tests, two tailed. Two
used to record data on accuracy and reaction times. Fur-
                                                                  comparisons were performed for visual and non-visual
thermore, at the end of the experimental session, the
                                                                  targets and for visual and non-visual definitions.
subjects were administered a questionnaire: they had to
                                                                  Correlations between IES, computational model
rate on a 1 − 7 Likert scale the intensity of the visual
                                                                  and visual load questionnaire. We also explored the
load they perceived as related to each target and to each
                                                                  existence of correlations between IES, the visual load
definition.
                                                                  questionnaire and the computational model output by
  The factorial design of the study included two within-
                                                                  using linear regressions. For both the IES values and
subjects factors, in which the visual load of both target
                                                                  the questionnaire scores, we computed for each item the
and definition was manipulated. The resulting four ex-
                                                                  mean of the 30 subjects’ responses. In a first model, we
perimental conditions were as follows:
                                                                  used the visual load questionnaire scores as independent
VV Visual Target—Visual Definition (e.g., ‘The bird of            variable to predict the participants’ performance (with


                                                            184
Figure 3: The graph shows, for each condition, the mean             Figure 4: Linear regression “Inverse Efficiency Score
IES with standard error.                                            (IES) by Visual Load Questionnaire”. The mean score
                                                                    in the Visual Load Questionnaire, reported on 1 − 7 Lik-
                                                                    ert scale, was used as an independent variable to predict
IESas the dependent variable); in a second model, we
                                                                    the subjects’ performance, as quantified by the IES.
used the computational data as independent variable to
predict the participants’ visual load evaluation (with the
questionnaire scores as the independent variable). In
order to verify the consistency of the correlation effects,         the general agreement of the subjects. By compar-
we also performed linear regressions where we controlled            ing the computational model scores for visual (mean
for three covariate variables: the number of words, their           ±sd = 4.0 ± 2.4) and non-visual (mean ±sd = 2.9 ± 2.0)
balancing across stimuli and the syntactic dependency               definitions we found a significant difference (p < 0.001;
structure.                                                          unpaired T-test, two tailed). By comparing the compu-
                                                                    tational model scores for visual (mean ±sd = 2.53±1.29)
Results                                                             and non-visual (mean ±sd = 0.26 ± 0.64) targets we
The ANOVA showed a significant effect of the within-                found a significant difference (p < 0.001). This suggest
subject factors “target” (F1,29 = 14.4; p < 0.001), sug-            that we were able to computationally model the visual-
gesting that the IES values were significantly lower in             load of both targets and descriptions, describing it as a
the visual than in the non-visual targets, and “defini-             linear combination of different low-level features: color,
tion” (F1,29 = 32.78; p < 0.001), suggesting that the IES           shape, motion and dimension.
values were significantly lower in the visual than in the           Results correlations. By using the visual load ques-
non-visual definitions. This means that, for both the tar-          tionnaire scores as independent variable we were able
get and the definition, the participants’ performance was           to significantly (R2 = 0.4; p < 0.001) predict the partici-
significantly faster and more accurate in the visual than           pants’ performance (that is, their IES values), illustrated
in the non-visual condition. We also found a significant            in Figure 4. This means that the higher the participants’
interaction “target*definition” (F1,29 = 7.54; p = 0.01).           visual score for a definition, the better the participants’
Based on the Duncan post hoc comparison, we verified                performance in giving the correct response (or, alterna-
that this interaction was explained by the effect of the            tively, the lower the IES value).
visual definitions of the visual targets (VV condition),
                                                                       By using the computational data as independent vari-
in which the participants’ performance was significantly
                                                                    able we were able to significantly (R2 = 0.44; p < 0.001)
faster and more accurate than in all the other conditions
                                                                    predict the participants’ visual load evaluation (their
(VNV; NVV; NVNV), as shown in Figure 3.
                                                                    questionnaire scores), as shown in Figure 5. This means
   By comparing the questionnaire scores for visual
                                                                    that a correlation exists between the computational pre-
(mean ±sd = 5.69 ± 0.55) and non-visual (mean ±sd =
                                                                    diction about the visual load of the definitions and the
4.73 ± 0.71) definitions we found a significant difference
                                                                    participants visual load evaluation: the higher is the
(p < 0.001; unpaired T-test, two tailed). By compar-
                                                                    computational model result, the higher is the partici-
ing the questionnaire scores for visual (mean ±sd =
                                                                    pants’ visual score in the questionnaire. We also found
6.32 ± 0.4) and non-visual (mean ±sd = 4.23 ± 0.9)
                                                                    that these effects were still significant in the regres-
targets we found a significant difference (p < 0.001).
                                                                    sion models where the number of words, their balancing
This suggest that our arbitrary categorization of each
                                                                    across stimuli and the syntactic dependency structure
sentences within the four conditions was supported by
                                                                    was controlled for.

                                                              185
                                                                    Coltheart, M. (1980). Deep dyslexia: A right hemisphere
                                                                      hypothesis. Deep dyslexia, 326–380.
                                                                    Cortese, M. J., & Khanna, M. M. (2007). Age of acquisi-
                                                                      tion predicts naming and lexical-decision performance
                                                                      above and beyond 22 other predictor variables: An
                                                                      analysis of 2,342 words. Q J Exp Psychol A, 60 (8),
                                                                      1072–1082.
                                                                    Just, M. A., Newman, S. D., Keller, T. A., McEleney,
                                                                      A., & Carpenter, P. A. (2004). Imagery in sentence
                                                                      comprehension: an fmri study. Neuroimage, 21 (1),
                                                                      112–124.
                                                                    Kemmerer, D. (2010). Words and the Mind: How words
                                                                      capture human experience. In B. Malt & P. Wolff
                                                                      (Eds.), (chap. How Words Capture Visual Experience
                                                                      - The Perspective from Cognitive Neuroscience). Ox-
                                                                      ford Scholarship Online.
Figure 5: Linear regression “Visual Load Questionnaire
                                                                    Kiran, S., & Tuchtenhagen, J. (2005). Imageability ef-
by Computational Model”. The mean value obtained
                                                                      fects in normal spanish–english bilingual adults and in
by the Computational model was used as an indepen-
                                                                      aphasia: Evidence from naming to definition and se-
dent variable to predict the subjects’ scores on the Visual
                                                                      mantic priming tasks. Aphasiology, 19 (3-5), 315–327.
Load Questionnaire, reported on 1 − 7 Likert scale.
                                                                    Lesmo, L. (2007, June). The Rule-Based Parser of the
                                                                      NLP Group of the University of Torino. Intelligenza
                     Conclusions                                      Artificiale, 2 (4), 46–47.
In the next future we plan to extend the representation             Lieto, A., Minieri, A., Piana, A., & Radicioni, D. P.
of the conceptual information by grounding the concep-                (2015). A knowledge-based system for prototypical
tual representation on a hybrid representation composed               reasoning. Connection Science, 27 (2), 137–152.
of conceptual spaces and ontologies (Lieto, Minieri, Pi-            Lieto, A., Radicioni, D. P., & Rho, V. (2015, July).
ana, & Radicioni, 2015; Lieto, Radicioni, & Rho, 2015).               A Common-Sense Conceptual Categorization System
Additionally, we plan to integrate the current model in               Integrating Heterogeneous Proxytypes and the Dual
the context of cognitive architectures.                               Process of Reasoning. In Proc. of IJCAI 2015. Buenos
                                                                      Aires, Argentina: AAAI Press.
                Acknowledgments                                     Marconi, D. (1997). Lexical competence. MIT Press.
This work has been partly supported by the Project The              Marconi, D., Manenti, R., Catricala, E., Della Rosa,
Role of the Visual Imagery in Lexical Processing, grant               P. A., Siri, S., & Cappa, S. F. (2013). The neural
TO-call03-2012-0046, funded by Università degli Studi                substrates of inferential and referential semantic pro-
di Torino and Compagnia di San Paolo.                                 cessing. Cortex , 49 (8), 2055–2066.
                                                                    Mellet, E., Tzourio, N., Denis, M., & Mazoyer, B. (1998).
                     References                                       Cortical anatomy of mental imagery of concrete nouns
Bergen, B. K., Lindsay, S., Matlock, T., & Narayanan, S.              based on their dictionary definition. Neuroreport,
  (2007). Spatial and linguistic aspects of visual imagery            9 (5), 803–808.
  in sentence comprehension. Cognitive Sci , 31 (5), 733–           Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Con-
  764.                                                                creteness, imagery, and meaningfulness values for 925
Bergsma, S., & Goebel, R. (2011). Using visual infor-                 nouns. Journal of experimental psychology, 76 , 1.
  mation to predict lexical preference. In RANLP (pp.               Silberer, C., Ferrari, V., & Lapata, M. (2013). Models
  399–405).                                                           of semantic representation with visual attributes. In
Bookheimer, S., Zeffiro, T., Blaxton, T., Gaillard, W.,               Acl 2013 proceedings (pp. 572–582).
  Malow, B., & Theodore, W. (1998). Regional cerebral               Townsend, J. T., & Ashby, F. G. (1978). Methods of
  blood flow during auditory responsive naming: evi-                  modeling capacity in simple processing systems. Cog-
  dence for cross-modality neural activation. Neurore-                nitive theory, 3 , 200–239.
  port, 9 (10), 2409–2413.                                          Tversky, A., & Kahneman, D. (1973). Availability: A
Bruni, E., Tran, N.-K., & Baroni, M. (2014). Multimodal               heuristic for judging frequency and probability. Cog-
  distributional semantics. J. Artif. Intell. Res., 49 , 1–           nitive psychology, 5 (2), 207–232.
  47.                                                               World Medical Association. (1991). Code of Ethics:
Cipolotti, L., & Warrington, E. K. (1995). Semantic                   Declaration of Helsinki. BMJ , 302 , 1194.
  memory and reading abilities: A case report. J INT
  NEUROPSYCH SOC , 1 (01), 104–110.


                                                              186