=Paper=
{{Paper
|id=Vol-2765/160
|storemode=property
|title=CONcreTEXT @ EVALITA2020: The Concreteness in Context Task
|pdfUrl=https://ceur-ws.org/Vol-2765/paper160.pdf
|volume=Vol-2765
|authors=Lorenzo Gregori,Maria Montefinese,Daniele P. Radicioni,Andrea Amelio Ravelli,Rossella Varvara
|dblpUrl=https://dblp.org/rec/conf/evalita/GregoriMRRV20
}}
==CONcreTEXT @ EVALITA2020: The Concreteness in Context Task==
CON CRE TEXT @ EVALITA2020: The Concreteness in Context Task Lorenzo Gregori Maria Montefinese Daniele P. Radicioni University of Florence University of Padua University of Turin lorenzo.gregori@unifi.it maria.montefinese@unipd.it daniele.radicioni@unito.it Andrea Amelio Ravelli Rossella Varvara Istituto di Linguistica Computazionale University of Florence “Antonio Zampolli” (ILC–CNR) - ItaliaNLP Lab rossella.varvara@unifi.it andreaamelio.ravelli@ilc.cnr.it Abstract the senses; abstract concepts lie on the opposite side of the scale and are grounded in the inter- Focus of the CON CRE TEXT task is con- nal sensory experience and linguistic information. ceptual concreteness: systems were so- While concrete concepts have direct sensory ref- licited to compute a value expressing to erents (Crutch and Warrington, 2005) and greater what extent target concepts are concrete availability of contextual information (Connell et (i.e., more or less perceptually salient) al., 2018; Kousta et al., 2011; Montefinese et al., within a given context of occurrence. To 2020), abstract concepts tend to be more emotion- these ends, we have developed a new ally valenced (Kousta et al., 2011) and less image- dataset which was annotated with con- able (Montefinese et al., 2020; Garbarini et al., creteness ratings and used as gold standard 2020). in the evaluation of systems. Four teams The CON CRE TEXT task challenges partici- participated in this first edition of the task, pants to build NLP systems to automatically as- with a total of 15 runs submitted. sign a concreteness value to words in context. It is Interestingly, these works extend infor- aimed at investigating how the concreteness infor- mation on conceptual concreteness avail- mation affects sense selection: different from past able in existing (non contextual) norms research (Brysbaert et al., 2014b; Montefinese et derived from human judgments with new al., 2014), we are interested in assessing the con- knowledge from recently developed neu- creteness of concepts within the context of real ral architectures, in much the same multi- sentences rather than in isolation. Additionally, disciplinary spirit whereby the CON CRE - the concreteness score is assumed to be a property TEXT task was organized. of meanings rather than a property of word forms; thus, scoring the concreteness of a concept in con- 1 Introduction text implicitly requires to individuate its underly- Concept concreteness – that is, how directly a con- ing sense, by handling lexical phenomena such as cept is related to sensorial experience (Brysbaert polysemy and homonymy. et al., 2014a)– is a fundamental dimension of con- Ordinary experience suggests that concepts’ ceptual semantic representation that has attracted concrete/abstract status can affect their semantic more and more interest and attention in psycholin- representation, and lexical access and processing: guistics in the last decade. This dimension is usu- concrete meanings are acknowledged to be more ally assessed by participants ratings on a Likert quickly and easily delivered in human commu- scale: concrete concepts lie herein on one side of nication than abstract meanings (Bambini et al., the scale and refer to something that exists in re- 2014). Historically, it has been observed that con- ality and can be experienced immediately through crete concepts are responded to more quickly than abstract concepts in lexical decision tasks (Bleas- Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 dale, 1987; Kroll and Merves, 1986), although International (CC BY 4.0). more recent experiments have shown that abstract concepts might have an advantage when other based approaches, and more recent language mod- variables have been accounted for (Kousta et al., els and sequence-to-sequence models. Finally, 2011). Concrete concepts are also easier to encode like in many real-world cases, the provided trial and retrieve than abstract concepts (Romani et al., data is rather scarce, in the order of hundred sen- 2008; Miller and Roodenrys, 2009), are easier to tences for the Italian language, and as many for make associations with (de Groot, 1989), and are English. This aspect forced our participants to more thoroughly described in definition tasks (Sa- face something similar to a ‘cold start’ problem. doski et al., 1997). Moreover, it takes generally We hope that this edition of the CON CRE TEXT less time to comprehend a concrete sentence than task will be the first appointment in a series for an abstract one (Haberlandt and Graesser, 1985; those who are interested in the issues posed by the Schwanenflugel and Shoben, 1983). Thus, it has contextual conceptual concreteness to research on been proposed that different organizational princi- natural language semantics. ples govern semantic representations of concrete and abstract concepts: concrete concepts are pre- 2 Task Definition dominantly organized by featural similarity mea- sures, and abstract concepts by associative rela- tions, co-occurrence patterns and syntactic infor- The task CON CRE TEXT (so dubbed after CON- mation (Vigliocco et al., 2009). creteness in conTEXT) focuses on automatic con- creteness (and conversely, abstractness) recogni- All surveyed features make aspects ingrained in tion. Given a sentence along with a target word, the distinction between concreteness/abstractness we asked participants to propose a system able a stimulating and challenging field also for com- to assess the concreteness of a concept expressed putational linguistics. Among the earliest attempts by a given word within a sentence, on a 7-point at grasping concreteness, we find works that in- Likert-like scale where 1 stands for completely ab- vestigated on concreteness/abstractness informa- stract (e.g., ‘freedom’) and 7 for completely con- tion in its interplay with metaphor identification crete (e.g., ‘car’). For example, in the sentence and figurative language more in general (Tur- “In summer, wheat fields are coloured in yellow” ney et al., 2011) (and, more recently (Mensa the noun field refers to an entity that can smell, be et al., 2018b)). Although concreteness infor- touched, and pointed to. In this case, in a scale mation is acknowledged to be central to, e.g., ranging from 1 to 7 its concreteness may be evalu- word-sense induction and compositionality mod- ated as 7, because it refers to an extremely con- eling (Hill et al., 2013), the contribution of con- crete concept. In contrast, the same noun field creteness/abstractness to semantic representations in the sentence “Physics is Alice’s research field” is not fully grasped and exploited in existing ap- refers to a scientific subject, i.e., something that proaches and resources, with the notable excep- cannot be perceived through the five senses, but tion of works aimed i) at learning multimodal em- that can be explained through a linguistic descrip- beddings, and how abstract and concrete repre- tion. In this sentence, the noun field may be eval- sentations can be acquired by multi-modal mod- uated 1 because it refers to an extremely abstract els (Hill and Korhonen, 2014); and ii) at exploring concept. Moreover, the task targets can be halfway in how far concreteness information is represented between completely abstract and completely con- in the distributional patterns in corpora (Hill et crete, as in the case of “Magnetic field attracts al., 2013). Moreover, some approaches exist that iron”, where the noun field refers to something attempted to create lexical resources by also em- more abstract compared to “wheat fields” but more ploying common-sense information (Mensa et al., concrete compared to “research field”. As antic- 2018a; Colla et al., 2018). ipated, the concreteness score being assigned to Characterizing tokens within sentences with the word should be evaluated in context: the word their concreteness requires integrating both word- should not be considered in isolation, but as part specific and contextual information. In our view, of a given sentence. the CON CRE TEXT Task entails dealing with a Participants were invited to exploit all possible relaxed form of word sense disambiguation; such strategies to solve the task, including (but not lim- aspects were faced by our participants by devising ited to) knowledge bases, external training data, methods relying on both traditional knowledge- word embeddings, etc. Table 1: Basic statistics on the CON CRE TEXT GOLD−EN GOLD−IT dataset used as gold standard. 0.30 0.30 Italian English 0.20 0.20 Unique Verb targets 52 44 Unique Noun targets 96 73 0.10 0.10 Num. Sentences 550 534 Num. Sentences Verb target 189 210 0.00 0.00 Num. Sentences Noun target 361 324 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Avg. sent. length 14.43 14.33 Avg. sent. length (no punct) 13.03 12.87 Avg. full words per sent. 7.14 7.15 (a) English dataset. (b) Italian dataset. Num. Annotators 333 310 Human ratings (HR) 18,726 16,522 Figure 1: Distribution of human ratings for the En- Min HR per sentence 30 30 glish and Italian datasets. 3 Dataset can be used across the entire Italian – and English The dataset used for this task has been taken from – speaking populations. the English-Italian parallel section of The Human The dataset has been split into trial and test data, Instruction Dataset (Chocron and Pareti, 2018), with a 20–80 ratio. Trial data has been released derived from WikiHow instructions.1 All such with the concreteness scores, while the test data documents had been anonymized beforehand, so has been provided at the beginning of the evalua- that downloaded data present no privacy nor data tion window without any score.2 sensitivity issues. 4 Evaluation Measures and Baselines The dataset is composed of overall 1, 096 sen- tences, arranged as follows: 562 Italian sentences We chose the Spearman correlation indices as our plus 534 English sentences. Each sentence con- main evaluation measure; for the sake of com- tains a target term (either verb or noun) with its pleteness, we also report Pearson indices (substan- associated concreteness score (1–7 scale). Such tially in accord with the previous metrics). We score is derived from the average of at least 30 chose the former measure because the collected human judgments from native Italian and English ratings are not normally distributed, which makes speakers about the concreteness of a target word in the Spearman correlation more suited to the data. a given sentence (see Table 1 for the dataset num- In fact, by running the Shapiro–Wilk test we ob- bers). tained a p-value < 0.001. The non normal distri- The reliability of the collected data within bution of data is also confirmed by the plot of the each language (Italian, English) for the trial and gold standard ratings, as illustrated in Figure 1. test phases was evaluated separately by apply- Two baselines have been designed for this task. ing the split-half correlations corrected with the Spearman-Brown formula after randomly divid- Baseline One. The first baseline for the Italian ing the participants into two subgroups of equal language is derived as follows. The fastText word size. All the reliability indexes were calculated embeddings have been acquired beforehand by on 10, 000 different randomizations of the partic- training the model on the Italian dump of the Wik- ipants. The mean correlations between the two iHow instructions. We chose fastText for its sup- groups are very high for both the trial and test port to the handling of OOV terms (Bojanowski et phases, ranging from a minimum of r = 0.87 al., 2017), which is a crucial feature in the present for English (at the test phase) to a maximum of setting. The cited norms by Montefinese et al. r = 0.98 for Italian (at the trial phase), showing (2014) (referred to as ‘the norms’ hereafter) have that the resulting ratings are highly reliable and been used herein. The average score of terms in each input sentence S = {t1 , t2 , . . . tK } has been 1 The whole Human Instruction Dataset 2 dataset is freely available on Kaggle, The dataset employed in the CON CRE TEXT task is https://www.kaggle.com/paolop/ available at the URL https://lablita.github.io/ human-instructions-multilingual-wikihow CONcreTEXT/. computed by scrolling through the content words 5.1 A NDI of the sentence. Each term t is searched in the The A NDI team (Rotaru, 2020) proposed a system norms: if the term is found, the associated con- based on multiple classes of concreteness score creteness score c(t) is returned; otherwise, if the predictors. The first class of predictors has been term is not present in the norms, the ranking of derived from large datasets of behavioral norms, the l (l = 20, 000) elements most similar to t is collected for a wide variety of psycholinguistic generated through fastText. In this case, we scan factors. Beside well known concreteness norms, the whole norms list and employ the concreteness A NDI takes into account also semantic diversity, score of the element in the norms closest to those age of acquisition, emotional and sensori-motor in the fastText ranking. In either case we obtain dimensions, as well as frequency and contextual a score for each and every term in the input sen- diversity counts. The vocabulary resulting from tence, so that the concreteness score of the target the merging of these words collections comprises token t̂ is computed as the averaged score of the more than 70K words, and it is the base vocabu- terms in the input sentence: lary used to extract all the predictors. The second K 1 X class of predictors has been derived from context- c(t̂) = · c(ti ). independent distributional models, namely Skip- K i=1 gram, GloVe, and NumberBatch embeddings, as The first baseline for the English language is well as from the concatenation of the three. The analogous to the Italian one, except for the fact that third class of predictors has been derived from fea- the English tokens from the norms are accessed in tures obtained through recent transformers mod- this case. The same strategy governs the handling els, i.e. context-dependent representations. The of the fastText resource, that in this case has been models exploited are: BERT, GPT-2, Bart, and trained on the English dump of the Human Instruc- ALBERT. The final rating has been computed tion Dataset. through a ridge regression over the three classes. Baseline Two. The second baseline for the Ital- ian language implements a simple lookup func- 5.2 C APISCO tion. More specifically, input sentences have been The C APISCO Team (Bondielli et al., 2020) sub- translated into English through the Google Trans- mitted 3 systems for both Italian and English. late ajax API implementation, and then the con- creteness scores associated to the terms in the N ON -C APISCO. The first system computes a norms by Brysbaert et al. (2014b) are retrieved variation of the Baseline Two; that is, the target (in the unlikely case the term is not found, it is concreteness is obtained by combining the con- dropped, thus not contributing to the final score). creteness value of the target term (taken in isola- The concreteness score of the target term is thus tion), and the average concreteness of the whole assigned to the average concreteness of terms in sentence. Improvement from baseline comes from the given input sentence. The baseline two for the considering differently the weight of the concrete- English language employs the concreteness score ness of the target term and of the context. —by also employing the norms by Brysbaert et al. (2014b)— associated to all terms in the input C APISCO -C ENTROIDS. This system is based sentence, finally assigning to the target token the on the assumption that close semantic spaces are average concreteness score for the whole sentence. featured by similar concreteness scores. In this case the authors first build two centroids, one for 5 Systems Descriptions concrete and one for abstract concepts based on In this Section we briefly describe the systems that the norms by Brysbaert et al. (2014b) and Della participated in the competition. As a first edition, Rosa et al. (2010), by employing fastText pre- the CON CRE TEXT task recorded a good feed- trained embeddings. The concreteness score of a back from the community, with 4 teams, overall term is then computed by averaging the distance of 7 participants and 15 submitted system runs. In the first 50 lexical substitutes of the target (identi- the next Section we report the results obtained by fied through BERT) from the two polarized cen- all such systems, while anonymizing a withdrawn troids. Introducing a list of target substitutes in a participant. given context is thus the gist of this approach. C APISCO -T RANSFORMERS. In this variant, Table 2: Results for each run on English test set. the C APISCO team fine-tuned a pre-trained BERT System run Spear Pears Eucl.D model on the concreteness rating task, by com- A NDI 0.833 0.834 15.409 plementing the CON CRE TEXT training data with N ON -C APISCO 0.785 0.787 35.663 newly generated training data. The new data gen- KON K RETI K A 3 0.663 0.668 28.613 eration is twofold: for each original sentence, new KON K RETI K A 1 0.651 0.667 29.933 sentences are generated by replacing the target Baseline 2 0.554 0.567 38.451 term with the first lexical substitutes derived with KON K RETI K A 4 0.542 0.545 29.836 BERT target masking approach. Then, more sen- C APISCO C ENTR 0.542 0.538 48.864 tences are borrowed from Italian and English ref- KON K RETI K A 2 0.541 0.545 30.322 erence corpora. C APISCO T RANS 0.504 0.501 29.927 5.3 KON K RETI K A Baseline 1 0.382 0.377 31.738 withdrawn run3 -0.013 0.067 41.109 The KON K RETI K A team (Badryzlova, 2020) pre- withdrawn run1 -0.124 -0.123 44.068 sented a system that first assigns a concreteness withdrawn run2 -0.127 -0.129 43.890 and an abstractness score to the target lemma, and then it adjusts these values based on the surround- Table 3: Results for each run on Italian test set. ing context. In the first step, the system computes System run Spear Pears Eucl.D semantic similarity between the target vectors and A NDI 0.749 0.749 19.950 a “seed list” consisting of abstract and concrete C APISCO T RANS 0.625 0.617 24.367 words (extracted from the MRC Psycholinguistic C APISCO C ENTR 0.615 0.609 28.608 Database). In the second step, the values where N ON -C APISCO 0.557 0.557 31.588 adjusted to the sentential context considering the Baseline 2 0.534 0.522 40.114 mean concreteness index of the entire sentence. Baseline 1 0.346 0.368 31.046 The team submitted 4 runs based on a heuristically selected coefficient. substantially confirm the results: for the results on 6 Results English (Table 2) it is minimal for the output of the A NDI system, and it increases while Spearman Four teams participated in the CON CRE TEXT correlation values decrease. The same trend is also competition: A NDI, C APISCO, KON K RETI K A, confirmed on Italian results (Table 3). and a withdrawn team. A NDI and C APISCO de- Tables 6 and 7 report disaggregated Spearman veloped a system for both languages (English and correlations for verbs and nouns. This allows Italian), while KON K RETI K A participated in the to highlight if and to what extent the participat- English track only, and the same did the with- ing systems obtained better results on either POS. drawn participant. Each team was allowed to sub- A NDI obtained the best results on both verbs and mit the output of up to 4 system runs; the final nouns in both languages. This system (and N ON - ranking has been compiled based on the results of C APISCO as well) obtained analogous results on the best run. verbs and nouns. On the whole, the rest of the In Tables 2 and 3 we present the score of each systems obtained results clearly better on English run for the English and Italian language, respec- verbs and slightly better on Italian nouns. In par- tively. Although, as mentioned, the Spearman in- ticular, KON K RETI K A (English only) is strongly dices were adopted as our main evaluation metrics, biased on verbs: its performances on verbs are we also report Pearson correlation indices and Eu- higher in all 4 runs. C APISCO systems exhibit the clidean distance, that may be useful to complete most varied behavior. the assessment of the results. The final ranking is provided in Tables 4 and 5. 7 Discussion We can observe a substantial agreement be- tween Spearman and Pearson indices: the aver- The obtained results confirm transformers as a aged delta between such figures amounts to 0.012 good device to compute concreteness score for and to 0.008 on the English and Italian dataset, re- words in context. The virtues of transform- spectively. Also the Euclidean distance seems to ers in grasping contextual information are largely Table 4: Final ranking on English test set. ness score of a word in context is a complex task, Team Spear Pears Eucl.D involving different semantic, cognitive and expe- A NDI 0.833 0.834 15.409 riential levels. CAPISCO 0.785 0.787 35.663 The high correlation obtained by the N ON - KON K RETI K A 0.663 0.668 28.613 C APISCO in the English task is somehow surpris- withdrawn -0.013 0.067 41.109 ing, since this system makes use only of the mean concreteness of the sentence (computed from ex- Table 5: Final ranking on Italian test set. isting norms) as contextual information. This re- Team Spear Pears Eucl.D sult is thus related to the availability of existing A NDI 0.749 0.749 19.950 norms, but it shows that there is a link between CAPISCO 0.625 0.617 24.367 the concreteness score of a target word in context and the concreteness scores of the words it oc- Table 6: Spearman rank differences between curs with. Further analysis are needed, but it sug- nouns and verbs on English test set. gests that concrete interpretations of a target word Spear.N Spear.V Diff are associated with concrete context words. Of C APISCO T RANS 0.443 0.654 0.211 course, systems based exclusively on behavioral KONKRETIKA 4 0.502 0.701 0.199 norms are strongly dependent on the coverage of KONKRETIKA 2 0.502 0.683 0.181 the considered vocabulary. In fact, the N ON - C APISCO C ENTR 0.478 0.659 0.181 C APISCO Italian performances (obtained exploit- KONKRETIKA 3 0.629 0.762 0.133 ing a ∼ 1.2K vocabulary) are lower than all the other systems, while on the English track it ranks KONKRETIKA 1 0.611 0.741 0.13 second (using a ∼ 70K vocabulary). A NDI 0.836 0.857 0.021 N ON -C APISCO 0.779 0.782 0.003 Table 7: Spearman rank differences between 8 Conclusions nouns and verbs on Italian test set. Spear.N Spear.V Diff We presented the results of the CON CRE TEXT N ON -C APISCO 0.579 0.507 0.072 task at EVALITA 2020 (Basile et al., 2020). C APISCO T RANS 0.607 0.667 0.060 The task challenges participants to build NLP C APISCO C ENTR 0.625 0.591 0.034 systems to automatically assign a concreteness A NDI 0.762 0.749 0.013 score to words in context, evaluating to what ex- tent target concepts are concrete (i.e., more or less perceptually salient) within a given context known, but in the present setting we observe that of occurrence. A novel dataset was developed their output can be further improved by integrat- for this task as a multilingual comparable cor- ing behavioral information (this seems to be one pus composed of 550 Italian sentences and 534 major difference between the systems A NDI and English sentences, annotated with the concrete- C APISCO -T RANSFORMERS). ness/abstractness rating of target nouns and verbs. The most important output of this challenge is Three teams completed their participation to the definitely the great performance of the A NDI sys- task, obtaining the following ranking: A NDI (Ro- tem, that proves to be robust and reliable for the taru, 2020), C APISCO (Bondielli et al., 2020), and considered task: the system obtains the best rank- KON K RETI K A (Badryzlova, 2020). ing in both languages, a low deviation from the Future work will address the following steps. gold standard and a substantial stability in process- First of all, we will improve our dataset by includ- ing both verbs and nouns. Moreover, the proposed ing further languages, also from different language system is ready to be applied in a multi-language families and under-resourced languages. Also the environment, given that non-English sentences are set of considered targets should be expanded, to automatically translated into English. The A NDI ensure a broader coverage to the dataset, and more system exploits different kinds of available re- significant results (thanks to the larger experimen- sources and works with local and contextual in- tal base) to its future users as well. formation. This shows that deriving the concrete- References Louise Connell, Dermot Lynott, and Briony Banks. 2018. Interoception: the forgotten modality in per- Yulia Badryzlova. 2020. KON K RETI K A @ CON CRE - ceptual grounding of abstract and concrete concepts. TEXT: Computing concreteness indexes with sig- Philosophical Transactions of the Royal Society B: moid transformation and adjustment for context. In Biological Sciences, 373(1752):20170143. Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors, Proceedings of the 7th Sebastian J Crutch and Elizabeth K Warrington. 2005. evaluation campaign of Natural Language Process- Abstract and concrete concepts have structurally ing and Speech tools for Italian (EVALITA 2020), different representational frameworks. Brain, Online. CEUR.org. 128(3):615–627. Valentina Bambini, Donatella Resta, and Mirko Annette M de Groot. 1989. Representational aspects Grimaldi. 2014. A dataset of metaphors from of word imageability and word frequency as as- the italian literature: Exploring psycholinguistic sessed through word association. Journal of Experi- variables and the role of context. PloS one, mental Psychology: Learning, Memory, and Cogni- 9(9):e105634. tion, 15(5):824. Valerio Basile, Danilo Croce, Maria Di Maro, and Lu- cia C. Passaro. 2020. Evalita 2020: Overview Pasquale A Della Rosa, Eleonora Catricalà, Gabriella of the 7th evaluation campaign of natural language Vigliocco, and Stefano F Cappa. 2010. Beyond the processing and speech tools for italian. In Valerio abstract—concrete dichotomy: Mode of acquisition, Basile, Danilo Croce, Maria Di Maro, and Lucia C. concreteness, imageability, familiarity, age of acqui- Passaro, editors, Proceedings of Seventh Evalua- sition, context availability, and abstractness norms tion Campaign of Natural Language Processing and for a set of 417 italian words. Behavior research Speech Tools for Italian. Final Workshop (EVALITA methods, 42(4):1042–1048. 2020), Online. CEUR.org. Francesca Garbarini, Fabrizio Calzavarini, Matteo Di- Fraser A Bleasdale. 1987. Concreteness-dependent as- ano, Monica Biggio, Carola Barbero, Daniele P sociative priming: Separate lexical organization for Radicioni, Giuliano Geminiani, Katiuscia Sacco, concrete and abstract words. Journal of Experimen- and Diego Marconi. 2020. Imageability effect on tal Psychology: Learning, Memory, and Cognition, the functional brain activity during a naming to def- 13(4):582. inition task. Neuropsychologia, 137:107275. Piotr Bojanowski, Edouard Grave, Armand Joulin, and Karl F Haberlandt and Arthur C Graesser. 1985. Com- Tomas Mikolov. 2017. Enriching word vectors with ponent processes in text comprehension and some of subword information. their interactions. Journal of Experimental Psychol- ogy: General, 114(3):357. Alessandro Bondielli, Gianluca E. Lebani, Lucia C. Passaro, and Alessandro Lenci. 2020. C APISCO @ Felix Hill and Anna Korhonen. 2014. Learning ab- CON CRE TEXT: (Un)supervised Systems to Con- stract concept embeddings from multi-modal data: textualize Concreteness with Norming Data. In Va- Since you probably can’t see what i mean. In Pro- lerio Basile, Danilo Croce, Maria Di Maro, and Lu- ceedings of the 2014 Conference on Empirical Meth- cia C. Passaro, editors, Proceedings of the 7th eval- ods in Natural Language Processing (EMNLP), uation campaign of Natural Language Processing pages 255–265. and Speech tools for Italian (EVALITA 2020), On- line. CEUR.org. Felix Hill, Douwe Kiela, and Anna Korhonen. 2013. Concreteness and corpora: A theoretical and prac- Marc Brysbaert, Michaël Stevens, Simon De Deyne, tical study. In Proceedings of the Fourth Annual Wouter Voorspoels, and Gert Storms. 2014a. Workshop on Cognitive Modeling and Computa- Norms of age of acquisition and concreteness for tional Linguistics (CMCL), pages 75–83. 30,000 dutch words. Acta psychologica, 150:80–84. Marc Brysbaert, Amy Beth Warriner, and Victor Ku- Stavroula-Thaleia Kousta, Gabriella Vigliocco, perman. 2014b. Concreteness ratings for 40 thou- David P Vinson, Mark Andrews, and Elena sand generally known english word lemmas. Behav- Del Campo. 2011. The representation of ab- ior research methods, 46(3):904–911. stract words: why emotion matters. Journal of Experimental Psychology: General, 140(1):14. Paula Chocron and Paolo Pareti. 2018. Vocabulary alignment for collaborative agents: a study with Judith F Kroll and Jill S Merves. 1986. Lexical access real-world multilingual how-to instructions. In IJ- for concrete and abstract words. Journal of Experi- CAI, pages 159–165. mental Psychology: Learning, Memory, and Cogni- tion, 12(1):92. D. Colla, E. Mensa, A. Porporato, and D.P. Radicioni. 2018. Conceptual Abstractness: From Nouns to Enrico Mensa, Aureliano Porporato, and Daniele P. Verbs. In Proceedings of the Fifth Italian Confer- Radicioni. 2018a. Annotating concept abstractness ence on Computational Linguistics (CLiC-it 2018), by common-sense knowledge. In Chiara Ghidini, volume 2253. CEUR. Bernardo Magnini, Andrea Passerini, and Paolo Traverso, editors, AI*IA 2018 – Advances in Arti- ficial Intelligence, pages 415–428, Cham. Springer International Publishing. Enrico Mensa, Aureliano Porporato, and Daniele P. Radicioni. 2018b. Grasping metaphors: Lexical semantics in metaphor analysis. In Aldo Gangemi, Anna Lisa Gentile, Andrea Giovanni Nuzzolese, Se- bastian Rudolph, Maria Maleshkova, Heiko Paul- heim, Jeff Z Pan, and Mehwish Alam, editors, The Semantic Web: ESWC 2018 Satellite Events, pages 192–195, Cham. Springer International Publishing. Leonie M Miller and Steven Roodenrys. 2009. The interaction of word frequency and concreteness in immediate serial recall. Memory & Cognition, 37(6):850–865. Maria Montefinese, Ettore Ambrosini, Beth Fairfield, and Nicola Mammarella. 2014. The adaptation of the affective norms for english words (anew) for ital- ian. Behavior research methods, 46(3):887–903. Maria Montefinese, Ettore Ambrosini, Antonino Visalli, and David Vinson. 2020. Catching the in- tangible: a role for emotion? Behavioral and Brain Sciences, 43. Cristina Romani, Sheila Mcalpine, and Randi C Mar- tin. 2008. Concreteness effects in different tasks: Implications for models of short-term mem- ory. Quarterly Journal of Experimental Psychology, 61(2):292–323. Armand Rotaru. 2020. ANDI @ CON CRE TEXT: Predicting concreteness in context for English and Italian using distributional models and behavioural norms. In Valerio Basile, Danilo Croce, Maria Di Maro, and Lucia C. Passaro, editors, Proceedings of the 7th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA 2020), Online. CEUR.org. Mark Sadoski, William A Kealy, Ernest T Goetz, and Allan Paivio. 1997. Concreteness and imagery ef- fects in the written composition of definitions. Jour- nal of Educational Psychology, 89(3):518. Paula J Schwanenflugel and Edward J Shoben. 1983. Differential context effects in the comprehension of abstract and concrete verbal materials. Journal of Experimental Psychology: Learning, Memory, and Cognition, 9(1):82. Peter Turney, Yair Neuman, Dan Assaf, and Yohai Co- hen. 2011. Literal and metaphorical sense identifi- cation through concrete and abstract context. In Pro- ceedings of the 2011 Conference on Empirical Meth- ods in Natural Language Processing, pages 680– 690. Gabriella Vigliocco, Lotte Meteyard, Mark Andrews, and Stavroula Kousta. 2009. Toward a theory of semantic representation. Language and Cognition, 1(2):219–247.