Taming Sense Sparsity: a Common-Sense Approach Antonio Lieto, Enrico Mensa and Daniele P. Radicioni Dipartimento di Informatica Università degli Studi di Torino Corso Svizzera 185, 10149 – Torino ITALY {lieto,mensa,radicion}@di.unito.it Abstract are preferable (Palmer et al., 2004). In these cases, English. We present a novel algorithm and fine-grained distinctions may be unnecessary and a linguistic resource named C L OS E S T af- even detrimental to WSD and WSI, so that in the ter ‘Common SEnse STrainer’. The re- last few years many efforts concentrated on clus- source contains a list of the main senses tering senses. Most works focused on produc- associated to a given term, and it was ob- ing coarser-grained sense inventories, to the ends tained by applying a simple set of prun- of grouping together the closest (partially over- ing heuristics to the senses provided in the lapped) senses of a word; to these ends, various NASARI vectors for the set of 15K most techniques have been carried out, that are briefly frequent English terms. The preliminary surveyed in Section 2. experimentation provided encouraging re- Differently from existing approaches, we pro- sults. pose a simple yet effective method that relies on recently developed resources that are assumed to Italiano. In questo lavoro presentiamo also grasp common-sense knowledge (Camacho- un algoritmo e una risorsa linguistica, Collados et al., 2015; Lieto et al., 2016a), which ClOSeSt, che contiene i sensi più rile- is assumed to be both widely accessible and vanti per i 15K termini più frequenti del elementary knowledge (Minsky, 2000), and to dizionario inglese. L’algoritmo implemen- reflect typicality traits encoded as prototypical tato utilizza una risorsa esistente che cod- knowledge (Rosch, 1975). The research question ifica conoscenza di tipo enciclopedico, e presently addressed is thus: To what extent can poggia sulla nozione di senso comune per we individuate few principal —common-sense— filtrare i possibili sensi associati a cias- senses for a term, and in how far is it possible cun termine. La valutazione preliminare to approximate human performance? Although ha fornito risultati incoraggianti in merito it is known that even human annotators provide alla qualità dei sensi estratti. quite different response when annotating text with senses (Palmer et al., 2004), we presently explore 1 Introduction the hypothesis that wide-coverage resources are Many NLP tasks involve word sense disambigua- sufficient to individuate the main senses associated tion (WSD) and word sense induction (WSI), and to English terms. require using lexical resources such as Word- 2 Related Work Net (Miller, 1995) and BabelNet (Navigli and Ponzetto, 2010) that provide a rich mapping of In order to attain coarse-grained senses, differ- terms (or word forms) onto the corresponding ent approaches have been proposed, based on senses (word meanings). These widely used re- some sort of semantic underspecification (Buite- sources provide in fact subtle distinctions between laar, 2000; Ng et al., 2003; Palmer et al., 2007), the possible senses of a term. It is largely ac- on existing dictionaries and on exploiting hand- knowledged that while fine-grained sense distinc- crafted sense hierarchies (Navigli, 2006), on syn- tions are necessary for some precise tasks (such tactic and semantic properties (such as selec- as machine translation), for other sorts of appli- tional restrictions on verb arguments) (Artale et cations (such as text categorization and informa- al., 1998; Palmer et al., 2004), on linguistically tion extraction) coarse-grained sense inventories motivated heuristics (Mihalcea and Moldovan, 2001), or on distributional similarity among word 2010), that provides about 2000 fundamental Ital- senses (Agirre and De Lacalle, 2003). Further ap- ian terms (De Mauro, 1999) with an ontological proaches have been proposed that rely on an ad- description. justable nearest neighbour schema for clustering senses according to the sense granularity actually 3 The C L OS E S T Algorithm required by the application at hand (McCarthy, The rationale underlying the C L OS E S T algorithm 2006). A popular testbed for experimenting these is that the main (most frequent) senses gained and other approaches is represented by the sense- more room than marginal senses in our lexical annotated corpora Senseval-2 and 3 (Edmonds and conceptual system and in general in our ut- and Cotton, 2001; Mihalcea and Edmonds, 2004). terances. This phenomenon determines words and The problem of annotating a term with the ap- phrases availability and saliency (Vossen and Fell- propriate sense is a challenging one, to such an ex- baum, 2009), that are arguably grasped by ency- tent that by no means “two lexicographers work- clopedic resources, as well. Herein, more central ing independently are guaranteed to derive the senses are typically featured by richer (i.e., longer same set of distinctions for a given word” (Palmer vectors) and less specific information, richer se- et al., 2004). It has been raised that this issue can mantic connections with other concepts, and heav- be overcome to some extent by adopting a more ier feature weights. Although it may happen that flexible annotation schema, where senses are de- some sense spans over (or even subsumes) another scribed in a graded fashion: in this way, the ap- one, we are not primarily trying to cluster senses in plicability of a sense can be assessed on an ordi- agglomerative fashion, e.g., by resorting to some nal scale, rather than in ‘crisp’ fashion. This sort superclass of the considered concept; rather, we of annotation would allow to better interpret hu- select the most relevant ones (a term is seldom man annotations, in particular for coarse-grained associated to more than few, say three or four, groups (Erk et al., 2013). A related and comple- senses) and we discard the other ones. mentary issue is that of clusterability, that mea- The C L OS E S T algorithm takes in input a term sures in how far word meanings can be partitioned. t and provides a set of possibly related senses.2 In this setting, whereas highly clusterable lem- The algorithm first retrieves the set of senses S = mas can be grouped based on traditional clustering {s1 , s2 , . . . , sn } that are possibly associated to t: techniques, less clusterable lemmas require more such set is obtained by directly querying NASARI. sophisticated soft-clustering algorithms to compu- The output of the algorithm is a result set S  ⊆ S. tational systems, and more time and expertise to In order to attain S  we devised a process of in- human annotators (McCarthy et al., 2016). cremental filtering, that is arranged into two main This work is framed in the context of a long- phases: term project aimed at investigating conceptual categorization (Lieto et al., 2015; Lieto et al., 1. LS-Pruning. Pruning of less salient senses: 2016b) based on a hybrid strategy (Evans and senses with associated poor information are Frankish, 2009) complementing formal ontolo- eliminated. Senses salience is determined gies with the geometrical framework of Concep- both in absolute terms and in relation to the tual Spaces (CS) (Gärdenfors, 2014). In particu- most salient sense. lar, we are building a knowledge base to collect 2. OL-Pruning. Pruning of overlapping senses: conceptual information encoded in a CS-based if senses with significant overlap are found, representational format to provide a uniform in- the less salient sense is pruned. terface between the linguistic and the concep- tual level, where CSs representations are fully Senses are represented as NASARI vectors, endowed with BabelNet identifiers (Lieto et al., that are the vectorial counterpart of BabelNet 2016a).1 This trait will make it possible to link synsets; concepts (basically, WordNet synsets the present work to existing initiatives like Senso and Wikipedia pages) are described through vec- Comune (Oltramari and Vetere, 2008; Chiari et al., tor representations, whose features are synset 1 IDs themselves. Feature weights are computed The integration of different semantic models such as CSs 2 and the distributional semantics underlying NASARI is still The present investigation is restricted to nouns, but no an open issue; we provided an initial solution to this problem theoretical limitation prevents us from extending the ap- in (Lieto et al., 2016a). proach to verbs and adjectives. through the metrics of lexical specificity, by ex- where, among overlapped vectors, the most salient ploiting a semantics-based dimensionality reduc- one has been retained. tion (Camacho-Collados et al., 2015). Each sense is associated with exactly one NASARI vector, so 3.1 Building the C L OS E S T resource that pruning a sense amounts to pruning a vector. Overall the system handled about 2.69M LS-Pruning. To analyze the senses in S, we in- NASARI vectors. Some 207K vectors associated spect each vector ~vts related to sense s for the term to Named Entities were discarded, as not directly t. The first pruning occurs when no enough infor- related to common-sense concepts; the remaining mation is found, that is when ~vts contains less than vectors contained overall 6.9M unique words. a fixed number of elements (Table 1). Then, in or- The top (most frequent) 15K nouns were ex- der to determine the next vectors to be pruned, we tracted from the Corpus of Contemporary Amer- compute the weight of each vector (W (~vts )), the ican English (COCA) which has been built from longest vector and the heaviest one among those composite and balanced sources, including spo- associated with t (L(~vt ) and H(~vt ), respectively). ken, fiction, magazine, newspaper, academic text.3 The weight of a NASARI vector W (~vts ) is com- Over 6K terms were discarded, since they are puted by averaging the weight of the features (i.e., associated in NASARI either to 1 sense (about the synsets) contained herein. The definitions for 1K terms) or to no sense at all (over 5K terms), these measures are illustrated in Equations 1–3. which actually reduced the input size to about 8.7K terms; overall 32.6K senses were retrieved (on average, 3.7 senses per term), corresponding L (~vt ) = arg max (len(~vts )) (1) s∈S to such input terms. 1 X The figures featuring the processing phases are W (~vts ) = · wsj (2) len(~vts ) reported in Table 1: over 4K senses were filtered j  in the first step of the LS-Pruning phase, based H (~vt ) = arg max W (~vts ) . (3) on the length of the vector ~vts , and 7.4K senses s∈S were further discarded in the second step. Finally, The decision on whether to prune or not a vec- in the OL-Pruning phase, 5.6K vectors were can- tor is based on a simple criterion: ~vts ∈ S is celed based on overlapping accounts, thus overall pruned if both its length is below a given frac- yielding 17.5K deleted and 15.1K survived vec- tion of the length of the longest one L (~vt ), and its tors.4 The polysemy rate was reduced from the weight is lower than a given fraction of the heavi- 3.74 senses per term initially featuring NASARI est one, H (~vt ). The parameter settings adopted in down to 1.73 senses per term, which is in line with the present work are illustrated in Table 1. the degree of polysemy detected in the Collins En- OL-Pruning. The second phase of the algorithm glish Dictionary for English nouns by WordNet aims at detecting overlapped senses. The over- authors (Fellbaum, 1990). lap between vectors that survived the LS-Pruning is computed thanks to the information provided 4 Evaluation in NASARI. The heuristics used in this phase is as follows: the overlap between two vectors A preliminary experimentation has been devised Ovl(~vti , ~vtj ) is computed as a fraction of the to assess the correctness and completeness of the length of the shortest vector between the two con- extracted senses: that is, the question addressed sidered, as indicated in Equation 4). was whether i) all senses extracted for the input term are salient (and actually judged as the main ~vti ∩ ~vtj senses), and ii) all the relevant senses were pre- Ovl(~vti , ~vtj ) = (4) len(shortest(~vti , ~vtj )) served in C L OS E S T . To these ends, 15 volunteers were recruited and interviewed through an on-line The overlapping is checked for every pair h~vi , ~vj i questionnaire to evaluate, on a human common- (with i 6= j) and when an overlap is detected sense judgement basis, the set of senses extracted higher than a fixed threshold (see Table 1), the by the system for 20 terms. shortest vector between the two is pruned. At the end of this phase, we have the set S  3 http://corpus.byu.edu/full-text/. 4 where only the most salient vectors survived and C L OS E S T is available at http://goo.gl/7B61Oz. condition threshold values pruned senses pruning phase ) len(~vts ) ≤ α α=5 4, 389 prune ~vts IF     LS-Pruning len(~vts ) L (~vt ) < β AND WW(H(~v(~tsv ))) < γ β, γ = .40 7, 460 t Ovl(~vts , ~vtu ) ≥ δ δ = .20 5, 676 OL-Pruning filtered out senses 17,525 retained senses 15,134 Table 1: Pruning of senses in the three steps, along with the number of senses pruned at each step. Stimuli. The list of 20 terms was algorithmically was relevant (and missing, in the C L OS E S T re- selected from the aforementioned COCA corpus source) for the input term ‘education’; the sense (see footnote 3) by selecting terms herein with in- ‘social network’ is relevant for the term ‘network’; dex 1, 51, 101, and so forth. In this way we se- and ‘meeting’ for ‘session’. lected highly frequent terms that are expected to However, although encouraging results be part of common-sense for those who partici- emerged from the experimentation, further ex- pated in our experimentation.5 periments are needed to assess the C L OS E S T Experimental design and procedure. The partic- resource in a more extensive and principled way, ipants were asked a) to assess each and every sense also in consideration of the many factors that extracted by the system and associated to each in- were presently neglected, such as, e.g., age, put term by indicating whether it was acceptable education, occupation of the participants, their as one of the principal senses for the term at hand. native language, etc.. Additionally, they were requested b) to indicate 5 Conclusions any further sense they reputed essential in order to complete the common-sense pool of senses for In this paper we have illustrated the C L OS E S T the given term. algorithm to extract the most salient (under the Results. Overall 42 senses (corresponding to common-sense perspective) senses associated to the 20 mentioned terms) were assessed through a given term; also, we have introduced the the experimentation: each sense was rated 15 C L OS E S T resource, which has been built by start- times, thus resulting in 630 judgements: 24% of ing from the 15K top frequency English terms. senses were not found appropriate, according to a The resource currently provides senses in a flat common-sense judgement, thereby determining a manner, but, if required, senses can be organized 76% accuracy as regards as question a). However, in a sorted fashion by extending the metrics used if we consider senses refused by at least 10 par- for filtering. Our work relies on a recently devel- ticipants, only 5 senses were refused (12%), that oped resource such as NASARI that is multilin- actually correspond to very specific senses (e.g., gual in nature.6 Consequently, different from most the sense ‘Net (textile)’ for the term ‘network’; previous approaches, C L OS E S T can be linked ‘Session (Presbyterianism)’, ‘session house’ for to various existing resources aimed at grasping the term ‘session’). common-sense to complete the ideal chain con- As regards as question b), results are more diffi- necting lexicon, semantics and formal (ontologi- cult to interpret, due to the sparsity of the answers: cal) description. The experimentation revealed a out of the 59 added senses, only in 8 cases the reasonable agreement with human responses, and added sense has been indicated by two or three pointed out some difficulties in fully assessing this participants (and never more): in such cases it sort of resource. These issues, along with im- emerged, for example, that the sense ‘manners’ provements to the heuristics implemented by the algorithm and a different evaluation based on a 5 The full list of the considered terms includes: time, side, shared NLP task, will be addressed in future work. education, type, officer, ability, network, shoulder, threat, in- 6 vestigation, gold, claim, learning, session, aid, emergency, An interesting question may be raised on this point, bowl, pepper, milk, resistance. The printed version of the on- about the conceptual alignment in a inter-linguistic perspec- line questionnaire is available at the URL http://goo. tive, which is a well-known issue, e.g., for applications in the gl/w9TNQT. legal field (Ajani et al., 2010). References Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), pages 875–881, Eneko Agirre and Oier Lopez De Lacalle. 2003. Clus- Buenos Aires, July. AAAI Press. tering WordNet Word Senses. In RANLP, volume 260, pages 121–130. Antonio Lieto, Enrico Mensa, and Daniele P. Radi- cioni. 2016a. A Resource-Driven Approach for An- Gianmaria Ajani, Guido Boella, Leonardo Lesmo, choring Linguistic Resources to Conceptual Spaces. Marco Martin, Alessandro Mazzei, Daniele P Radi- In Proceedings of the 15th International Conference cioni, and Piercarlo Rossi. 2010. Multilevel legal of the Italian Association for Artificial Intelligence, ontologies. In Semantic Processing of Legal Texts, Genoa, Italy, December. Springer. pages 136–154. Springer. Antonio Lieto, Daniele P Radicioni, and Valentina Alessandro Artale, Anna Goy, Bernardo Magnini, Rho. 2016b. Dual PECCS: a Cognitive System Emanuele Pianta, and Carlo Strapparava. 1998. for Conceptual Representation and Categorization. Coping with WordNet Sense Proliferation. In First Journal of Experimental & Theoretical Artificial In- International Conference on Language Resources & telligence, pages 1–20. Evaluation. Diana McCarthy, Marianna Apidianaki, and Katrin Paul Buitelaar. 2000. Reducing Lexical Semantic Erk. 2016. Word Sense Clustering and Clusterabil- Complexity with Systematic Polysemous Classes ity. Computational Linguistics. and Underspecification. In NAACL-ANLP 2000 Workshop: Syntactic and Semantic Complexity in Diana McCarthy. 2006. Relating WordNet Senses Natural Language Processing Systems, pages 14– for Word Sense Disambiguation. Making Sense of 19. Association for Computational Linguistics. Sense: Bringing Psycholinguistics and Computa- tional Linguistics Together, 17. José Camacho-Collados, Mohammad Taher Pilehvar, and Roberto Navigli. 2015. NASARI: a Novel Ap- Rada Mihalcea and Phil Edmonds. 2004. proach to a Semantically-Aware Representation of SENSEVAL-3: Overview. In Proceedings Senseval- Items. In Proceedings of NAACL, pages 567–577. 3 3rd International Workshop on Evaluating Word Isabella Chiari, Alessandro Oltramari, and Guido Vet- Sense Disambiguation Systems. ACL, Barcelona, ere. 2010. Di Cosa Parliamo quando Parliamo Fon- Spain. damentale? Lessemi, Accezioni, Sensi e Ontolo- Rada Mihalcea and Dan I Moldovan. 2001. Automatic gie. In Lessico e Lessicologia. Atti del Convegno Generation of a Coarse Grained WordNet. In Pro- della Societ di Linguistica Italiana, pages 177–194, ceedings of the NAACL Workshop on WordNet and Roma, September. Bulzoni. Other Lexical Resources. Tullio De Mauro. 1999. Grande Dizionario Italiano George A Miller. 1995. WordNet: a Lexical dell’Uso. UTET, Turin, Italy. Database for English. Communications of the ACM, Philip Edmonds and Scott Cotton. 2001. SENSEVAL- 38(11):39–41. 2: Overview. In Proceedings of SENSEVAL- 2 Second International Workshop on Evaluating Marvin Minsky. 2000. Commonsense-based inter- Word Sense Disambiguation Systems, pages 1–5, faces. Communications of the ACM, 43(8):66–73. Toulouse, France, July. Association for Computa- Roberto Navigli and Simone Paolo Ponzetto. 2010. tional Linguistics. BabelNet: Building a Very Large Multilingual Se- Katrin Erk, Diana McCarthy, and Nicholas Gaylord. mantic Network. In Proceedings of the 48th Annual 2013. Measuring word meaning in context. Com- Meeting of the Association for Computational Lin- putational Linguistics, 39(3):511–554. guistics, pages 216–225. Association for Computa- tional Linguistics. Jonathan St BT Evans and Keith Ed Frankish. 2009. In Two Minds: Dual Processes and Beyond. Oxford Roberto Navigli. 2006. Meaningful Clustering of University Press. Senses Helps Boost Word Sense Disambiguation Performance. In Proceedings of the 21st Interna- Christiane Fellbaum. 1990. English Verbs as a Se- tional Conference on Computational Linguistics and mantic Net. International Journal of Lexicography, the 44th annual meeting of the Association for Com- 3(4):278–301. putational Linguistics, pages 105–112. Association for Computational Linguistics. Peter Gärdenfors. 2014. The Geometry of Meaning: Semantics Based on Conceptual Spaces. MIT Press. Hwee Tou Ng, Bin Wang, and Yee Seng Chan. 2003. Exploiting Parallel Texts for Word Sense Disam- Antonio Lieto, Daniele P. Radicioni, and Valentina biguation: An Empirical Study. In Proceedings of Rho. 2015. A Common-Sense Conceptual Cate- the 41st Annual Meeting on Association for Compu- gorization System Integrating Heterogeneous Prox- tational Linguistics-Volume 1, pages 455–462. As- ytypes and the Dual Process of Reasoning. In sociation for Computational Linguistics. Alessandro Oltramari and Guido Vetere. 2008. Lexi- con and Ontology Interplay in Senso Comune. On- toLex 2008 Programme, page 24. Martha Palmer, Olga Babko-Malaya, and Hoa Trang Dang. 2004. Different Sense Granularities for Dif- ferent Applications. In Proceedings of Workshop on Scalable Natural Language Understanding. Martha Palmer, Hoa Trang Dang, and Christiane Fell- baum. 2007. Making Fine-Grained and Coarse- Grained Sense Distinctions, both Manually and Automatically. Natural Language Engineering, 13(02):137–163. Eleanor Rosch. 1975. Cognitive Representations of Semantic Categories. Journal of Experimental Psy- chology: General, 104(3):192–233. Piek Vossen and Christiane Fellbaum, 2009. Multi- lingual FrameNets in Computational Lexicography: Methods and Applications, chapter Universals and idiosyncrasies in multilingual WordNets. Trends in linguistics / Studies and monographs: Studies and monographs. Mouton de Gruyter.