Word Similarity Perception: an Explorative Analysis

Word Similarity Perception: an Explorative Analysis AliceRuggeri ruggeri@di.unito.it Centre for Cognitive Science University of Turin LoredanaCupi loredana.cupi@unito.it Department of Foreign Languages and Literatures and Modern Cultures University of Turin LuigiDiCaro dicaro@di.unito.it Department of Computer Science University of Turin Word Similarity Perception: an Explorative Analysis B449B736B982CDD7E5DF40150EC3BC3A GROBID - A machine learning software for extracting information from scholarly documents

Natural language is a medium for expressing things belonging to conceptual and cognitive levels, made of words and grammar rules used to carry semantics. However, its natural ambiguity is the main critical issue that computational systems are generally asked to solve. In this paper, we propose to go beyond the current conceptualization of word similarity, i.e., the building block of disambiguation at computational level. First, we analyze the origin of the perceived similarity, studying how conceptual, functional, and syntactic aspects influence its strength. We report the results of a two-stages experiment showing clear similarity perception patterns. Then, based on the insights gained in the cognitive tests, we developed a computational system that automatically predicts word similarity reaching high levels of accuracy.

Introduction

Words are symbolic entities referring to something which fills a portion of an autopoietic space made of conceptual, cognitive and contextual information. These three aspects are fundamental to understand the meaning ascribed to linguistic expressions.

One of the most important building block in almost all Computational Linguistics tasks is the computation of similarity scores between texts at different levels: words, sentences and discourses. Since manual numeric annotations of word-word similarity revealed a low agreement between the annotators, cognitive studies can help improve computational systems by discovering what lies behind the perception of similairty between words and their referenced concepts. The concept of similarity has been extensively studied in the Cognitive Science community, since it is fundamental in the human cognition. We tend to rely on similarity to generate inferences and categorize objects into kinds when we do not know exactly what properties are relevant, or when we cannot easily separate an object into separate properties. when specific knowledge is available, then a generic assessment of similarity is less relevant (G. L. Murphy & Medin,1985).

Since words co-occur in textual representations (mutually influencing one each other) it is possible to make experiments on contextual information to analyze what and how influence the perception of similarity. Let us consider two words as two mental representations. The intersection between them can be seen as the context that may also help define the correct similarity of the words in a text. For example, sugar and salt can be easily associated to the context kitchen, whereas salt and sea intersect in another part of the mental representations.

From a computational perspective, being words ambiguous by nature, the disambiguation process (i.e., Word Sense Disambiguation) is one of the most studied tasks in Computational Linguistics. To make an example, the term count can mean many things like nobleman or sum. Using contextual information, it is often possible to make a choice. Again, this choice is done by means of comparisons among contexts, that are still made of words. In other terms, we may state that the computational part of almost all computational linguistics research is about the calculus of matching scores between linguistic items, i.e., words similarity. But what's behind words similarity?

There exist many annotated data related to similarity and relatedness between words, like wordsim-353 (Finkelstein et al.,2001) and SimLex-999 (Hill, Reichart, & Korhonen,2014). A large part of the proposed computational systems aims at finding relatedness between words, instead of similarity. Relatedness is more general than similarity, since it refers to a generic correlation (like cradle and baby, that are words representing dissimilar concepts which, however, share similar contexts).

One problem of similarity, as often faced in literature or annotated in datasets, is that it cannot be a static value. Indeed, as the authors of these resources state in their works, the agreement between the annotators is usually not high (around 50-70%). The reason is trivial, however: people can give different degrees of importance with respect to the specific characteristics of the concepts to compare. If we ask one to say how much dog is similar to cat, the right answer can only be "it depends". While we can all agree about the fact that the concept dog is quite similar to cat, we cannot say 0.7 rather than 0.9 (in the range [0,1]) with certainty. Different aspects can be taken into account: are we measuring the form of the animal, or its behaviour? In both cases, it depends on which part of the animal and which actions we are considering to make a choice. For instance, dogs use to return thrown objects. From this point of view, dogs and cats are dissimilar.

In the light of this, our contribution provides the basis for understanding what lies behind a similarity between words and their referenced concepts. First, we analyze syntactic, conceptual and functional aspects of the similarity perception; then, we develop a computational system which is able to predict similarity by leveraging contextual information.

The Cognitive Experiment

In this work, we present two tests to analyze how linguistic constructions are perceived by humans in terms of strength of semantic similarity and if there exists a functionality-based connection that has an influence on its perception. The experiment was presented to 96 users, having different ages and professions, without any particular cognitive or linguistic disorder.

Test on single words

The first test of the experiment regards the perception of the similarity between single words1 . In particular, the goal was to analyze how the users focus on the functional links between the words, and more importantly if such functionalbased similarity is a preferential perception channel compared to the conceptual-based one.

Words are ambigous, and many resources have been released with the goal of defining all the possible senses of a word (i.e., WordNet). Word Sense Disambiguation (Bhattacharyya & Khapra,2012) is the task of resolving the ambiguity of a word in a given context. Notice that, in our experiment, we do not need any disambiguation of the words, since this process is embodied in the human cognition, thus the users of the test will autonomously represent their subjective sense to associate to the words under comparison.

Then, since we wanted to compare conceptual with functional preferences, we designed the test as a comparison between two word pairs, one involving conceptually-related words and one with words linked by direct functionalities. To generalize, let us consider the words a, b, and c with the conceptual word pair a-b and the functional word pair a-c. The user is asked to mark the most similar word (among b and c) to associate to a, and so the most correlated word pair. The users were not aware of the goal of the test and of the difference between the word pairs. Since words and actions present a high variability in terms of conceptual range (or their mental representation), we put particular attention to the choice of the word pairs, according to the following principles:

Conceptual granularity If we think at the words object and thing, we probably do not have enough information to make significant comparisons due to their large and undefined conceptual boundaries. The same happens in cases when two words represent very specific concepts such as lactose and amino acid. The word pairs of the proposed test have been selected by considering this constraint (and so they include words which are not too specific nor too general).

Concreteness Words may have direct links with concrete objects such as "table" and "dog". In other cases, words such as "justice" and "thought" represent abstract concepts. Since it is not clear how this may affect the per-ception of similarity, we decided to keep concrete words only.

Semantic coherence Another criterion used for the selection of the words was the level of semantic similarity between the word pairs to compare. To better analyze whether the functional aspect plays a significant role in the similarity perception, we extracted conceptual and functional pairs of words which had similar semantic closeness according to a standard semantic similarity calculation. In the light of this, we used a Latent Semantic Space calculated over almost 1 million of documents coming from the collection of literary text contained in the project Gutenberg page2 . The selected conceptual and functional word pairs had the property of having a very close semantic similarity (the score differences were less than 0,01 in a [0,1] range).

The test was composed by three word pairs, to leave it simple and to be not affected by users tiredness. Then, instead of randomly selecting three different word pairs, we wanted to consider three cases in which the functional links between the words have distinct levels of importance. Our assumption was that the more the importance of the functional link between two words in a pair, the more its perceived similarity (and thus the user preferences with respect to the conceptual word pair). For this reason we added a final criterion:

Increasing relevance of the functional aspect To estimate the importance of the functional aspect that relates two words we analyzed the number of actions (or verbs) in which they are usually involved with. In our test, the functional word pairs salt-water, nail-polish, and ring-finger have a functional link of 0.0033 , 0.01014 , and 0.06255 respectively (see Table 1). These values are calculated in the following way: given the total number of existing verbs NV(rw) for the root word rw and the number of effective usages EU(sw) with the second word of the pair sw, we computed the functional link Fl(rw, sw) of the functional pair as EU(sw) / NV(rw). We then considered a backup test set with random word pairs matching with the same above-mentioned criteria, collecting a total of 24 answers on 24 different cases of word pairs such as the main one of Table 1. This was done to prove the reliability of the test, seeing whether the results and the analyses show a similar trend, being independent from the selection of the words. The results of the whole test is described in the final part of this section.

Test on Phrases

The second test of the experiment concerns the perception of the similarity between phrases, o multi-word linguistic constructions 6 . The goal was to analyze how the syntactic context of a target word influences the perception of similarity among entire phrases. More in detail, we wanted to discover possible differences of such perception along different syntactic roles. We considered a simple syntactic structure of the type subject-verb-object.

Given a root sentence such as "Mario sings the song", we created three variations by changing the subject, the verb, and the direct object. For example, by changing "Mario" with "The bird" we obtained "The bird sings the song". The complete set of replacements are shown in Table 3.

We presented to the 96 users a total of 4 sentences (see Table 2), that with its 4 variations produce a total of 16 pairs of sentences to be analyzed by the users in terms of perceived similarity, as in the first test. For each sentence, the users had to indicate the degree of similarity of the original sentence with one of its variation using a value in the range [0,10]7 . 0 means no semantic similarity between the two phrases and 10 means total equality. The grammatical changes made on the original sentences were chosen maintaining the semantic validity (i.e., all the sentences represent valid mental representations).

Interpretation of the results

In this section, we give a preliminary interpretation of the results on the collected answers.

In the first test on single words, we can state that, generally, conceptual and functional word pairs are differently perceived according to the importance of the funcional link in the functional word pair. This shows that words and their referenced concepts are mainly compared in terms of conceptual similarity, but when there exists importafnt functionalities between them, this influences the users preference towards the functional word pair ??.

For example, the similarity of the sugar-salt pair results to be stronger compared to the water-salt one, since the action to add/put the salt in the water is "a needle in a haystack" with respect to all the actions related to water and salt independently. This means that there is no exclusive action between water and salt (i.e., there are many actions that involve water). An opposite example is represented by the word pair ring-finger, since the action to put/wear the ring on the finger is much more exclusive than in the previous case. Such preference could be explained by stating that all word pairs, especially with words that underlie actions, have a strong visual representation that makes them quickly perceivable. This result is also in line with what stated by (Cohen et al.,2002), i.e., words that have a functionality-based relationship can have a more complex visual component that makes such correlation weaker.

In Figure 1, we show the users preferences for the second test. In the case of verb replacement (VC) we can notice a high meaning change in terms of similarity perception (similarity values close to 0), so the verb represents the real root of the mental representations. The case of the subject change (SC) shows a less important decrease of similarity perception, while the object change (OC) resulted to be the less relevant syntactic role influencing the meaning of the whole phrase.

The Computational Analysis

In the previous section we studied the role of the context (on different levels) within the process of word similarity perception. Since the results indicated that both functional aspects and syntactic roles have an impact on how people perceive similarity, we experimented a computational approach for the automatic estimation of the similarity based on functional and syntax-aware contextual information.

In particular, we used the large and freely-available semantic resource ConceptNet8 . A partial overview of the semantic knowledge contained in ConceptNet is illustrated in Table 6. ConceptNet is a resource based on common-sense rather than linguistic knowledge since it contains much more functionbased information (e.g., all the actions an object can or cannot do) contained in even complex syntactic structures. The idea is also to exploit users perception of reality (the actual origin of ConceptNet) instead of the result of top-down expert building of ontologies (e.g., WordNet). ConceptNet contains important semantic problems related to covarage, utility of semantic information and coherence, but we used it as a black box due to its largeness and common-sense nature. A deep analysis of this resource is out of the scope of this paper.

The experiment started from the transformation of a wordword-score similarity dataset into a context-based dataset in which the words are replaced by sets of semantic information taken from ConceptNet. The aim was to figure out which semantic facts make the similarity between two words perceivable.

We used the dataset SimLex-999 (Hill et al.,2014) that contains one thousand word pairs that were manually annotated with similarity scores. The inter-annotation agreement is 0.67 (Spearman correlation). We leveraged ConceptNet to retrieve the semantic information associated to the words of each pair, then keeping the intersection. For example, considering the pair rice-bean, ConceptNet returns the following set of semantic information for the term rice:

[hasproperty-edible, isa-starch, memberof-oryza, atlocation-refrigerator, usedfor-survival, atlocationatgrocerystore, isa-food, isa-domesticateplant, relatedto-grain, madeof-sake, isa-grain, receivesactioncook, atlocation-pantry, atlocation-ricecrisp, atlocationsupermarket, ...] Then, the semantic information for the word bean are:

[usedfor-fillbeanbagchair, atlocation-infield, atlocation-can, usedfor-nutrition, usedfor-cook, atlocation-atgrocerystore, usedfor-grow, atlocationfoodstore, isa-legume, usedfor-count, isadomesticateplant, atlocation-cookpot, atlocationbeansoup, atlocation-soup, isa-vegetable, ...] Finally, the intersection produces the following set:

[atlocation-atgrocerystore, isa-domesticateplant, atlocationpantry] At this point, for each non-empty intersection, we created one instance of the type: <semantic information>, <similarity score> and computed a standard term-document matrix, where the term is a semantic term within the set of semantic information retrieved from ConceptNet and the document dimension represents the word pairs of the original dataset. After this preprocessing phase, the score attribute is discretized into two bins:

• non-similar class -range in the dataset [0, 5]

• similar class -range in the dataset [5.1, 10] The splitting of the data into two clusters allowed us to experiment a classic supervised classification system, where a Machine Learning tool (a Support Vector Machine, in our case) has been used to learn a binary model for automatically classifying similar and non-similar word pairs. The result of the experiment is shown in Table 7. Noticeably, the classifier has been able to reach a quite good accuracy (65.38% of correctly classified word pairs), considering that the interannotation agreement of the original data is only 0.67 (Spearman correlation). Notice that similar word pairs are generally easier to identify with respect to non-similar ones.

Related Work

This paper presents an idea which combines linguistic, cognitive and computational perspectives. In this section, we mention those theoretical and empirical methods that inspired our motivational basis.

Linguistic Background

The difficulty of defining the meaning of meaning has to do with some tricky issues like lexical ambiguity and polysemy, vagueness, contextual variability of word meaning, etc. As a matter of fact, words are organized in lexicon as a complex network of semantic relation which are basically subsumed within Saussure's paradigmatic (the axis of combination) and syntagmatic (the axis of choice) axes (Saussure,1983). Some authors (Chaffin & Herrmann,1984) have already suggested theoretical and empirical taxonomies of semantic relations consisting of some main families of relation (such as contrast, similars, class inclusion, part-whole, etc.). As Murphy points out (M. L. Murphy,2003), lexicon has become more central in linguistic theories and, even if there is no a widely accepted theory on its internal semantic structure and how lexical information are represented in it, the semantic relations among words are considered in scholarly literature as relevant to the structure of both lexical and conceptual information and it is generally believed that relations among words determine meaning.

Cognitive Background

Although words perception could seem immediate, the input we perceive is recognized and trasformed mediating background and contextual information, within a dynamic and cooperative process. The well-known semiotic triangle (Ogden, Richards, Malinowski, & Crookshank,1946) introduced by different authors over time represents a first reference for our study. People use symbols (our words) to communicate meanings (the effective content). The meaning is something untangible, which can be though even without any concrete presence. The last point is then the physical reference, i.e., the object in the reality9 . Note that there is no connection between symbols and references, since only imagined meanings can allow the two to be linked.

Interaction is another important aspect that has been investigated in literature. Indeed, the actions change the type of perception of an object, which models itself to fit with the context of use. Then, the Gestalt theory (Köhler,1929) contains different notions about the perception of meaning according to interaction and context. In particular, the core of the model is the complementarity between the figure and the ground. In our case, a word is the figure and the ground is the context that lets emerge its specific sense. Finally, James Gibson introduced the concept of affordances as the cognitive cues that an object exposes to the external world, indicating ways of use (Gibson,1977). In cognitive and computational linguistics, this theory can be inherited to model words as objects and contexts as their interaction with the world.

Computational Background

In this section, we review the main works that are related to our contribution from a computational perspective. Natural Language Processing represents an active research community whose focus is letting machines communicate by understanding semantics within linguistic expressions. Ontology Learning (Cimiano,2006) is the task of automatic extracting structured semantic knowledge from texts, and it well fits the scope of this paper. Nevertheless, Word Sense Disambiguation (WSD) (Stevenson & Wilks,2003) is maybe the most related NLP task, whose aim is to capture the correct meaning of a word in a context. Generally speaking, many other tasks have the problem of comparing linguistic items in order to make choices to pass from syntax to semantics. Named Entity Recognition (NER) (Nadeau & Sekine,2007;Marrero, Urbano, Sánchez-Cuadrado, Morato, & Gómez-Berbís,2013) is the task of identifying entities like people, organizations and locations in texts. This is often done by comparing words in contexts to some learned patterns. In general, many other NLP tasks are based on the evaluation of similarity scores (Manning & Schütze,1999).

Nowadays, there exists a large set of available semantic resources that can be used in Natural Language Processing techniques in order to understand the hidden meaning of perceived similarity between two words or concepts. For example, ConceptNet contains semantic information that are usually associated with common terms (even if not correctly disambiguated). By analyzing the relationship betweeen annotated similarity scores and semantic information it is possi-ble to create predictive models which automatically deduce words similarity by dynamically weighting words features based on their mutual interaction.

If we consider the objects / agents / actions to be terms in text sentences, we can try to extract their meaning and semantic constraints by using the idea of affordances. For instance, let us think to the sentence "The squirrel climbs the tree". In this case, we need to know what kind of subject "squirrel" is to figure out (and visually imagine) how the action will be performed. According to this, no particular issues come out from the reading of this sentence. Let us now consider the sentence "The elephant climbs the tree". Even if the grammatical structure of the sentence is the same as before, the agent of the action is different, and it obviously creates some semantic problems. In fact, from this case, some constraints arise; in order to climb a tree, the subject needs to fit to our mental model of something that can climb a tree. In addition, this also depends on the mental model of "tree". Moreover, different agents can be both correct subjects of an action whilst they may produce different meanings in terms of how the action will be mentally performed. Consider the sentences "The cat opens the door" and "The man opens the door". In both cases, some implicit knowledge suggests the manner the action is done: while in the second case we may think at the cat that opens the door leaning to it, in the case of the man we probably imagine the use of a door handle. A study of these language dynamics can be of help for many NLP tasks like Part-Of-Speech tagging as well as more complex operations like dependency parsing and semantic relations extraction. Some of these concepts are latently studied in different disciplines related to statistics. Distributional Semantics (DS) (Baroni & Lenci,2010) represents a class of statistical and linguistic analysis of text corpora that tries to estimate the validity of connections between subjects, verbs, and objects by means of statistical sources of significance.

Conclusions and Future Work

In this paper, we proposed a combined analysis of linguistic, cognitive and computational aspects to assess the nature of words similarity. First, we studied how word similarity perception is influenced in terms of conceptual, functional and syntactic roles. In future work, our aim is to extend the sample of users on more specific cases. Still, by changing the language of the users, we can have results that take into account the cultural ground, understanding if and how word similarity depends on it. On the other side, we stressed the importance of computational understanding of similarity to improve Computational Linguistics tasks which are based on it, usually without any analysis of contextual information. In particular, we used the large semantic knowledge contained in ConceptNet to create a Support Vector Machine classifier to predict word similarity based on an annotated dataset. In future work, we will extend our experimental analysis to validate existing similarity datasets and to produce predictive models for the automatic identification of human-readable similarity scores.

Figure 1 :1Figure 1: Results of the second test, showing the change scores in terms of word similairty perception after subject, verb and object replacements. SC stands subject change, VC for verb change, and OC for object change.

Table 1 :1The chosen word pairs in the first test.Root word Conceptual Pair Functional Pair [F. link]rwrw -swrw -sw [Fl(rw,sw)]saltsalt -sugarsalt -water [0.003]nailnail -fingernail -polish [0.0101]ringring -necklacering -finger [0.0625]

Table 2 :2The chosen phrases in the second test.Phrase IDPhrase(a)Mario sings the song(b)Alan drives the car(c)Alice writes the book(d)Marco does the homeworks

Table 3 :3The word replacements for subjects (SC), verbs (VC) and direct objects (OC).Replacement(a)(b)(c)(d)SCbirdrobot computer softwareVCwrites cleanscleansgivesOCversebandsheetpasta

Table 4 :4Results showing the percentage of preferences in the choice of the most (perceived) correlated word pairs of the first test.CaseWord pairsN. of preferences%1a.salt -sugar7578%1b.salt -water2122%2a.nail -finger4446%2b.nail -polish5254%3a.ring -necklace1617%3b.ring -finger8083%

Table 5 :5Results on the backup of the first test (with 24 different cases including 8 low-FL cases, 8 medium-FL cases and 8 high-FL cases). The results are in line with the ones of the main test shown in Table4.Funct. PairPref. w.r.t conceptual pairFunct. Pair (low FL)1 out of 8 (12.5%)Funct. Pair (medium FL)5 out of 8 (62.5%)Funct. Pair (high FL)7 out of 8 (87.5%)

Table 6 :6Some of the existing relations in ConceptNet, with example sentences in English.RelationExample sentenceIsANP is a kind of NP.LocatedNearYou are likely to find NP near NP.UsedForNP is used for VP.DefinedAsNP is defined as NP.HasANP has NP.HasPropertyNP is AP.CapableOfNP can VP.ReceivesActionNP can be VP.HasPrerequisiteNP-VP requires NP-VP.MotivatedByGoal You would VP because you want VP.MadeOfNP is made of NP.......

Table 7 :7Classification results in terms of Precision, Recall, and F-measure. The total accuracy is 65.38%.Precision Recall F-MeasureClass0,6970,4750,565non-similar0,6330,8150,713similar0,6640,6540,643weighted total

Notice that "similarity between words" is intended as the similarity between the concepts they bring to mind. http://promo.net/pg/ For the verbs "to put", "to add" and "to get" For the verbs "to apply" and "to use" For the verbs "to put" and "to wear" Even in this case, "similarity" is intended as the similarity between the concepts related to the phrases. 7 We used a [0,10] range instead of a [0,1] range as in the previous test because it represents a more human-understandable and intuitive votation. http://conceptnet5.media.mit.edu/ The existing terminology is quite varying: symbolthought/reference/referent (Aristotele); object-representationinterpretant (Peirce); signified-sign-referent (De Saussure)

Distributional memory: A general framework for corpus-based semantics MBaroni ALenci Computational Linguistics 36 4 2010 Word sense disambiguation PBhattacharyya MKhapra Emerging Applications of Natural Language Processing: Concepts and New Research 2012 22 The similarity and diversity of semantic relations RChaffin DJHerrmann Memory & Cognition 12 2 1984 Ontology learning from text PCimiano 2006 Springer Language-specific tuning of visual cortex? functional properties of the visual word form area LCohen SLehéricy FChochon CLemer SRivaud SDehaene Brain 125 5 2002 Placing search in context: The concept revisited LFinkelstein EGabrilovich YMatias ERivlin ZSolan GWolfman Proceedings of the 10th international conference on world wide web the 10th international conference on world wide web 2001 The Theory of Affordances JJGibson Perceiving, acting, and knowing: Toward an ecological psychology RShaw JBransford Lawrence Erlbaum 1977 Simlex-999: Evaluating semantic models with (genuine) similarity estimation FHill RReichart AKorhonen arXiv:1408.3456 2014 arXiv preprint Gestalt psychology WKöhler 1929 Foundations of statistical natural language processing CDManning HSchütze 1999 MIT press Named entity recognition: Fallacies, challenges and opportunities MMarrero JUrbano SSánchez-Cuadrado JMorato JMGómez-Berbís Computer Standards & Interfaces 35 5 2013 The role of theories in conceptual coherence GLMurphy DLMedin Semantic relations and the lexicon MLMurphy Cambridge University Press 1985. 2003 92 289 A survey of named entity recognition and classification DNadeau SSekine Lingvisticae Investigationes 30 1 2007 The meaning of meaning CKOgden IARichards BMalinowski FGCrookshank Harcourt

New York

Brace & World 1946 Embodied perception and the economy of action DProffitt Perspectives on psychological science 1 2 2006 FDSaussure Course in general linguistics, trans. R. Harris

London

Duckworth 1983 Word-sense disambiguation MStevenson YWilks The Oxford Handbook of Comp. Linguistics 2003