=Paper=
{{Paper
|id=Vol-1347/paper34
|storemode=property
|title=On the use of antonyms and synonyms from a domain perspective
|pdfUrl=https://ceur-ws.org/Vol-1347/paper34.pdf
|volume=Vol-1347
|dblpUrl=https://dblp.org/rec/conf/networds/TesfayeP15
}}
==On the use of antonyms and synonyms from a domain perspective==
On the use of antonyms and synonyms from a domain perspective Debela Tesfaye Carita Paradis IT PhD Program Centre for Languages and Literature Addis Ababa University Lund University Addis Ababa, Ethiopia Lund, Sweden dabookoo@gmail.com carita.paradis@englund.lu.se email The rationale is that the dependency parsing pro- Abstract duces the relational information among the con- stituent words of a given sentence, which allows This corpus study addresses the question us to (i) extract co-occurrences specific to a giv- of the nature and the structure of anto- en domain/context, and (ii) capture long distance nymy and synonymy in language use, co-occurrences between the word pairs. Consider following automatic methods to identify (1). their behavioral patterns in texts. We ex- 1. Winters are cold and dry, summers are amine the conceptual closeness/distance cool in the hills and quite hot in the plains. of synonyms and antonyms through the In (1), the antonyms cold: hot modify winters lens of their DOMAIN instantiations. and summers respectively. Those forms express 1 Introduction the lexical concepts winter and summer in the domain temperature. The antonyms cold: hot co- Using data from Wikipedia, this corpus study occur but at a distance in the sentence. Thanks to addresses the question of the nature and the the dependency information, it is possible to ex- structure of antonym and synonymy in language tract such long distance co-occurrences together use. While quite a lot of empirical research using with the concepts modified. different observational techniques has been car- The article is organized as follows. In section ried on antonymy (e.g. Roehm et al. 2007, Loba- 2, we describe the procedure and the two me- nova 2013, Paradis et al. 2009, Jones et al. 2012), thods used: co-occurrence extraction of lexical not as much has been devoted to synonymy (e.g. items in the same sentence and a variant domain Divjak 2010) and very little has been carried out dependent co-occurrence extraction method. The on both of them using the same methodologies latter method extracts patterns of co-occurrence (Gries & Otani 2010). The goal of this study is to information of the synonyms and antonyms in bring antonyms and synonyms together, using different sentences. In section 3 we present the the same automatic methods to identify their be- results and discussions followed by a discussion havioral patterns in texts. We examine the con- of our results in comparison with related pre- ceptual closeness/distance of synonyms and an- vious works in section 4. The conclusions are tonyms through the lens of their domain instan- presented in section 5. tiations. For instance, strong used in the context of wind or taste (of tea) as compared to light and 2 Procedure weak respectively, and light as compared to Using an algorithm similar to the one proposed heavy when talking about rain or weight. by Tesfaye & Zock (2012) and Zock & Tesfaye The basic assumption underlying this study is (2012), we extracted the co-occurrence informa- that the strength of co-occurrence of antonyms tion of the pairs in different domains separately, and synonyms is dependent on the domain in measuring the strength of their relation in the which they are instantiated and co-occur. In or- different domains with the aim of (i) making der to test the hypothesis we mine the co- principled comparisons between antonyms and occurrence information of the antonyms and the synonyms from a domain perspective, and (ii) synonyms relative to the domains using a depen- determining the structure of antonymy and syn- dency grammar method. 1 onymy as categories in language and cognition. Our algorithm is similar to the standard n- 1 http://nlp.stanford.edu/software/lexparser.shtml gram co-occurrences extraction algorithms, but Copyright © by the paper’s authors. Copying permitted for private and academic purposes. In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org 150 instead of using the linear ordering of the words Start with the selected set of syno- in the text, it generates co-occurrences frequen- nym/antonym pairs cies along paths in the dependency tree of the Extract sentences containing the pairs sentence as presented in the sections 2.2–2.5. Identify the dependency information of the sentences 2.1 Training and testing data Mine the dependency patterns linking The antonyms and synonyms employed for train- the pairs with the concepts they modify ing and testing were extracted from the data used Use these learned patters to extract fur- by Paradis et al. (2009) where the antonyms are ther relations (synonym/antonym pairs presented according to their underlying dimen- and the associated concepts) sions and synonyms were provided for all the individual antonyms (for a description of the 2.3 Extracting the domains principles see Paradis et al. 2009). That set of We created a matrix of antonym and synonym antonyms and synonyms were used to extract pairs matching every antonym and synonym their co-occurrence patterns from the Wikipedia from the list in Table 1. Using the patterns texts in this study. learned in section 2.2 we identified as many do- mains as possible for the pairs of synonyms and Dimen- Anto- The associated syn- antonyms and calculated their frequency of co- sions nyms onyms of the antonyms occurrence in the respective domains. Size Large huge, vast, massive ,big When the lexical concepts were considered ,bulky, giant ,gross, too specific, we referred them to more inclusive, heavy, significant ,wide superordinate domains. Frequency of occurrence Small little, low, minor, minute, was used as a criterion for conflation of concepts petite, slim, tiny into superordinate ones as follows. Speed Fast quick, hurried, prompt, Extract term co-occurrence frequencies accelerating, rapid within a window of sentences constitut- Slow sudden, dull, gradual, lazy ing both the antonyms/synonyms and the potential domain concepts. For instance: Strength Strong forceful, hard, heavy, o Antonyms: cold: hot, domain muscular, powerful, sub- concepts: winter, summer stantial, tough o Synonyms: strong: heavy, do- Weak light, soft, thin, wimpy main concepts: wind, rain Merit Bad crappy, defective, evil Create a matrix of the potential domain ,harmful, poor ,shitty concepts and the co-occurring terms with ,spoiled ,unhappy their frequencies Good awful ,genuine ,great, ho- Cluster them using the k-means algo- norable ,hot, neat, nice, rithm reputable, right ,safe ,well Take the term with the maximal frequen- cy (centroid) in each cluster and consider Table 1. The antonym pairs in their meaning dimen- it the domain term sions and the associated synonyms. Test the result using expert judgment running the algorithm on the test set. 2.2 Extracting the co-occurrences of the antonyms and synonyms in the respec- Antonym/Synonym tive domains Words co- Do- occurring main concept In order to extract the co-occurrences of the an- with possible Frequency tonyms/synonyms in the respective domains we domain con- Potential produced the relational information among the cepts constituent words of a given sentence. To this end, we extracted the patterns linking the syn- onyms/antonyms and the concepts they modify and used this same pattern to extract more lexical hot summer win- temperature 50 concepts. The procedure was as follows. cold ter climate 43 Wind 30 151 strong wind rain wind rain 86 tonyms in different sentences, because we ex- heavy winds snow- winds snow- 3 pected synonyms to be applicable to different, fall fall rather than the same contexts, since complete winds rainfall winds rainfall 34 overlap of meanings of words are rare or even waves rain- waves rainfall 4 non-existent. This way we were able to gain in- fall formation indirectly about their use by extracting their co-occurrence when they appear separately Table 2. The matrix of the frequencies of terms co- in different sentences while still being instan- occurring with sample antonyms and the associated tiated in the same domain. We mined the co- potential domain concepts occurrence information of the synonym/antonym pairs separately in all possible domains and 2.4 Extracting co-occurrences frequency check if they co-occurred in the same sorts of specific to a given Domain/Context domains: The algorithm calculated the co-occurrence fre- X(y, f) quency of the antonyms/synonyms with the dif- Z(y, f) ferent concepts they refer to (or modify) as pre- Where, sented in table 3 by combining the information X and Z are a pair of a given an- obtained in section 2.3 and section 2.4. tonym/synonym, Y is the domain within which the pairs of the antonym/synonym co-occur and f the frequency of the x-y Frequency Antonyms Concept 1 Concept 2 or z-y co-occurrence. Domain The frequency of a pair of the anto- nyms/synonyms in the Y domain was counted and the same applies to the other pair. This made hot sum- winter 10 temper- it possible to measure the degree of co- cold mer 5 ature occurrence of the antonym/synonym pairs from strong wind rain 11 winds the domain perspective indirectly. heavy winds snowfall 2 rain 3. Results and discussion winds rainfall waves rainfall 3.1 Co-occurrences in the same sentence Based on the results of the experiment the Table 3. The frequency of sample antonym specific to strength of the antonyms/synonyms varies in re- the underlying domains lation to the domains of instantiation. Hence, the strength of the co-occurrence of antonyms and 2.5 Variant Domain Dependent Co- synonyms is a function of the domains. For in- occurrence Extraction stance, the antonyms: slow: fast, slow: quick and In the previous algorithm, the co-occurrence in- slow: rapid were used in completely different formation was extracted from the same sentence. domains with little or no overlap. Slow: fast is However, unlike the antonyms, synonyms rarely used in the domains of motion, movement, occurred together in the same context (the same speed; slow: quick is used for time, march, steps sentence and domain). It is natural to assume that domains. The synonyms powerful: strong are in most cases synonyms are used in different used in the domains of voices, links, meaning; contexts since they evoke similar but not identic- strong: muscular in the domains of legs, neck; al meanings. This is however not the case for strong: heavy are used in the domains of wind antonyms, which were always used to evoke rain, waves rainfall, winds snow respectively; properties of the same meanings when these an- intense: strong in the domains of battle resis- tonymic words were used to express opposition tance, radiation gravity, updrafts clouds respec- (Paradis & Willners 2011), and in fact also when tively. they are not used to express opposition (Para- We observed some unique patterns among the dis,et al., 2015). Because of this we decided to antonyms and synonyms as described below: extract a variant domain dependent co- The antonyms: occurrence algorithm for the synonyms and an- Co-occurred frequently in the same do- tonyms, which instead extracts patterns of co- main in the same sentence. occurrence information of the synonyms and an- 152 The strength of the co-occurrence de- 4 Comparison with related works pends on the domain: slow: fast in the domains of growth, lines , motion, Previous research has shown that there are anto- movement, speed ,trains, music, pitch; nyms that are strongly opposing (canonical anto- slow: quick in the domains of time, nyms) (Paradis et al. 2009, Jones et al. 2012). march, steps; slow: gradual in the do- Such antonyms are very frequent in terms of co- mains of process, change, transition; occurrence as compared to other antonyms: small: big in the domains of screen, small: large as compared with small: big. In this band; small: large in the domains of in- experiment we found that the canonical anto- testine, companies, businesses; week: nyms are the set of antonyms the domains in strong in the domains of force, interac- which they function were numerous and produc- tion, team, ties, points, sides, wind. tive. For instance the number of domains for The Synonyms: small: large (11704) is by far greater than for Co-occurred in the same sentence but small: big (120). However this doesn’t make the mainly in different domains. For in- antonym small: large more felicitous in all the stance, fast: quick, strong: heavy. Few domains. Small: big are the most felicitous anto- co-occurrences in the same sentences in nyms for the domains such as screen, band as the same domains as exhibited by the compared to small: large. pairs gradual: slow in the domains of Measuring the strength of antonyms without process, change, development. taking domains into account provided higher values for the canonicals as they tended to be The strength of the synonym co- used in several domains. If domains were taken occurrence depends on the domains. For in to account, as we did in this experiment, all instance, the synonyms strong: heavy in the antonyms were strong in their specific do- wind and rain domains respectively to mains. The antonym pair small: large had higher express intensity; the synonyms large: value without considering domain in to account wide in the domains of population and yet had 0.29 value in the domain of screen where distribution domains respectively; gra- small: big has much higher value (0.71). The dual: slow in the domains of process, values were calculated taking the frequency of change, development; small: low in the co-occurrence of the domain term (screen in this domain of size cost, range, size weight, case) with each antonyms and dividing it by the area, size price, amount density; micro: summation of the frequency of co-occurrence of small in the domains of enterprises, the domain term (again screen in this case) with businesses, entrepreneurs.. both antonyms (small big and small large). 3.2 The variant domain dependent co- occurrence method 5 Conclusion As mentioned before, the variant domain depen- The strength of the antonyms/synonyms varied in dent co-occurrence extraction algorithm mines relation to the domains of instantiation. The use the patterns of co-occurrence information of the of antonyms and synonyms was very consistent synonyms and antonyms in different sentences. with few overlaps across the domains. Similar The result from the variant co-occurrence expe- results were observed in both experiments from riment showed hardly any differences in the do- the domain perspective although with significant mains with which the synonyms and antonyms differences in frequency. Antonyms frequently are associated. Strong in the domains of influ- co-occurred in the same domains in the same ence, force, wind, interactions, evidence, ties; sentences and synonyms co-occurred in different Heavy in the domains of loss, rain, industry, traf- domains in the same sentences (with less fre- fic; gradual: slow in the domains of process, quency) and more frequently in different sen- change, transition. However, we observed that tences in the same domains. the frequency of co-occurrence differed signifi- cantly. For instance, the frequency of the pair Acknowledgments gradual: slow was 76 in same sentences experi- We acknowledge European Science Foundation ment but 1436 in the variant co-occurrence expe- (ESF) for providing us the funding to undertake riment. this work. 153 References Dagmar Divjak. 2010. Structuring the lexicon: a clus- tered model for near-synonymy. Berlin: de Gruyter. Gries Stefan Th. & N. Otani. 2010. Behavioral pro- files: a corpus-based perspective on synonymy and antonymy. ICAME Journal, 34:121–150. Jones Steven, M.L. Murphy, Carita Paradis & Caro- line Willners. 2012. Antonyms in English: Con- struals, constructions and canonicity. Cambridge University Press, Cambridge, UK. Anna Lobanova. 2012. The Anatomy of Antonymy: A Corpus-Driven Approach. Dissertation, University of Groningen. Carita Paradis. 2005. Ontologies and construals in lexical semantics. Axiomathes,15:541–573. Carita Paradis, Caroline Willners & Jones Steven. 2009. Good and bad opposites: using textual and psycholinguistic techniques to measure antonym canonicity. The Mental Lexicon, 4(3): 380–429. Carita Paradis, Simon Löhndorf , Joost van de Weijer & Caroline Willners. 2015. Semantic profiles of antonymic adjectives in discourse. Linguistics, 53.1: 153 – 191. Roehm, D., I. Bornkessel-Schlesewsky, F. Rösler & M. Schlesewsky. 2007. To predict or not to predict: Influences of task and strategy on the processing o f semantic relations. Journal of Cognitive Neuro- science, 19 (8):1259–1274. Debela Tesfaye. & Michael Zock. 2012. Automatic Extraction of Part-whole Relations. In Proceedings of the 9th International Workshop on Natural Lan- guage Processing and Cognitive Science. Michael Zock. & Debela Tesfaye. 2012. Automatic index creation to support navigation in lexical graphs encoding part of relations. Proceedings of the 3rd Workshop on Cognitive Aspects of the Lex- icon (CogALex-III), COLING 2012. 154