=Paper= {{Paper |id=Vol-1347/paper34 |storemode=property |title=On the use of antonyms and synonyms from a domain perspective |pdfUrl=https://ceur-ws.org/Vol-1347/paper34.pdf |volume=Vol-1347 |dblpUrl=https://dblp.org/rec/conf/networds/TesfayeP15 }} ==On the use of antonyms and synonyms from a domain perspective== https://ceur-ws.org/Vol-1347/paper34.pdf
       On the use of antonyms and synonyms from a domain perspective
                    Debela Tesfaye                                 Carita Paradis
                    IT PhD Program                       Centre for Languages and Literature
                 Addis Ababa University                            Lund University
                 Addis Ababa, Ethiopia                              Lund, Sweden

             dabookoo@gmail.com                        carita.paradis@englund.lu.se
                                                               email


                                                                 The rationale is that the dependency parsing pro-
                            Abstract                             duces the relational information among the con-
                                                                 stituent words of a given sentence, which allows
      This corpus study addresses the question
                                                                 us to (i) extract co-occurrences specific to a giv-
      of the nature and the structure of anto-
                                                                 en domain/context, and (ii) capture long distance
      nymy and synonymy in language use,
                                                                 co-occurrences between the word pairs. Consider
      following automatic methods to identify
                                                                 (1).
      their behavioral patterns in texts. We ex-
                                                                    1. Winters are cold and dry, summers are
      amine the conceptual closeness/distance
                                                                        cool in the hills and quite hot in the plains.
      of synonyms and antonyms through the
                                                                 In (1), the antonyms cold: hot modify winters
      lens of their DOMAIN instantiations.
                                                                 and summers respectively. Those forms express
1       Introduction                                             the lexical concepts winter and summer in the
                                                                 domain temperature. The antonyms cold: hot co-
Using data from Wikipedia, this corpus study                     occur but at a distance in the sentence. Thanks to
addresses the question of the nature and the                     the dependency information, it is possible to ex-
structure of antonym and synonymy in language                    tract such long distance co-occurrences together
use. While quite a lot of empirical research using               with the concepts modified.
different observational techniques has been car-                    The article is organized as follows. In section
ried on antonymy (e.g. Roehm et al. 2007, Loba-                  2, we describe the procedure and the two me-
nova 2013, Paradis et al. 2009, Jones et al. 2012),              thods used: co-occurrence extraction of lexical
not as much has been devoted to synonymy (e.g.                   items in the same sentence and a variant domain
Divjak 2010) and very little has been carried out                dependent co-occurrence extraction method. The
on both of them using the same methodologies                     latter method extracts patterns of co-occurrence
(Gries & Otani 2010). The goal of this study is to               information of the synonyms and antonyms in
bring antonyms and synonyms together, using                      different sentences. In section 3 we present the
the same automatic methods to identify their be-                 results and discussions followed by a discussion
havioral patterns in texts. We examine the con-                  of our results in comparison with related pre-
ceptual closeness/distance of synonyms and an-                   vious works in section 4. The conclusions are
tonyms through the lens of their domain instan-                  presented in section 5.
tiations. For instance, strong used in the context
of wind or taste (of tea) as compared to light and               2    Procedure
weak respectively, and light as compared to
                                                                 Using an algorithm similar to the one proposed
heavy when talking about rain or weight.
                                                                 by Tesfaye & Zock (2012) and Zock & Tesfaye
   The basic assumption underlying this study is
                                                                 (2012), we extracted the co-occurrence informa-
that the strength of co-occurrence of antonyms
                                                                 tion of the pairs in different domains separately,
and synonyms is dependent on the domain in
                                                                 measuring the strength of their relation in the
which they are instantiated and co-occur. In or-
                                                                 different domains with the aim of (i) making
der to test the hypothesis we mine the co-
                                                                 principled comparisons between antonyms and
occurrence information of the antonyms and the
                                                                 synonyms from a domain perspective, and (ii)
synonyms relative to the domains using a depen-
                                                                 determining the structure of antonymy and syn-
dency grammar method. 1
                                                                 onymy as categories in language and cognition.
                                                                    Our algorithm is similar to the standard n-
1
    http://nlp.stanford.edu/software/lexparser.shtml             gram co-occurrences extraction algorithms, but

               Copyright © by the paper’s authors. Copying permitted for private and academic purposes.
In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
                          Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org

                                                           150
instead of using the linear ordering of the words                               Start with the selected set of syno-
in the text, it generates co-occurrences frequen-                                nym/antonym pairs
cies along paths in the dependency tree of the                                  Extract sentences containing the pairs
sentence as presented in the sections 2.2–2.5.                                  Identify the dependency information of
                                                                                 the sentences
2.1   Training and testing data
                                                                                Mine the dependency patterns linking
The antonyms and synonyms employed for train-                                    the pairs with the concepts they modify
ing and testing were extracted from the data used                               Use these learned patters to extract fur-
by Paradis et al. (2009) where the antonyms are                                  ther relations (synonym/antonym pairs
presented according to their underlying dimen-                                   and the associated concepts)
sions and synonyms were provided for all the
individual antonyms (for a description of the               2.3                 Extracting the domains
principles see Paradis et al. 2009). That set of            We created a matrix of antonym and synonym
antonyms and synonyms were used to extract                  pairs matching every antonym and synonym
their co-occurrence patterns from the Wikipedia             from the list in Table 1. Using the patterns
texts in this study.                                        learned in section 2.2 we identified as many do-
                                                            mains as possible for the pairs of synonyms and
 Dimen-      Anto-    The associated syn-                   antonyms and calculated their frequency of co-
 sions       nyms     onyms of the antonyms                 occurrence in the respective domains.
 Size        Large    huge, vast, massive ,big                 When the lexical concepts were considered
                      ,bulky, giant ,gross,                 too specific, we referred them to more inclusive,
                      heavy, significant ,wide              superordinate domains. Frequency of occurrence
             Small    little, low, minor, minute,           was used as a criterion for conflation of concepts
                      petite, slim, tiny                    into superordinate ones as follows.
 Speed       Fast     quick, hurried, prompt,                    Extract term co-occurrence frequencies
                      accelerating, rapid                           within a window of sentences constitut-
             Slow     sudden, dull, gradual, lazy                   ing both the antonyms/synonyms and the
                                                                    potential domain concepts. For instance:
 Strength Strong      forceful, hard, heavy,                             o Antonyms: cold: hot, domain
                      muscular, powerful, sub-                               concepts: winter, summer
                      stantial, tough                                    o Synonyms: strong: heavy, do-
             Weak     light, soft, thin, wimpy                               main concepts: wind, rain
 Merit       Bad      crappy, defective, evil                    Create a matrix of the potential domain
                      ,harmful, poor ,shitty                        concepts and the co-occurring terms with
                      ,spoiled ,unhappy                             their frequencies
             Good     awful ,genuine ,great, ho-                 Cluster them using the k-means algo-
                      norable ,hot, neat, nice,                     rithm
                      reputable, right ,safe ,well               Take the term with the maximal frequen-
                                                                    cy (centroid) in each cluster and consider
 Table 1. The antonym pairs in their meaning dimen-                 it the domain term
         sions and the associated synonyms.                      Test the result using expert judgment
                                                                    running the algorithm on the test set.
2.2   Extracting the co-occurrences of the
      antonyms and synonyms in the respec-
                                                              Antonym/Synonym




      tive domains                                                                                     Words co-
                                                                                                 Do-




                                                                                                       occurring
                                                                                    main concept




In order to extract the co-occurrences of the an-                                                      with possible
                                                                                                                       Frequency




tonyms/synonyms in the respective domains we                                                           domain con-
                                                                                    Potential




produced the relational information among the                                                          cepts
constituent words of a given sentence. To this
end, we extracted the patterns linking the syn-
onyms/antonyms and the concepts they modify
and used this same pattern to extract more lexical           hot                    summer win- temperature            50
concepts. The procedure was as follows.                      cold                   ter           climate              43
                                                                                                  Wind                 30




                                                      151
 strong         wind rain                wind rain                 86          tonyms in different sentences, because we ex-
 heavy          winds snow-              winds snow-               3           pected synonyms to be applicable to different,
                fall                     fall                                  rather than the same contexts, since complete
                winds rainfall           winds rainfall            34          overlap of meanings of words are rare or even
                waves rain-              waves rainfall            4           non-existent. This way we were able to gain in-
                fall                                                           formation indirectly about their use by extracting
                                                                               their co-occurrence when they appear separately
 Table 2. The matrix of the frequencies of terms co-                           in different sentences while still being instan-
 occurring with sample antonyms and the associated                             tiated in the same domain. We mined the co-
             potential domain concepts                                         occurrence information of the synonym/antonym
                                                                               pairs separately in all possible domains and
2.4          Extracting co-occurrences frequency
                                                                               check if they co-occurred in the same sorts of
             specific to a given Domain/Context
                                                                               domains:
The algorithm calculated the co-occurrence fre-                                      X(y, f)
quency of the antonyms/synonyms with the dif-                                        Z(y, f)
ferent concepts they refer to (or modify) as pre-                              Where,
sented in table 3 by combining the information                                          X and      Z are a pair of a given an-
obtained in section 2.3 and section 2.4.                                                tonym/synonym, Y is the domain within
                                                                                        which the pairs of the antonym/synonym
                                                                                        co-occur and f the frequency of the x-y
                                              Frequency
  Antonyms


                 Concept 1



                             Concept 2




                                                                                        or z-y co-occurrence.
                                                          Domain




                                                                               The frequency of a pair of the anto-
                                                                               nyms/synonyms in the Y domain was counted
                                                                               and the same applies to the other pair. This made
 hot            sum-         winter          10           temper-              it possible to measure the degree of co-
 cold           mer                          5            ature                occurrence of the antonym/synonym pairs from
 strong         wind         rain            11           winds                the domain perspective indirectly.
 heavy          winds        snowfall        2            rain
                                                                               3. Results and discussion
                winds        rainfall
                waves        rainfall                                          3.1   Co-occurrences in the same sentence
                                                                               Based on the results of the experiment the
Table 3. The frequency of sample antonym specific to                           strength of the antonyms/synonyms varies in re-
               the underlying domains                                          lation to the domains of instantiation. Hence, the
                                                                               strength of the co-occurrence of antonyms and
2.5          Variant Domain Dependent                              Co-
                                                                               synonyms is a function of the domains. For in-
             occurrence Extraction
                                                                               stance, the antonyms: slow: fast, slow: quick and
In the previous algorithm, the co-occurrence in-                               slow: rapid were used in completely different
formation was extracted from the same sentence.                                domains with little or no overlap. Slow: fast is
However, unlike the antonyms, synonyms rarely                                  used in the domains of motion, movement,
occurred together in the same context (the same                                speed; slow: quick is used for time, march, steps
sentence and domain). It is natural to assume that                             domains. The synonyms powerful: strong are
in most cases synonyms are used in different                                   used in the domains of voices, links, meaning;
contexts since they evoke similar but not identic-                             strong: muscular in the domains of legs, neck;
al meanings. This is however not the case for                                  strong: heavy are used in the domains of wind
antonyms, which were always used to evoke                                      rain, waves rainfall, winds snow respectively;
properties of the same meanings when these an-                                 intense: strong in the domains of battle resis-
tonymic words were used to express opposition                                  tance, radiation gravity, updrafts clouds respec-
(Paradis & Willners 2011), and in fact also when                               tively.
they are not used to express opposition (Para-                                    We observed some unique patterns among the
dis,et al., 2015). Because of this we decided to                               antonyms and synonyms as described below:
extract a variant domain dependent co-                                         The antonyms:
occurrence algorithm for the synonyms and an-                                        Co-occurred frequently in the same do-
tonyms, which instead extracts patterns of co-                                          main in the same sentence.
occurrence information of the synonyms and an-




                                                                         152
      The strength of the co-occurrence de-                4   Comparison with related works
       pends on the domain: slow: fast in the
       domains of growth, lines , motion,                   Previous research has shown that there are anto-
       movement, speed ,trains, music, pitch;               nyms that are strongly opposing (canonical anto-
       slow: quick in the domains of time,                  nyms) (Paradis et al. 2009, Jones et al. 2012).
       march, steps; slow: gradual in the do-               Such antonyms are very frequent in terms of co-
       mains of process, change, transition;                occurrence as compared to other antonyms:
       small: big in the domains of screen,                 small: large as compared with small: big. In this
       band; small: large in the domains of in-             experiment we found that the canonical anto-
       testine, companies, businesses; week:                nyms are the set of antonyms the domains in
       strong in the domains of force, interac-             which they function were numerous and produc-
       tion, team, ties, points, sides, wind.               tive. For instance the number of domains for
The Synonyms:                                               small: large (11704) is by far greater than for
    Co-occurred in the same sentence but                   small: big (120). However this doesn’t make the
       mainly in different domains. For in-                 antonym small: large more felicitous in all the
       stance, fast: quick, strong: heavy. Few              domains. Small: big are the most felicitous anto-
       co-occurrences in the same sentences in              nyms for the domains such as screen, band as
       the same domains as exhibited by the                 compared to small: large.
       pairs gradual: slow in the domains of                   Measuring the strength of antonyms without
       process, change, development.                        taking domains into account provided higher
                                                            values for the canonicals as they tended to be
    The strength of the synonym co-                        used in several domains. If domains were taken
       occurrence depends on the domains. For
                                                            in to account, as we did in this experiment, all
       instance, the synonyms strong: heavy in
                                                            the antonyms were strong in their specific do-
       wind and rain domains respectively to
                                                            mains. The antonym pair small: large had higher
       express intensity; the synonyms large:
                                                            value without considering domain in to account
       wide in the domains of population and
                                                            yet had 0.29 value in the domain of screen where
       distribution domains respectively; gra-
                                                            small: big has much higher value (0.71). The
       dual: slow in the domains of process,
                                                            values were calculated taking the frequency of
       change, development; small: low in the
                                                            co-occurrence of the domain term (screen in this
       domain of size cost, range, size weight,
                                                            case) with each antonyms and dividing it by the
       area, size price, amount density; micro:
                                                            summation of the frequency of co-occurrence of
       small in the domains of enterprises,
                                                            the domain term (again screen in this case) with
       businesses, entrepreneurs..
                                                            both antonyms (small big and small large).
3.2       The variant domain dependent co-
          occurrence method                                 5   Conclusion
As mentioned before, the variant domain depen-              The strength of the antonyms/synonyms varied in
dent co-occurrence extraction algorithm mines               relation to the domains of instantiation. The use
the patterns of co-occurrence information of the            of antonyms and synonyms was very consistent
synonyms and antonyms in different sentences.               with few overlaps across the domains. Similar
The result from the variant co-occurrence expe-             results were observed in both experiments from
riment showed hardly any differences in the do-             the domain perspective although with significant
mains with which the synonyms and antonyms                  differences in frequency. Antonyms frequently
are associated. Strong in the domains of influ-             co-occurred in the same domains in the same
ence, force, wind, interactions, evidence, ties;            sentences and synonyms co-occurred in different
Heavy in the domains of loss, rain, industry, traf-         domains in the same sentences (with less fre-
fic; gradual: slow in the domains of process,               quency) and more frequently in different sen-
change, transition. However, we observed that               tences in the same domains.
the frequency of co-occurrence differed signifi-
cantly. For instance, the frequency of the pair             Acknowledgments
gradual: slow was 76 in same sentences experi-              We acknowledge European Science Foundation
ment but 1436 in the variant co-occurrence expe-            (ESF) for providing us the funding to undertake
riment.                                                     this work.




                                                      153
References
Dagmar Divjak. 2010. Structuring the lexicon: a clus-
  tered model for near-synonymy. Berlin: de
  Gruyter.
Gries Stefan Th. & N. Otani. 2010. Behavioral pro-
  files: a corpus-based perspective on synonymy and
  antonymy. ICAME Journal, 34:121–150.
Jones Steven, M.L. Murphy, Carita Paradis & Caro-
  line Willners. 2012. Antonyms in English: Con-
  struals, constructions and canonicity. Cambridge
  University Press, Cambridge, UK.
Anna Lobanova. 2012. The Anatomy of Antonymy: A
  Corpus-Driven Approach. Dissertation, University
  of Groningen.
Carita Paradis. 2005. Ontologies and construals in
  lexical semantics. Axiomathes,15:541–573.
Carita Paradis, Caroline Willners & Jones Steven.
  2009. Good and bad opposites: using textual and
  psycholinguistic techniques to measure antonym
  canonicity. The Mental Lexicon, 4(3): 380–429.
Carita Paradis, Simon Löhndorf , Joost van de Weijer
  & Caroline Willners. 2015. Semantic profiles of
  antonymic adjectives in discourse.     Linguistics,
  53.1: 153 – 191.
Roehm, D., I. Bornkessel-Schlesewsky, F. Rösler &
  M. Schlesewsky. 2007. To predict or not to predict:
  Influences of task and strategy on the processing o
  f semantic relations. Journal of Cognitive Neuro-
  science, 19 (8):1259–1274.
Debela Tesfaye. & Michael Zock. 2012. Automatic
  Extraction of Part-whole Relations. In Proceedings
  of the 9th International Workshop on Natural Lan-
  guage Processing and Cognitive Science.
Michael Zock. & Debela Tesfaye. 2012. Automatic
  index creation to support navigation in lexical
  graphs encoding part of relations. Proceedings of
  the 3rd Workshop on Cognitive Aspects of the Lex-
  icon (CogALex-III), COLING 2012.




                                                        154