=Paper=
{{Paper
|id=Vol-1347/paper10
|storemode=property
|title=What can distributional semantic models tell us about part-of relations?
|pdfUrl=https://ceur-ws.org/Vol-1347/paper10.pdf
|volume=Vol-1347
|dblpUrl=https://dblp.org/rec/conf/networds/Morlane-Hondere15
}}
==What can distributional semantic models tell us about part-of relations?==
What can distributional semantic models tell us about part-of relations? François Morlane-Hondère LIMSI-CNRS, Orsay, France francois.morlane-hondere@limsi.fr 1 Introduction 2 Part-of relation and DSMs As its name suggests, part-of relation – or The term Distributional semantic models (DSMs) meronymy1 – holds between a part – the meronym refers to a family of unsupervised corpus-based – and its whole – the holonym –, like in bed/pillow, approaches to semantic similarity computation. armor/steel or ostrich/feather. It is one of the cen- These models rely on the distributional hypothe- tral relations used in knowledge representation. sis (Harris, 1954), which states that semantically Automatic extraction of part-of relations has related words tend to share many of their contexts. been addressed using many approaches, most of So, by collecting information about the contexts which are pattern-based (Berland and Charniak, in which words are used in a corpus, DSMs are 1999; Girju et al., 2006; Pantel and Pennacchiotti, able to measure the distributional similarity of two 2006). However, the unsupervised nature of the words, which theoretically translates into a seman- distributional approach makes it an attractive al- tic one. ternative. In recent years, these models have become very Studies were conducted to assess the nature popular in a wide range of NLP tasks (Weeds, of the semantic relations extracted by distribu- 2003; Baroni and Lenci, 2010), mainly because tional models – using human judges (Kuroda et of the ever-increasing availability of textual data. al., 2010), thesauri (Morlane-Hondère, 2013; Fer- Regardless of their use in NLP applications, distri- ret, 2015) or ad hoc datasets (Baroni and Lenci, butional data provide precious information about 2011). They showed that part-of relations are words’ behaviour and their tendency to appear in present in varying proportions among distribution- the same contexts. Yet, linguists have shown lit- ally similar words. This very presence is inter- tle interest in DSMs (Sahlgren, 2008). We believe esting in that unlike synonymy, hypernymy or co- that this kind of information can be relied on to hyponymy, meronymy is not a similarity relation empirically assess the validity of linguistic theo- (Resnik, 1993; Budanitsky and Hirst, 2006): an ries. Conversely, by shedding light on underlying ostrich is not the same kind of thing as a feather, linguistic factors that influence distributional be- neither an armor is the same kind of thing as steel. haviours, linguistic studies can contribute to im- Following the distributional hypothesis, it is not prove our understanding of the results provided by expected that these kind of meronyms share a lot DSMs. of contexts. It appears, though, that a certain proportion This paper illustrates such a qualitative linguis- of them tend to do so. For example, in Ba- tic approach by investigating the presence of part- roni and Lenci (2010)’s DSM, player, pianist and of relations among distributionally similar French musician are among the ten most distributionally words. We compare distributional data and a set of similar words of orchestra. In the following of part-of relations provided by humans in a lexical this study, we compare the semantic properties network. In order to assess the nature of the part- of the meronyms which can be extracted using a of word pairs which can – or cannot – be found distributional approach and the properties of the in DSMs, these words were sense-tagged using meronyms which cannot. WordNet supersenses. Our results show consid- erable discrepancies between the representation of 1 Some authors make a distinction between part-of relation part-of sense pairs in distributional data. and meronymy (Cruse and Croft, 2004). Copyright © by the paper’s authors. Copying permitted for private and academic purposes. In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org 46 3 Methodology and data lation between two words and their probability of being extracted in a DSM. However, the typology 3.1 The part-of dataset has proven to be inadequate, so we chose to an- The first step consists in gathering a set of notate the words instead of their relation. This is meronyms. Although efforts are made to provide also what we do in this study. This approach is in- expert-built lexical semantic resources for French spired by the idea that the difference between the (Fišer and Sagot, 2008; Pradet et al., 2014), there meronymic sub-relations is due to the semantic na- is currently no freely-available equivalent – in ture of the words involved (Murphy, 2003). terms of quality and coverage – to WordNet (Fell- The above-mentioned lack of freely-available baum, 1998) or the Moby thesaurus (Ward, 2002) thesauri for French led us to use WordNet to per- for French. So, we use the JeuxDeMots (JDM) form this task. Words of our dataset were 1) trans- lexical network (Lafourcade, 2007), which is a lated to English, 2) mapped to WordNet synsets GWAP (Game With A Purpose) in which players and 3) linked to their translation’s supersense(s). are asked to provide words which can be in a given Supersenses – or lexicographer classes – are a set relation with a given word2 . of 44 coarse semantic categories used to classify Although collaboratively-built lexical semantic WordNet’s noun, verb and adjective entries3 . Ex- resources have shown to be valuable (Gurevych amples of the 25 noun supersenses are GROUP, and Wolf, 2010) and although a relation in LOCATION or FOOD . Supersenses were then man- JDM must be provided by two different play- ually disambiguated (drawer can both belong to ers to be added to the network, a certain pro- the PERSON and ARTIFACT supersenses, but only portion of part-of relations in JDM are actually the latter fits in the pair cabinet/drawer). hypernymys (sucette/bonbon ’lollipop/candy’), 3.3 The distributional model synonyms (chef /patron ’chief/boss’) or the- matic associations (océanographie/eau ’oceanog- We use a DSM4 generated from the frWaC corpus raphy/water’). Two possible explanations for (Baroni et al., 2009) – a 1.6 billion words corpus these confusions are the lack of linguistic expertise of French web pages. of the players or a misunderstanding of the instruc- Words in the DSM appear at least 20 times in tion. Erroneous relations were manually removed the corpus and in at least 5 different contexts. from the set. Syntactic dependencies were used as contexts One interesting characteristic of JDM part-of using the Talismane parser (Urieli, 2013). Rela- relations is that a considerable number of them tions taken into account in the context vectors are do not fit into traditional typologies of meronymy the subject, object and modifier relations. Prepo- relations. For example, topological inclusions sitions and coordinating conjunctions are also in- (cell/prisoner), attachment relations (ear/earring) cluded as relations (the label of the relation being or ownership (millionaire/money) are very com- the preposition or the coordinating conjuction). mon among JDM part-of pairs although they are The weighting of the contexts was made using considered to be non-meronymic relations (Win- the pointwise mutual information and the cosine ston et al., 1987). measure was used to compute the similarity be- After filtering the pairs whose members do not tween the context vectors. The minimum similar- appear in our DSM and removing most of the er- ity threshold has been set to 0.02. The total num- roneous relations, there were 24 089 part-of pairs ber of word pairs in the DSM is 3 674 254. left in our dataset. 4 Results and discussion 3.2 Sense tagging We then measure the proportion of semantically- In a previous study (Morlane-Hondère and Fabre, annotated part-of pairs – sense pairs – in our set 2012), we manually annotated the different which are present in the DSM. Sense pairs which meronymic sub-relations – following Winston and occur less than 100 times in the dataset are dis- Chaffin (1987)’s typology – in a dataset like the carded. Table 1 provides the list of the 22 re- one described above. The idea was to test whether 3 http://wordnet.princeton.edu/man/ there is a correlation between the nature of the re- lexnames.5WN.html 4 Provided by Franck Sajous from the CLLE-ERSS labo- 2 http://www.jeuxdemots.org/ ratory. 47 maining sense pairs and, for each one, the ratio of holonym/meronym % holonym/meronym % TIME/ TIME 84 ARTIFACT / PERSON 32.6 part-of pairs present in the DSM. In this section, LOC ./ LOC . 78.3 ARTIFACT / ARTIFACT 31.4 we describe the homogeneous sense pairs – whose SUBST./ SUBST. 62.4 ARTIFACT / LOC . 24.8 OBJECT / OBJECT 61 ARTIFACT / PLANT 22.8 semantic classes are identical – and the heteroge- COMM ./ COMM . 53.8 ARTIFACT / SUBST. 20.4 neous ones, then we provide a detailed analysis of GROUP / PERSON 52.8 OBJECT / ANIMAL 19.8 some of the PERSON/BODY meronyms which have LOC ./ ARTIFACT 46.8 PLANT / PLANT 19.7 BODY / BODY 40.5 GROUP / ANIMAL 17.1 been extracted by the DSM. ANIMAL / ANIMAL 41 PERSON / ARTIFACT 16.5 ARTIFACT / COMM . 39.9 ANIMAL / BODY 9.4 4.1 Homogeneous sense pairs ACT / ARTIFACT 35.8 PERSON / BODY 5.5 As expected, part-of relations composed of two Table 1: Part-of sense pairs and their presence in words of the same class are the most repre- the DSM. sented in the DSM. 84 % of the TIME/TIME part-of pairs were extracted by the DSM. This can be explained by the fact that the mem- acier – as well as fer – is used as a material, the bers of pairs like mois/jour ‘month/day’ both representation of carbone that emerges from the appear in contexts involving temporal prepo- corpus is that of a chemical element. sitions like venir IL Y A ‘to come SINCE’, se dérouler DURANT ‘to take place DURING’ or scrutin AVANT ‘election BEFORE’. 4.2 Heterogeneous sense pairs Likewise, the spatial dimension plays a crucial At the other end of the scale, part-of relations com- role in the extraction of meronyms (78.3 % of posed of two words of different classes are – also LOCATION / LOCATION pairs are extracted). This logically – the less represented in the DSM. is due to the fact that, as for time, spatial infor- Part-of pairs composed of words that refer to mation can be conveyed by specific prepositions. human beings or to animals and their body parts Thus, LOCATION/LOCATION meronyms’ shared are barely present in the DSM (although being contexts massively involve the DANS ‘IN’ relation. the most frequent sense pairs in our dataset). In SUBSTANCE pairs are the third best-extracted frWaC, PERSON words appear as subjects of ac- kind of pairs. The reason why 37.6 % of them has tion (prendre ‘to take’, dire ‘to say’) or cognitive not been extracted can be illustrated by the com- verbs (vouloir ‘to want’, savoir ‘to know’). They parison of acier ‘steel’ and two of its meronyms, are frequently modified by nationality adjectives. namely fer ‘iron’ – which was extracted in the Body parts do not appear in such contexts. The DSM – and carbone ‘carbon’ – which was not ex- class of body parts was actually found to be quite tracted: heterogeneous, in that body parts’ distributions in the corpus differ from persons’, but not in the same 1. acier and fer both appear in contexts way: like grille EN ‘grille COMP’, forgé MOD ‘forged MOD’ or lame DE ‘blade COMP’. • organ nouns mostly appear in noun com- Thus, they appear as materials and, moreover, pounds to indicate the location of medical in- as materials which are used to build the same terventions (radiographie DE ‘x-ray MOD’) kind of things; or affections (cancer de ‘cancer COMP’ or lésion de ‘injury COMP’); 2. although being a material as well, carbone does not appear as such in the corpus. Rather, • limb nouns are modified by adjectives related its contexts are chemical compounds like to location and are objects of verbs like lever monoxyde DE ‘monoxide COMP’. It is also ‘to raise’ or étendre ‘to stretch’. modified by adjectives like inorganique MOD ‘inorganic MOD’, which describe chemical All these contexts are obviously incompatible with PERSON words. properties of carbone. These two kinds of contexts are not found among acier’s. A similar distributional discrepancy can be ob- served with the ANIMAL/BODY sense pair, ex- So, we can see that there is a discrepancy between cept that animal nouns tend to appear in contexts the contexts in which acier appears in the corpus like élevage DE ‘farming COMP’ or espèce DE and the ones in which carbone appears: whereas ‘species COMP’. They are also modified by size 48 adjectives. It is interesting to note that many and the holonym are quite random. For ex- animal body parts like tête DE ‘head COMP’, ample, the meronyms homme/main ‘man/hand’ peau DE ‘skin COMP’ or queue DE ‘tail COMP’ share contexts like nu MOD ‘bare MOD’ or dos DE do appear among the closest contexts of animal ‘back COMP’, which are not very informative nouns. This means that the meronymic relation about their relation. On the other hand (!) some between nouns referring to animals and their body shared contexts like doigt DE ‘finger COMP’ and parts is not a paradigmatic one. Thus, it is rea- saisir SUJ ‘to grab SUBJ’ are more informative. sonable to say that, in order to extract this particu- The fact that these specific features are shared by lar relation, the use of syntagmatic patterns would the meronyms indicates some kind of similarity be a better strategy than the use of a paradigmatic between them: when a man grabs a rock, it is ac- DSM. tually his hand that completes the action of grab- The sense pair GROUP/PERSON also presents bing, as well as a man’s fingers are also his hand’s an interesting situation. Of all the heterogeneous fingers. sense pairs, meronymic relations belonging to this The meronyms enfant/oeil ‘child/eye’ also one are the most likely to be extracted by the distri- share some interesting contexts: both the butional method. This can be explained by a ten- meronym and the holonym are subjects of verbs of dency to use the GROUP entities in a metonymic visual perception like regarder ‘to look’, percevoir way: although an army is not the same kind of ‘to perceive’ or observer ‘to observe’. The thing as a soldier, both words share contexts like metonymic interpretation is quite straightforward: tirer SUJ ‘to shoot SUBJ’ or tué PAR ‘killed BY’. although the eye is the child’s part that allows him Another reason is the transitivity of properties like to look/perceive/observe, this ability is extended nationality: armée ‘army’ and soldat ‘soldier’ are to the whole child. both modified by nationality adjectives because This phenomenon partially explains why such usually, members of the armed forces of a nation meronyms share semantic – thus distributional – have to be citizens of this nation. features and are more likely to be extracted with a In the section 2, we mentioned the fact that DSM. three meronyms of orchestra were present among its ten most distributionally similar words in Ba- roni and Lenci (2010)’s DSM. In our data, the 5 Conclusion meronyms orchestre/musicien have also been ex- tracted: as for army and soldier, these words The main goal of this study is to shed light on share semantic features. They are related to the linguistic phenomena at work in DSMs. By the kind of music a musician and an orches- comparing a set of sense-tagged part-of relations tra can play (classique MOD ‘classical MOD’, and a distributional model, we show that the se- traditionnel MOD ‘traditional MOD’ or jazz DE mantic class of the meronyms has a dramatic in- ‘jazz MOD’), the kind of actions they perform (in- fluence on their probability to be extracted by a terprété PAR ‘performed BY’, accompagné PAR DSM. We also highlight the – positive – influence ‘accompanied BY’) or their nationality. of metonymy in the extraction of heterogeneous meronyms. 4.3 Focus on the PERSON/BODY sense pair These results show that the part-of relation is not a monolithic entity but a collection of different In the previous subsection, we saw that meronyms kinds of relations between different kinds of words belonging to the PERSON/BODY are the least likely which may or may not be distributionally similar. to be extracted with the distributional approach. In this subsection, we provide further insight into this result by examining the nature of the few PER - Acknowledgments SON / BODY meronymic pairs that were success- fully extracted. This work was partially supported by the ANSM The examination of the 5.5 % of PER - (French National Agency for Medicines and SON / BODY meronymic pairs that were success- Health Products Safety) through the Vigi4MED fully extracted is disappointing: the vast ma- project under grant #2013S060. jority of the contexts shared by the meronym 49 References Mathieu Lafourcade. Making people play for Lexi- cal Acquisition with the JeuxDeMots prototype. In Marco Baroni, Silvia Bernardini, Adriano Ferraresi, SNLP’07: 7th International Symposium on Natu- and Eros Zanchetta. The WaCky wide web: a col- ral Language Processing, page 7, Pattaya, Chonburi, lection of very large linguistically processed web- Thailand, December 2007. crawled corpora. Language Resources and Evalua- tion, 43(3):209–226, 2009. François Morlane-Hondère. Une approche linguistique de l’évaluation des ressources extraites par analyse Marco Baroni and Alessandro Lenci. Distributional distributionnelle automatique. PhD thesis, Univer- memory: A general framework for corpus-based sité de Toulouse II le Mirail, 2013. semantics. Computational Linguistics, 36(4):673– 721, 2010. François Morlane-Hondère and Cécile Fabre. Étude des manifestations de la relation de méronymie dans Marco Baroni and Alessandro Lenci. How we une ressource distributionnelle. In Proceedings of BLESSed distributional semantic evaluation. GEMS TALN 2012, Grenoble, France, June 2012. 2011, pages 1–10, 2011. Lynne Murphy. Semantic Relations and the Lexicon. Matthew Berland and Eugene Charniak. Finding parts Cambridge University Press, New York, 2003. in very large corpora. In Proceedings of the 37th An- nual Meeting of the Association for Computational Patrick Pantel and Marco Pennacchiotti. Espresso: Linguistics on Computational Linguistics, ACL ’99, Leveraging generic patterns for automatically har- pages 57–64, Stroudsburg, PA, USA, 1999. Associ- vesting semantic relations. In Proceedings of the ation for Computational Linguistics. 21st International Conference on Computational Alexander Budanitsky and Graeme Hirst. Evaluating Linguistics and the 44th Annual Meeting of the As- WordNet-based Measures of Lexical Semantic Re- sociation for Computational Linguistics, ACL-44, latedness. Computational Linguistics, 32(1):13–47, pages 113–120, Stroudsburg, PA, USA, 2006. As- March 2006. sociation for Computational Linguistics. D. Alan Cruse and William Croft. Cognitive lin- Quentin Pradet, Gaël de Chalendar and Jeanne Bague- guistics. Cambridge: Cambridge University Press, nier Desormeaux. WoNeF, an improved, expanded 2004. and evaluated automatic French translation of Word- Net. In GWC 2014, Tartu, Estonia, 2014. Christiane Fellbaum, editor. WordNet An Electronic Lexical Database. The MIT Press, Cambridge, MA; Philip Resnik. Selection and Information: a Class- London, May 1998. Based Approach to Lexical Relationships. PhD the- sis, The Institute For Research In Cognitive Science, Olivier Ferret. Typing relations in distributional the- University of Pennsylvania, 1993. sauri. In Núria Gala, Reinhard Rapp, and Gemma Bel-Enguix, editors, Language Production, Cogni- Magnus Sahlgren. The distributional hypothesis. Riv- tion, and the Lexicon, volume 48 of Text, Speech and ista di Linguistica, 20(1):33–53, 2008. Language Technology, pages 113–134. Springer In- Assaf Urieli. Robust French syntax analysis: recon- ternational Publishing, 2015. ciling statistical methods and linguistic knowledge Darja Fišer and Benoı̂t Sagot. Combining multiple re- in the Talismane toolkit. PhD thesis, Université de sources to build reliable wordnets. In TSD 2008 - Toulouse II le Mirail, 2013. Text Speech and Dialogue, Brno, Czech Republic, 2008. Grady Ward. Moby Thesaurus List (English),. 2002. Roxana Girju, Adriana Badulescu, and Dan Moldovan. Julie Weeds. Measures and Applications of Lexical Automatic discovery of part-whole relations. Com- Distributional Similarity. PhD thesis, Department put. Linguist., 32(1):83–135, March 2006. of Informatics, University of Sussex, 2003. Iryna Gurevych and Elisabeth Wolf. Expert-Built M. E. Winston, R. Chaffin, and D. Herrmann. A tax- and Collaboratively Constructed Lexical Semantic onomy of part-whole relations. Cognitive Science, Resources. Language and Linguistics Compass, 11(4):417–444, December 1987. 11(4):1074–1090, 2010. Zellig Harris. Distributional structure. Word, 10(23):146–162, 1954. Kow Kuroda, Jun’ichi Kazama, and Kentaro Torisawa. A look inside the distributionally similar terms. In Proceedings of the Second Workshop on NLP Chal- lenges in the Information Explosion Era (NLPIX 2010), pages 40–49, Beijing, China, August 2010. Coling 2010 Organizing Committee. 50