=Paper=
{{Paper
|id=Vol-1347/paper39
|storemode=property
|title=Using network clustering to uncover the taxonomic and thematic structure of the mental lexicon
|pdfUrl=https://ceur-ws.org/Vol-1347/paper39.pdf
|volume=Vol-1347
|dblpUrl=https://dblp.org/rec/conf/networds/DeyneV15
}}
==Using network clustering to uncover the taxonomic and thematic structure of the mental lexicon==
Using network clustering to uncover the taxonomic and thematic structure of the mental lexicon Simon De Deyne Steven Verheyen University of Adelaide University of Leuven School of Psychology Department of Psychology 5005 Adelaide, Australia Tiensestraat 102, 3000 Leuven, Belgium simon.dedeyne@adelaide.edu.au steven.verheyen@ppw.kuleuven.be While still influential, the view that concepts are and allows for overlapping clusters. Similar to organized as a hierarchical taxonomy as proposed taxonomic theories of knowledge representation, by Rosch (1973) has been challenged on several words are grouped in progressively larger clusters, occasions. For example, some studies have at- which allows us to evaluate structural properties tributed a larger role to thematic relations (Gentner of the lexicon at different scales. This hierarchi- and Kurtz, 2005; Lin and Murphy, 2001), whereas cal structure is also derived from the data by using others have stressed the role of affect in structuring a statistical criterion that involves a comparison word meaning (Niedenthal et al., 1999). A com- with an appropriate null-model for the weighted prehensive account of how these different princi- directed graph. ples shape and structure meaning in the lexicon is Applying OSLOM to the semantic network re- missing, and most studies continue to be biased sulted in a solution with five hierarchical levels. towards concrete noun categories that fit into hier- An overview of this solution is shown in Table 1. archical taxonomies (Medin and Rips, 2005). To There was a large degree of variability in the num- capture mental or psychological properties that or- ber of clusters across the five hierarchical levels ganize the lexicon for a wide range of concepts ranging between 2 and 506 clusters. On aver- and semantic relations, we propose a large-scale age, the p-value of the extracted clusters was low semantic network derived from word associations across all levels, indicating that the obtained clus- as the basis to uncover what the structural princi- ters were unlikely to arise in a comparable random ples are. network1 . There were few homeless nodes at any level, indicating that most words were reliably at- 1 Network Clustering tributed to a specific cluster. There was also a con- Since this is one of the first times the mental lex- siderable degree of overlap at all levels relative to icon is mapped in its entirety using an extremely the size of the clusters; clusters were more distinct extensive word association corpus, an exploratory at the more precise levels, where more clusters approach is warranted. To achieve this, network were obtained. For instance, at the lowest level clustering was used as a way to study how the 1,676 words appeared in multiple clusters, com- mental lexicon can be structured at different scales pared to 5,943 at the highest level. and what type of semantic relations dominate its Figure 1 illustrates the obtained clusters with structure. At the basis lies a semantic network de- the most prototypical examples of each cluster at rived from a large scale word association corpus various levels. At the most general level, Figure 1 including over 12,000 cues and 3.77 million re- shows two distinct clusters, with one of them con- sponses (De Deyne et al., 2013). For the purpose taining highly central words with a negative con- of this study, non-dominant word forms were re- notation. In order to verify whether this interpreta- moved (e.g., apples was removed if apple was also tion is supported statistically, we used the valence present) resulting in a network of 11,000 words. judgments reported by Moors et al. (2012), which Next, the recent Order Statistics Local Optimiza- 1 Default parameters were used in the OSLOM algorithm, tion Method (OSLOM) was applied to identify sta- except for the p cut-off value. Setting this value depends on tistically reliable clusters in a directed weighted the task as it affects the size of the clusters (Lancichinetti word associations network (Lancichinetti et al., et al., 2011). In this application, the cutoff was set at 0.25, because the few clusters in the final solution with high p- 2011). This method includes words in the final values were easy to interpret. Other values of p did not alter cluster solution on the basis of statistical criteria the general pattern of results we report here. Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org 172 Figure 1: Hierarchical tree visualization of clusters in the lexicon with five most central members in terms of cluster in-strength. increasingly more concrete. For instance, Level Table 1: Overview of the hierarchical cluster structure showing five levels (Level 1 is broadest, Level 5 is most pre- 2 shows that the “negative” cluster in Level 1 in- cise). The statistics include total number of clusters N, av- cludes clusters with abstract words or words re- erage cluster size hNc i and its standard deviation, number of lated to human culture (school, money, religion, homeless nodes Nhomeless , number of nodes member of mul- tiple clusters Noverlapping , and the average p-value hpi. time,...) which are now differentiated from a purely negative cluster with central members like 1 2 3 4 5 negative, sad, and crossed. The subdivisions of the N 2 7 37 161 506 “positive” cluster involve the central nodes nature, hNc i 8588 3049 515 112 25 sd(Nc ) 2112 973 364 66 12 music, sports, and food, which might be inter- Nhomeless 18 18 39 86 380 preted as covering sensorial information and natu- Noverlapping 5943 6956 5263 4717 1676 ral kinds. hpi 0 0.062 0.04 0.035 0.051 are applicable to 3,642 non-overlapping words At the lowest level, 506 clusters were identi- in our clusters. The valence judgments differed fied, with an average size of 25 words. A total of significantly between our two clusters according 1,676 words occurred in multiple clusters; at least to an independent t-test (t(3640) = 7.367, CI = a part of them because of homonymy (e.g., bank) [0.190,0.327]). This post-hoc test confirmed our or polysemy (e.g., language, assigned to clusters interpretation of a valence difference between the about nationality, speech, language education, and clusters, which brings further support to studies communication). Most importantly, inspection of that indicated valence is the most important di- the content of all clusters exhibited a widespread mension in semantic space (De Deyne et al., 2014; thematic structure: the clusters were often com- Samsonovic and Ascoli, 2010) and empirical find- posed of both nouns (racket), adjectives (loud), ings highlighting affect-based category structure and verbs (to sound), which does not reflect a pure (Niedenthal et al., 1999). taxonomy of entitities, but also includes properties At Levels 2 to 4, the meaning clusters become and actions. 173 2 Evaluating Taxonomic Structure Table 2: F-values and cluster sizes for items gen- To test whether the clusters provide evidence for erated for 13 concrete noun categories. Nhuman is a hierarchical taxonomic view along the lines of the category size based on the exemplar genera- Rosch and colleagues (Rosch, 1973) or support an tion task; Nc is the size of the best-matching clus- alternative view based on thematic relations iden- ter; F captures precision and recall according to tified in the previous section, data from an exem- the human categories for the full network. F 0 is plar generation task from Ruts et al. (2004) was calculated from a network that excluded potential used. In this task, 100 participants generated as thematic information. F-values are fairly low, in- many exemplars they could think of for six ar- dicating lack of correspondence between the clus- tifact categories (C LOTHING, K ITCHEN U TEN - ters and the taxonomic categories. Excluding the- SILS , M USICAL I NSTRUMENTS , T OOLS , V EHI - matic information results in F 0 values that do cap- CLES, and W EAPONS ) and seven natural kinds ture taxonomic information. categories (F RUIT, V EGETABLES, B IRDS, I N - Category Nhuman Nc F F0 SECTS , F ISH , M AMMALS , and R EPTILES ). If the F RUIT 40 50 0.47 0.84 clusters in the word association network group to- V EGETABLES 35 58 0.50 0.90 gether different types of birds, vehicles, fruits, and B IRDS 53 63 0.53 0.90 I NSECTS 40 34 0.46 0.68 so on, this would indicate a taxonomic organiza- F ISH 37 48 0.57 0.91 tion of semantic memory. For each category, we M AMMALS 61 21 0.20 0.76 investigated the size of the best matching cluster R EPTILES 21 22 0.65 0.51 and calculated precision and recall in terms of the Mean 41 42 0.48 0.79 F-measure for clustering performance. C LOTHING 46 70 0.35 0.80 A taxonomic-like organization would be evi- K ITCHEN U T. 71 18 0.20 0.66 dent in clusters with high precision and recall, re- M USIC I NSTR . 46 24 0.37 0.89 sulting from many true positives and few false pos- T OOLS 73 56 0.25 0.76 V EHICLES 46 28 0.16 0.73 itives and false negatives. For instance, if the clus- W EAPONS 46 25 0.37 0.88 ter corresponding to the category BIRDS contained Mean 55 37 0.28 0.79 robin (a true positive) and did not contain spoon (a true negative), that would increase the F-score. Inspecting the false positives for each of the Conversely, if it contained guitar (a false positive) clusters in Table 3 confirms the validity of the ap- or did not contain ostrich (a false negative), that proach as in the majority of the cases the superor- would decrease the F-score. This way, high F- dinate label (e.g., fruit, tools, etc.) was the most scores should reflect categories that are not overly central member of each cluster. The remaining specific (many false negatives) or general (many intrusions were thematic in nature (e.g., F RUIT: false positives). pick, B IRDS: nest), thus confirming our earlier ex- On average, the best matching clusters were ploratory findings. found at Level 5. The results for each category are shown in Table 2. The average number of mem- One potential response to the previous analyses bers in the exemplar generation task was on av- relates to the nature of the data upon which they erage 41 for the seven natural kinds categories, are based. Perhaps the word association task sim- which is in the same range as the average best ply fails to capture taxonomic information, and if matching cluster size of 42. For artifacts the gener- so, the results of these analyses are simply an ar- ated categories included on average 55 members, tifact of the choice of task. Alternatively, perhaps which was somewhat larger than the obtained av- the “failure” arises because the word association erage cluster size of 37. task is more general than the tasks typically used The resulting F-values were on average 0.48 for to study taxonomic categories. the natural categories and 0.28 for the artifacts, in- There is some evidence that a different choice of dicating only limited support for the presence of task would produce different results. For instance, taxonomic categories. The highest values were ob- much of the work on taxonomic organization re- tained for F ISH (F = .57) and R EPTILES (F = .65) lies on tasks in which participants are asked to list where most items in the clusters were true cate- features of entities (McRae et al., 2005; Ruts et al., gory members. 2004). One could argue that feature generation is 174 Table 3: Top 5 false positives ordered by cluster in-strength per category. Most of the false positives are thematic in nature. For instance, false positives for BIRDS include beak, egg, nest, and whistle. Category 1 2 3 4 5 F RUIT fruit juicy pit pick summer V EGETABLES vegetable healthy puree sausage hotchpotch B IRDS bird beak nest whistle egg I NSECTS insect vermin beast crawl animal F ISH fish fishing rod slippery water M AMMALS rodent gnaw tail pen marten R EPTILES reptile scales animal tail amphibian C LOTHING clothing fashion blouse collar zipper K ITCHEN U T. cooking kitchen stove cooker hood burning M USICAL I NSTR . wind instrument to blow fanfare orchestra harmony T OOLS tools carpenter carpentry wood drill V EHICLES speed drive vehicle motor circuit W EAPONS sharp stab blade point stake a constrained version of the word association task, However, the fact that the only way to do so is to and the key difference is the number of thematic mimic all the restrictive characteristics of a fea- responses one gets in both procedures. Similarly, ture generation task (e.g., limited word set) is re- feature generation stimuli are usually restricted to vealing. Taxonomic information is not the primary concrete nouns, which places restrictions on what means by which the mental lexicon is organized: words can be grouped together. In other words, if it were, we should not have to resort to such the tendency to find taxonomic categories may be drastic restrictions in order to uncover taxonomic a result of restricting the task. categories. To test this idea, we used the word associa- In summary, even at the most detailed level of tion data to construct a network that included only the hierarchy, only limited evidence for a taxo- those 588 words that belonged to one of the tax- nomic view along the lines of Rosch was found, onomic categories. Moreover, in order to ap- even for typical taxonomic domains like animals. proximate the “shared features” measure that is These results suggest that in much of the previ- more typical of feature generation tasks, we com- ous work the pervasive contribution of affective puted the cosine similarity between pairs of words. and thematic or relational knowledge structuring That is, words that have the same associates are might be overlooked by a selection bias in terms deemed more similar, and this similarity was used of the concepts (nouns, mostly concrete) and se- to weight the edges in the restricted network.2 We mantic relations (predominantly taxonomic). This then applied the clustering procedure to this re- finding is in line with previous results indicat- stricted network and repeated the analysis from the ing that network derived similarity estimates ac- previous section. The F-statistics from this analy- count better for human thematic relatedness judg- sis are reported as the F 0 -values in Table 2. This ments than for taxonomic relatedness judgments time, the results of the clustering show a high de- (De Deyne et al., in press). In priming studies, gree of agreement with the taxonomic organiza- the dominance of thematic over taxonomic struc- tion, with an average F-value of 0.79. The only ture can also explain facilitation when thematic but exception was REPTILES, which upon inspection not coordinate prime-target pairs are used (Hutchi- appears to reflect a failure to distinguish REPTILES son, 2003). Finally, our findings converge with re- from INSECTS. cent evidence that highlights the role of thematic The success of this analysis suggests two things. representations even in domains such as animals First, the word association task does encode taxo- (Gentner and Kurtz, 2006; Lin and Murphy, 2001; nomic information, as evidenced by the fact that Wisniewski and Bassok, 1999) whereas previous we are able to reconstruct taxonomic categories. reports that have stressed taxonomic organization 2 Note that one could also derive such a similarity-based might be more exceptional as they are heavily cul- network for the complete lexicon, which would reflect the turally defined (Lopez et al., 1997), a consequence similarity between cues rather than their weighted associative strength. We did in fact do this. It produced similar results to of formal education (Sharp et al., 1979), or reflect the original analysis. different levels of expertise (Medin et al., 1997). 175 Acknowledgments [Medin et al.1997] Douglas L. Medin, Elizabeth B. Lynch, John D. Coley, and Scott Atran. 1997. Categorization and This research has been supported by an ARC grant reasoning among tree experts: Do all roads lead to rome? DE140101749 awarded to SDD. SV is a postdoctoral fellow Cognitive psychology, 32(1):49–96. at the Research Foundation - Flanders. A longer version of this work was also submitted to the 37th Annual meeting of [Moors et al.2012] Agnes Moors, Jan De Houwer, Dirk the Cognitive Science Society, Pasadena, 2015. We wish to Hermans, Sabine Wanmaker, Kevin van Schie, Anne- express our gratitude to Dan Navarro and Amy Perfors, who Laura Van Harmelen, Maarten De Schryver, Jeffrey De contributed to the longer version of this work. Winne, and Marc Brysbaert. 2012. Norms of valence, arousal, dominance, and age of acquisition for 4,300 dutch words. Behavior research methods, pages 1–9. References [Niedenthal et al.1999] Paula M. Niedenthal, Jamin B. Hal- [De Deyne et al.2013] Simon De Deyne, Daniel J. Navarro, berstadt, and Åse H. Innes-Ker. 1999. Emotional re- and Gert Storms. 2013. Better explanations of lexical sponse categorization. Psychological Review, 106(2):337. and semantic cognition using networks derived from con- tinued rather than single word associations. Behavior Re- [Rosch1973] Eleanor Rosch. 1973. Natural categories. Cog- search Methods, 45:480–498. nitive Psychology, 4:328–350. [De Deyne et al.2014] Simon De Deyne, Wouter Voorspoels, [Ruts et al.2004] Wim Ruts, Simon De Deyne, Eef Ameel, Steven Verheyen, Daniel J. Navarro, and Gert Storms. Wolf Vanpaemel, Timothy Verbeemen, and Gert Storms. 2014. Accounting for graded structure in adjective cat- 2004. Dutch norm data for 13 semantic categories and 338 egories with valence-based opposition relationships. Lan- exemplars. Behaviour Research Methods, Instruments, guage and Cognitive Processes, 29(5):568–583. and Computers, 36:506–515. [De Deyne et al.in press] Simon De Deyne, Steven Verheyen, [Samsonovic and Ascoli2010] Alexei V. Samsonovic and and Gert Storms. in press. The role of corpus-size and Giorgio A Ascoli. 2010. Principal semantic components syntax in deriving lexico-semantic representations for a of language and the measurement of meaning. PloS one, wide range of concepts. Quarterly Journal of Experimen- 5(6):e10921. tal Psychology. [Sharp et al.1979] Donald Sharp, Michael Cole, Charles Lave, Herbert P Ginsburg, Ann L Brown, and Lucia A [Gentner and Kurtz2005] Dedre Gentner and Kenneth J. French. 1979. Education and cognitive development: The Kurtz. 2005. Relational categories. In W. K. Ahn, R. L. evidence from experimental research. Monographs of the Goldstone, B. C. Love, A. B. Markman, and P. W. Wolff, society for research in child development, pages 1–112. editors, Categorization inside and outside the lab., pages 151–175. American Psychology Association. [Wisniewski and Bassok1999] Edward J. Wisniewski and M. Bassok. 1999. What makes a man similar to a tie? [Gentner and Kurtz2006] Dedre Gentner and Kenneth J. Cognitive Psychology, 39:208–238. Kurtz. 2006. Relations, objects, and the composition of analogies. Cognitive Science, 30:609–642. [Hutchison2003] Keith A. Hutchison. 2003. Is semantic priming due to association strength or feature overlap? Psychonomic Bulletin and Review, 10:785–813. [Lancichinetti et al.2011] Andrea Lancichinetti, Filippo Radicchi, José J Ramasco, and Santo Fortunato. 2011. Finding statistically significant communities in networks. PloS one, 6(4):e18961. [Lin and Murphy2001] Emilie L. Lin and Gregory L. Murphy. 2001. Thematic relations in adults’ concepts. Journal of Experimental Psychology: General, 1:3–28. [Lopez et al.1997] Alejandro Lopez, Scott Atran, John D Co- ley, Douglas L Medin, and Edward E Smith. 1997. The tree of life: Universal and cultural features of folkbio- logical taxonomies and inductions. Cognitive psychology, 32(3):251–295. [McRae et al.2005] Ken McRae, George S Cree, Mark S Sei- denberg, and Chris McNorgan. 2005. Semantic feature production norms for a large set of living and nonliving things. Behavior Research Methods, 37:547–559. [Medin and Rips2005] Douglas L. Medin and Lance J. Rips. 2005. Concepts and categories: memory, meaning, and metaphysics. In K. Holyoak and R. Morrison, editors, The Cambridge Handbook of Thinking and Reasoning, pages 37–72. Cambridge University Press, Cambridge, UK. 176