=Paper= {{Paper |id=Vol-1347/paper39 |storemode=property |title=Using network clustering to uncover the taxonomic and thematic structure of the mental lexicon |pdfUrl=https://ceur-ws.org/Vol-1347/paper39.pdf |volume=Vol-1347 |dblpUrl=https://dblp.org/rec/conf/networds/DeyneV15 }} ==Using network clustering to uncover the taxonomic and thematic structure of the mental lexicon== https://ceur-ws.org/Vol-1347/paper39.pdf
       Using network clustering to uncover the taxonomic and thematic
                       structure of the mental lexicon
                    Simon De Deyne                                      Steven Verheyen
                  University of Adelaide                               University of Leuven
                  School of Psychology                             Department of Psychology
                 5005 Adelaide, Australia                    Tiensestraat 102, 3000 Leuven, Belgium
            simon.dedeyne@adelaide.edu.au                         steven.verheyen@ppw.kuleuven.be



   While still influential, the view that concepts are          and allows for overlapping clusters. Similar to
organized as a hierarchical taxonomy as proposed                taxonomic theories of knowledge representation,
by Rosch (1973) has been challenged on several                  words are grouped in progressively larger clusters,
occasions. For example, some studies have at-                   which allows us to evaluate structural properties
tributed a larger role to thematic relations (Gentner           of the lexicon at different scales. This hierarchi-
and Kurtz, 2005; Lin and Murphy, 2001), whereas                 cal structure is also derived from the data by using
others have stressed the role of affect in structuring          a statistical criterion that involves a comparison
word meaning (Niedenthal et al., 1999). A com-                  with an appropriate null-model for the weighted
prehensive account of how these different princi-               directed graph.
ples shape and structure meaning in the lexicon is                 Applying OSLOM to the semantic network re-
missing, and most studies continue to be biased                 sulted in a solution with five hierarchical levels.
towards concrete noun categories that fit into hier-            An overview of this solution is shown in Table 1.
archical taxonomies (Medin and Rips, 2005). To                  There was a large degree of variability in the num-
capture mental or psychological properties that or-             ber of clusters across the five hierarchical levels
ganize the lexicon for a wide range of concepts                 ranging between 2 and 506 clusters. On aver-
and semantic relations, we propose a large-scale                age, the p-value of the extracted clusters was low
semantic network derived from word associations                 across all levels, indicating that the obtained clus-
as the basis to uncover what the structural princi-             ters were unlikely to arise in a comparable random
ples are.                                                       network1 . There were few homeless nodes at any
                                                                level, indicating that most words were reliably at-
1   Network Clustering                                          tributed to a specific cluster. There was also a con-
Since this is one of the first times the mental lex-            siderable degree of overlap at all levels relative to
icon is mapped in its entirety using an extremely               the size of the clusters; clusters were more distinct
extensive word association corpus, an exploratory               at the more precise levels, where more clusters
approach is warranted. To achieve this, network                 were obtained. For instance, at the lowest level
clustering was used as a way to study how the                   1,676 words appeared in multiple clusters, com-
mental lexicon can be structured at different scales            pared to 5,943 at the highest level.
and what type of semantic relations dominate its                   Figure 1 illustrates the obtained clusters with
structure. At the basis lies a semantic network de-             the most prototypical examples of each cluster at
rived from a large scale word association corpus                various levels. At the most general level, Figure 1
including over 12,000 cues and 3.77 million re-                 shows two distinct clusters, with one of them con-
sponses (De Deyne et al., 2013). For the purpose                taining highly central words with a negative con-
of this study, non-dominant word forms were re-                 notation. In order to verify whether this interpreta-
moved (e.g., apples was removed if apple was also               tion is supported statistically, we used the valence
present) resulting in a network of 11,000 words.                judgments reported by Moors et al. (2012), which
Next, the recent Order Statistics Local Optimiza-                   1 Default parameters were used in the OSLOM algorithm,
tion Method (OSLOM) was applied to identify sta-                except for the p cut-off value. Setting this value depends on
tistically reliable clusters in a directed weighted             the task as it affects the size of the clusters (Lancichinetti
word associations network (Lancichinetti et al.,                et al., 2011). In this application, the cutoff was set at 0.25,
                                                                because the few clusters in the final solution with high p-
2011). This method includes words in the final                  values were easy to interpret. Other values of p did not alter
cluster solution on the basis of statistical criteria           the general pattern of results we report here.


                  Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
 In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
                           Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org
                                                          172
Figure 1: Hierarchical tree visualization of clusters in the lexicon with five most central members in
terms of cluster in-strength.


                                                                       increasingly more concrete. For instance, Level
Table 1: Overview of the hierarchical cluster structure
showing five levels (Level 1 is broadest, Level 5 is most pre-         2 shows that the “negative” cluster in Level 1 in-
cise). The statistics include total number of clusters N, av-          cludes clusters with abstract words or words re-
erage cluster size hNc i and its standard deviation, number of         lated to human culture (school, money, religion,
homeless nodes Nhomeless , number of nodes member of mul-
tiple clusters Noverlapping , and the average p-value hpi.             time,...) which are now differentiated from a
                                                                       purely negative cluster with central members like
                     1         2       3        4        5
                                                                       negative, sad, and crossed. The subdivisions of the
  N                  2        7       37      161      506             “positive” cluster involve the central nodes nature,
  hNc i           8588    3049      515       112       25
  sd(Nc )         2112      973     364        66       12             music, sports, and food, which might be inter-
  Nhomeless         18       18       39       86      380             preted as covering sensorial information and natu-
  Noverlapping    5943    6956     5263     4717     1676              ral kinds.
  hpi                0    0.062     0.04    0.035    0.051


are applicable to 3,642 non-overlapping words                             At the lowest level, 506 clusters were identi-
in our clusters. The valence judgments differed                        fied, with an average size of 25 words. A total of
significantly between our two clusters according                       1,676 words occurred in multiple clusters; at least
to an independent t-test (t(3640) = 7.367, CI =                        a part of them because of homonymy (e.g., bank)
[0.190,0.327]). This post-hoc test confirmed our                       or polysemy (e.g., language, assigned to clusters
interpretation of a valence difference between the                     about nationality, speech, language education, and
clusters, which brings further support to studies                      communication). Most importantly, inspection of
that indicated valence is the most important di-                       the content of all clusters exhibited a widespread
mension in semantic space (De Deyne et al., 2014;                      thematic structure: the clusters were often com-
Samsonovic and Ascoli, 2010) and empirical find-                       posed of both nouns (racket), adjectives (loud),
ings highlighting affect-based category structure                      and verbs (to sound), which does not reflect a pure
(Niedenthal et al., 1999).                                             taxonomy of entitities, but also includes properties
   At Levels 2 to 4, the meaning clusters become                       and actions.




                                                                 173
2   Evaluating Taxonomic Structure
                                                               Table 2: F-values and cluster sizes for items gen-
To test whether the clusters provide evidence for              erated for 13 concrete noun categories. Nhuman is
a hierarchical taxonomic view along the lines of               the category size based on the exemplar genera-
Rosch and colleagues (Rosch, 1973) or support an               tion task; Nc is the size of the best-matching clus-
alternative view based on thematic relations iden-             ter; F captures precision and recall according to
tified in the previous section, data from an exem-             the human categories for the full network. F 0 is
plar generation task from Ruts et al. (2004) was               calculated from a network that excluded potential
used. In this task, 100 participants generated as              thematic information. F-values are fairly low, in-
many exemplars they could think of for six ar-                 dicating lack of correspondence between the clus-
tifact categories (C LOTHING, K ITCHEN U TEN -                 ters and the taxonomic categories. Excluding the-
SILS , M USICAL I NSTRUMENTS , T OOLS , V EHI -                matic information results in F 0 values that do cap-
CLES, and W EAPONS ) and seven natural kinds                   ture taxonomic information.
categories (F RUIT, V EGETABLES, B IRDS, I N -                  Category           Nhuman       Nc       F       F0
SECTS , F ISH , M AMMALS , and R EPTILES ). If the
                                                                F RUIT                 40       50     0.47    0.84
clusters in the word association network group to-              V EGETABLES            35       58     0.50    0.90
gether different types of birds, vehicles, fruits, and          B IRDS                 53       63     0.53    0.90
                                                                I NSECTS               40       34     0.46    0.68
so on, this would indicate a taxonomic organiza-                F ISH                  37       48     0.57    0.91
tion of semantic memory. For each category, we                  M AMMALS               61       21     0.20    0.76
investigated the size of the best matching cluster              R EPTILES              21       22     0.65    0.51
and calculated precision and recall in terms of the             Mean                   41       42     0.48    0.79
F-measure for clustering performance.
                                                                C LOTHING              46       70     0.35    0.80
    A taxonomic-like organization would be evi-                 K ITCHEN U T.          71       18     0.20    0.66
dent in clusters with high precision and recall, re-            M USIC I NSTR .        46       24     0.37    0.89
sulting from many true positives and few false pos-             T OOLS                 73       56     0.25    0.76
                                                                V EHICLES              46       28     0.16    0.73
itives and false negatives. For instance, if the clus-          W EAPONS               46       25     0.37    0.88
ter corresponding to the category BIRDS contained
                                                                Mean                   55       37     0.28    0.79
robin (a true positive) and did not contain spoon
(a true negative), that would increase the F-score.               Inspecting the false positives for each of the
Conversely, if it contained guitar (a false positive)          clusters in Table 3 confirms the validity of the ap-
or did not contain ostrich (a false negative), that            proach as in the majority of the cases the superor-
would decrease the F-score. This way, high F-                  dinate label (e.g., fruit, tools, etc.) was the most
scores should reflect categories that are not overly           central member of each cluster. The remaining
specific (many false negatives) or general (many               intrusions were thematic in nature (e.g., F RUIT:
false positives).                                              pick, B IRDS: nest), thus confirming our earlier ex-
    On average, the best matching clusters were                ploratory findings.
found at Level 5. The results for each category are
shown in Table 2. The average number of mem-                      One potential response to the previous analyses
bers in the exemplar generation task was on av-                relates to the nature of the data upon which they
erage 41 for the seven natural kinds categories,               are based. Perhaps the word association task sim-
which is in the same range as the average best                 ply fails to capture taxonomic information, and if
matching cluster size of 42. For artifacts the gener-          so, the results of these analyses are simply an ar-
ated categories included on average 55 members,                tifact of the choice of task. Alternatively, perhaps
which was somewhat larger than the obtained av-                the “failure” arises because the word association
erage cluster size of 37.                                      task is more general than the tasks typically used
    The resulting F-values were on average 0.48 for            to study taxonomic categories.
the natural categories and 0.28 for the artifacts, in-            There is some evidence that a different choice of
dicating only limited support for the presence of              task would produce different results. For instance,
taxonomic categories. The highest values were ob-              much of the work on taxonomic organization re-
tained for F ISH (F = .57) and R EPTILES (F = .65)             lies on tasks in which participants are asked to list
where most items in the clusters were true cate-               features of entities (McRae et al., 2005; Ruts et al.,
gory members.                                                  2004). One could argue that feature generation is




                                                         174
Table 3: Top 5 false positives ordered by cluster in-strength per category. Most of the false positives are
thematic in nature. For instance, false positives for BIRDS include beak, egg, nest, and whistle.

  Category                             1                    2                     3                4               5
  F RUIT                              fruit                 juicy                  pit            pick          summer
  V EGETABLES                     vegetable              healthy                 puree          sausage        hotchpotch
  B IRDS                              bird                  beak                  nest          whistle            egg
  I NSECTS                          insect               vermin                  beast           crawl           animal
  F ISH                               fish                fishing                 rod           slippery          water
  M AMMALS                          rodent                 gnaw                   tail             pen           marten
  R EPTILES                         reptile                scales               animal             tail        amphibian
  C LOTHING                        clothing              fashion                blouse           collar          zipper
  K ITCHEN U T.                    cooking               kitchen                 stove        cooker hood       burning
  M USICAL I NSTR .            wind instrument           to blow                fanfare        orchestra        harmony
  T OOLS                             tools              carpenter              carpentry         wood             drill
  V EHICLES                         speed                  drive                vehicle          motor           circuit
  W EAPONS                           sharp                   stab                blade            point           stake


a constrained version of the word association task,                      However, the fact that the only way to do so is to
and the key difference is the number of thematic                         mimic all the restrictive characteristics of a fea-
responses one gets in both procedures. Similarly,                        ture generation task (e.g., limited word set) is re-
feature generation stimuli are usually restricted to                     vealing. Taxonomic information is not the primary
concrete nouns, which places restrictions on what                        means by which the mental lexicon is organized:
words can be grouped together. In other words,                           if it were, we should not have to resort to such
the tendency to find taxonomic categories may be                         drastic restrictions in order to uncover taxonomic
a result of restricting the task.                                        categories.
   To test this idea, we used the word associa-                             In summary, even at the most detailed level of
tion data to construct a network that included only                      the hierarchy, only limited evidence for a taxo-
those 588 words that belonged to one of the tax-                         nomic view along the lines of Rosch was found,
onomic categories. Moreover, in order to ap-                             even for typical taxonomic domains like animals.
proximate the “shared features” measure that is                          These results suggest that in much of the previ-
more typical of feature generation tasks, we com-                        ous work the pervasive contribution of affective
puted the cosine similarity between pairs of words.                      and thematic or relational knowledge structuring
That is, words that have the same associates are                         might be overlooked by a selection bias in terms
deemed more similar, and this similarity was used                        of the concepts (nouns, mostly concrete) and se-
to weight the edges in the restricted network.2 We                       mantic relations (predominantly taxonomic). This
then applied the clustering procedure to this re-                        finding is in line with previous results indicat-
stricted network and repeated the analysis from the                      ing that network derived similarity estimates ac-
previous section. The F-statistics from this analy-                      count better for human thematic relatedness judg-
sis are reported as the F 0 -values in Table 2. This                     ments than for taxonomic relatedness judgments
time, the results of the clustering show a high de-                      (De Deyne et al., in press). In priming studies,
gree of agreement with the taxonomic organiza-                           the dominance of thematic over taxonomic struc-
tion, with an average F-value of 0.79. The only                          ture can also explain facilitation when thematic but
exception was REPTILES, which upon inspection                            not coordinate prime-target pairs are used (Hutchi-
appears to reflect a failure to distinguish REPTILES                     son, 2003). Finally, our findings converge with re-
from INSECTS.                                                            cent evidence that highlights the role of thematic
   The success of this analysis suggests two things.                     representations even in domains such as animals
First, the word association task does encode taxo-                       (Gentner and Kurtz, 2006; Lin and Murphy, 2001;
nomic information, as evidenced by the fact that                         Wisniewski and Bassok, 1999) whereas previous
we are able to reconstruct taxonomic categories.                         reports that have stressed taxonomic organization
    2 Note that one could also derive such a similarity-based            might be more exceptional as they are heavily cul-
network for the complete lexicon, which would reflect the                turally defined (Lopez et al., 1997), a consequence
similarity between cues rather than their weighted associative
strength. We did in fact do this. It produced similar results to         of formal education (Sharp et al., 1979), or reflect
the original analysis.                                                   different levels of expertise (Medin et al., 1997).




                                                                   175
 Acknowledgments                                                   [Medin et al.1997] Douglas L. Medin, Elizabeth B. Lynch,
                                                                      John D. Coley, and Scott Atran. 1997. Categorization and
 This research has been supported by an ARC grant                     reasoning among tree experts: Do all roads lead to rome?
 DE140101749 awarded to SDD. SV is a postdoctoral fellow              Cognitive psychology, 32(1):49–96.
 at the Research Foundation - Flanders. A longer version of
 this work was also submitted to the 37th Annual meeting of        [Moors et al.2012] Agnes Moors, Jan De Houwer, Dirk
 the Cognitive Science Society, Pasadena, 2015. We wish to            Hermans, Sabine Wanmaker, Kevin van Schie, Anne-
 express our gratitude to Dan Navarro and Amy Perfors, who            Laura Van Harmelen, Maarten De Schryver, Jeffrey De
 contributed to the longer version of this work.                      Winne, and Marc Brysbaert. 2012. Norms of valence,
                                                                      arousal, dominance, and age of acquisition for 4,300 dutch
                                                                      words. Behavior research methods, pages 1–9.
 References
                                                                   [Niedenthal et al.1999] Paula M. Niedenthal, Jamin B. Hal-
[De Deyne et al.2013] Simon De Deyne, Daniel J. Navarro,               berstadt, and Åse H. Innes-Ker. 1999. Emotional re-
    and Gert Storms. 2013. Better explanations of lexical              sponse categorization. Psychological Review, 106(2):337.
    and semantic cognition using networks derived from con-
    tinued rather than single word associations. Behavior Re-      [Rosch1973] Eleanor Rosch. 1973. Natural categories. Cog-
    search Methods, 45:480–498.                                        nitive Psychology, 4:328–350.

[De Deyne et al.2014] Simon De Deyne, Wouter Voorspoels,           [Ruts et al.2004] Wim Ruts, Simon De Deyne, Eef Ameel,
    Steven Verheyen, Daniel J. Navarro, and Gert Storms.               Wolf Vanpaemel, Timothy Verbeemen, and Gert Storms.
    2014. Accounting for graded structure in adjective cat-            2004. Dutch norm data for 13 semantic categories and 338
    egories with valence-based opposition relationships. Lan-          exemplars. Behaviour Research Methods, Instruments,
    guage and Cognitive Processes, 29(5):568–583.                      and Computers, 36:506–515.

[De Deyne et al.in press] Simon De Deyne, Steven Verheyen,         [Samsonovic and Ascoli2010] Alexei V. Samsonovic and
    and Gert Storms. in press. The role of corpus-size and             Giorgio A Ascoli. 2010. Principal semantic components
    syntax in deriving lexico-semantic representations for a           of language and the measurement of meaning. PloS one,
    wide range of concepts. Quarterly Journal of Experimen-            5(6):e10921.
    tal Psychology.
                                                                   [Sharp et al.1979] Donald Sharp, Michael Cole, Charles
                                                                       Lave, Herbert P Ginsburg, Ann L Brown, and Lucia A
[Gentner and Kurtz2005] Dedre Gentner and Kenneth J.
                                                                       French. 1979. Education and cognitive development: The
    Kurtz. 2005. Relational categories. In W. K. Ahn, R. L.
                                                                       evidence from experimental research. Monographs of the
    Goldstone, B. C. Love, A. B. Markman, and P. W. Wolff,
                                                                       society for research in child development, pages 1–112.
    editors, Categorization inside and outside the lab., pages
    151–175. American Psychology Association.                      [Wisniewski and Bassok1999] Edward J. Wisniewski and
                                                                       M. Bassok. 1999. What makes a man similar to a tie?
[Gentner and Kurtz2006] Dedre Gentner and Kenneth J.                   Cognitive Psychology, 39:208–238.
    Kurtz. 2006. Relations, objects, and the composition of
    analogies. Cognitive Science, 30:609–642.

[Hutchison2003] Keith A. Hutchison. 2003. Is semantic
    priming due to association strength or feature overlap?
    Psychonomic Bulletin and Review, 10:785–813.

[Lancichinetti et al.2011] Andrea Lancichinetti,       Filippo
    Radicchi, José J Ramasco, and Santo Fortunato. 2011.
    Finding statistically significant communities in networks.
    PloS one, 6(4):e18961.

[Lin and Murphy2001] Emilie L. Lin and Gregory L. Murphy.
    2001. Thematic relations in adults’ concepts. Journal of
    Experimental Psychology: General, 1:3–28.

[Lopez et al.1997] Alejandro Lopez, Scott Atran, John D Co-
    ley, Douglas L Medin, and Edward E Smith. 1997. The
    tree of life: Universal and cultural features of folkbio-
    logical taxonomies and inductions. Cognitive psychology,
    32(3):251–295.

[McRae et al.2005] Ken McRae, George S Cree, Mark S Sei-
   denberg, and Chris McNorgan. 2005. Semantic feature
   production norms for a large set of living and nonliving
   things. Behavior Research Methods, 37:547–559.

[Medin and Rips2005] Douglas L. Medin and Lance J. Rips.
   2005. Concepts and categories: memory, meaning, and
   metaphysics. In K. Holyoak and R. Morrison, editors, The
   Cambridge Handbook of Thinking and Reasoning, pages
   37–72. Cambridge University Press, Cambridge, UK.




                                                                 176