Deriving Word Association Networks from Text Corpora
               David Galea (dp.galea@student.qut.edu.au) and Peter Bruza (p.bruza@qut.edu.au)
                                                      Information Systems School
                                                  Queensland University of Technology
                                                            2 George Street
                                                       Brisbane, QLD 4000 AUS


                             Abstract                                       the ai ’s denote associates. An arrow, e.g., t → a1 represents
                                                                            that associate a1 was produced in a free association experi-
  This article presents and evaluates a model to automatically de-          ment in respect to target t. Table 1 shows the corresponding
  rive word association networks from text corpora. Two aspects
  were evaluated: To what degree can corpus-based word associ-              adjacency matrix for this example network. When collected
  ation networks (CANs) approximate human word association                  over a subject pool, the edges can be weighted, e.g., by the
  networks with respect to (1) their ability to quantitatively pre-         probability that a given associate is produced in relation to a
  dict word associations and (2) their structural network charac-
  teristics. Word association networks are the basis of the hu-             cue. Such networks are referred to as free association net-
  man mental lexicon. However, extracting such networks from                works (FANs). FANs have formed the basis of human mem-
  human subjects is laborious, time consuming and thus neces-               ory models such as Spreading Activation (Collins & Loftus,
  sarily limited in relation to the breadth of human vocabulary.
  Automatic derivation of word associations from text corpora               1975) and Processing Implicit and Explicit Representations
  would address these limitations. In both evaluations corpus-              (PIER) (Nelson, Schreiber, & McEvoy, 1992; Nelson, Kitto,
  based processing provided vector representations for words.               Galea, McEvoy, & Bruza, 2013).
  These representations were then employed to derive CANs us-
  ing two measures: (1) the well known cosine metric, which                    FANs have the following structural characteristics:
  is a symmetric measure, and (2) a new asymmetric measure
  computed from orthogonal vector projections. For both eval-          R1 The edges are directed, hence allowing for asymmetric as-
  uations, the full set of 4068 free association networks (FANs)          sociations between words.
  from the University of South Florida word association norms
  were used as baseline human data. Two corpus based mod-              R2 The target word has an edge with each associate in the net-
  els were benchmarked for comparison: a latent topic model
  and latent semantic analysis (LSA). We observed that CANs               work.
  constructed using the asymmetric measure were slightly less          R3 The edges are weighted.
  effective than the topic model in quantitatively predicting free
  associates, and slightly better than LSA. The structural net-
  works analysis revealed that CANs do approximate the FANs                 FANs are derived manually which is time consuming and la-
  to an encouraging degree.                                                 bor intensive. They are therefore restricted in relation to the
  Keywords: semantic networks; free association networks;                   breadth of vocabulary in human language and challenging to
  corpus-based semantic representation
                                                                            keep up-to-date as language and associations evolve. The aim
                                                                            of this paper is investigate to what degree corpus-based se-
                         Introduction                                       mantic methods can be used to approximate FANs in relation
The mental lexicon is a mental dictionary of words, but its                 to both their structural network characteristics and their abil-
structure is founded on the associative links that bind these               ity to quantitatively predict human word associations. We
words together. Such links are acquired through experience                  shall refer to such networks as Corpus Based Association
and the vast and semi-random nature of this experience en-                  Networks (CANS).
sures that words within the lexicon are highly interconnected,
both directly and indirectly through other words. For exam-
ple, during childhood development and the associated acqui-
sition of English, the word planet becomes associated with
earth, space, moon, and so on. Even within this set, moon
can itself become linked to earth and star etc. Words are so
associatively interconnected with each other that they meet
the qualifications of a ‘small world’ network wherein it takes
only a few steps to move from any one word to any other
in the lexicon (Steyvers & Tennenbaum, 2005). Because
of such connectivity individual words are not represented in                     Figure 1: Example of a Free Association Network
long-term memory as isolated entities but as part of a net-
work of related words. One approach to extract such net-                           Corpus Based Association Networks
work is to employ a target as a cue and collect free associ-                A CAN comprises nodes, which correspond to words, and di-
ations from human subjects (Nelson, McEvoy, & Schreiber,                    rected weighted edges, which model the associations between
2004; Simon, Navarro, & Storms, 2013). For example, Fig-                    words. We begin by describing how the nodes of a CAN are
ure 1 depicts such a network where t is the target word and                 constructed.


                                                                      252
                          t      a1    a2                                                                      hu, vi
                                                                                              cos (u, v) =                           (1)
                     t    0      0.2   0.1                                                                   ||u||||v||
                     a1   0      0     0.6
                     a2   0.7    0     0                                 As pointed out previously, PPMI vector representations exist
                                                                         in the first orthant. Consequently the standard boundaries for
Table 1: Example adjacency matrix of the FAN depicted in                 the cosine metric being [−1, 1] are transformed to [0, 1] and
Figure 1                                                                 can be interpreted as a normalized measure of strength, where
                                                                         0 represents no relationship between words u and v and 1 rep-
                                                                         resents a perfect synonymous relationship. In having a nor-
Vector Representations of Words                                          malized measure, requirement R3 is satisfied. Unfortunately,
Each word u, (i.e., a node) has a vector based representation            as cosine is a metric, its associations are necessarily symmet-
u, where the vector has been computed from an underlying                 ric meaning cos(u, v) = cos(v, u). This violates characteristic
corpus. There are a variety of strategies to produce such vec-           R1 specified above. In order to satisfy R1, a measure is re-
tors (Bullinaria & Levy, 2007), which are sometimes referred             quired that permits asymmetric associations between words.
to as “semantic vectors” due to their ability to replicate hu-           The topic model (Griffiths, Steyvers, & Tenenbaum, 2007)
man semantic association norm data (Dumais, 2004; Lund &                 used conditional probabilities to achieve this. For example
Burgess, 1996; Turney & Pantel, 2011).                                   the strength of association from word u to v is computed
   We used a Positive Pointwise Mutual Information (PPMI)                as Pr(u|v) and the strength of reverse relation is computed
vector representation because of its robust performance                  Pr(v|u). Note that these probabilities need not be the same
across a variety of linguistic and semantic tasks (Bullinaria            which thus allows for asymmetry in the associations between
& Levy, 2007). PPMI vectors are derived from discrete prob-              these two words. In this paper, however, we will build on a
ability distributions built from word co-occurrence statistics.          word association measure based on projection (Pothos, Buse-
In our case, these discrete probability distributions are built          meyer, & Trueblood, 2013). Initially, a simple orthogonal
from a modified version of a standard word co-occurrence                 vector projection was considered:
matrix where the rows correspond to a set of pre-defined tar-                                                hu, vi
                                                                                                  P (u, v) =                         (2)
get words. The co-occurrence frequencies of a given target                                                    ||v||
word with other words are computed using a sliding win-
dow of fixed size (denoted w) across the corpus where sen-               Exploration of this measure shows that it is bound between
tence and paragraph boundaries are ignored. Context words                [0, ||u||], where 0 represents no relationship and ||u|| repre-
are those words surrounding the target word when it is cen-              sents a perfect synonymous relationship. Although not nor-
tered in the window. The frequency of each context word is               malized this does preserve rank when comparing multiple
accumulated as the window slides across the corpus. In this              v’s to u. Unfortunately, when comparing multiple v’s to
process, stop words are ignored. The frequencies are subse-              different
                                                                                    u’s, say u1 , u2 we arrive at two sets of bounds,
quently normalized to produce a probability distribution for              0, ||u1 || & 0, ||u2 || , which destroys rank equivalence (un-
the given target word. As a consequence all vector elements              less ||u1 || = ||u2 ||). To overcome this undesirable property,
are positive real values, and thus exist in the first orthant of         the GP measure was developed in which the relative differ-
Euclidean Space. This property has important consequences                ence between v and the length of the projection of u onto v is
for the bounds for the word association measures to be dis-              taken into account:
                                                                                             ( P(u,v)
cussed in the next section.For this analysis, both target and                                    ||v||                 : P (u, v) < ||v||
                                                                              GP (u, v) =              ||v||
context words were treated as single tokens. Furthermore the                                    1 + ||u|| − cos (u, v) : P (u, v) ≥ ||v||
window size was not explored as part of this analysis.
                                                                         From a technical point of view, GP is not a metric, but a
Measures of Association S(u, v)                                          pre-metric. As was the case with cosine, GP is also bound
The preceding section described how the nodes of a CAN                   from [0, 1] and can be interpreted as a normalised measure of
are represented via corpus-based vectors. These vectors are              strength (thus satisfying R3). Furthermore, it permits asym-
used to compute weighted associations between words thus                 metric associations between words meaning GP (u, v) is not
providing the means to derive edges for CANs. For this paper,            necessarily equal to GP (v, u), thus satisfying R1.
we have utilized one well known metric: the cosine metric as
well as introducing a new measure of association called the              Constructing Corpus Based Association Networks
GP measure.                                                              This section describes an abstract algorithm to compute a
   The cosine metric was chosen as a baseline as it is of-               CAN using the notation shown in Table 3. A CAN is based
ten used to compute vector based associations, e.g., in the              around a target word t.
Latent Semantic Analysis model where it has shown consis-                   The first step is to compute the list of associates tA based
tently good performance in computing associations between                on t. In order to compute this list, the vector representation
words across a number of studies and text corpora (Landauer,             t is compared to the vector representation of all other words,
Foltz, & Laham, 1998).                                                   v (v ∈ V ) using a measure of association S(u, v), which can


                                                                   253
be either cosine, or GP. For an associate to be added to the
                                                                                                     Table 3: Notation
list, the strength of association must be greater or equal to a
threshold value: S (u, v) ≥ Sτ . This ensures the target has an
association with all associates in tA thus satisfying require-                 u         A word u
ment R2. The threshold is a parameter which is empirically                     u         The vector representation for u
set per measure (cosine or GP).                                                uA        The set of associates for u.
   A word t’s tN is constructed by taking t’s tA and comput-                   mna       The maximum number of associates permitted in uA
ing the strengths between each directed pair (u, v) u 6= v and                 S(u, v)   Method to measure the strength between u, v
including those strengths in which S (u, v) ≥ Sτ . The results                 Sτ        Minimum threshold value for for S(u, v).
are stored in tM so that tM (u, v) = S (u, v). This process is                 uN        Word Association Network for u
formalized by Algorithm 0.1                                                    uM        Adjacency Matrix used to represent uN
                                                                               t         A target word
Algorithm 0.1: CAN(t,tA )                                                      T         Set of Target Words, T ⊂ V
                                                                               V         Vocabulary of Words
 tA = tA ∪ t
        u ∈ tA
 for each
              v ∈ tA , v 6= u
       for each
                                                                              Quantitative Prediction of Word Associations
   do          if S (u, v) ≥ Sτ
        do
                 then tM (u, v) = S (u, v)                                    In order to evaluate the quality of associations in CANs we
                                                                              analyzed the degree to which free associates from the USF
   Consider the following example, where a target word t                      norms were appearing in the associate list tA for a all targets t.
and the associate list tA = {a1 , a2 } and assume the follow-                To this end we adopt the approach and corpus used to evaluate
ing two associations are above the threshold: S a2 , a1 =                    the Topic Model (Griffiths et al., 2007).
S1,2 ≥ Sτ and S a2 ,t = S2,t ≥ Sτ and that all other associ-                  Materials In generating the vector representations, the
ations S (a, b) = 0. Applying Algorithm 0.1, the first step is                Touchstone Applied Science Associates (TASA) corpus. was
to add the target t as a default element to its associate list, i.e.,         used with a standard stop word list. This corpus comprises
tA = {t, a1 , a2 }. The next step is to consider the associations             916060 documents. The set of target words T comprised
that each member of tA has with one another and keep those                    the full 4068 target words present in the University of South
for which S (a, b) ≥ Sτ                                                       Florida (USF) word association norms (Nelson et al., 2004).
                           
  u = t , v = a1 : S t, a1 = St,1 ≥ Sτ → tM (t, a1 ) = St,1                   The baseline models for comparison are the Topic Model
                                                                              (Griffiths et al., 2007) and Latent Semantic Analysis (LSA)
                            
          v = a2 : S w, a2 = St,2 ≥ Sτ → wM (t, a2 ) = St,2
                                                                              (Dumais, 2004). The Topic Model is a corpus based ap-
                                                                             proach to semantic representation which ascribes probabili-
 u = a1 , v = t : S a1 ,t = 0 → tM (a1 ,t) = 0                                ties to words with respect to latent contexts called “topics”.
                                                                              The model allows asymmetric words associations to be com-
                           
        v = a2 : S a1 , a2 = 0 → tM (a1 , a2 ) = 0
                                                                              puted and has been evaluated on the USF word association
                                                                             norms. The LSA Model was chosen as it a common corpus
 u = a2 , v = u : S a2 ,t = S2,w ≥ Sτ → tM (a1 ,t) = S2,t                     based benchmark that uses the cosine metric.
                           
        v = a1 : S a2 , a1 = S2,1 ≥ Sτ → tM (a1 , a2 ) = S2,1                 Procedure The procedure involves taking each of the 4068
                                                                              target words and computing the PPMI vector representation
The matrix returned by the algorithm tM is,                                   using the method described in section “Vector Representa-
             Table 2: Adjacency Matrix (tM ) for t                            tions of Words”. The size of the resulting vocabulary V was
                                                                              47059 words, which is the dimensionality of the vector rep-
                            t      a1      a2                                 resentations. The vocabulary was constructed by taking all
                      t     0      St,1    St,2                               words in the TASA corpus (not including stop words) and
                      a1    0      0       0                                  only considering those with a term frequency greater than
                      a2    S2,t   S2,1    0                                  10 (as used with the Topic Model). Thereafter, the associate
                                                                              strength between the target and all words of the vocabulary
                                                                              is computed. This list is then sorted (in descending order)
                                                                              by associate strength and then the rank/position of the target
                  Empirical Evaluation                                        word’s first associate is found. The first associate is the as-
The evaluation aims to address two questions: To what degree                  sociate of the target word (from the USF data) that has the
CANs approximate FANs with respect to (1) their ability to                    strongest forward relationship. For example, in Fig. 1, a1 has
quantitatively predict human word associations and (2) their                  the strongest forward relationship to t being S (t, a1 ) = 0.2
structural network characteristics.                                           and thus would be the first associate for t. The probability of


                                                                        254
finding the first associate within the top m associates is com-         Procedure A PPMI vector representation for each target
puted using: Pr (m) = nnmT , where nm is the number of first            word was computed using the method described in section
associates produced whose rank ≤ m and nT is the number of              “Vector Representations of Words”. The size of the resulting
words in the corpus.                                                    vocabulary V was 255460 words, which is the dimensionality
   The cosine and the GP pre-metric were eval-                          of the vector representations. The procedure involved gener-
uated in this way for 6 different values of m                           ating a CAN for each target word using Algorithm 0.1 with
(m ∈ M = {1, 5, 10, 25, 50, 100}) and the results com-                  GP as the measure used to compute the associations. (CANs
pared with published results of LSA and the Topic Model                 were not constructed with cosine as this measure is symmet-
documented in (Griffiths et al., 2007). In order to determine           ric) The CANs were generated with mna ≤ 50, where mna
the best performance a simple method was introduced which               refers to the maximum number of associates a target can have
sums the probabilties across the different values of m:                 in it’s CAN. This value was chosen because it is the maxi-
P = ∑m∈M Pr (m). Best performing results for CAN (cosine)               mum number associates encountered across all target words
are reported with window size w = 3. For CAN (GP) the best              in the USF word association norms.
performing results were achieved with w = 6.                               The structural network characteristics (see Table 4) used
                                                                        for evaluation are derived from the CAN’s adjacency matrix
                                                                        (tM ). These characteristics are well known in network analy-
                                                                        sis and have been used to analyze the USF word association
                                                                        norms (Steyvers & Tennenbaum, 2005) The mean, median
                                                                        and standard deviation (sample size=4068) are calculated for
                                                                        each of these network characteristics. The standard deviation
                                                                        is used to assess the stability of the mean and median.

                                                                                      Table 4: Structural Characteristics


                                                                         n         The number of nodes in the network.
                                                                         d         The network density.
Figure 2: Probabilities for producing the first USF associate
                                                                         L         The average minimum distance between nodes.
modulo the size of the associate list m
                                                                         <k>       The average number of connections for each node.
Results The results are presented in the Fig 2, the P val-               C         The clustering coefficient for the network.
ues for each of the four methods are: PCAN−COS = 2.7155,
PLSA−COS = 2.4568, PTopic−Model = 2.7818, PCAN−GP =                                    Table 5: Network Dimension (n)
2.5932.
   Of the four, the Topic Model produces the best results fol-                                         USF     GP
lowed closely by the CAN (cosine). In comparing both of the                                Mean        14      16.23
baseline methods, CAN (cosine) outperforms LSA. In com-                                    Median      14      14
paring the asymmetric measures, the Topic Model is slightly                                St Dev      4.7     10.89
more effective than CAN (GP). Given that we are primarily
interested in the asymmetric measures of association, we ob-            Results Table 5 shows that the GP measure has strengths
serve that the performance of the Topic Model for first as-             and weakness in replicating the Network Dimension n of the
sociates for lower m values is considerably better than CAN             FANs. Whilst CANs over-fit the mean, they produce a per-
(GP), however this behavior is not continued for larger m val-          fect median value. There is a quite large standard deviation,
ues in which the CAN (GP) approaches and then slightly su-              which may be due to the fact that it is much easier to estab-
persedes the effectiveness of the Topic Model.                          lish associations in corpus based processing than humans are
                                                                        able to in free association experiments. We can conclude that
Comparison of CANs vs FANs using structural
                                                                        whilst the CANs ability to replicate FANs is quite good, there
network characteristics                                                 is a larger spread in the numbers of nodes.
Materials The corpus used for testing was Wikipedia 2008                   Table 6 shows that the mean and median Network Density
which comprises 61998051 documents. Wikipedia was cho-                  d of FANs is closely matched by the CANs. Not only it is a
sen and it allows the CAN algorithm to be tested on a very              great predictor of the mean and median, it’s standard devia-
large corpus of text. The set of target words T used was the            tion is relatively small indicating stability.
4068 target words present in the University of South Florida               Table 7 reveals that the mean and median average minu-
(USF) word association norms (Nelson et al., 2004). Each                mum distance between nodes in FANs is under-fitted by the
word has a corresponding PPMI vector representation using               CANs, but produces a stable result. This is to be expected
the method described in section . The baseline for compari-             given the structure of the USF FANs. These FANs are gener-
son are the 4068 FANs in the USF norms.                                 ally quite sparse except in two areas, firstly all associates have


                                                                  255
                Table 6: Network Density (d)                                           Table 9: Clustering Coefficient (C)

                               USF     GP                                                               USF      GP
                    Mean       0.23    0.2                                                   Mean       0.44     0.31
                    Median     0.21    0.15                                                  Median     0.43     0.32
                    St Dev     0.11    0.14                                                  St Dev     0.10     0.16


                                                                         networks for each node in the network. Although we have ob-
 Table 7: Average Minimum Distance Between Nodes (L)                     served that words appear to be more connected in CANs over
                                                                         FANs (as observed in Table 7 and 8), there is therefore likely
                              USF     GP
                                                                         to be, on average, more sub-networks in CANs. However, the
                   Mean       1.79    1.19
                                                                         density of these sub-networks around a node is smaller than
                   Median     1.76    1.05
                                                                         in FANs. The direct cause of this is unknown at this stage.
                   St Dev     0.36    0.32
                                                                                                  Discussion
                                                                         The first component of analysis evaluated the degree to which
a forward association to the target (as per R2) and secondly
                                                                         CANs can quantitatively predict human word associations.
it is a common theme that the backward relationships (to the
                                                                         Two models were used as baselines for comparison - the
target) also exist (though these can be of very low weight).
                                                                         Topic Model and LSA. The results revealed the following
Consequently, the majority of associates in a USF FAN are
                                                                         findings.
connected to the target in both a forward and backward con-
                                                                            CANs extracted using both the cosine metric and the GP
nection and thus allow for an easier traverse between any two
                                                                         pre-metric outperform LSA though the differences are small.
nodes in the FANs resulting in a low L value. The pattern
                                                                         The Topic Model outperforms CAN (GP pre-metric) and
of forward connections is replicated by the CANs (R2) and
                                                                         CAN (cosine) at higher levels of precision. At lower levels of
is strongly desired when replicating FANs (small world be-
                                                                         precision CAN (cosine) outperforms the Topic Model. That
havior). The lower L value for the GP generated CANs indi-
                                                                         being said, all models are poor at generating FANs’ first as-
cates that traversal between nodes in a CAN is easier than in a
                                                                         sociate at maximal precision (i.e., when m = 1). The cosine
FAN. Given that the densities for FANs and CANs are almost
                                                                         metric in conjunction with corpus-based vectors like PPMI
identical (as illustrated in Table 6), and that both have forced
                                                                         has shown in many studies to have a predisposition to com-
forward connections to the target, the difference in structure
                                                                         pute semantic associations (e.g., (Lund & Burgess, 1996; Du-
probably lies in the non-target nodes being, on average, more
                                                                         mais, 2004)). As there are many cases where the first asso-
interconnected in the CANs, than in the FANs. This higher
                                                                         ciate is not semantically associated with the target, it is there-
degree of interconnectedness provides more opportunities for
                                                                         fore challenging for such associates to be ranked first based
traversal through the network and thus a lower L value. Table
                                                                         on a PPMI representation. Clearly the asymmetry of GP pre-
     Table 8: Average Number of Connections (< k >)                      metric could not mitigate the predisposition of the PPMI vec-
                                                                         tor representations to compute associations of a semantic na-
                              USF     GP                                 ture. Conversely, the Topic Model is better at predicting first
                   Mean       1.12    2.34                               associates perhaps because the conditional probabilities pick
                   Median     1.14    1.81                               up associations which are broader in nature than semantic as-
                   St Dev     0.15    1.94                               sociations.
8 shows that the mean and median average number of connec-                  Currently the CAN method creates vector representations
tions of FANs is over-fitted by the CANs and is quite unsta-             for words in Euclidean space. In doing so, established met-
ble. On average, the number of associate to associate relation-          rics of Euclidean Space (i.e., the cosine metric) can be used
ships is greater for CANs than for FANs, which is consistent             to compute word associations. These metrics must satisfy
with our preceding conjecture that the non-target nodes of               four axioms being (1) d (a, b) = d (b, a), (2) d (a, a) = 0 , (3)
CANs are more interconnected than in FANs. Again, a pos-                 d(a, b) ≥ 0 and (4) d (a, b) ≤ d (a, c) + d (c, b), where d(a, b)
sible explanation is that in corpus-based techniques it is gen-          denotes the distance between points a and b in the space.
erally much easier to establish associations between words.              Tversky challenged this assumption and found empirical ev-
Whether this is a result of the PPMI representation, the large           idence that symmetry (1) and the triangle inequality (4) are
size of the corpus and/or a consequence of the GP pre-metric             violated. Tversky argued that these violations implied that
is currently under investigation.                                        words do not act like points in Euclidean space (Tversky &
   Table 9 shows that the mean and median Clustering Coeffi-             Gati, 1982). Although the vectors for the CANs are in Eu-
cient C of FANs are under-fitted by the CANs. The Clustering             clidean space, the GP pre-metric does not base the degree of
Coefficient measures the average density for localized sub-              association on the distance between points in the space, but


                                                                   256
rather on the degree of projection between the respective vec-           for further development.
tors.
   The second component of analysis was to assess the struc-                                     References
tural similarities of the FANs with the CANs. A set of well              Bullinaria, J. A., & Levy, J. P. (2007). Extracting semantic
known network characteristics were employed to measure the                      representations from word co-occurrence statistics: A
performance. It was found that the CANs built using the                         computational study. Behavior Research Methods, 39,
GP pre-metric performed encouragingly well at replicating                       510-526.
the structural features of the FANs, however issues of sta-              Collins, A., & Loftus, E. (1975). A spreading-activation
bility and under/over fitting the network characteristic need                   theory of semantic processing. Psychological Review,
to be investigated in more detail. Structural analysis of the                   82(6), 407-428.
USF norms has been performed previously (Griffiths et al.,               Dumais, S. (2004). Latent Semantic Analysis. Annual review
2007), however instead of analyzing the individual networks                     of information science and technology, 38, 189-200.
(as done in this analysis), the networks were aggregated into            Griffiths, T., Steyvers, M., & Tenenbaum, J. (2007). Top-
a single global network which was then subjected to network                     ics in semantic representation. Psychological Review,
analysis. The focus of this study was different; we were in-                    114(2), 211-244.
terested in how well FANs based on individual target words               Landauer, T., Foltz, P., & Laham, D. (1998). An introduc-
can be structurally replicated. For this reason, the small world                tion to latent semantic analysis. Discourse Processes,
network characteristic γ (used in P (k) = k−γ ) was not inves-                  25(2&3), 259-284.
tigated because this characteristic is more meaningfully ap-             Lund, K., & Burgess, C. (1996). Producing high-dimensional
plied to a global network rather than small individual net-                     semantic spaces from lexical co-occurrence. Behaviour
works.                                                                          Research Methods, Instruments & Computers, 28(2),
   The brute force style strategy employed to isolate the opti-                 203–208.
mal parameters for the structural analysis could be improved.            Nelson, D., Kitto, K., Galea, D., McEvoy, C., & Bruza, P.
Whilst it does converge to the optimal set of solutions, it is                  (2013). How activation, entanglement, and searching a
computationally inefficient and does not explore the stability                  semantic network contribute to event memory. Memory
of each set of solutions, nor does it assign weightings to in-                  & Cognition, 41(6), 797-819.
dividual parameters. Lastly, the USF norms collected over                Nelson, D., McEvoy, C., & Schreiber, T. (2004). The uni-
three decades and were primarily sourced from students who                      versity of South Florida, word association, rhyme and
attended the University of South Florida. As a consequence,                     word fragment norms. Behavior Research Methods, In-
the corpus suffers from temporal and geographical bias. To                      struments & Computers, 36, 408-420.
overcome the temporal and geographical bias, a new collec-               Nelson, D., Schreiber, T., & McEvoy, C. (1992). Process-
tion of FANs built by the University Of Leuven could be used                    ing implicit and explicit representations. Psychological
as a more comprehensive and contemporary baseline of hu-                        Review, 99(2), 322-348.
man word association data (Simon et al., 2013).                          Pothos, E., Busemeyer, J., & Trueblood, J. (2013). A quan-
                                                                                tum geometric model of similarity. Psychological Re-
                        Conclusion                                              view, 120(3).
                                                                         Simon, D., Navarro, D., & Storms, G. (2013). Better explana-
The aim of this paper is to investigate to what degree cor-                     tions of lexical and semantic cognition using networks
pus based semantic methods can be used to derive weighted                       derived from continued rather than single-word associ-
networks of words which approximate human free associa-                         ations. Behavior Research Methods, 45, 480-498.
tion networks (FANs) in relation to both structural network              Steyvers, M., & Tennenbaum, J. (2005). The large scale
characteristics and the ability to quantitatively predict human                 structure of semantic networks: statistical analyses and
word associations. We conclude that corpus-based methods                        a model of semantic growth. , 21, 41–78.
can approximate the structural characteristics of FANs to an             Turney, P., & Pantel, P. (2011). From frequency to meaning:
encouraging degree when a thresholded asymmetric measure                        Vector space models of semantics. Journal of Artificial
based on vector projection is used to construct the network.                    Intelligence Research, 37, 141-188.
   The degree to which the corpus-based procedures can repli-            Tversky, A., & Gati, I. (1982). Similarity, separability and
cate human word associations is still questionable. When                        the triangle inequality. Psychological Review, 89, 123–
benchmarked against two corpus-based models, CANs pro-                          154.
duced similar effectiveness. At this stage we conclude that
when term co-occurrence statistics are used to provide vec-
tor representations, the performance of the symmetric cosine
metric can’t be differentiated from an asymmetric measure
based on vector projection. The difference in performance be-
tween CANs and the benchmark models is small from which
we can conclude that CAN (cosine and GP) do show promise


                                                                   257