=Paper= {{Paper |id=Vol-2314/paper4 |storemode=property |title=Taking into account semantic similarities in correspondence analysis |pdfUrl=https://ceur-ws.org/Vol-2314/paper4.pdf |volume=Vol-2314 |authors=Mattia Egloff,François Bavaud |dblpUrl=https://dblp.org/rec/conf/comhum/EgloffB18 }} ==Taking into account semantic similarities in correspondence analysis== https://ceur-ws.org/Vol-2314/paper4.pdf
     Taking into account semantic similarities in correspondence analysis

                                  Mattia Egloff1 , François Bavaud1, 2
    1 Department of Language and Information Sciences, University of Lausanne, Switzerland
         2 Institute of Geography and Sustainability, University of Lausanne, Switzerland

                                     {megloff1, fbavaud}@unil.ch




                     Abstract                                 ing and comparing the average semantic similarity
                                                              within documents or contexts to the average se-
     Term-document matrices feed most distri-                 mantic similarity between documents or contexts –
     butional approaches to quantitative textual              which supposes the recourse to some hand-crafted
     studies, without consideration for the se-               semantics, fairly unavailable at the time of Harris’
     mantic similarities between terms, whose                 writings.
     presence arguably reduces content variety.                  The present short study distinguishes both kinds
     This contribution presents a formalism rem-              of similarities and constitutes at this stage a proof
     edying this omission, and makes an explicit              of concept oriented towards the formalism and the
     use of the semantic similarities as extracted            conceptualization rather than large-scale applica-
     from WordNet. A case study in similarity-                tions – in the general spirit of the COMHUM 2018
     reduced correspondence analysis illustrates              conference. It yields a new measure of textual va-
     the proposal.                                            riety taking explicitly into account the semantic
                                                              similarities between terms, and permits to weigh
1    Introduction
                                                              the usage of the semantic similarity when analyzing
The term-document matrix N = (nik ) counts the                the term-document matrix.
occurrences of n terms in p documents, and con-
stitutes the privileged input of most distributional          2   Data
studies in quantitative textual linguistics: chi2 dis-
                                                              After manually extracting the paragraphs of each of
similarities between terms or documents, distance-
                                                              the p = 11 chapters of Book I of “An Inquiry into
based clustering of terms or documents, multidi-
                                                              the Nature and Causes of the Wealth of Nations”
mensional scaling (MDS) on terms or documents;
                                                              by Adam Smith (Smith, 1776) (a somewhat arbi-
and, also, latent clustering by non-negative matrix
                                                              trary choice among myriads of other possibilities),
factorization (e.g., Lee and Seung, 1999) or topic
                                                              we tagged the parts of speech and lemma for each
modeling (e.g., Blei, 2012); as well as nonlinear
                                                              word of the corpus using the nlp4j tagger (Choi,
variants resulting from transformations of the inde-
                                                              2016). Subsequently we created a lemma-chapter
pendence quotients, as in the Hellinger dissimilari-
                                                              matrix, retaining only the type of words serving
ties, or transformations of the chi2 dissimilarities
                                                              a specific task, such as verbs. Terms i, j present
themselves (e.g., Bavaud, 2011).
                                                              in the chapters were then associated to their first
   When using the term-document matrix, the se-
                                                              conceptual senses ci , c j , that is to their first Word-
mantic link between words is only indirectly ad-
                                                              Net synsets (Miller, 1995). We inspected several
dressed through the celebrated “distributional hy-
                                                              similarity matrices ŝi j = ŝ(ci , c j ) between pairs of
pothesis,” postulating an association between dis-
                                                              concepts ci and c j .
tributional similarity (the neighbourhood or close-
ness of words in a text) and meaning similarity               3   Semantic similarities
(the closeness of concepts) (Harris, 1954) (see also,
e.g., Sahlgren, 2008; McGillivray et al., 2008). Al-          A few approaches for computing similarities be-
though largely accepted and much documented, the              tween words have been proposed in the literature
study of the distributional hypothesis seems hardly           (see, e.g., Gomaa and Fahmy, 2013). Recent mea-
tackled in an explicit way, by typically comput-              sures use word embeddings (Kusner et al., 2015),

                                                         45
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)


and tough these approaches are successful at re-
solving other NLP tasks, they suffer some draw-                                                                       1
                                                                            ŝjch (ci , c j )=
backs in computing semantic similarity (Faruqui                                                  − log p(ci ) − log p(c j ) + 2 · log p(ci ∨ c j )
et al., 2016). Also, the latter methods are directly
based on the distributional hypothesis, and hence                       and obeys ŝjch (ci , ci ) = ∞.
unadapted to distinguish between distributional and
semantic dissimilarities, precisely.                                      Among the above similarities, the path, Wu-
   By contrast, the present paper uses WordNet,                         Palmer and Lin similarities obey the conditions
that is a humanly constructed ontology. The clas-
sical WordNet similarities ŝ(ci , c j ) between two                                ŝi j = ŝ ji ≥ 0             and          ŝii = 1 .            (2)
concepts ci and c j computed on WordNet take on                         In what follows, we shall use the path similarities
different forms. The conceptually easiest is the path                   when required.
similarity, defined from the number `(ci , c j ) ≥ 0 of
edges of the shortest-path (in the WordNet hierar-                      4    A similarity-reduced measure of
chy) between ci and c j as follows:                                          textual variety

                  ŝpath (ci , c j )=
                                             1
                                                             (1)        Let fi ≥ 0 be the relative frequency of term i, nor-
                                        1 + `(ci , c j )                malized to ∑ni=1 fi . Shannon entropy H = − ∑i fi ln fi
                                                                        constitutes a measure of relative textual variety,
  The Leacock Chodorow similarity (Leacock and                          ranging from 0 (a single term repeats itself) to ln n
Chodorow, 1998) is based on the same principle but                      (all terms are different). Yet, the entropy does not
considers also the maximum depth D = maxi `(ci , 0)                     take into account the possible similarity between
(where 0 represents the root of the hierarchy, oc-                      the terms, in contrast to the reduced entropy R (our
cuped by the concept subsuming all the others) of                       nomenclature) defined as
the concepts in the WordNet taxonomy:                                                      n                                   n
                                          `(ci , c j )                       R = − ∑ fi ln bi                 where bi =      ∑ ŝi j f j .          (3)
                ŝlch (ci , c j ) = − log                                                 i=1                                 j=1
                                            2D
                                                                        In Ecology, bi is the banality of species i, measur-
    The Wu-Palmer similarity (Wu and Palmer, 1994)                      ing its average similarity to other species (Marcon,
is based on the notion of lowest common subsumer                        2016), proposed by Leinster and Cobbold (2012),
ci ∨ c j , that is the least general concept in the hier-               as well as by Ricotta and Szeidl (2006). By con-
archy that is a hypernym or ancestor of both ci and                     struction, fi ≤ bi ≤ 1 and thus R ≤ H: the larger
c j:                                                                    the similarities, the lower the textual variety as
                                      2`(ci ∨ c j , 0)                  measured by the reduced entropy, as requested.
              ŝwup (ci , c j )=                                           Returning to the case study, we have, out of
                                    `(ci , 0) + `(c j , 0)
                                                                        the 643 verb lemmas initially present in the corpus,
   The following similarities are further based on                      retained the n = 234 verb lemmas occurring at least
the concept of Information Content, proposed by                         5 times (“be” and “have” excluded). Overall term
Resnik (Resnik, 1993b,a). The Information Con-                          weights fi , chapter weights ρk and term weights fik
tent of a concept c is defined as − log(p(c)), where                    within a chapter obtain from the n × p = 234 × 11
p(c) is the probability to encounter a concept c in                     term-document matrix N = (nik ) as
a reference corpus. The Resnik similarity (Resnik,                                         ni•                     n•k                  nik
                                                                                   fi =                    ρk =                 fik =                (4)
1995) is defined as:                                                                       n••                     n••                  n•k
                ŝres (ci , c j )= − log p(ci ∨ c j )                      The corresponding entropies and reduced en-
                                                                        tropies read H = 4.98 > R = 1.60. For each chap-
The Lin similarity (Lin, 1998) is defined as:
                                                                        ter, the corresponding quantities are depicted in
                                    2 · log p(ci ∨ c j )                figure 1. One can observe the so-called concav-
             ŝlin (ci , c j )=
                                  log p(ci ) + log p(c j )              ity property H > ∑k ρk Hk (always verified) and
                                                                        R > ∑k ρk Rk (verified here), which says that the
  Finally, the Jiang Coranth similarity (Jiang and                      variety of the whole is larger than the average vari-
Conrath, 1997) is defined as:                                           ety of its constituents.

                                                                   46
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)



                                                                                                                                                                        Shannon variety NShannon = exp(H) ≤ n represents
                                                     5
                                                                      Shannon entropy Hk
                                                                      reduced entropy Rk
                                                                                                                                                                     the equivalent number of distinct types in a uni-
                                                                                                                                                                     formly constituted corpus of same richness or di-
                                                     4




                                                                                                                                                                     versity (in the entropy sense) as the currently exam-
                                                                                                                                                                     ined corpus. Likewise, the reduced variety Nreduced =
                                                     3




                                                                                                                                                                     exp(R)  NShannon measures the equivalent number
                                                                                                                                                                     of types if the latter were uniformly distributed and
                                                     2




                                                                                                                                                                     completely dissimilar (that is si j = 0 for i 6= j): see
                                                                                                                                                                     figure 2.
                                                     1




                                                                                                                                                                     5   Ordinary correspondence analysis
                                                     0




                                                                  1     2      3     4      5      6      7          8   9      10     11

                                                                                             chapter                                                                     (recall)
Figure 1: Entropies Hk and reduced entropies Rk                                                                                                                      Correspondence analysis (CA) permits a simulta-
for each chapter k; dashed lines depict H and R.                                                                                                                     neous representation of terms and documents in
                                                                                                                                                                     the so-called biplot (figure 3). CA results from
                                                                                                                                                                     weighted multidimensional scaling (MDS) applied
                                                     140




                                                                      Shannon variety exp(H)
                                                                      reduced variety exp(R)                                                                                                      χ
                                                                                                                                                                     to the chi2 dissimilarities Dkl between documents
                                                     120




                                                                                                                                                                     k and l
                                                     100




                                                                                                                                                                                      n
                                                                                                                                                                           Dkl = ∑ fi (qik − qil )2 where qik = nni•ik nn••
                                                                                                                                                                             χ
                                                                                                                                                                                                                              (5)
                                                     80




                                                                                                                                                                                                                         •k
                                                                                                                                                                                     i=1
                                                     60




                                                                                                                                                                     or equivalently, on MDS applied to the chi2 dis-
                                                     40




                                                                                                                                                                     similarities between terms. Note the qik in (5)
                                                     20




                                                                                                                                                                     to constitute the independence quotients, that is
                                                                                                                                                                     the ratio of the observed counts to their expected
                                                     0




                                                                  1     2      3     4      5      6      7          8   9      10     11

                                                                                             chapter                                                                 value under independence. Figure 3 constitutes
                                                                                                                                                                     the two-dimensional projection of a weighted Eu-
Figure 2: Shannon varieties exp(Hk ) and reduced                                                                                                                     clidean configuration of min(234 − 1, 11 − 1) = 10
varieties exp(Rk ) for each chapter k; dashed lines                                                                                                                  dimensions, expressing a maximal proportion of
depict exp(H) and exp(R).                                                                                                                                            0.17 + 0.15 = 32% of dispersion or inertia ∆ =
                                                                                                                                                                     2 ∑kl ρk ρl Dkl .
                                                                                                                                                                     1            χ

                                                                                                                         confound
  dimension 2 : proportion of inertia = 0.15
                                               2




                                                                                                                             resolve                                 Similarity-reduced correspondence analysis In
                                                                                                                                                                     the case where documents k and l, differing by the
                                                                                                                                       add
                                                                                                                                          rear                       presence of distinct terms, contain semantically
                                               1




                                                                                           belong                manage                   constitute
                                                                                                                                           replace
                                                                                      performsubdivide
                                                                                                                       derive
                                                                                                                    bear prepare
                                                                                                                               advance
                                                                                                                                       labour
                                                                                                                                           hear
                                                                                                                                       remain
                                                                                                                                  destine
                                                                                                                            transport
                                                                                                                          collect
                                                                                                                                              furnish
                                                                                                                                         lower set
                                                                                                                                                                     similar terms, the “naive” chi2 dissimilarity (5),
                                                                                         facilitate run employ
                                                                                       found                     save   enjoy
                                                                                                                   complain  contribute      situate
                                                                                                                                            destroy
                                                                                                                                               die
                                                                                                                                               yield
                                                                                                   decay         expect
                                                                                                                thrive
                                                                                                             admit
                                                                                                         manufacture      decline
                                                                                                                    open pretend     apply  produce
                                                                                                                                            improve
                                                                                                                                      maintain   starve
                                                                                                                                                 throw
                                                                                                                                                multiply
                                                                                                                                        compensate
                                                               repose                                  execute
                                                                                              losepublish
                                                                                                         lie call
                                                                                                        see
                                                                                                     acquire
                                                                                                       think
                                                                                                     occupy
                                                                                                     write
                                                                                                succeed put
                                                                                                                   work
                                                                                                              separate
                                                                                                              lay
                                                                                                               extend
                                                                                                            approach
                                                                                                           regard
                                                                                                             live
                                                                                                                     demand
                                                                                                                          account
                                                                                                                    consist
                                                                                                                   dispose
                                                                                                                 make  take
                                                                                                                       settledo
                                                                                                                                  tend
                                                                                                                             arrive
                                                                                                                           pay bestow
                                                                                                                        affect
                                                                                                                                  turn
                                                                                                                                limit
                                                                                                                           assure
                                                                                                                                force
                                                                                                                          represent
                                                                                                                                           afford
                                                                                                                                    encourage
                                                                                                                                      use
                                                                                                                                  compose
                                                                                                                                borrow
                                                                                                                                              precede
                                                                                                                                           shew
                                                                                                                                              demonstrate
                                                                                                                                        prohibit
                                                                                                                                    increase
                                                                                                                                           cultivate
                                                                                                                                              consume
                                                                                                                                          raise
                                                                                                                                      inhabit
                                                                                                                                     augment
                                                                                                                                                 discourage
                                                                                                                                            reckon
                                                                                                                                          rise
                                                                                                                                             bring
                                                                                                                                         connect
                                                                                                                                                   exhaust
                                                                                                                                                    feed
                                                                                                                                                     smuggle
                                                                                                                                                     discover
                                                                                                                                                     mislead
                                                                                                                                                     amount
                                                                                                                                                      confine
                                                                                                                                                      inclose
                                                                                                                                                       inform
                                                                                                                                               acknowledge
                                                                                                                                        diminish
                                                                                                                                                                     which implicitly assumes distinct terms to be com-
                                                                 educate
                                                                       serve    enter sail     grant          send  require
                                                                                                                     dividecome
                                                                                                                           occasion
                                                               earn
                                                               hire
                                                            attempt
                                                                             restrain
                                                                           learn
                                                                              bind               leavesuffer  procure
                                                                                                            draw  exceed
                                                                                                                 carry   depend
                                                                                                                             fix
                                                                                                                              seem     supply
                                                                                                                                      suppose
                                                                                                                                     owego sink
                                                                                                                                  provide
                                                                                                                                    observe
                                                                                                                               become
                                                                                                                                                 concern
                                                                                                                                                        fit
                                                    incorporate
                                                       teach         obstruct
                                                                   remove                support
                                                                                      combine    understand
                                                                                               follow
                                                                                                   expose
                                                                                                    oblige
                                                                                                    choose
                                                                                                                             reducehappenaccord
                                                                                                                                         cost   judge                pletely dissimilar, arguably overestimates their dif-
                                               0




                                                                          gain              enact hinder know agree       allow
                                                                                                                         compute
                                                                                                                        reward
                                                                                                                           import
                                                                                                                     imagine   findfall
                                                                                                                              regulate
                                                                                                                         enable                grow  get
                                                                 exercise                  obtain          consider      give            sell
                                                                                                                                     begin
                                                                                                                                     continue
                                                                                                                                      keep          tell
                                                                                                                          say
                                                                                                                   endeavour   determine
                                                                                                                           render
                                                                      establish              let mention       introduce cover     compare
                                                                                                                              appear
                                                                               attend                 receive
                                                                                                     value
                                                                                                      imposearise vary
                                                                                                                   believereturn wear
                                                                                                                                                                     ference. The latter should be downsized accord-
                                                                                                                       mean              purchase
                                                                                                                                      exchange   estimate
                                                                                                        intend
                                                                                                                     possess
                                                                                                                                        want
                                                                                                                                                                     ingly, in a way both reflecting the amount of shared
                                               -1




                                                                                                                                       stand
                                                                                                                                   contain                           similarity between k and l, and still retaining the
                                                                                                                                     buyweigh
                                                                                                                                        preserve
                                                                                                                                      exportdeface
                                                                                                                     express
                                                                                                                                 reserve
                                                                                                                                                                     squared Euclidean nature of their dissimilarity – a
                                                                                                                                rate
                                                                                                                                                                     crucial requirement for the validity of MDS. This
                                               -2




                                                                                                                                        coin                         simple idea leads us to propose the following re-
                                                           -2.0             -1.5           -1.0               -0.5             0.0             0.5
                                                                                                                                                                     duced squared Euclidean distance D̂kl between doc-
                                                     dimension 1 : proportion of inertia = 0.17                                                                      uments, taking into account both the distributional
                                                                                                                                                                     and semantic differences between the documents,
Figure 3: Biplot of the 234 × 11 term-document                                                                                                                       namely
matrix. Circles depict terms and triangles depict
documents.                                                                                                                                                                       D̃kl = ∑ t̃i j (qik − qil )(q jk − q jl )    (6)
                                                                                                                                                                                           ij


                                                                                                                                                                47
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)


                                                       7                                                                                                                                                                        become
                                                                                                                                                                                                                                   furnish
                                                                                                                                                                                                                                 say
                                                                                                                                                                                                                             manufacture
                                                                                                                                                                                                                                regulate
                                                                                                                                                                                                                                    supply
                                                                                                                                                                                                                               pretend
                                                                                                                                                                                                                                   demand
                                                                                                                                                                                                                                     express
                                                                                                                                                                                                                                    admit
                                                                                                                                                                                                                               decline
                                                                                                                                                                                                                               rise
                                                                                                                                                                                                                               fall
                                               0.20
                                                                                                                                                                                                                         continue
                                                                                                                                                                                                                           acknowledge
                                                                                                                                                                                                                                mention
                                                                                                                                                                                                                          produce
                                                                                                                                                                                                                              come
                                                                                                                                                                                                                              earn
                                                                                                                                                                                                                              see
                                                                                                                                                                                                                       represent        fit
                                                                                                                                                                                                                                       bring




                                                                                                                                                                         0.1
                                                                                                                                                                                                                       contributetell
                                                                                                                                                                                                                              observe
                                                                                                                                                                                                                          improve      feed
                                                                                                                                                                                                                                      prohibit
                                                                                                                                                                                                                                   dispose
                                                                                                                                                                                                                                      sell
                                                                                                                                                                                                                                     raise
                                                                                                                                                                                                                                      teach
                                                                                                                                                                                                                                      thrive
                                                                                                                                                                                                                                  believe
                                                                                                                                                                                                                                   suffer
                                                                                                         6                                                                                                        require
                                                                                                                                                                                                                    cover
                                                                                                                                                                                                                employ go
                                                                                                                                                                                                                  succeed
                                                                                                                                                                                                                    know   loblige
                                                                                                                                                                                                                         choose
                                                                                                                                                                                                                             hear
                                                                                                                                                                                                                               lie
                                                                                                                                                                                                                         contain
                                                                                                                                                                                                                          exchange
                                                                                                                                                                                                                          occupy
                                                                                                                                                                                                                          borrow
                                                                                                                                                                                                                         consist
                                                                                                                                                                                                                         situate
                                                                                                                                                                                                                      encourage
                                                                                                                                                                                                                    intend
                                                                                                                                                                                                                       purchase
                                                                                                                                                                                                                     leave
                                                                                                                                                                                                                 cultivatefound
                                                                                                                                                                                                                          obtainlay
                                                                                                                                                                                                                             abour
                                                                                                                                                                                                                             shew
                                                                                                                                                                                                                       establish
                                                                                                                                                                                                                            value
                                                                                                                                                                                                                                     reckon
                                                                                                                                                                                                                                  increase
                                                                                                                                                                                                                                      yield
                                                                                                                                                                                                                                    judge
                                                                                                                                                                                                                                     destine
                                                                                                                                                                                                                                  accord
                                                                                                                                                                                                                                 expose
                                                                                                                                                                                                                                 set
                                                                                                                                                                                                                                decay  regard
                                                                                                                                                                                                                                     confine
                                                                                                                                                                                                                                    consider
                                                                                                                                                                                                                            incorporate
                                                                                                                                                                                                                                     call
                                                                                                                                                                                                                                   provide
                                                                                                                                                                                                                                   sail
                                                                                                                                                                                                                                force
                                                                                                                                                                                                                                   sink
                                                                                                                                                                                                                             smuggle
                                                                                                                                                                                                                                 throw
                                                                                                                                                                                                                           consume
                                                                                                                                                                                                                                 deface
                                                                                                                                                                                                                                 save
                                                                                                                                                                                                                                 reduce
                                                                                                                                                                                                                                derive
                                                                                                                                                                                                                                   coin
                                                                                                                                                                                                                            subdivide
                                                                                                                                                                                                                       compare
                                                                                                                                                                                                                       hire    compute  limit
                                                                                                                                                                                                                                         afford
                                                                                                                                                                                                                                      serve
                                                                                                                                                                                                                                 settle
                                                                                                                                                                                                                  perform
                                                                                                                                                                                                                determinemislead
                                                                                                                                                                                                                   concern
                                                                                                                                                                                                                         destroy
                                                                                                                                                                                                                      find
                                                                                                                                                                                                                  assure
                                                                                                                                                                                                                 maintain
                                                                                                                                                                                                                  acquire    send
                                                                                                                                                                                                                              turn
                                                                                                                                                                                                                             lower
                                                                                                                                                                                                                        combine
                                                                                                                                                                                                                     open           think
                                                                                                                                                                                                                            facilitate
                                                                                                                                                                                                                                   inform
                                                                                                                                                                                                                             imagine
                                                                                                                                                                                                                                  reserve
                                                                                                                                                                                                                                   run
                                                                                                                                                                                                                                 expect
                                                                                                                                                                                                                                 replace
                                                                                                                                                                                                                          endeavour
                                                                                                                                                                                                                              take
                                                                                                                                                                                                                        preserve
                                                                                                                                                                                                                               live
                                                                                                                                                                                                                               impose
                                                                                                                                                                                                                              repose
                                                                                                                                                                                                                                divide
                                                                                                                                                                                                                              affect
                                                                                                                                                                                                                              augment rear
                                                                                                                                                                                                                                      reward
                                                                                                                                                                                                                                 collect
                                                                                                                                                                                                                                introduce
                                                                                                                                                                                                                                estimate
                                                                                                                                                                                                                                 render
                                                                                                                                                                                                                            prepare
                                                                                                                                                                                                                    agree
                                                                                                                                                                                                                  use
                                                                                                                                                                                                                understandattempt
                                                                                                                                                                                                                            enable
                                                                                                                                                                                                                            import
                                                                                                                                                                                                                             compensate
                                                                                                                                                                                                                      exercise
                                                                                                                                                                                                                    arrive      restrain
                                                                                                                                                                                                                          inhabit
                                                                                                                                                                                                                            follow
                                                                                                                                                                                                                              grow
                                                                                                                                                                                                                         obstruct
                                                                                                                                                                                                                              draw
                                                                                                                                                                                                                        advance
                                                                                                                                                                                                                     givereceive   suppose
                                                                                                                                                                                                                                      bestow
                                                                                                                                                                                                                                      fix
                                                                                                                                                                                                                                       enact
                                                                                                                                                                                                                                     rate
                                                                                                                                                                                                                                 starve
                                                                                                                                                                                                                  separate
                                                                                                                                                                                                                    wear
                                                                                                                                                                                                                  connect
                                                                                                                                                                                                                  amount
                                                                                                                                                                                                                 applyapproach
                                                                                                                                                                                                                     enjoy
                                                                                                                                                                                                                    want
                                                                                                                                                                                                                       compose
                                                                                                                                                                                                                    allow
                                                                                                                                                                                                                   exceed
                                                                                                                                                                                                                     owe        carrygrant
                                                                                                                                                                                                                                  write
                                                                                                                                                                                                                                resolve
                                                                                                                                                                                                                              occasion
                                                                                                                                                                                                                             bear
                                                                                                                                                                                                                              inclose
                                                                                                                                                                                                                                put    add
                                                                                                                                                                                                                                   weigh
                                                                                                                                                                                                                                diminish
                                                                                                                                                                                                                                     die
                                                                                                                                                                                                                              discourage
                                                                                                                                                                                                                         possess
                                                                                                                                                                                                                         publish
                                                                                                                                                                                                                      lose       gainbind
                                                                                                                                                                                                                              discover
                                                                                                                                                                                                                               educate mean
                                                                                                                                                                                                                                 multiply
                                                                                                                                                                                                                                 extend
                                                                                                                                                                                                                             arise
 dimension 2 : proportion of inertia = 0.18




                                                                                                                            dimension 2 : proportion of inertia = 0.04
                                                                                                                                                                                                                    begin
                                                                                                                                                                                                                    keep  hinder
                                                                                                                                                                                                                        transport
                                                                                                                                                                                                                        manage   exhaust
                                                                                                                                                                                                                             return
                                                                                                                                                                                                                          export
                                                                                                                                                                                                                               buy
                                                                                                                                                                                                                              procure
                                               0.15




                                                                                                                                                                                                                       get
                                                                                                                                                                                                                     vary
                                                                                                                                                                                                                  precede
                                                                                                                                                                                                                    work
                                                                                                                                                                                                                     enter
                                                                                                                                                                                                                   remove
                                                                                                                                                                                                                  happen  attend
                                                                                                                                                                                                                         execute
                                                                                                                                                                                                                    demonstrate
                                                                                                                                                                                                                   support
                                                                                                                                                                                                                       let




                                                                                                                                                                         0.0
                                                                                                                                                                                                                    learn
                                                                                                                                                                                                                 constitute
                                                                                                                                                                                                                 complain    account
                                                                                                                                                                                 do                                         confound
                                                                                                                                                                                                                                 tend
                                                                                                                                                                                make                                          dependstand
                                                                                                                                                                                                                                  cost
                                                                                                                                                                                                                               remain
                                               0.10




                                                                                                                                                                         -0.1
                                                                                                                                                                                                                            belong
                                                                                                               1
                                               0.05




                                                                                      39




                                                                                                                                                                         -0.2
                                                               11
                                               0.00




                                                                                                                                                                         -0.3
                                                                                8
                                                                                      2            10
                                               -0.05




                                                                                                                                                                         -0.4
                                               -0.10




                                                                       5




                                                                                                                                                                         -0.5
                                                                                                                                                                                                                                   seem
                                                                                                                                                                                                                                  appear
                                                                                4
                                               -0.15




                                                                                                                                                                         -0.6
                                                               -0.05           0.00        0.05   0.10         0.15                                                              -0.5    -0.4   -0.3    -0.2   -0.1       0.0         0.1

                                                           dimension 1 : proportion of inertia = 0.19                                                                               dimension 1 : proportion of inertia = 0.05


Figure 4: Weighted MDS of the document re-                                                                                 Figure 5: Weighted MDS on the term semantic dis-
duced dissimilarities D̃ (6), displaying the optimal                                                                       similarities (7) for the 234 retained verbs. The first
two-dimensional projection of the reduced iner-                                                                            dimension opposes do and make (whose similarity
tia ∆˜ = 21 ∑kl ρk ρl D̃kl = 0.025, which is roughly                                                                       is 1) to the other verbs. The second dimension op-
50 times smaller than the ordinary inertia ∆ =                                                                             poses appear and seem (with similarity 1) to the
1            χ
2 ∑kl ρk ρl Dkl = 1.156 of usual CA (figure 3).                                                                            other verbs.


where (qik − qil )(q jk − q jl ) captures the distribu-                                                                    (see the Appendix), and this circumstance allows
tional contribution, and                                                                                                   a weighted MDS on semantic dissimilarities be-
                                                                                                                           tween terms, aimed at depicting an optimal low-
                                                      fi f j ŝi j
                                              t̃i j = p                       where b = Ŝ f is the banality               dimensional representation of the semantic inertia
                                                         bi b j
                                                                                                                                                                                            1
captures the semantic contribution. Matrix T̃ =                                                                                                                                         ∆ˆ = ∑ fi f j d̂i j      ,                          (8)
                                                                                                                                                                                            2 ij
(t̃i j ) has been designed so that
                                                                                                                           irrespectively of the distributional term-document
                                • T̃ = diag( f ) for “naive” similarities Ŝ = I                                           structure (figures 5 and 6).
                                  (where I is the identity matrix), in which case
                                  D̃ is the usual chi2-dissimilarity                                                       A family of similarities interpolating between
                                                                                                                           totally distinct types and confounded types
                                • T̃ = f f 0 for “confounded types” Ŝ = J (where                                          The exact form of similarities Ŝ between terms
                                  J is the unit matrix filled with ones), in which                                         fully governs the similarity-reduction mechanism
                                  case D̃ is identically zero.                                                             investigated so far. Yet, little systematic investi-
Also, one can prove D̃ in (6) to be a squared Eu-                                                                          gation seems to have been devoted to the formal
clidean dissimilarity iff S is positive semi-definite,                                                                     properties of similarities (by contrast to the study
that is iff all its eigenvalues are non-negative, a                                                                        of the dissimilarities families found, for example,
verified condition for path dissimilarities (see the                                                                       in Critchley and Fichet (1994) or Deza and Lau-
Appendix). Figure 4 depicts the corresponding                                                                              rent (2009), which may obey much more specific
MDS.                                                                                                                       properties than (2). In particular, ŝαi j satisfies (2)
                                                                                                                           for α ≥ 0 if ŝi j does, and varying α permits to
Semantic MDS on terms Positive semi-definite                                                                               interpolate between the extreme cases of “naive”
semantic similarities Ŝ of the form (2), such as                                                                          similarities Ŝ = I and “confounded types” Ŝ = J.
the path similarities, generate squared Euclidean                                                                             Lists of synonyms1 yield binary similarity matri-
dissimilarities as                                                                                                         ces si j = 0 or 1. More generally, S can be defined
                                                                           d̂i j = 1 − ŝi j                   (7)         1. For example: http://www.crisco.unicaen.fr/des/


                                                                                                                      48
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)



                                                       have




                                                                                                                                                1.0
                                                                                                                         R(β) H (dashed line)
   dimension 2 : proportion of inertia = 0.05
                                                0.6




                                                                                                                                                0.8
                                                0.4




                                                                                                                                                0.6
                                                                                                                         ~
                                                        stock
                                                0.2




                                                                                                                                                0.4
                                                                                                                         ~(β) Δ (solid line)
                                                        store                                        be




                                                                                                                                                0.2
                                                0.0




                                                         record
                                                        collect
                                                        accumulate  degrade
                                                                    fail
                                                                    interpose
                                                                    reach
                                                                    redeem
                                                                    rid
                                                                    shut
                                                                    vie
                                                                    wait
                                                                    gather
                                                                    abolish
                                                                    accompany
                                                                    excel
                                                                    love
                                                                    meet
                                                                    stop
                                                                    surpass
                                                                    spread
                                                                    conceal
                                                                    look
                                                                    remember
                                                                    spend
                                                                    subsist
                                                                    feel
                                                                    signify
                                                                    join
                                                                    bid
                                                                    cease
                                                                    end
                                                                    neglect
                                                                    oppress
                                                                    quit
                                                                    avoid
                                                                    complain
                                                                    open
                                                                    ruin
                                                                    prevail
                                                                    arrive
                                                                    enter
                                                                    separate
                                                                    lead
                                                                    intend
                                                                    agree
                                                                    cover
                                                                    remove
                                                                    enjoy
                                                                    lose
                                                                    perform
                                                                    hire
                                                                    confirm
                                                                    kill
                                                                    learn
                                                                    concern
                                                                    preclude
                                                                    prevent
                                                                    amount
                                                                    wear
                                                                    decide
                                                                    succeed
                                                                    constitute
                                                                    leave
                                                                    assure
                                                                    begin
                                                                    connect
                                                                    exceed
                                                                    examine
                                                                    study
                                                                    support
                                                                    understand
                                                                    precede
                                                                    ascertain
                                                                    determine
                                                                    allow
                                                                    let
                                                                    happen
                                                                    occur
                                                                    desire
                                                                    want
                                                                    owe
                                                                    vary
                                                                    trade
                                                                    know
                                                                    include
                                                                    need
                                                                    require
                                                                    find
                                                                    cultivate
                                                                    work
                                                                    hold
                                                                    keep
                                                                    maintain
                                                                    equal
                                                                   give
                                                                    act
                                                                    create
                                                                   acquire
                                                                   get
                                                                   apply
                                                                   employ
                                                                   use
                                                                   go
                                                                   move
                                                                   travel
                                                                   alter
                                                                   change
                                                        register
                                                        reserve  do
                                                                 make               prove
                                                                                    abound
                                                                                    rank
                                                                                    deserve
                                                                                    confound
                                                                                    rest
                                                                                    account
                                                                                    belong
                                                                                    remain
                                                                                    depend
                                                                                    cost
                                                                                    tend
                                                        recruit  claim
                                                                 deprive
                                                                 desert
                                                                 fear
                                                                 prefer
                                                                 rub
                                                                 reap
                                                                 disguise
                                                                 diffuse
                                                                 subscribe
                                                                 strike
                                                                 assign
                                                                 shun
                                                                 purpose
                                                                 surprise
                                                                 extort
                                                                 pillage
                                                                 unite
                                                                 dishearten
                                                                 dissipate
                                                                 please
                                                                 salt
                                                                 scatter
                                                                 waste
                                                                 smelt
                                                                 disturb
                                                                 steal
                                                                 annihilate
                                                                 overbalance
                                                                 preponderate
                                                                 burn
                                                                 devour
                                                                 usurp
                                                                 dig
                                                                 practise
                                                                 authorise
                                                                 empower
                                                                 evade
                                                                 attain
                                                                 foresee
                                                                 assay
                                                                 measure
                                                                 frequent
                                                                 border
                                                                 surround
                                                                 bound
                                                                 part
                                                                 demonstrate
                                                                 mislead
                                                                 seek
                                                                 shew
                                                                 value
                                                                 govern
                                                                 farm
                                                                 manage
                                                                 execute
                                                                 destroy
                                                                 print
                                                                 publish
                                                                 situate
                                                                 compose
                                                                 choose
                                                                 apprehend
                                                                 comprehend
                                                                 denote
                                                                 hinder
                                                                 obstruct
                                                                 listen
                                                                 eat
                                                                preserve
                                                                 encourage
                                                                 promote
                                                                 labour
                                                                 lie
                                                                 combine
                                                                 compare
                                                                 turn
                                                                occupy
                                                                attend
                                                                 hear
                                                                bear
                                                                export
                                                                consist
                                                                see
                                                                possess
                                                                import
                                                                represent
                                                                arise
                                                                recover
                                                                court
                                                                establish
                                                                found
                                                                borrow
                                                                commit
                                                                engage
                                                                pursue
                                                                form
                                                                resort
                                                                build
                                                                inhabit
                                                                live
                                                                attempt
                                                                earn
                                                                oblige
                                                                obtain
                                                                receive
                                                                relate
                                                                cause
                                                                ascend
                                                                circulate
                                                                descend
                                                                fly
                                                                tread
                                                                withdraw
                                                                expel
                                                                fluctuate
                                                                stir
                                                                walk
                                                                crowd
                                                                pass
                                                                plough
                                                               exert
                                                               exercise
                                                                return
                                                                lower
                                                                dress
                                                                contain
                                                                send
                                                                transport
                                                                follow
                                                                undergo
                                                                conclude
                                                               buy
                                                               purchase
                                                                draw
                                                                grow
                                                                correspond
                                                               overdo
                                                               continue
                                                               advance
                                                               disqualify
                                                               dry
                                                               fatten
                                                               heat
                                                               thicken
                                                               weaken
                                                               lay
                                                               place
                                                               put
                                                               set
                                                               clear
                                                               debase
                                                               accustom
                                                               fall
                                                               take
                                                               contribute
                                                               lend
                                                               prepare
                                                               enable
                                                        disfranchise
                                                               rise
                                                               produce
                                                               affect
                                                               come
                                                               adjust
                                                               improve
                                                               exchange
                                                              say
                                                              state
                                                              tell
                                                              carry
                                                              allege
                                                              blow
                                                              reproach
                                                              disperse
                                                              surmount
                                                              pave
                                                              command
                                                              catch
                                                              marry
                                                              drown
                                                              navigate
                                                              squeeze
                                                              plunder
                                                              protect
                                                              enlist
                                                              grind
                                                              distinguish
                                                              annex
                                                              undervalue
                                                              entitle
                                                              read
                                                              mark
                                                              mount
                                                              coincide
                                                              conduct
                                                              pick
                                                              extirpate
                                                              charge
                                                              despise
                                                              disdain
                                                              insure
                                                              convince
                                                              adopt
                                                              elect
                                                              affix
                                                              abandon
                                                              inclose
                                                              facilitate
                                                              retain
                                                             save
                                                             hunt
                                                             lodge
                                                             enforce
                                                             refuse
                                                             reside
                                                             smuggle
                                                             force
                                                             impose
                                                             erect
                                                             consume
                                                             conceive
                                                             adventure
                                                             repose
                                                             invent
                                                             knit
                                                             procure
                                                             secure
                                                             skim
                                                             renew
                                                             prompt
                                                             saunter
                                                             transact
                                                             breed
                                                             engross
                                                             treat
                                                             imagine
                                                             sail
                                                             run
                                                             throw
                                                             gain
                                                             attract
                                                             endeavour
                                                             moulder
                                                             rot
                                                             resemble
                                                             correct
                                                             coin
                                                             write
                                                             entrust
                                                             subdivide
                                                             plant
                                                             sow
                                                             assemble
                                                             burden
                                                             inflame
                                                             mineralize
                                                             retard
                                                             water
                                                             load
                                                             decay
                                                             settle
                                                            approach
                                                            occasion
                                                             break
                                                            complete
                                                             clothe
                                                            divide
                                                            abridge
                                                            die
                                                            influence
                                                            enrich
                                                            sink
                                                            subject
                                                            relieve
                                                            figure
                                                            remark
                                                            discover
                                                            observe
                                                            reject
                                                            compute
                                                            ascribe
                                                            impute
                                                            derive
                                                            infer
                                                            fix
                                                            mend
                                                            repair
                                                            rate
                                                            adjudge
                                                            enlarge
                                                            educate
                                                            accept
                                                            resolve
                                                            decline
                                                            call
                                                            declare
                                                            render
                                                            reduce
                                                            expect
                                                            augment
                                                            distribute
                                                            enumerate
                                                            better
                                                            escape
                                                           extend
                                                            oversee
                                                            infest
                                                            overrun
                                                            tax
                                                            deposit
                                                            tie
                                                            guard
                                                            visit
                                                            forfeit
                                                            operate
                                                            exclude
                                                            grant
                                                           bind
                                                           permit
                                                           animate
                                                           decrease
                                                           diminish
                                                           lessen
                                                           undertake
                                                           restore
                                                           fancy
                                                           pretend
                                                           abuse
                                                           submit
                                                           think
                                                           clip
                                                           manure
                                                           adulterate
                                                           comply
                                                           remedy
                                                           weave
                                                           aim
                                                           sell
                                                           sacrifice
                                                           bake
                                                           replace
                                                           mention
                                                           refer
                                                           melt
                                                           regulate
                                                           accord
                                                           restrain
                                                           address
                                                           communicate
                                                           crop
                                                           till
                                                           speak
                                                           become
                                                           suffer
                                                           revive
                                                           people
                                                           rear
                                                           satisfy
                                                           parcel
                                                           ask
                                                           enquire
                                                           inquire
                                                           deface
                                                           restrict
                                                           digest
                                                           distress
                                                           raise
                                                           interrupt
                                                           persuade
                                                           content
                                                           exhaust
                                                           incorporate
                                                           deduct
                                                           degenerate
                                                           dismiss  resist glitter seem
                                                                           mortgage
                                                                           retail
                                                                           last
                                                                           starve
                                                                           gravitate
                                                                           weigh
                                                                           stand     appear
                                                           multiply
                                                          double
                                                          quadruple
                                                          triple
                                                          promise
                                                          expose
                                                          propose
                                                           accommodate
                                                           fit
                                                           suit
                                                          judge
                                                           hazard
                                                          discourage
                                                          inform
                                                           furnish
                                                           provide
                                                           supply
                                                          estimate
                                                          balance
                                                          presume
                                                          care
                                                          drive
                                                          manufacture
                                                          suspend
                                                          share
                                                          acknowledge
                                                          admit
                                                          undersell
                                                          believe
                                                          endure
                                                          forge
                                                          appoint
                                                          proportion
                                                          dwell
                                                          dictate
                                                          victual
                                                          consult
                                                          deliver
                                                          dispose
                                                          copy
                                                          reckon
                                                          proceed
                                                          offer
                                                          yield
                                                          widen
                                                          commend
                                                          extol
                                                          doubt
                                                          except
                                                         report
                                                          increase
                                                         convict
                                                         contract
                                                         recommend
                                                          feed
                                                         bring
                                                         suspect
                                                         explain
                                                         teach
                                                         add
                                                         discharge
                                                         trust
                                                         acquaint
                                                         introduce
                                                         connive
                                                         license
                                                         pasture
                                                         deceive
                                                         cheat
                                                         overstock
                                                         understock
                                                         conquer
                                                         compensate
                                                         counterbalance
                                                         accelerate
                                                         quicken
                                                         transcribe
                                                         petition
                                                         solicit
                                                         enhance
                                                         stipulate
                                                         bribe
                                                         defray
                                                         recompense
                                                         enact
                                                         ordain
                                                         rent
                                                         repeat
                                                         flourish
                                                         thrive
                                                         indicate
                                                         attest
                                                         certify
                                                         demand
                                                         serve
                                                         suppose
                                                        domineer
                                                        defraud
                                                        wonder
                                                        consider
                                                        regard
                                                        deliberate
                                                        oppose
                                                        exact
                                                        confine
                                                        limit
                                                        destine
                                                        authenticate
                                                        encumber
                                                        trifle
                                                        short
                                                        pay
                                                        certificate
                                                        bestow
                                                        beg
                                                        deal
                                                        illustrate
                                                        convey
                                                        order
                                                        tempt
                                                        reward
                                                        describe
                                                        direct
                                                        express
                                                        mean
                                                        prohibit
                                                        afford




                                                                                                                                                0.0
                                                                                                                         Δ
                                                -0.2




                                                              -0.2          0.0          0.2   0.4                                                    1e-03       1e-01          1e+01      1e+03

                                                          dimension 1 : proportion of inertia = 0.15                                                          β       bandwidth parameter


Figure 6: Weighted MDS on the term semantic                                                                          Figure 7: The larger the bandwidth parameter β ,
dissimilarities (7) for the 643 verbs initially present                                                              the less similar are the terms, and hence the greater
in the corpus, emphasizing the particular position                                                                                             ˜ ) as well as the reduced
                                                                                                                     are the reduced inertia ∆(β
of be and have                                                                                                       entropy R̃(β ) (3)


as a convex combination of binary synonymy rela-                                                                     the collapse of the cloud of document coordinates
tions, insuring its non-negativity, symmetry, pos-                                                                   (figure 8). As a matter of fact, the bandwidth pa-
itive definiteness, with sii = 1 for all terms i. A                                                                  rameter β controls the paradigmatic sensitivity of
family of such semantic similarities indexed by the                                                                  the linguistic subject: the larger β , the larger the
bandwidth parameter β > 0 obtains as                                                                                 semantic distances between the documents, and the
                                                                                                                     larger the spread of the factorial cloud as measured
                                                                                   ˆ
                                                              si j = exp(−β d̂i j /∆)                     (9)                             ˜ ) (figure 7). On the other
                                                                                                                     by reduced inertia ∆(β
                                                                                                                     direction, a low β can model an illiterate person,
where d̂i j is the semantic dissimilarity (7) and ∆ˆ
                                                                                                                     sadly unable to discriminate between documents,
the associated semantic inertia (8).
                                                                                                                     which look all alike.
   As a matter of fact, it can be shown that a binary
S makes the similarity-reduced document dissim-                                                                      6                 Conclusion and further issues
ilarity D̃kl (6) identical to the chi2 dissimilarity
(5), with the exception that the sum now runs on                                                                     Despite the technicality of its exposition, the idea
cliques of synonyms rather than terms. Also, the                                                                     of this contribution is straightforward, namely to
limit β → 0 in (9) makes D̃kl → 0 with a reduced                                                                     propose a way to take semantic similarity explic-
        ˜ ) = 1 ∑kl ρk ρl D̃kl tending to zero. In the
inertia ∆(β                                                                                                          itly into account, within the classical distributional
                 2
                                                χ
opposite direction, β → ∞ makes D̃kl → Dkl pro-                                                                      similarity framework provided by correspondence
vided d̂i j > 0 for i 6= j, a circumstance violated in                                                               analysis. Alternative approaches and variants are
the case study, where the n = 234 verbs display, ac-                                                                 obvious: further analysis on non-verbs should be
cordingly to their first sense in WordNet, 15 cliques                                                                investigated; other definitions of D̃ are worth inves-
of size 2 (among which do-make and appear-seem,                                                                      tigating; other choices of S are possible (in partic-
already encountered in figure 5) and 3 cliques of                                                                    ular the original Ŝ extracted form Wordnet). Also,
size 3 (namely, employ-apply-use, set-lay-put and                                                                    alternatives to WordNet path similarities (e.g., for
supply-furnish-provide). In any case, the relative                                                                   languages in which WordNet is not defined) are
reduced inertia ∆(β˜ )/∆ is increasing in β (figure                                                                  required.
7).                                                                                                                     On the document side, and despite its numerous
   Performing the similarity-reduced correspon-                                                                      achievements, the term-document matrix still relies
dence analysis on the reduced dissimilarities (6)                                                                    on a rudimentary approach to textual context, mod-
between the 11 document, with similarity matri-                                                                      elled as p documents consisting of bag of words.
ces S(β ) (instead of Ŝ as in figure 4) demonstrates                                                                Much finer syntagmatic descriptions are possible,

                                                                                                                49
 Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)


                                                                    standard CA
 dimension 2 : proportion of inertia = 0.15




                                                                                                                        dimension 2 : proportion of inertia = 0.15
                                                                             6                                                                                                                 β=100               6                           References
                                              1.0




                                                                                                                                                                     1.0
                                                                                                                                                                                                                                               Bavaud, François (2011). On the Schoenberg trans-
                                              0.5




                                                                                                                                                                     0.5
                                                                               1                 98                                                                                                   1
                                                                                   3
                                                                                                         7
                                                                                                                                                                                                          3
                                                                                                                                                                                                                           9
                                                                                                                                                                                                                           8 7                 formations in data analysis: Theory and illustra-
                                                                                                             11                                                                                                                11
                                                                   10                                                                                                                     10
                                                                                                                                                                                                                                               tions.   Journal of Classification, 28(3):297–314.
                                              0.0




                                                                                                                                                                     0.0
                                                                                                                                                                                                                       2
                                                                                                 2
                                                                                                                                                                                                                                               doi:10.1007/s00357-011-9092-x.
                                              -0.5




                                                                                                                                                                     -0.5
                                                                                             4                                                                                                                             4
                                                                                                                                                                                                                                               Bavaud, François, Christelle Cocco, and Aris Xanthos
                                              -1.0




                                                                                                                                                                     -1.0
                                                                                                     5                                                                                                                         5
                                                                                                                                                                                                                                               (2015). Textual navigation and autocorrelation. In
                                                            -1.0        -0.5               0.0

                                                     dimension 1 : proportion of inertia = 0.17
                                                                                                                  0.5                                                              -1.0        -0.5              0.0

                                                                                                                                                                            dimension 1 : proportion of inertia = 0.17
                                                                                                                                                                                                                                    0.5
                                                                                                                                                                                                                                               G. Mirkros and J. Macutek, eds., Sequences in Lan-
                                                                                                                                                                                                                                               guage and Text, pages 35–56. De Gruyter Mouton.
 dimension 2 : proportion of inertia = 0.15




                                                                                                                        dimension 2 : proportion of inertia = 0.18



                                                                                                      5
                                                                        β=5                      4                                                                                             β=0.5
                                              1.0




                                                                                                                                                                     1.0




                                                                                                                                                                                                                                               Blei, David M (2012).     Probabilistic topic mod-
                                              0.5




                                                                                                                                                                     0.5




                                                                                                 2                                                                                                                                             els.    Communications of the ACM, 55(4):77–84.
                                                                                                                                                                                                                     4
                                                                   10
                                                                                                         11
                                                                                                                                                                                                              1 10
                                                                                                                                                                                                                 382 5
                                                                                                                                                                                                              6 911                            doi:10.1145/2133806.2133826.
                                              0.0




                                                                                                                                                                     0.0




                                                                                       3                                                                                                                         7
                                                                                                     8 7
                                                                                                     9
                                                                                   1

                                                                                                                                                                                                                                               Choi, Jinho D. (2016). Dynamic Feature Induction:
                                              -0.5




                                                                                                                                                                     -0.5




                                                                                            6
                                                                                                                                                                                                                                               The Last Gist to the State-of-the-Art. In Proceed-
                                                                                                                                                                                                                                               ings of the 15th Annual Conference of the North Amer-
                                              -1.0




                                                                                                                                                                     -1.0




                                                            -1.0        -0.5               0.0                    0.5                                                              -1.0        -0.5              0.0                0.5        ican Chapter of the Association for Computational
                                                     dimension 1 : proportion of inertia = 0.17                                                                             dimension 1 : proportion of inertia = 0.21
                                                                                                                                                                                                                                               Linguistics, NAACL’16, pages 271–281. San Diego,
                                                                                                                                                                                                                                               CA. URL https://aclweb.org/anthology/N/
 Figure 8: In the limit β → 0, both diagonal and                                                                                                                                                                                               N16/N16-1031.pdf.
 off-diagonal similarities si j (β ) tend to one, making
                                                                                                                                                                                                                                               Critchley, Frank and Bernard Fichet (1994). The par-
 all terms semantically identical, thus provoking the
                                                                                                                                                                                                                                               tial order by inclusion of the principal classes of dissim-
 collapse of the cloud of document coordinates.                                                                                                                                                                                                ilarity on a finite set, and some of their basic properties.
                                                                                                                                                                                                                                               In Bernard Van Cutsem, ed., Classification and Dissim-
                                                                                                                                                                                                                                               ilarity Analysis, pages 5–65. New York, NY: Springer.
 captured by the general concept of exchange ma-                                                                                                                                                                                               doi:10.1007/978-1-4612-2686-4_2.
 trix E, giving the joint probability to select a pair                                                                                                                                                                                         Deza, Michel and Monique Laurent (2009). Ge-
 of textual positions through textual navigation (by                                                                                                                                                                                           ometry of cuts and metrics, vol. 15 of Algorithms
 reading, hyperlinks or bibliographic zapping, etc.).                                                                                                                                                                                          and Combinatorics.     Berlin/Heidelberg: Springer.
 E defines a weighted network whose nodes are the                                                                                                                                                                                              doi:10.1007/978-3-642-04295-9.
 textual positions occupied by terms (Bavaud et al.,                                                                                                                                                                                           Egloff, Mattia and Raphaël Ceré (2018). Soft textual
 2015).                                                                                                                                                                                                                                        cartography based on topic modeling and clustering of
    The parallel with spatial issues (quantitative                                                                                                                                                                                             irregular, multivariate marked networks. In Chantal
                                                                                                                                                                                                                                               Cherifi, Hocine Cherifi, Márton Karsai, and Mirco Mu-
 geography, image analysis), where E defines the
                                                                                                                                                                                                                                               solesi, eds., Complex Networks & Their Applications
“where”, and the features dissimilarities between                                                                                                                                                                                              VI, vol. 689 of Studies in Computational Intelligence,
 positions D defines the “what”, is immediate (see,                                                                                                                                                                                            pages 731–743. Cham: Springer. doi:10.1007/978-3-
 e.g., Egloff and Ceré, 2018). In all likelihood, de-                                                                                                                                                                                          319-72150-7_59.
 veloping both axes, that is taking into account se-                                                                                                                                                                                           Faruqui, Manaal, Yulia Tsvetkov, Pushpendre Rastogi,
 mantic similarities on generalized textual networks,                                                                                                                                                                                          and Chris Dyer (2016). Problems with evaluation of
 could provide a fruitful extension and renewal of                                                                                                                                                                                             word embeddings using word similarity tasks. In Pro-
 the venerable term-document matrix paradigm, and                                                                                                                                                                                              ceedings of the 1st Workshop on Evaluating Vector
                                                                                                                                                                                                                                               Space Representations for NLP, pages 30–35. Asso-
 provide a renewed look to the distributional hypoth-                                                                                                                                                                                          ciation for Computational Linguistics. URL https:
 esis, which can be reframed as a spatial autocorre-                                                                                                                                                                                           //aclweb.org/anthology/W/W16/W16-2506.pdf.
 lation hypothesis.
                                                                                                                                                                                                                                               Gomaa, Wael H. and Aly A Fahmy (2013). A
 Acknowledgments                                                                                                                                                                                                                               survey of text similarity approaches. International
                                                                                                                                                                                                                                               Journal of Computer Applications, 68(13):13–18.
The guidelines and organisation of M. Piotrowski,                                                                                                                                                                                              doi:10.5120/11638-7118.
chair of COMHUM 2018, as well as the sugges-                                                                                                                                                                                                   Harris, Zellig S. (1954). Distributional structure. Word,
tions of two anonymous reviewers are gratefully                                                                                                                                                                                                10(2-3):146–162.
acknowledged.
                                                                                                                                                                                                                                               Jiang, Jay J. and David W. Conrath (1997). Semantic
                                                                                                                                                                                                                                               similarity based on corpus statistics and lexical taxon-
                                                                                                                                                                                                                                               omy. In Proceedings of International Conference on
                                                                                                                                                                                                                                               Research in Computational Linguistics (ROCLING X),
                                                                                                                                                                                                                                               pages 19–33.


                                                                                                                                                                                                                                          50
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)



Kusner, Matt, Yu Sun, Nicholas Kolkin, and Kil-                Resnik, Philip (1995). Using information content to
ian Weinberger (2015). From word embeddings to                 evaluate semantic similarity in a taxonomy. In Proceed-
document distances. In Francis Bach and David                  ings of the International Joint Conference for Artificial
Blei, eds., Proceedings of the 32nd International Con-         Intelligence (IJCAI-95), pages 448–453.
ference on Machine Learning, vol. 37 of Proceed-
ings of Machine Learning Research, pages 957–966.              Ricotta, Carlo and Laszlo Szeidl (2006). Towards a
PMLR. URL http://proceedings.mlr.press/                        unifying approach to diversity measures: bridging the
v37/kusnerb15.html.                                            gap between the Shannon entropy and Rao’s quadratic
                                                               index. Theoretical Population Biology, 70(3):237–243.
Leacock, Claudia and Martin Chodorow (1998). Com-              doi:10.1016/j.tpb.2006.06.003.
bining local context and WordNet similarity for word
sense identification.   In Christiane Fellbaum and             Sahlgren, Magnus (2008). The distributional hypothe-
George A. Miller, eds., WordNet: An Electronic Lex-            sis. Rivista di Linguistica, 20(1):33–53. URL http://
ical Database, chap. 11, pages 265–284. Cambridge,             linguistica.sns.it/RdL/20.1/Sahlgren.pdf.
MA: MIT Press.
                                                               Smith, Adam (1776). An Inquiry into the Nature and
Lee, Daniel D. and H. Sebastian Seung (1999). Learn-           Causes of the Wealth of Nations; Book I. Urbana, Illi-
ing the parts of objects by non-negative matrix factor-        nois: Project Gutenberg. Also known as: Wealth of Na-
ization. Nature, 401:788–791. doi:10.1038/44565.               tions. URL http://www.gutenberg.org/ebooks/
                                                               3300.
Leinster, Tom and Christina A. Cobbold (2012). Mea-
suring diversity: the importance of species similarity.        Wu, Zhibiao and Martha Palmer (1994). Verbs seman-
Ecology, 93(3):477–489. doi:10.1890/10-2402.1.                 tics and lexical selection. In Proceedings of the 32nd
                                                               annual meeting of the Association for Computational
Lin, Dekang (1998). An information-theoretic defini-           Linguistics, pages 133–138. Association for Computa-
tion of similarity. In Proceedings of the Fifteenth In-        tional Linguistics. URL https://www.aclweb.org/
ternational Conference on Machine Learning, ICML               anthology/P94-1019.pdf.
’98, pages 296–304. San Francisco, CA, USA: Morgan
Kaufmann.                                                      Appendix: Proof of the squared Euclidean nature
Marcon, Eric (2016).            Mesurer la Biodi-              of D in (7).
versité et la Structuration Spatiale.      Thèse                  The number `i j of edges is the shortest path (in
d’habilitation, Université de Guyane. URL https:               the WordNet hierarchical tree) linking the concepts
//hal-agroparistech.archives-ouvertes.fr/                      associated to i and j is a a tree dissimilarity2 , and
tel-01502970.
                                                               hence a squared Euclidean dissimilarity (see, e.g.,
McGillivray, Barbara, Christer Johansson, and Daniel           Critchley and Fichet, 1994). Hence, (1) and (7)
Apollon (2008). Semantic structure from correspon-             entail
dence analysis. In Proceedings of the 3rd Textgraphs
Workshop on Graph-Based Algorithms for Natural Lan-                                                1          `i j
guage Processing, pages 49–52. Association for Com-                    d̂i j = 1 − ŝi j = 1 −            =
putational Linguistics. URL https://aclweb.org/
                                                                                                 1 + `i j   1 + `i j
anthology/W/W08/W08-2007.pdf.
                                                               that is d̂i j = ϕ(`i j ), where ϕ(x) = x/(1 + x). The
Miller, George A. (1995). WordNet: a lexical database          function ϕ(x) is non-negative, increasing, concave,
for English. Communications of the ACM, 38(11):39–
41. doi:10.1145/219717.219748.                                 with ϕ(0) = 0. For r ≥ 1, its even derivatives
                                                               ϕ (2r) (x) are non-positive, and its odd derivatives
Resnik, Philip (1993a). Selection and information: a           ϕ (2r−1) (x) are non-negative. That, is, ϕ(x) is a
class-based approach to lexical relationships. Tech.
Rep. IRCS-93-42, University of Pennsylvania Insti-
                                                               Schoenberg transformation, transforming a squared
tute for Research in Cognitive Science. URL http:              Euclidean dissimilarity into a squared Euclidean
//repository.upenn.edu/ircs_reports/200.                       dissimilarity (see, e.g., Bavaud, 2011), thus estab-
                                                               lishing the squared Euclidean nature of D in (7)
Resnik, Philip (1993b). Semantic classes and syntactic
ambiguity. In Proceedings of the Workshop on Human             (and, by related arguments, the p.s.d. nature of S).
Language Technology (HLT ’93), pages 278–283. As-
sociation for Computational Linguistics. URL https:            2. Provided no terms possess two direct hypernyms, which
//www.aclweb.org/anthology/H93-1054.pdf.                       seems to be verified for the verbs considered here




                                                          51