=Paper=
{{Paper
|id=Vol-2314/paper4
|storemode=property
|title=Taking into account semantic similarities in correspondence analysis
|pdfUrl=https://ceur-ws.org/Vol-2314/paper4.pdf
|volume=Vol-2314
|authors=Mattia Egloff,François Bavaud
|dblpUrl=https://dblp.org/rec/conf/comhum/EgloffB18
}}
==Taking into account semantic similarities in correspondence analysis==
Taking into account semantic similarities in correspondence analysis
Mattia Egloff1 , François Bavaud1, 2
1 Department of Language and Information Sciences, University of Lausanne, Switzerland
2 Institute of Geography and Sustainability, University of Lausanne, Switzerland
{megloff1, fbavaud}@unil.ch
Abstract ing and comparing the average semantic similarity
within documents or contexts to the average se-
Term-document matrices feed most distri- mantic similarity between documents or contexts –
butional approaches to quantitative textual which supposes the recourse to some hand-crafted
studies, without consideration for the se- semantics, fairly unavailable at the time of Harris’
mantic similarities between terms, whose writings.
presence arguably reduces content variety. The present short study distinguishes both kinds
This contribution presents a formalism rem- of similarities and constitutes at this stage a proof
edying this omission, and makes an explicit of concept oriented towards the formalism and the
use of the semantic similarities as extracted conceptualization rather than large-scale applica-
from WordNet. A case study in similarity- tions – in the general spirit of the COMHUM 2018
reduced correspondence analysis illustrates conference. It yields a new measure of textual va-
the proposal. riety taking explicitly into account the semantic
similarities between terms, and permits to weigh
1 Introduction
the usage of the semantic similarity when analyzing
The term-document matrix N = (nik ) counts the the term-document matrix.
occurrences of n terms in p documents, and con-
stitutes the privileged input of most distributional 2 Data
studies in quantitative textual linguistics: chi2 dis-
After manually extracting the paragraphs of each of
similarities between terms or documents, distance-
the p = 11 chapters of Book I of “An Inquiry into
based clustering of terms or documents, multidi-
the Nature and Causes of the Wealth of Nations”
mensional scaling (MDS) on terms or documents;
by Adam Smith (Smith, 1776) (a somewhat arbi-
and, also, latent clustering by non-negative matrix
trary choice among myriads of other possibilities),
factorization (e.g., Lee and Seung, 1999) or topic
we tagged the parts of speech and lemma for each
modeling (e.g., Blei, 2012); as well as nonlinear
word of the corpus using the nlp4j tagger (Choi,
variants resulting from transformations of the inde-
2016). Subsequently we created a lemma-chapter
pendence quotients, as in the Hellinger dissimilari-
matrix, retaining only the type of words serving
ties, or transformations of the chi2 dissimilarities
a specific task, such as verbs. Terms i, j present
themselves (e.g., Bavaud, 2011).
in the chapters were then associated to their first
When using the term-document matrix, the se-
conceptual senses ci , c j , that is to their first Word-
mantic link between words is only indirectly ad-
Net synsets (Miller, 1995). We inspected several
dressed through the celebrated “distributional hy-
similarity matrices ŝi j = ŝ(ci , c j ) between pairs of
pothesis,” postulating an association between dis-
concepts ci and c j .
tributional similarity (the neighbourhood or close-
ness of words in a text) and meaning similarity 3 Semantic similarities
(the closeness of concepts) (Harris, 1954) (see also,
e.g., Sahlgren, 2008; McGillivray et al., 2008). Al- A few approaches for computing similarities be-
though largely accepted and much documented, the tween words have been proposed in the literature
study of the distributional hypothesis seems hardly (see, e.g., Gomaa and Fahmy, 2013). Recent mea-
tackled in an explicit way, by typically comput- sures use word embeddings (Kusner et al., 2015),
45
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
and tough these approaches are successful at re-
solving other NLP tasks, they suffer some draw- 1
ŝjch (ci , c j )=
backs in computing semantic similarity (Faruqui − log p(ci ) − log p(c j ) + 2 · log p(ci ∨ c j )
et al., 2016). Also, the latter methods are directly
based on the distributional hypothesis, and hence and obeys ŝjch (ci , ci ) = ∞.
unadapted to distinguish between distributional and
semantic dissimilarities, precisely. Among the above similarities, the path, Wu-
By contrast, the present paper uses WordNet, Palmer and Lin similarities obey the conditions
that is a humanly constructed ontology. The clas-
sical WordNet similarities ŝ(ci , c j ) between two ŝi j = ŝ ji ≥ 0 and ŝii = 1 . (2)
concepts ci and c j computed on WordNet take on In what follows, we shall use the path similarities
different forms. The conceptually easiest is the path when required.
similarity, defined from the number `(ci , c j ) ≥ 0 of
edges of the shortest-path (in the WordNet hierar- 4 A similarity-reduced measure of
chy) between ci and c j as follows: textual variety
ŝpath (ci , c j )=
1
(1) Let fi ≥ 0 be the relative frequency of term i, nor-
1 + `(ci , c j ) malized to ∑ni=1 fi . Shannon entropy H = − ∑i fi ln fi
constitutes a measure of relative textual variety,
The Leacock Chodorow similarity (Leacock and ranging from 0 (a single term repeats itself) to ln n
Chodorow, 1998) is based on the same principle but (all terms are different). Yet, the entropy does not
considers also the maximum depth D = maxi `(ci , 0) take into account the possible similarity between
(where 0 represents the root of the hierarchy, oc- the terms, in contrast to the reduced entropy R (our
cuped by the concept subsuming all the others) of nomenclature) defined as
the concepts in the WordNet taxonomy: n n
`(ci , c j ) R = − ∑ fi ln bi where bi = ∑ ŝi j f j . (3)
ŝlch (ci , c j ) = − log i=1 j=1
2D
In Ecology, bi is the banality of species i, measur-
The Wu-Palmer similarity (Wu and Palmer, 1994) ing its average similarity to other species (Marcon,
is based on the notion of lowest common subsumer 2016), proposed by Leinster and Cobbold (2012),
ci ∨ c j , that is the least general concept in the hier- as well as by Ricotta and Szeidl (2006). By con-
archy that is a hypernym or ancestor of both ci and struction, fi ≤ bi ≤ 1 and thus R ≤ H: the larger
c j: the similarities, the lower the textual variety as
2`(ci ∨ c j , 0) measured by the reduced entropy, as requested.
ŝwup (ci , c j )= Returning to the case study, we have, out of
`(ci , 0) + `(c j , 0)
the 643 verb lemmas initially present in the corpus,
The following similarities are further based on retained the n = 234 verb lemmas occurring at least
the concept of Information Content, proposed by 5 times (“be” and “have” excluded). Overall term
Resnik (Resnik, 1993b,a). The Information Con- weights fi , chapter weights ρk and term weights fik
tent of a concept c is defined as − log(p(c)), where within a chapter obtain from the n × p = 234 × 11
p(c) is the probability to encounter a concept c in term-document matrix N = (nik ) as
a reference corpus. The Resnik similarity (Resnik, ni• n•k nik
fi = ρk = fik = (4)
1995) is defined as: n•• n•• n•k
ŝres (ci , c j )= − log p(ci ∨ c j ) The corresponding entropies and reduced en-
tropies read H = 4.98 > R = 1.60. For each chap-
The Lin similarity (Lin, 1998) is defined as:
ter, the corresponding quantities are depicted in
2 · log p(ci ∨ c j ) figure 1. One can observe the so-called concav-
ŝlin (ci , c j )=
log p(ci ) + log p(c j ) ity property H > ∑k ρk Hk (always verified) and
R > ∑k ρk Rk (verified here), which says that the
Finally, the Jiang Coranth similarity (Jiang and variety of the whole is larger than the average vari-
Conrath, 1997) is defined as: ety of its constituents.
46
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
Shannon variety NShannon = exp(H) ≤ n represents
5
Shannon entropy Hk
reduced entropy Rk
the equivalent number of distinct types in a uni-
formly constituted corpus of same richness or di-
4
versity (in the entropy sense) as the currently exam-
ined corpus. Likewise, the reduced variety Nreduced =
3
exp(R) NShannon measures the equivalent number
of types if the latter were uniformly distributed and
2
completely dissimilar (that is si j = 0 for i 6= j): see
figure 2.
1
5 Ordinary correspondence analysis
0
1 2 3 4 5 6 7 8 9 10 11
chapter (recall)
Figure 1: Entropies Hk and reduced entropies Rk Correspondence analysis (CA) permits a simulta-
for each chapter k; dashed lines depict H and R. neous representation of terms and documents in
the so-called biplot (figure 3). CA results from
weighted multidimensional scaling (MDS) applied
140
Shannon variety exp(H)
reduced variety exp(R) χ
to the chi2 dissimilarities Dkl between documents
120
k and l
100
n
Dkl = ∑ fi (qik − qil )2 where qik = nni•ik nn••
χ
(5)
80
•k
i=1
60
or equivalently, on MDS applied to the chi2 dis-
40
similarities between terms. Note the qik in (5)
20
to constitute the independence quotients, that is
the ratio of the observed counts to their expected
0
1 2 3 4 5 6 7 8 9 10 11
chapter value under independence. Figure 3 constitutes
the two-dimensional projection of a weighted Eu-
Figure 2: Shannon varieties exp(Hk ) and reduced clidean configuration of min(234 − 1, 11 − 1) = 10
varieties exp(Rk ) for each chapter k; dashed lines dimensions, expressing a maximal proportion of
depict exp(H) and exp(R). 0.17 + 0.15 = 32% of dispersion or inertia ∆ =
2 ∑kl ρk ρl Dkl .
1 χ
confound
dimension 2 : proportion of inertia = 0.15
2
resolve Similarity-reduced correspondence analysis In
the case where documents k and l, differing by the
add
rear presence of distinct terms, contain semantically
1
belong manage constitute
replace
performsubdivide
derive
bear prepare
advance
labour
hear
remain
destine
transport
collect
furnish
lower set
similar terms, the “naive” chi2 dissimilarity (5),
facilitate run employ
found save enjoy
complain contribute situate
destroy
die
yield
decay expect
thrive
admit
manufacture decline
open pretend apply produce
improve
maintain starve
throw
multiply
compensate
repose execute
losepublish
lie call
see
acquire
think
occupy
write
succeed put
work
separate
lay
extend
approach
regard
live
demand
account
consist
dispose
make take
settledo
tend
arrive
pay bestow
affect
turn
limit
assure
force
represent
afford
encourage
use
compose
borrow
precede
shew
demonstrate
prohibit
increase
cultivate
consume
raise
inhabit
augment
discourage
reckon
rise
bring
connect
exhaust
feed
smuggle
discover
mislead
amount
confine
inclose
inform
acknowledge
diminish
which implicitly assumes distinct terms to be com-
educate
serve enter sail grant send require
dividecome
occasion
earn
hire
attempt
restrain
learn
bind leavesuffer procure
draw exceed
carry depend
fix
seem supply
suppose
owego sink
provide
observe
become
concern
fit
incorporate
teach obstruct
remove support
combine understand
follow
expose
oblige
choose
reducehappenaccord
cost judge pletely dissimilar, arguably overestimates their dif-
0
gain enact hinder know agree allow
compute
reward
import
imagine findfall
regulate
enable grow get
exercise obtain consider give sell
begin
continue
keep tell
say
endeavour determine
render
establish let mention introduce cover compare
appear
attend receive
value
imposearise vary
believereturn wear
ference. The latter should be downsized accord-
mean purchase
exchange estimate
intend
possess
want
ingly, in a way both reflecting the amount of shared
-1
stand
contain similarity between k and l, and still retaining the
buyweigh
preserve
exportdeface
express
reserve
squared Euclidean nature of their dissimilarity – a
rate
crucial requirement for the validity of MDS. This
-2
coin simple idea leads us to propose the following re-
-2.0 -1.5 -1.0 -0.5 0.0 0.5
duced squared Euclidean distance D̂kl between doc-
dimension 1 : proportion of inertia = 0.17 uments, taking into account both the distributional
and semantic differences between the documents,
Figure 3: Biplot of the 234 × 11 term-document namely
matrix. Circles depict terms and triangles depict
documents. D̃kl = ∑ t̃i j (qik − qil )(q jk − q jl ) (6)
ij
47
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
7 become
furnish
say
manufacture
regulate
supply
pretend
demand
express
admit
decline
rise
fall
0.20
continue
acknowledge
mention
produce
come
earn
see
represent fit
bring
0.1
contributetell
observe
improve feed
prohibit
dispose
sell
raise
teach
thrive
believe
suffer
6 require
cover
employ go
succeed
know loblige
choose
hear
lie
contain
exchange
occupy
borrow
consist
situate
encourage
intend
purchase
leave
cultivatefound
obtainlay
abour
shew
establish
value
reckon
increase
yield
judge
destine
accord
expose
set
decay regard
confine
consider
incorporate
call
provide
sail
force
sink
smuggle
throw
consume
deface
save
reduce
derive
coin
subdivide
compare
hire compute limit
afford
serve
settle
perform
determinemislead
concern
destroy
find
assure
maintain
acquire send
turn
lower
combine
open think
facilitate
inform
imagine
reserve
run
expect
replace
endeavour
take
preserve
live
impose
repose
divide
affect
augment rear
reward
collect
introduce
estimate
render
prepare
agree
use
understandattempt
enable
import
compensate
exercise
arrive restrain
inhabit
follow
grow
obstruct
draw
advance
givereceive suppose
bestow
fix
enact
rate
starve
separate
wear
connect
amount
applyapproach
enjoy
want
compose
allow
exceed
owe carrygrant
write
resolve
occasion
bear
inclose
put add
weigh
diminish
die
discourage
possess
publish
lose gainbind
discover
educate mean
multiply
extend
arise
dimension 2 : proportion of inertia = 0.18
dimension 2 : proportion of inertia = 0.04
begin
keep hinder
transport
manage exhaust
return
export
buy
procure
0.15
get
vary
precede
work
enter
remove
happen attend
execute
demonstrate
support
let
0.0
learn
constitute
complain account
do confound
tend
make dependstand
cost
remain
0.10
-0.1
belong
1
0.05
39
-0.2
11
0.00
-0.3
8
2 10
-0.05
-0.4
-0.10
5
-0.5
seem
appear
4
-0.15
-0.6
-0.05 0.00 0.05 0.10 0.15 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1
dimension 1 : proportion of inertia = 0.19 dimension 1 : proportion of inertia = 0.05
Figure 4: Weighted MDS of the document re- Figure 5: Weighted MDS on the term semantic dis-
duced dissimilarities D̃ (6), displaying the optimal similarities (7) for the 234 retained verbs. The first
two-dimensional projection of the reduced iner- dimension opposes do and make (whose similarity
tia ∆˜ = 21 ∑kl ρk ρl D̃kl = 0.025, which is roughly is 1) to the other verbs. The second dimension op-
50 times smaller than the ordinary inertia ∆ = poses appear and seem (with similarity 1) to the
1 χ
2 ∑kl ρk ρl Dkl = 1.156 of usual CA (figure 3). other verbs.
where (qik − qil )(q jk − q jl ) captures the distribu- (see the Appendix), and this circumstance allows
tional contribution, and a weighted MDS on semantic dissimilarities be-
tween terms, aimed at depicting an optimal low-
fi f j ŝi j
t̃i j = p where b = Ŝ f is the banality dimensional representation of the semantic inertia
bi b j
1
captures the semantic contribution. Matrix T̃ = ∆ˆ = ∑ fi f j d̂i j , (8)
2 ij
(t̃i j ) has been designed so that
irrespectively of the distributional term-document
• T̃ = diag( f ) for “naive” similarities Ŝ = I structure (figures 5 and 6).
(where I is the identity matrix), in which case
D̃ is the usual chi2-dissimilarity A family of similarities interpolating between
totally distinct types and confounded types
• T̃ = f f 0 for “confounded types” Ŝ = J (where The exact form of similarities Ŝ between terms
J is the unit matrix filled with ones), in which fully governs the similarity-reduction mechanism
case D̃ is identically zero. investigated so far. Yet, little systematic investi-
Also, one can prove D̃ in (6) to be a squared Eu- gation seems to have been devoted to the formal
clidean dissimilarity iff S is positive semi-definite, properties of similarities (by contrast to the study
that is iff all its eigenvalues are non-negative, a of the dissimilarities families found, for example,
verified condition for path dissimilarities (see the in Critchley and Fichet (1994) or Deza and Lau-
Appendix). Figure 4 depicts the corresponding rent (2009), which may obey much more specific
MDS. properties than (2). In particular, ŝαi j satisfies (2)
for α ≥ 0 if ŝi j does, and varying α permits to
Semantic MDS on terms Positive semi-definite interpolate between the extreme cases of “naive”
semantic similarities Ŝ of the form (2), such as similarities Ŝ = I and “confounded types” Ŝ = J.
the path similarities, generate squared Euclidean Lists of synonyms1 yield binary similarity matri-
dissimilarities as ces si j = 0 or 1. More generally, S can be defined
d̂i j = 1 − ŝi j (7) 1. For example: http://www.crisco.unicaen.fr/des/
48
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
have
1.0
R(β) H (dashed line)
dimension 2 : proportion of inertia = 0.05
0.6
0.8
0.4
0.6
~
stock
0.2
0.4
~(β) Δ (solid line)
store be
0.2
0.0
record
collect
accumulate degrade
fail
interpose
reach
redeem
rid
shut
vie
wait
gather
abolish
accompany
excel
love
meet
stop
surpass
spread
conceal
look
remember
spend
subsist
feel
signify
join
bid
cease
end
neglect
oppress
quit
avoid
complain
open
ruin
prevail
arrive
enter
separate
lead
intend
agree
cover
remove
enjoy
lose
perform
hire
confirm
kill
learn
concern
preclude
prevent
amount
wear
decide
succeed
constitute
leave
assure
begin
connect
exceed
examine
study
support
understand
precede
ascertain
determine
allow
let
happen
occur
desire
want
owe
vary
trade
know
include
need
require
find
cultivate
work
hold
keep
maintain
equal
give
act
create
acquire
get
apply
employ
use
go
move
travel
alter
change
register
reserve do
make prove
abound
rank
deserve
confound
rest
account
belong
remain
depend
cost
tend
recruit claim
deprive
desert
fear
prefer
rub
reap
disguise
diffuse
subscribe
strike
assign
shun
purpose
surprise
extort
pillage
unite
dishearten
dissipate
please
salt
scatter
waste
smelt
disturb
steal
annihilate
overbalance
preponderate
burn
devour
usurp
dig
practise
authorise
empower
evade
attain
foresee
assay
measure
frequent
border
surround
bound
part
demonstrate
mislead
seek
shew
value
govern
farm
manage
execute
destroy
print
publish
situate
compose
choose
apprehend
comprehend
denote
hinder
obstruct
listen
eat
preserve
encourage
promote
labour
lie
combine
compare
turn
occupy
attend
hear
bear
export
consist
see
possess
import
represent
arise
recover
court
establish
found
borrow
commit
engage
pursue
form
resort
build
inhabit
live
attempt
earn
oblige
obtain
receive
relate
cause
ascend
circulate
descend
fly
tread
withdraw
expel
fluctuate
stir
walk
crowd
pass
plough
exert
exercise
return
lower
dress
contain
send
transport
follow
undergo
conclude
buy
purchase
draw
grow
correspond
overdo
continue
advance
disqualify
dry
fatten
heat
thicken
weaken
lay
place
put
set
clear
debase
accustom
fall
take
contribute
lend
prepare
enable
disfranchise
rise
produce
affect
come
adjust
improve
exchange
say
state
tell
carry
allege
blow
reproach
disperse
surmount
pave
command
catch
marry
drown
navigate
squeeze
plunder
protect
enlist
grind
distinguish
annex
undervalue
entitle
read
mark
mount
coincide
conduct
pick
extirpate
charge
despise
disdain
insure
convince
adopt
elect
affix
abandon
inclose
facilitate
retain
save
hunt
lodge
enforce
refuse
reside
smuggle
force
impose
erect
consume
conceive
adventure
repose
invent
knit
procure
secure
skim
renew
prompt
saunter
transact
breed
engross
treat
imagine
sail
run
throw
gain
attract
endeavour
moulder
rot
resemble
correct
coin
write
entrust
subdivide
plant
sow
assemble
burden
inflame
mineralize
retard
water
load
decay
settle
approach
occasion
break
complete
clothe
divide
abridge
die
influence
enrich
sink
subject
relieve
figure
remark
discover
observe
reject
compute
ascribe
impute
derive
infer
fix
mend
repair
rate
adjudge
enlarge
educate
accept
resolve
decline
call
declare
render
reduce
expect
augment
distribute
enumerate
better
escape
extend
oversee
infest
overrun
tax
deposit
tie
guard
visit
forfeit
operate
exclude
grant
bind
permit
animate
decrease
diminish
lessen
undertake
restore
fancy
pretend
abuse
submit
think
clip
manure
adulterate
comply
remedy
weave
aim
sell
sacrifice
bake
replace
mention
refer
melt
regulate
accord
restrain
address
communicate
crop
till
speak
become
suffer
revive
people
rear
satisfy
parcel
ask
enquire
inquire
deface
restrict
digest
distress
raise
interrupt
persuade
content
exhaust
incorporate
deduct
degenerate
dismiss resist glitter seem
mortgage
retail
last
starve
gravitate
weigh
stand appear
multiply
double
quadruple
triple
promise
expose
propose
accommodate
fit
suit
judge
hazard
discourage
inform
furnish
provide
supply
estimate
balance
presume
care
drive
manufacture
suspend
share
acknowledge
admit
undersell
believe
endure
forge
appoint
proportion
dwell
dictate
victual
consult
deliver
dispose
copy
reckon
proceed
offer
yield
widen
commend
extol
doubt
except
report
increase
convict
contract
recommend
feed
bring
suspect
explain
teach
add
discharge
trust
acquaint
introduce
connive
license
pasture
deceive
cheat
overstock
understock
conquer
compensate
counterbalance
accelerate
quicken
transcribe
petition
solicit
enhance
stipulate
bribe
defray
recompense
enact
ordain
rent
repeat
flourish
thrive
indicate
attest
certify
demand
serve
suppose
domineer
defraud
wonder
consider
regard
deliberate
oppose
exact
confine
limit
destine
authenticate
encumber
trifle
short
pay
certificate
bestow
beg
deal
illustrate
convey
order
tempt
reward
describe
direct
express
mean
prohibit
afford
0.0
Δ
-0.2
-0.2 0.0 0.2 0.4 1e-03 1e-01 1e+01 1e+03
dimension 1 : proportion of inertia = 0.15 β bandwidth parameter
Figure 6: Weighted MDS on the term semantic Figure 7: The larger the bandwidth parameter β ,
dissimilarities (7) for the 643 verbs initially present the less similar are the terms, and hence the greater
in the corpus, emphasizing the particular position ˜ ) as well as the reduced
are the reduced inertia ∆(β
of be and have entropy R̃(β ) (3)
as a convex combination of binary synonymy rela- the collapse of the cloud of document coordinates
tions, insuring its non-negativity, symmetry, pos- (figure 8). As a matter of fact, the bandwidth pa-
itive definiteness, with sii = 1 for all terms i. A rameter β controls the paradigmatic sensitivity of
family of such semantic similarities indexed by the the linguistic subject: the larger β , the larger the
bandwidth parameter β > 0 obtains as semantic distances between the documents, and the
larger the spread of the factorial cloud as measured
ˆ
si j = exp(−β d̂i j /∆) (9) ˜ ) (figure 7). On the other
by reduced inertia ∆(β
direction, a low β can model an illiterate person,
where d̂i j is the semantic dissimilarity (7) and ∆ˆ
sadly unable to discriminate between documents,
the associated semantic inertia (8).
which look all alike.
As a matter of fact, it can be shown that a binary
S makes the similarity-reduced document dissim- 6 Conclusion and further issues
ilarity D̃kl (6) identical to the chi2 dissimilarity
(5), with the exception that the sum now runs on Despite the technicality of its exposition, the idea
cliques of synonyms rather than terms. Also, the of this contribution is straightforward, namely to
limit β → 0 in (9) makes D̃kl → 0 with a reduced propose a way to take semantic similarity explic-
˜ ) = 1 ∑kl ρk ρl D̃kl tending to zero. In the
inertia ∆(β itly into account, within the classical distributional
2
χ
opposite direction, β → ∞ makes D̃kl → Dkl pro- similarity framework provided by correspondence
vided d̂i j > 0 for i 6= j, a circumstance violated in analysis. Alternative approaches and variants are
the case study, where the n = 234 verbs display, ac- obvious: further analysis on non-verbs should be
cordingly to their first sense in WordNet, 15 cliques investigated; other definitions of D̃ are worth inves-
of size 2 (among which do-make and appear-seem, tigating; other choices of S are possible (in partic-
already encountered in figure 5) and 3 cliques of ular the original Ŝ extracted form Wordnet). Also,
size 3 (namely, employ-apply-use, set-lay-put and alternatives to WordNet path similarities (e.g., for
supply-furnish-provide). In any case, the relative languages in which WordNet is not defined) are
reduced inertia ∆(β˜ )/∆ is increasing in β (figure required.
7). On the document side, and despite its numerous
Performing the similarity-reduced correspon- achievements, the term-document matrix still relies
dence analysis on the reduced dissimilarities (6) on a rudimentary approach to textual context, mod-
between the 11 document, with similarity matri- elled as p documents consisting of bag of words.
ces S(β ) (instead of Ŝ as in figure 4) demonstrates Much finer syntagmatic descriptions are possible,
49
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
standard CA
dimension 2 : proportion of inertia = 0.15
dimension 2 : proportion of inertia = 0.15
6 β=100 6 References
1.0
1.0
Bavaud, François (2011). On the Schoenberg trans-
0.5
0.5
1 98 1
3
7
3
9
8 7 formations in data analysis: Theory and illustra-
11 11
10 10
tions. Journal of Classification, 28(3):297–314.
0.0
0.0
2
2
doi:10.1007/s00357-011-9092-x.
-0.5
-0.5
4 4
Bavaud, François, Christelle Cocco, and Aris Xanthos
-1.0
-1.0
5 5
(2015). Textual navigation and autocorrelation. In
-1.0 -0.5 0.0
dimension 1 : proportion of inertia = 0.17
0.5 -1.0 -0.5 0.0
dimension 1 : proportion of inertia = 0.17
0.5
G. Mirkros and J. Macutek, eds., Sequences in Lan-
guage and Text, pages 35–56. De Gruyter Mouton.
dimension 2 : proportion of inertia = 0.15
dimension 2 : proportion of inertia = 0.18
5
β=5 4 β=0.5
1.0
1.0
Blei, David M (2012). Probabilistic topic mod-
0.5
0.5
2 els. Communications of the ACM, 55(4):77–84.
4
10
11
1 10
382 5
6 911 doi:10.1145/2133806.2133826.
0.0
0.0
3 7
8 7
9
1
Choi, Jinho D. (2016). Dynamic Feature Induction:
-0.5
-0.5
6
The Last Gist to the State-of-the-Art. In Proceed-
ings of the 15th Annual Conference of the North Amer-
-1.0
-1.0
-1.0 -0.5 0.0 0.5 -1.0 -0.5 0.0 0.5 ican Chapter of the Association for Computational
dimension 1 : proportion of inertia = 0.17 dimension 1 : proportion of inertia = 0.21
Linguistics, NAACL’16, pages 271–281. San Diego,
CA. URL https://aclweb.org/anthology/N/
Figure 8: In the limit β → 0, both diagonal and N16/N16-1031.pdf.
off-diagonal similarities si j (β ) tend to one, making
Critchley, Frank and Bernard Fichet (1994). The par-
all terms semantically identical, thus provoking the
tial order by inclusion of the principal classes of dissim-
collapse of the cloud of document coordinates. ilarity on a finite set, and some of their basic properties.
In Bernard Van Cutsem, ed., Classification and Dissim-
ilarity Analysis, pages 5–65. New York, NY: Springer.
captured by the general concept of exchange ma- doi:10.1007/978-1-4612-2686-4_2.
trix E, giving the joint probability to select a pair Deza, Michel and Monique Laurent (2009). Ge-
of textual positions through textual navigation (by ometry of cuts and metrics, vol. 15 of Algorithms
reading, hyperlinks or bibliographic zapping, etc.). and Combinatorics. Berlin/Heidelberg: Springer.
E defines a weighted network whose nodes are the doi:10.1007/978-3-642-04295-9.
textual positions occupied by terms (Bavaud et al., Egloff, Mattia and Raphaël Ceré (2018). Soft textual
2015). cartography based on topic modeling and clustering of
The parallel with spatial issues (quantitative irregular, multivariate marked networks. In Chantal
Cherifi, Hocine Cherifi, Márton Karsai, and Mirco Mu-
geography, image analysis), where E defines the
solesi, eds., Complex Networks & Their Applications
“where”, and the features dissimilarities between VI, vol. 689 of Studies in Computational Intelligence,
positions D defines the “what”, is immediate (see, pages 731–743. Cham: Springer. doi:10.1007/978-3-
e.g., Egloff and Ceré, 2018). In all likelihood, de- 319-72150-7_59.
veloping both axes, that is taking into account se- Faruqui, Manaal, Yulia Tsvetkov, Pushpendre Rastogi,
mantic similarities on generalized textual networks, and Chris Dyer (2016). Problems with evaluation of
could provide a fruitful extension and renewal of word embeddings using word similarity tasks. In Pro-
the venerable term-document matrix paradigm, and ceedings of the 1st Workshop on Evaluating Vector
Space Representations for NLP, pages 30–35. Asso-
provide a renewed look to the distributional hypoth- ciation for Computational Linguistics. URL https:
esis, which can be reframed as a spatial autocorre- //aclweb.org/anthology/W/W16/W16-2506.pdf.
lation hypothesis.
Gomaa, Wael H. and Aly A Fahmy (2013). A
Acknowledgments survey of text similarity approaches. International
Journal of Computer Applications, 68(13):13–18.
The guidelines and organisation of M. Piotrowski, doi:10.5120/11638-7118.
chair of COMHUM 2018, as well as the sugges- Harris, Zellig S. (1954). Distributional structure. Word,
tions of two anonymous reviewers are gratefully 10(2-3):146–162.
acknowledged.
Jiang, Jay J. and David W. Conrath (1997). Semantic
similarity based on corpus statistics and lexical taxon-
omy. In Proceedings of International Conference on
Research in Computational Linguistics (ROCLING X),
pages 19–33.
50
Proceedings of the Workshop on Computational Methods in the Humanities 2018 (COMHUM 2018)
Kusner, Matt, Yu Sun, Nicholas Kolkin, and Kil- Resnik, Philip (1995). Using information content to
ian Weinberger (2015). From word embeddings to evaluate semantic similarity in a taxonomy. In Proceed-
document distances. In Francis Bach and David ings of the International Joint Conference for Artificial
Blei, eds., Proceedings of the 32nd International Con- Intelligence (IJCAI-95), pages 448–453.
ference on Machine Learning, vol. 37 of Proceed-
ings of Machine Learning Research, pages 957–966. Ricotta, Carlo and Laszlo Szeidl (2006). Towards a
PMLR. URL http://proceedings.mlr.press/ unifying approach to diversity measures: bridging the
v37/kusnerb15.html. gap between the Shannon entropy and Rao’s quadratic
index. Theoretical Population Biology, 70(3):237–243.
Leacock, Claudia and Martin Chodorow (1998). Com- doi:10.1016/j.tpb.2006.06.003.
bining local context and WordNet similarity for word
sense identification. In Christiane Fellbaum and Sahlgren, Magnus (2008). The distributional hypothe-
George A. Miller, eds., WordNet: An Electronic Lex- sis. Rivista di Linguistica, 20(1):33–53. URL http://
ical Database, chap. 11, pages 265–284. Cambridge, linguistica.sns.it/RdL/20.1/Sahlgren.pdf.
MA: MIT Press.
Smith, Adam (1776). An Inquiry into the Nature and
Lee, Daniel D. and H. Sebastian Seung (1999). Learn- Causes of the Wealth of Nations; Book I. Urbana, Illi-
ing the parts of objects by non-negative matrix factor- nois: Project Gutenberg. Also known as: Wealth of Na-
ization. Nature, 401:788–791. doi:10.1038/44565. tions. URL http://www.gutenberg.org/ebooks/
3300.
Leinster, Tom and Christina A. Cobbold (2012). Mea-
suring diversity: the importance of species similarity. Wu, Zhibiao and Martha Palmer (1994). Verbs seman-
Ecology, 93(3):477–489. doi:10.1890/10-2402.1. tics and lexical selection. In Proceedings of the 32nd
annual meeting of the Association for Computational
Lin, Dekang (1998). An information-theoretic defini- Linguistics, pages 133–138. Association for Computa-
tion of similarity. In Proceedings of the Fifteenth In- tional Linguistics. URL https://www.aclweb.org/
ternational Conference on Machine Learning, ICML anthology/P94-1019.pdf.
’98, pages 296–304. San Francisco, CA, USA: Morgan
Kaufmann. Appendix: Proof of the squared Euclidean nature
Marcon, Eric (2016). Mesurer la Biodi- of D in (7).
versité et la Structuration Spatiale. Thèse The number `i j of edges is the shortest path (in
d’habilitation, Université de Guyane. URL https: the WordNet hierarchical tree) linking the concepts
//hal-agroparistech.archives-ouvertes.fr/ associated to i and j is a a tree dissimilarity2 , and
tel-01502970.
hence a squared Euclidean dissimilarity (see, e.g.,
McGillivray, Barbara, Christer Johansson, and Daniel Critchley and Fichet, 1994). Hence, (1) and (7)
Apollon (2008). Semantic structure from correspon- entail
dence analysis. In Proceedings of the 3rd Textgraphs
Workshop on Graph-Based Algorithms for Natural Lan- 1 `i j
guage Processing, pages 49–52. Association for Com- d̂i j = 1 − ŝi j = 1 − =
putational Linguistics. URL https://aclweb.org/
1 + `i j 1 + `i j
anthology/W/W08/W08-2007.pdf.
that is d̂i j = ϕ(`i j ), where ϕ(x) = x/(1 + x). The
Miller, George A. (1995). WordNet: a lexical database function ϕ(x) is non-negative, increasing, concave,
for English. Communications of the ACM, 38(11):39–
41. doi:10.1145/219717.219748. with ϕ(0) = 0. For r ≥ 1, its even derivatives
ϕ (2r) (x) are non-positive, and its odd derivatives
Resnik, Philip (1993a). Selection and information: a ϕ (2r−1) (x) are non-negative. That, is, ϕ(x) is a
class-based approach to lexical relationships. Tech.
Rep. IRCS-93-42, University of Pennsylvania Insti-
Schoenberg transformation, transforming a squared
tute for Research in Cognitive Science. URL http: Euclidean dissimilarity into a squared Euclidean
//repository.upenn.edu/ircs_reports/200. dissimilarity (see, e.g., Bavaud, 2011), thus estab-
lishing the squared Euclidean nature of D in (7)
Resnik, Philip (1993b). Semantic classes and syntactic
ambiguity. In Proceedings of the Workshop on Human (and, by related arguments, the p.s.d. nature of S).
Language Technology (HLT ’93), pages 278–283. As-
sociation for Computational Linguistics. URL https: 2. Provided no terms possess two direct hypernyms, which
//www.aclweb.org/anthology/H93-1054.pdf. seems to be verified for the verbs considered here
51