=Paper= {{Paper |id=Vol-1347/paper13 |storemode=property |title=Modelling semantic transparency in English compound nouns |pdfUrl=https://ceur-ws.org/Vol-1347/paper13.pdf |volume=Vol-1347 |dblpUrl=https://dblp.org/rec/conf/networds/BellS15 }} ==Modelling semantic transparency in English compound nouns== https://ceur-ws.org/Vol-1347/paper13.pdf

Modelling semantic transparency in English compound nouns
Melanie J. Bell Martin Schäfer
Anglia Ruskin University Friedrich Schiller University
Cambridge Jena
U.K. Germany
melanie.bell@anglia.ac.uk post@martinschaefer.info

1 Introduction the degree of expectedness of a particular word
sense and a particular relation for a given con-
Semantic transparency is known to play an im- stituent. In this paper, we provide evidence in
portant role in the storage and processing of support of this hypothesis: the more expected the
complex words (e.g. Marslen-Wilson et al. word sense and relation for a constituent, the
1994), and human raters of transparency achieve more transparent it is perceived to be.
high levels of agreement (e.g. Frisson et al. 2008,
Munro et al. 2010). In the case of noun-noun 2 Method
compounds, overall transparency is largely de-
termined by the transparency of the individual We used the publicly available dataset described
constituents. For example, Reddy et al. (2011) in Reddy et al. (2011), which gives human trans-
showed that the perceived transparency of a parency ratings for a set of 90 compound types
compound is highly correlated with both the sum and their constituents (N1 and N2), and compris-
and the product of the perceived transparencies es a total of 7717 ratings. To model the expect-
of its constituents. Furthermore, many psycho- edness of word senses and semantic relations for
linguistic studies find significant effects for se- a given compound constituent, we used the con-
mantic transparency using a four-way distinction stituent families of the compounds, which we
based on perceived constituent transparency: extracted in a two step process. We took all
transparent-transparent (e.g. carwash), transpar- strings of exactly two nouns that follow an article
ent-opaque (e.g. jailbird), opaque-transparent in the British National Corpus and which also
(e.g. strawberry) and opaque-opaque (e.g. hog- occur four times or more in the USENET corpus
wash) (Libben et al. 2003). Bell and Schäfer (Shaoul and Westbury 2010). From this set, we
(2013) modelled the transparency of individual extracted the positional constituent families for
compound constituents and showed that shifted all constituent nouns in the Reddy et al. dataset,
word senses reduce perceived transparency, giving a total of 4553 compounds for the N1
while certain semantic relations between constit- families and 9226 for the N2 families. Each of
uents increase it. However, this finding is prob- these compound types was coded for the seman-
lematic in at least two ways. Firstly, it is not tic relation between the constituents (after Levi
clear whether there is a solid basis for establish- 1978), and for the WordNet sense of the constit-
ing whether a specific word sense is shifted or uent under consideration (Princeton 2010). We
not. For example, card in credit card is clearly then calculated the proportion of compound
shifted if viewed etymologically, but may not types in each constituent family with each se-
synchronically be perceived as shifted due to its mantic relation (relation proportion), and each
frequent use. Secondly, work on conceptual WordNet sense of the constituent in question
combination by Gagné and collaborators has (synset proportion). We take these two measures
shown that relational information in compounds to reflect the expectedness of the respective rela-
is accessed via the concepts associated with indi- tions and WordNet senses of the constituents: if a
vidual modifiers and heads, rather than inde- relation or sense occurs in a high proportion of
pendently of them (e.g. Spalding et al. 2010 for the constituent family, it is more expected. These
an overview). This leads to the hypothesis that it variables were used, along with other quantita-
is not whether a specific word sense is etymolog- tive measures, as predictors in ordinary least
ically shifted, nor whether a specific semantic squares regression models of perceived constitu-
relation is used per se, that makes a compound ent transparency. The final model for the trans-
constituent more or less transparent; rather, it is parency of N1 is given in Table 1:
Copyright © by the paper’s authors. Copying permitted for private and academic purposes.
In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org

63
Coef S.E. t Pr(>|t|)
Intercept -4.6413 0.6593 -7.04 <0.0001
relation proportion in N1family -0.2187 0.6013 -0.36 0.7161
log family size of N1 -0.0189 0.0931 -0.20 0.8395
synset proportion in N1family -0.2426 0.6152 -0.39 0.6934
log synset count of N1 -0.7939 0.2469 -3.22 0.0013
compound proportion in N1 family (token-based) 3.0130 0.6788 4.44 <0.0001
log frequency of N1 0.8728 0.0569 15.34 <0.0001
relation proportion * log family size 0.3311 0.1305 2.54 0.0113
synset proportion * log synset count 0.6855 0.3161 2.17 0.0303
compound proportion * log frequency N1 -0.2804 0.0816 -3.44 0.0006

Table 1: Final model for the transparency of N1, R2 adjusted = 0.334

0
1.
4.

5
0

log synset count of N1

6
log family size of N1

5
3.0
1.

log frequency of N1
3.

10
4
5

0
5 2.5 2.
3.0

3
5
4 2.0 2. 8
2
3 1.5
1
2.5

3.0
6
1.0 0
2

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

relation proportion in N1family synset proportion in N1family compound proportion in N1 family (token-based)

Figure 1. Interaction plots for N1 transparency

3 Results 4 Conclusion
All predictors in our model enter into significant Overall, the model provides clear evidence for
interactions, and these are shown graphically in our hypothesis. N1 is rated as most transparent
Figure 1, where the contour lines on the plots when it is a frequent word, with a large family,
represent perceived transparency of the first con- occurring with its preferred semantic relation and
stituent (N1). The first plot shows an interaction most frequent sense, and with few other senses to
between relation proportion and overall (log) compete. We interpret the results as indicating
family size: for small families, relation propor- that compound constituents are perceived as
tion plays little role, whereas for larger families, more transparent when they are more expected
in accordance with our hypothesis, the transpar- (both generally and with a specific sense) and
ency of N1 increases with the proportion of the when they occur in their most expected semantic
corresponding relation in the family. The second environments. In information theory, the less
plot shows the interaction between the synset expected an event, the greater its information
proportion and the total number of a constitu- content: in so far as perceived transparency is a
ent’s senses (as listed in WordNet): only if there reflection of expectedness, it can therefore also
is a sufficient number of different senses in the be seen as the inverse of informativity.
family is their proportion a reliable predictor of
semantic transparency. There is also a small but Acknowledgements
significant interaction between the log frequency
This work was made possible by three short visit
of a constituent and the proportion of the constit-
grants from the European Science Foundation
uent family (in terms of tokens) represented by
through NETWORDS - The European Network
the compound in question: this shows that trans-
on Word Structure (grants 4677, 6520 and 7027),
parency increases with frequency, but only in the
for which the authors are extremely grateful.
lower frequently ranges does the proportion in
the family play a role.

64
References
Bell, Melanie J. and Martin Schäfer. 2013. Semantic
transparency: challenges for distributional
semantics. In Aurelie Herbelot, Roberto
Zamparelli and Gemma Boleda eds., Proceedings
of the IWCS 2013 workshop: Towards a formal
distributional semantics, 1–10. Potsdam:
Association for Computational Linguistics.
Frisson, Steven, Elizabeth Niswander-Klement and
Alexander Pollatsek. 2008. The role of semantic
transparency in the processing of English com-
pound words. British Journal of Psychology 991,
87–107.
Levi, Judith N. 1978. The syntax and semantics of
complex nominals. New York: Academic Press.
Marslen-Wilson, William, Lorraine K. Tyler,
Rachelle Waksler and Lianne Older. 1994. Mor-
phology and meaning in the English mental lexi-
con. Psychological Review 101, 1: 3-33.
Munro, Robert, Steven Bethard, Victor Kuperman,
Vicky Tzuyin Lai , Robin Melnick, Christopher
Potts, Tyler Schnoebelen and Harry Tily. 2010.
Crowdsourcing and language studies: the new
generation of linguistic data. In Proceedings of the
NAACL HLT 2010 Workshop on Creating Speech
and Language Data with Amazon's Mechanical
Turk, pp. 122-130. Association for Computational
Linguistics.
Princeton University. 2010. WordNet.

Reddy, Siva, Diana McCarthy and Suresh Manandhar.
2011. An empirical study on compositionality in
compound nouns. In Proceedings of The 5th In-
ternational Joint Conference on Natural Lan-
guage Processing 2011 IJCNLP 2011, Chiang
Mai, Thailand
Shaoul, Cyrus and Chris Westbury. 2010. An
anonymized multi-billion word USENET corpus
2005-2010
http://www.psych.ualberta.ca/˜westburylab/downl
oads/usenet.download.html
Spalding, Thomas L., Christina L. Gagné, Allison C.
Mullaly and Hongbo Ji. 2010. Relation-based in-
terpretation of noun-noun phrases: A new theoret-
ical approach. Linguistische Berichte Sonderheft
17, 283-315
Wurm, Lee H. 1997. Auditory processing of prefixed
English words is both continuous and
decompositional. Journal of Memory and Lan-
guage, 37, 438–461.