=Paper= {{Paper |id=Vol-1347/paper19 |storemode=property |title=Same same but different: type and typicality in a distributional model of complement coercion |pdfUrl=https://ceur-ws.org/Vol-1347/paper19.pdf |volume=Vol-1347 |dblpUrl=https://dblp.org/rec/conf/networds/ZarconePL15 }} ==Same same but different: type and typicality in a distributional model of complement coercion== https://ceur-ws.org/Vol-1347/paper19.pdf

Same same but different:
Type and typicality in a distributional model of complement coercion
Alessandra Zarcone1 Sebastian Padó2 Alessandro Lenci3
Universität des Saarlandes Universität Stuttgart Università degli Studi di Pisa
Saarbrücken, Germany Stuttgart, Germany Pisa, Italy
1
zarcone@coli.uni-saarland.de, 2 pado@ims.uni-stuttgart.de,
3
alessandro.lenci@unipi.it

Abstract that manipulates both type and typicality. We
discuss the performance of existing DSMs and a
We aim to model the results from a self- novel DSM combination. We also discuss how
paced reading experiment, which tested type information can be emerge from distribu-
the effect of semantic type clash and typ- tional information.
icality on the processing of German com-
plement coercion. We present two distri- 2 Manipulating Type and Typicality
butional semantic models to test if they
can model the effect of both type and typ- In a self-paced reading study on German comple-
icality in the psycholinguistic study. We ment coercion (Zarcone et al., in preparation), we
show that one of the models, without ex- have manipulated both type and typicality. The
plicitly representing type information, can dataset consists of 20 pairs of subjets (S) and as-
account both for the effect of type and typ- pectual verbs (V). Each pair is combined with four
icality in complement coercion. nominal objects (O) in SOV order:
[S Das Geburtstagskind] hat [O mit den Geschenken
1 Introduction: Complement Coercion [S The birthday boy] has [O with the presents
/ der Feier / der Suppe / der Schicht] [V angefangen].
Complement coercion (The author began the book / party / soup / work shift] [V begun].
→ reading the book) has been shown to cause an
The objects are: a high-typicality entity
increase in processing cost (Pylkkänen and McEl-
(presents); a high-typicality event (party); a low-
ree, 2006; Katsika et al., 2012), which has been as-
typicality entity (soup); and a low-typicality event
cribed to a type clash between an event-selecting
(work shift). The low-typicality objects are drawn
verb (begin) and an entity-denoting object (book).
from the high-typicality objects of other S-V pairs.
The increase in processing costs is found in com-
The self-paced reading study yielded the fol-
parison with a baseline condition, where the same
lowing significant effects: (1) an effect of typical-
verb is combined with an event-denoting object
ity on reading times (t = 2.28, p = .02) at the
(journey), which does not trigger a type clash.
object region (indicating subject-object integra-
A second influence on processing cost is the
tion), (2) an effect of object type on reading times
thematic fit or typicality of the fillers of the verb’s
(t = −2.5, p = .01) at the verb region (the region
argument slots (Bicknell et al., 2010; Matsuki et
of the type clash), (3) an interaction of type and
al., 2011): high-typicality combinations are pro-
thematic fit at the verb region (t = 2.04, p = .04).
cessed more quickly than low-typicality ones (the
Mean reading times per condition are reported in
mechanic checked the brakes / the spelling).
Table 1. In sum, the study shows that comple-
Distributional semantic models (DSMs) can
ment coercion involves both type and typicality.
successfully model a range of psycholinguistic
Thus, computational models of complement coer-
phenomena, including the effect of typicality
cion need to account for both.
on complement coercion (Zarcone et al., 2012).
However, they generally do not include a notion 3 Modeling the Experimental Results
of type. Can a DSM account for effects both of
type and typicality? Distributional semantic models (DSMs) repre-
In this paper, we consider experimental results sent word meaning as high-dimensional vectors
from a study on complement coercion in German recording co-occurrences with elements of their

Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org
91
Object region Verb region SV
ECU
mit den Geschenken angefangen
OV
with the presents began
high-fit entity 642 819 Subject Object Verb
high-fit event 655 736 JE
SO OV
low-fit entity 667 802
low-fit event 710 806
Figure 1: ECU vs. Joint Expectations for the verb
Table 1: Mean reading times per condition (in
ms) in the self-paced reading study.
ECU. We call the models following the ECU
procedure SOV+ and SOV*, depending on their
usage contexts. Semantic similarity is defined in combination operation (sum and product, respec-
terms of a vector similarity metric such as cosine. tively). Simpler models only consider the influ-
Distributional Memory (DM, Baroni and Lenci ence of subject or object on the verb (SV and OV
(2010)) is a DSM that includes syntactic knowl- respectively), just by leaving out the combination
edge into the word representations. More con- step. These models can successfully account for
cretely, the TypeDM version of DM records word- reading time results on a dataset of complement
relation-word tuples hw1 r w2 i. The tuples are coercion in German that manipulates typicality but
weighted by Local Mutual Information (Evert, not type (Zarcone et al., 2012).
2005), which can be employed to model predicate- In order to test ECU on a dataset which ma-
argument typicality. For example, the weight of nipulates both type and typicality, we evaluate the
hbook obj readi is higher than hlabel obj readi, following ECU models on the complement coer-
which in turn is higher than helephant obj readi. cion data in (Zarcone et al., in preparation): SO
TypeDM has been shown to be versatile and effec- to model effects at the object given the subject;
tive in several semantic tasks, including predicting SOV+, SOV* and OV to model effects at the verb.
verb-argument plausibility. We expect these models to account for the typi-
3.1 Complement Coercion and DSMs. cality effect at the object (1), but not for the type
effects at the verb (2,3).
DM has been extended into the Expectation Com-
The results are summarized in Table 2 (left and
position and Update model (ECU, Lenci (2011)),
middle). In accordance with our prediction, SO
a family of procedures that can be used to pre-
correctly yields the typicality effect at the object
dict the typicality of one sentence part given other
(F = 7.38, p < 0.01). Neither SOV+, SOV*, nor
sentence parts. E.g., to model the typicality at
OV can model the type-typicality interaction at the
the verb region in a German sentence with SOV
verb (3). Surprisingly, though, SOV* and OV yield
word order (e.g. Das Geburtstagskind hat mit dem
(2), an effect of type at the verb (F = 5.3228, p <
Geschenk angefangen / The birthday boy has with
0.05 and F = 20.388, p < 0.001, respectively).
the present begun), ECU determines the thematic
fit for the verb given subject and object:
Joint Expectations. The reading time study
• compute an expectation for the verb given the found that the subject-object typicality effects
subject s, as the distribution over verbs v de- linger at the verb, interacting with type. The main
fined by the weights of the tuples hs subj vi shortcoming of ECU is its inability to model the
typicality effects at the verb. This is due to the ar-
• compute an expectation for the verb given the chitecture of the SOV models (cf. Fig. 1, top): they
object o, as the distribution over verbs v de- compute the expectations for the verb first from
fined by the weights of the tuples ho obj vi. the subject (SV) and update them with the object’s
To combine the subject and object expectations, expectations (OV). They ignore the interaction be-
we combine the two distributions component by tween subject and object (SO) – the source of typi-
component, typically either by sum or products. cality effects (1,3) – corresponding to the assump-
This distribution is then represented in a vector tion that this interaction should only matter at the
space by computing the centroid or prototype of object. In order to account for this, we draw an
the vectors of the 20 most expected verbs. Finally, analogy to the concept of joint probability:
the thematic fit for a verb v given the subject s and
the object o is its cosine similarity to the centroid. P (S, O, V )

92
non-compos. ECU JE
SO OV SOV+ SOV* SO+OV SO*OV
(1) effect of typicality at the object region (SO interaction) X × × × X X
(2) effect of type at the verb region (type clash) × X × X × X
(3) type x thematic fit interaction at the verb region × × × × × ×

Table 2: Overview of the results of the different DSMs: non-compositional, ECU and JE.

which is equivalent (by the chain rule), to 4 Discussion: Type and Typicality
We found that the SO model successfully accounts
P (S)P (O|S)P (V |O) for the effect of typicality at the object. This is
not surprising: one of the most typical tasks suc-
Treating the first term as a constant prior, we ob- cessfully performed by distributional models such
tain as ECU is predicting verb-argument plausibility,
and ECU had already been successful in modeling
P (O|S)P (V |O)
effects of typicality on reading times in German
complement coercion (Zarcone et al., 2012).
which we can interpret distributionally as motiva-
On the other hand, the ECU SOV models were
tion to reweight the typicality of the verb given
not able to account for the type–typicality interac-
the object with the typicality of the object given
tion at the verb. The JE model (SO * OV), which
the subject, thus re-introducing the subject-object
we presented as an alternative to the ECU model
interaction into the verb prediction (cf. Figure 1,
to better account for the typicality effects at the
bottom).
verb, yielded effects of both type and typicality at
In the Joint Expectation (JE) model, the the- the verb, but did not account for their interaction.
matic fit score assigned to the target verb is in-
Our most surprising result is that the OV, SOV*,
fluenced both by the verb’s thematic fit with the
and SO*OV models explain the effect of type. As
object (the verb’s initial thematic fit score, equiva-
DSMs do not represent this concept explicitly, a
lent to the ECU weight for the hobject obj verbi
possible interpretation suggested by our results is
tuple) and by the object’s thematic fit with the
that type and typicality are not distinct categories,
subject (equivalent to the ECU weight for the
but capture properties of predicate-argument com-
hsubject verb objecti tuple), which in turn is
binations at different granularity levels.
used to reweight the verb’s score.
Distributional models can account for types be-
Similar to ECU, there is a choice of combi- cause they emerge from the observed corpus distri-
nation operations in JE (sum or product). Since butions. Specifically, for the aspectual verbs used
JE can be formulated as a simple wrapper around in the present data set, the distribution over their
ECU, ECU can be used to compute the individual objects – namely that event nouns occur much
components (e.g. SO, OV, or more complex ones) more frequently that object nouns (Zarcone et al.,
and these then just need to be combined additively 2013) – corresponds more naturally to an inter-
(SO+OV) or multiplicatively (SO*OV). pretation in terms of types than of typicality. A
The right-hand side of Table 2 shows the results compositional distributional model where seman-
for JE. SO+OV yields an effect of typicality (F = tic types emerge as patterns of behavior has the
6.777, p < 0.05) but no effect of type (2) or in- advantage of relying on minimal assumptions re-
teraction (3). SO*OV yields two main effects of garding the granularity of the type ontology, which
(2) type (F = 7.2359, p < 0.05) and typicality (F = is intriguing, as pattern recognition is a key aspect
7.2359, p < 0.01), although no interaction (3). of human cognition (Rumelhart and McClelland,
Comparing the two models, we see that ECU 1987; Saffran et al., 1996; Tomasello, 2009).
SO accounts for the results obtained at the object In conclusion, the picture that emerges from
(1), but the SOV models cannot explain the inter- our experiments is one where (1) expectations for
action with typicality on the verb (2,3). JE (SO * predicate-argument combinations have a hierar-
OV) models the effects of both type (2) and typ- chical structure, with types as a high-level distinc-
icality at the verb, but does not (yet) account for tion and typicality as a low-level distinction, (2)
their interaction (3). both levels are different, but interact early during

93
processing, influencing reading times, and (3) both Michael Tomasello. 2009. Constructing a language:
type and typicality can emerge from the “same A usage-based theory of language acquisition. Har-
vard University Press, Cambridge, MA.
same” distributional model.
Alessandra Zarcone, Jason Utt, and Sebastian Padó.
Acknowledgments 2012. Modeling covert event retrieval in logi-
cal metonymy: probabilistic and distributional ac-
This research was funded by the German Research counts. In Proceedings of the 3rd Workshop on
Foundation (DFG) as part of SFB 732 ”Incremen- Cognitive Modeling and Computational Linguistics,
pages 70–79, Montréal, Canada.
tal Specification in Context” and SFB 1102 ”Infor-
mation Density and Linguistic Encoding”. Alessandra Zarcone, Alessandro Lenci, Sebastian
Padó, and Jason Utt. 2013. Fitting, not clashing! a
distributional semantic model of logical metonymy.
References In Proceedings of the 10th International Conference
on Computational Semantics, Potsdam, Germany.
Marco Baroni and Alessandro Lenci. 2010. Dis-
tributional Memory: a general framework for Alessandra Zarcone, Alessandro Lenci, Ken McRae,
corpus-based semantics. Computational Linguis- and Sebastian Padó. in preparation. Type and the-
tics, 36(4):673–721. matic fit in logical metonymy resolution.

Klinton Bicknell, Jeffrey L Elman, Mary Hare, Ken
McRae, and Marta Kutas. 2010. Effects of event
knowledge in processing verbal arguments. Journal
of Memory and Language, 63:489–505.

Stefan Evert. 2005. The statistics of word cooccur-
rences. Ph.D. thesis, Universität Stuttgart.

Argyro Katsika, David Braze, Ashwini Deo, and
Maria Mercedes Piñango. 2012. Complement
coercion: Distinguishing between type-shifting
and pragmatic inferencing. The Mental Lexicon,
7(1):58–76.

Alessandro Lenci. 2011. Composing and updating
verb argument expectations: A distributional seman-
tic model. In Proceedings of the 2nd Workshop on
Cognitive Modeling and Computational Linguistics,
pages 58–66, Portland, OR.

Kazunaga Matsuki, Tracy Chow, Mary Hare, Jeffrey L
Elman, Christoph Scheepers, and Ken McRae.
2011. Event-based plausibility immediately influ-
ences on-line language comprehension. Journal of
Experimental Psychology: Learning, Memory, and
Cognition, 37(4):913–934.

Liina Pylkkänen and Brian McElree. 2006. The
syntax-semantics interface: On-line composition
of sentence meaning. In M. Traxler and M. A.
Gernsbacher, editors, Handbook of Psycholinguis-
tics, pages 539–579. Elsevier, Amsterdam, The
Netherlands, 2nd edition.

David E. Rumelhart and James L. McClelland. 1987.
Learning the past tenses of English verbs. Implicit
rules or parallel distributed processing. In Mech-
anisms of language acquisition, pages 249–308.
Lawrence Erlbaum Associates, Hillsdale, NJ.

Jenny R Saffran, Richard N Aslin, and Elissa L New-
port. 1996. Statistical learning by 8-month-old in-
fants. Science, 274(5294):1926–1928.