Same same but different: Type and typicality in a distributional model of complement coercion Alessandra Zarcone1 Sebastian Padó2 Alessandro Lenci3 Universität des Saarlandes Universität Stuttgart Università degli Studi di Pisa Saarbrücken, Germany Stuttgart, Germany Pisa, Italy 1 zarcone@coli.uni-saarland.de, 2 pado@ims.uni-stuttgart.de, 3 alessandro.lenci@unipi.it Abstract that manipulates both type and typicality. We discuss the performance of existing DSMs and a We aim to model the results from a self- novel DSM combination. We also discuss how paced reading experiment, which tested type information can be emerge from distribu- the effect of semantic type clash and typ- tional information. icality on the processing of German com- plement coercion. We present two distri- 2 Manipulating Type and Typicality butional semantic models to test if they can model the effect of both type and typ- In a self-paced reading study on German comple- icality in the psycholinguistic study. We ment coercion (Zarcone et al., in preparation), we show that one of the models, without ex- have manipulated both type and typicality. The plicitly representing type information, can dataset consists of 20 pairs of subjets (S) and as- account both for the effect of type and typ- pectual verbs (V). Each pair is combined with four icality in complement coercion. nominal objects (O) in SOV order: [S Das Geburtstagskind] hat [O mit den Geschenken 1 Introduction: Complement Coercion [S The birthday boy] has [O with the presents / der Feier / der Suppe / der Schicht] [V angefangen]. Complement coercion (The author began the book / party / soup / work shift] [V begun]. → reading the book) has been shown to cause an The objects are: a high-typicality entity increase in processing cost (Pylkkänen and McEl- (presents); a high-typicality event (party); a low- ree, 2006; Katsika et al., 2012), which has been as- typicality entity (soup); and a low-typicality event cribed to a type clash between an event-selecting (work shift). The low-typicality objects are drawn verb (begin) and an entity-denoting object (book). from the high-typicality objects of other S-V pairs. The increase in processing costs is found in com- The self-paced reading study yielded the fol- parison with a baseline condition, where the same lowing significant effects: (1) an effect of typical- verb is combined with an event-denoting object ity on reading times (t = 2.28, p = .02) at the (journey), which does not trigger a type clash. object region (indicating subject-object integra- A second influence on processing cost is the tion), (2) an effect of object type on reading times thematic fit or typicality of the fillers of the verb’s (t = −2.5, p = .01) at the verb region (the region argument slots (Bicknell et al., 2010; Matsuki et of the type clash), (3) an interaction of type and al., 2011): high-typicality combinations are pro- thematic fit at the verb region (t = 2.04, p = .04). cessed more quickly than low-typicality ones (the Mean reading times per condition are reported in mechanic checked the brakes / the spelling). Table 1. In sum, the study shows that comple- Distributional semantic models (DSMs) can ment coercion involves both type and typicality. successfully model a range of psycholinguistic Thus, computational models of complement coer- phenomena, including the effect of typicality cion need to account for both. on complement coercion (Zarcone et al., 2012). However, they generally do not include a notion 3 Modeling the Experimental Results of type. Can a DSM account for effects both of type and typicality? Distributional semantic models (DSMs) repre- In this paper, we consider experimental results sent word meaning as high-dimensional vectors from a study on complement coercion in German recording co-occurrences with elements of their Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org 91 Object region Verb region SV ECU mit den Geschenken angefangen OV with the presents began high-fit entity 642 819 Subject Object Verb high-fit event 655 736 JE SO OV low-fit entity 667 802 low-fit event 710 806 Figure 1: ECU vs. Joint Expectations for the verb Table 1: Mean reading times per condition (in ms) in the self-paced reading study. ECU. We call the models following the ECU procedure SOV+ and SOV*, depending on their usage contexts. Semantic similarity is defined in combination operation (sum and product, respec- terms of a vector similarity metric such as cosine. tively). Simpler models only consider the influ- Distributional Memory (DM, Baroni and Lenci ence of subject or object on the verb (SV and OV (2010)) is a DSM that includes syntactic knowl- respectively), just by leaving out the combination edge into the word representations. More con- step. These models can successfully account for cretely, the TypeDM version of DM records word- reading time results on a dataset of complement relation-word tuples hw1 r w2 i. The tuples are coercion in German that manipulates typicality but weighted by Local Mutual Information (Evert, not type (Zarcone et al., 2012). 2005), which can be employed to model predicate- In order to test ECU on a dataset which ma- argument typicality. For example, the weight of nipulates both type and typicality, we evaluate the hbook obj readi is higher than hlabel obj readi, following ECU models on the complement coer- which in turn is higher than helephant obj readi. cion data in (Zarcone et al., in preparation): SO TypeDM has been shown to be versatile and effec- to model effects at the object given the subject; tive in several semantic tasks, including predicting SOV+, SOV* and OV to model effects at the verb. verb-argument plausibility. We expect these models to account for the typi- 3.1 Complement Coercion and DSMs. cality effect at the object (1), but not for the type effects at the verb (2,3). DM has been extended into the Expectation Com- The results are summarized in Table 2 (left and position and Update model (ECU, Lenci (2011)), middle). In accordance with our prediction, SO a family of procedures that can be used to pre- correctly yields the typicality effect at the object dict the typicality of one sentence part given other (F = 7.38, p < 0.01). Neither SOV+, SOV*, nor sentence parts. E.g., to model the typicality at OV can model the type-typicality interaction at the the verb region in a German sentence with SOV verb (3). Surprisingly, though, SOV* and OV yield word order (e.g. Das Geburtstagskind hat mit dem (2), an effect of type at the verb (F = 5.3228, p < Geschenk angefangen / The birthday boy has with 0.05 and F = 20.388, p < 0.001, respectively). the present begun), ECU determines the thematic fit for the verb given subject and object: Joint Expectations. The reading time study • compute an expectation for the verb given the found that the subject-object typicality effects subject s, as the distribution over verbs v de- linger at the verb, interacting with type. The main fined by the weights of the tuples hs subj vi shortcoming of ECU is its inability to model the typicality effects at the verb. This is due to the ar- • compute an expectation for the verb given the chitecture of the SOV models (cf. Fig. 1, top): they object o, as the distribution over verbs v de- compute the expectations for the verb first from fined by the weights of the tuples ho obj vi. the subject (SV) and update them with the object’s To combine the subject and object expectations, expectations (OV). They ignore the interaction be- we combine the two distributions component by tween subject and object (SO) – the source of typi- component, typically either by sum or products. cality effects (1,3) – corresponding to the assump- This distribution is then represented in a vector tion that this interaction should only matter at the space by computing the centroid or prototype of object. In order to account for this, we draw an the vectors of the 20 most expected verbs. Finally, analogy to the concept of joint probability: the thematic fit for a verb v given the subject s and the object o is its cosine similarity to the centroid. P (S, O, V ) 92 non-compos. ECU JE SO OV SOV+ SOV* SO+OV SO*OV (1) effect of typicality at the object region (SO interaction) X × × × X X (2) effect of type at the verb region (type clash) × X × X × X (3) type x thematic fit interaction at the verb region × × × × × × Table 2: Overview of the results of the different DSMs: non-compositional, ECU and JE. which is equivalent (by the chain rule), to 4 Discussion: Type and Typicality We found that the SO model successfully accounts P (S)P (O|S)P (V |O) for the effect of typicality at the object. This is not surprising: one of the most typical tasks suc- Treating the first term as a constant prior, we ob- cessfully performed by distributional models such tain as ECU is predicting verb-argument plausibility, and ECU had already been successful in modeling P (O|S)P (V |O) effects of typicality on reading times in German complement coercion (Zarcone et al., 2012). which we can interpret distributionally as motiva- On the other hand, the ECU SOV models were tion to reweight the typicality of the verb given not able to account for the type–typicality interac- the object with the typicality of the object given tion at the verb. The JE model (SO * OV), which the subject, thus re-introducing the subject-object we presented as an alternative to the ECU model interaction into the verb prediction (cf. Figure 1, to better account for the typicality effects at the bottom). verb, yielded effects of both type and typicality at In the Joint Expectation (JE) model, the the- the verb, but did not account for their interaction. matic fit score assigned to the target verb is in- Our most surprising result is that the OV, SOV*, fluenced both by the verb’s thematic fit with the and SO*OV models explain the effect of type. As object (the verb’s initial thematic fit score, equiva- DSMs do not represent this concept explicitly, a lent to the ECU weight for the hobject obj verbi possible interpretation suggested by our results is tuple) and by the object’s thematic fit with the that type and typicality are not distinct categories, subject (equivalent to the ECU weight for the but capture properties of predicate-argument com- hsubject verb objecti tuple), which in turn is binations at different granularity levels. used to reweight the verb’s score. Distributional models can account for types be- Similar to ECU, there is a choice of combi- cause they emerge from the observed corpus distri- nation operations in JE (sum or product). Since butions. Specifically, for the aspectual verbs used JE can be formulated as a simple wrapper around in the present data set, the distribution over their ECU, ECU can be used to compute the individual objects – namely that event nouns occur much components (e.g. SO, OV, or more complex ones) more frequently that object nouns (Zarcone et al., and these then just need to be combined additively 2013) – corresponds more naturally to an inter- (SO+OV) or multiplicatively (SO*OV). pretation in terms of types than of typicality. A The right-hand side of Table 2 shows the results compositional distributional model where seman- for JE. SO+OV yields an effect of typicality (F = tic types emerge as patterns of behavior has the 6.777, p < 0.05) but no effect of type (2) or in- advantage of relying on minimal assumptions re- teraction (3). SO*OV yields two main effects of garding the granularity of the type ontology, which (2) type (F = 7.2359, p < 0.05) and typicality (F = is intriguing, as pattern recognition is a key aspect 7.2359, p < 0.01), although no interaction (3). of human cognition (Rumelhart and McClelland, Comparing the two models, we see that ECU 1987; Saffran et al., 1996; Tomasello, 2009). SO accounts for the results obtained at the object In conclusion, the picture that emerges from (1), but the SOV models cannot explain the inter- our experiments is one where (1) expectations for action with typicality on the verb (2,3). JE (SO * predicate-argument combinations have a hierar- OV) models the effects of both type (2) and typ- chical structure, with types as a high-level distinc- icality at the verb, but does not (yet) account for tion and typicality as a low-level distinction, (2) their interaction (3). both levels are different, but interact early during 93 processing, influencing reading times, and (3) both Michael Tomasello. 2009. Constructing a language: type and typicality can emerge from the “same A usage-based theory of language acquisition. Har- vard University Press, Cambridge, MA. same” distributional model. Alessandra Zarcone, Jason Utt, and Sebastian Padó. Acknowledgments 2012. Modeling covert event retrieval in logi- cal metonymy: probabilistic and distributional ac- This research was funded by the German Research counts. In Proceedings of the 3rd Workshop on Foundation (DFG) as part of SFB 732 ”Incremen- Cognitive Modeling and Computational Linguistics, pages 70–79, Montréal, Canada. tal Specification in Context” and SFB 1102 ”Infor- mation Density and Linguistic Encoding”. Alessandra Zarcone, Alessandro Lenci, Sebastian Padó, and Jason Utt. 2013. Fitting, not clashing! a distributional semantic model of logical metonymy. References In Proceedings of the 10th International Conference on Computational Semantics, Potsdam, Germany. Marco Baroni and Alessandro Lenci. 2010. Dis- tributional Memory: a general framework for Alessandra Zarcone, Alessandro Lenci, Ken McRae, corpus-based semantics. Computational Linguis- and Sebastian Padó. in preparation. Type and the- tics, 36(4):673–721. matic fit in logical metonymy resolution. Klinton Bicknell, Jeffrey L Elman, Mary Hare, Ken McRae, and Marta Kutas. 2010. Effects of event knowledge in processing verbal arguments. Journal of Memory and Language, 63:489–505. Stefan Evert. 2005. The statistics of word cooccur- rences. Ph.D. thesis, Universität Stuttgart. Argyro Katsika, David Braze, Ashwini Deo, and Maria Mercedes Piñango. 2012. Complement coercion: Distinguishing between type-shifting and pragmatic inferencing. The Mental Lexicon, 7(1):58–76. Alessandro Lenci. 2011. Composing and updating verb argument expectations: A distributional seman- tic model. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, pages 58–66, Portland, OR. Kazunaga Matsuki, Tracy Chow, Mary Hare, Jeffrey L Elman, Christoph Scheepers, and Ken McRae. 2011. Event-based plausibility immediately influ- ences on-line language comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(4):913–934. Liina Pylkkänen and Brian McElree. 2006. The syntax-semantics interface: On-line composition of sentence meaning. In M. Traxler and M. A. Gernsbacher, editors, Handbook of Psycholinguis- tics, pages 539–579. Elsevier, Amsterdam, The Netherlands, 2nd edition. David E. Rumelhart and James L. McClelland. 1987. Learning the past tenses of English verbs. Implicit rules or parallel distributed processing. In Mech- anisms of language acquisition, pages 249–308. Lawrence Erlbaum Associates, Hillsdale, NJ. Jenny R Saffran, Richard N Aslin, and Elissa L New- port. 1996. Statistical learning by 8-month-old in- fants. Science, 274(5294):1926–1928. 94