Same same but different:
    Type and typicality in a distributional model of complement coercion
      Alessandra Zarcone1              Sebastian Padó2         Alessandro Lenci3
    Universität des Saarlandes      Universität Stuttgart Università degli Studi di Pisa
      Saarbrücken, Germany           Stuttgart, Germany              Pisa, Italy
    1
      zarcone@coli.uni-saarland.de, 2 pado@ims.uni-stuttgart.de,
                              3
                                alessandro.lenci@unipi.it


                       Abstract                                 that manipulates both type and typicality. We
                                                                discuss the performance of existing DSMs and a
    We aim to model the results from a self-                    novel DSM combination. We also discuss how
    paced reading experiment, which tested                      type information can be emerge from distribu-
    the effect of semantic type clash and typ-                  tional information.
    icality on the processing of German com-
    plement coercion. We present two distri-                    2    Manipulating Type and Typicality
    butional semantic models to test if they
    can model the effect of both type and typ-                  In a self-paced reading study on German comple-
    icality in the psycholinguistic study. We                   ment coercion (Zarcone et al., in preparation), we
    show that one of the models, without ex-                    have manipulated both type and typicality. The
    plicitly representing type information, can                 dataset consists of 20 pairs of subjets (S) and as-
    account both for the effect of type and typ-                pectual verbs (V). Each pair is combined with four
    icality in complement coercion.                             nominal objects (O) in SOV order:
                                                                    [S Das Geburtstagskind] hat [O mit den Geschenken
1   Introduction: Complement Coercion                               [S The birthday boy]        has [O with the presents
                                                                    / der Feier / der Suppe / der Schicht] [V angefangen].
Complement coercion (The author began the book                      / party     / soup      / work shift] [V begun].
→ reading the book) has been shown to cause an
                                                                   The objects are: a high-typicality entity
increase in processing cost (Pylkkänen and McEl-
                                                                (presents); a high-typicality event (party); a low-
ree, 2006; Katsika et al., 2012), which has been as-
                                                                typicality entity (soup); and a low-typicality event
cribed to a type clash between an event-selecting
                                                                (work shift). The low-typicality objects are drawn
verb (begin) and an entity-denoting object (book).
                                                                from the high-typicality objects of other S-V pairs.
The increase in processing costs is found in com-
                                                                   The self-paced reading study yielded the fol-
parison with a baseline condition, where the same
                                                                lowing significant effects: (1) an effect of typical-
verb is combined with an event-denoting object
                                                                ity on reading times (t = 2.28, p = .02) at the
(journey), which does not trigger a type clash.
                                                                object region (indicating subject-object integra-
   A second influence on processing cost is the
                                                                tion), (2) an effect of object type on reading times
thematic fit or typicality of the fillers of the verb’s
                                                                (t = −2.5, p = .01) at the verb region (the region
argument slots (Bicknell et al., 2010; Matsuki et
                                                                of the type clash), (3) an interaction of type and
al., 2011): high-typicality combinations are pro-
                                                                thematic fit at the verb region (t = 2.04, p = .04).
cessed more quickly than low-typicality ones (the
                                                                Mean reading times per condition are reported in
mechanic checked the brakes / the spelling).
                                                                Table 1. In sum, the study shows that comple-
   Distributional semantic models (DSMs) can
                                                                ment coercion involves both type and typicality.
successfully model a range of psycholinguistic
                                                                Thus, computational models of complement coer-
phenomena, including the effect of typicality
                                                                cion need to account for both.
on complement coercion (Zarcone et al., 2012).
However, they generally do not include a notion                 3    Modeling the Experimental Results
of type. Can a DSM account for effects both of
type and typicality?                                            Distributional semantic models (DSMs) repre-
   In this paper, we consider experimental results              sent word meaning as high-dimensional vectors
from a study on complement coercion in German                   recording co-occurrences with elements of their

                  Copyright c by the paper’s authors. Copying permitted for private and academic purposes.
 In Vito Pirrelli, Claudia Marzi, Marcello Ferro (eds.): Word Structure and Word Usage. Proceedings of the NetWordS Final
                           Conference, Pisa, March 30-April 1, 2015, published at http://ceur-ws.org
                                                           91
                       Object region      Verb region                              SV
                                                                                                         ECU
                     mit den Geschenken   angefangen
                                                                                            OV
                      with the presents      began
   high-fit entity           642              819                   Subject        Object        Verb
   high-fit event            655              736                                                         JE
                                                                              SO            OV
    low-fit entity           667              802
     low-fit event           710              806
                                                              Figure 1: ECU vs. Joint Expectations for the verb
Table 1: Mean reading times per condition (in
ms) in the self-paced reading study.
                                                              ECU. We call the models following the ECU
                                                              procedure SOV+ and SOV*, depending on their
usage contexts. Semantic similarity is defined in             combination operation (sum and product, respec-
terms of a vector similarity metric such as cosine.           tively). Simpler models only consider the influ-
   Distributional Memory (DM, Baroni and Lenci                ence of subject or object on the verb (SV and OV
(2010)) is a DSM that includes syntactic knowl-               respectively), just by leaving out the combination
edge into the word representations. More con-                 step. These models can successfully account for
cretely, the TypeDM version of DM records word-               reading time results on a dataset of complement
relation-word tuples hw1 r w2 i. The tuples are               coercion in German that manipulates typicality but
weighted by Local Mutual Information (Evert,                  not type (Zarcone et al., 2012).
2005), which can be employed to model predicate-                 In order to test ECU on a dataset which ma-
argument typicality. For example, the weight of               nipulates both type and typicality, we evaluate the
hbook obj readi is higher than hlabel obj readi,              following ECU models on the complement coer-
which in turn is higher than helephant obj readi.             cion data in (Zarcone et al., in preparation): SO
TypeDM has been shown to be versatile and effec-              to model effects at the object given the subject;
tive in several semantic tasks, including predicting          SOV+, SOV* and OV to model effects at the verb.
verb-argument plausibility.                                   We expect these models to account for the typi-
3.1 Complement Coercion and DSMs.                             cality effect at the object (1), but not for the type
                                                              effects at the verb (2,3).
DM has been extended into the Expectation Com-
                                                                 The results are summarized in Table 2 (left and
position and Update model (ECU, Lenci (2011)),
                                                              middle). In accordance with our prediction, SO
a family of procedures that can be used to pre-
                                                              correctly yields the typicality effect at the object
dict the typicality of one sentence part given other
                                                              (F = 7.38, p < 0.01). Neither SOV+, SOV*, nor
sentence parts. E.g., to model the typicality at
                                                              OV can model the type-typicality interaction at the
the verb region in a German sentence with SOV
                                                              verb (3). Surprisingly, though, SOV* and OV yield
word order (e.g. Das Geburtstagskind hat mit dem
                                                              (2), an effect of type at the verb (F = 5.3228, p <
Geschenk angefangen / The birthday boy has with
                                                              0.05 and F = 20.388, p < 0.001, respectively).
the present begun), ECU determines the thematic
fit for the verb given subject and object:
                                                              Joint Expectations. The reading time study
  • compute an expectation for the verb given the             found that the subject-object typicality effects
    subject s, as the distribution over verbs v de-           linger at the verb, interacting with type. The main
    fined by the weights of the tuples hs subj vi             shortcoming of ECU is its inability to model the
                                                              typicality effects at the verb. This is due to the ar-
  • compute an expectation for the verb given the             chitecture of the SOV models (cf. Fig. 1, top): they
    object o, as the distribution over verbs v de-            compute the expectations for the verb first from
    fined by the weights of the tuples ho obj vi.             the subject (SV) and update them with the object’s
To combine the subject and object expectations,               expectations (OV). They ignore the interaction be-
we combine the two distributions component by                 tween subject and object (SO) – the source of typi-
component, typically either by sum or products.               cality effects (1,3) – corresponding to the assump-
This distribution is then represented in a vector             tion that this interaction should only matter at the
space by computing the centroid or prototype of               object. In order to account for this, we draw an
the vectors of the 20 most expected verbs. Finally,           analogy to the concept of joint probability:
the thematic fit for a verb v given the subject s and
the object o is its cosine similarity to the centroid.                              P (S, O, V )


                                                         92
                                                                     non-compos.        ECU                 JE
                                                                    SO      OV      SOV+ SOV*       SO+OV        SO*OV
   (1) effect of typicality at the object region (SO interaction)   X       ×        ×      ×         X            X
               (2) effect of type at the verb region (type clash)   ×       X        ×      X         ×            X
           (3) type x thematic fit interaction at the verb region   ×       ×        ×      ×         ×            ×

       Table 2: Overview of the results of the different DSMs: non-compositional, ECU and JE.


which is equivalent (by the chain rule), to                         4   Discussion: Type and Typicality
                                                                    We found that the SO model successfully accounts
                P (S)P (O|S)P (V |O)                                for the effect of typicality at the object. This is
                                                                    not surprising: one of the most typical tasks suc-
Treating the first term as a constant prior, we ob-                 cessfully performed by distributional models such
tain                                                                as ECU is predicting verb-argument plausibility,
                                                                    and ECU had already been successful in modeling
                   P (O|S)P (V |O)
                                                                    effects of typicality on reading times in German
                                                                    complement coercion (Zarcone et al., 2012).
which we can interpret distributionally as motiva-
                                                                       On the other hand, the ECU SOV models were
tion to reweight the typicality of the verb given
                                                                    not able to account for the type–typicality interac-
the object with the typicality of the object given
                                                                    tion at the verb. The JE model (SO * OV), which
the subject, thus re-introducing the subject-object
                                                                    we presented as an alternative to the ECU model
interaction into the verb prediction (cf. Figure 1,
                                                                    to better account for the typicality effects at the
bottom).
                                                                    verb, yielded effects of both type and typicality at
   In the Joint Expectation (JE) model, the the-                    the verb, but did not account for their interaction.
matic fit score assigned to the target verb is in-
                                                                       Our most surprising result is that the OV, SOV*,
fluenced both by the verb’s thematic fit with the
                                                                    and SO*OV models explain the effect of type. As
object (the verb’s initial thematic fit score, equiva-
                                                                    DSMs do not represent this concept explicitly, a
lent to the ECU weight for the hobject obj verbi
                                                                    possible interpretation suggested by our results is
tuple) and by the object’s thematic fit with the
                                                                    that type and typicality are not distinct categories,
subject (equivalent to the ECU weight for the
                                                                    but capture properties of predicate-argument com-
hsubject verb objecti tuple), which in turn is
                                                                    binations at different granularity levels.
used to reweight the verb’s score.
                                                                       Distributional models can account for types be-
   Similar to ECU, there is a choice of combi-                      cause they emerge from the observed corpus distri-
nation operations in JE (sum or product). Since                     butions. Specifically, for the aspectual verbs used
JE can be formulated as a simple wrapper around                     in the present data set, the distribution over their
ECU, ECU can be used to compute the individual                      objects – namely that event nouns occur much
components (e.g. SO, OV, or more complex ones)                      more frequently that object nouns (Zarcone et al.,
and these then just need to be combined additively                  2013) – corresponds more naturally to an inter-
(SO+OV) or multiplicatively (SO*OV).                                pretation in terms of types than of typicality. A
   The right-hand side of Table 2 shows the results                 compositional distributional model where seman-
for JE. SO+OV yields an effect of typicality (F =                   tic types emerge as patterns of behavior has the
6.777, p < 0.05) but no effect of type (2) or in-                   advantage of relying on minimal assumptions re-
teraction (3). SO*OV yields two main effects of                     garding the granularity of the type ontology, which
(2) type (F = 7.2359, p < 0.05) and typicality (F =                 is intriguing, as pattern recognition is a key aspect
7.2359, p < 0.01), although no interaction (3).                     of human cognition (Rumelhart and McClelland,
   Comparing the two models, we see that ECU                        1987; Saffran et al., 1996; Tomasello, 2009).
SO accounts for the results obtained at the object                     In conclusion, the picture that emerges from
(1), but the SOV models cannot explain the inter-                   our experiments is one where (1) expectations for
action with typicality on the verb (2,3). JE (SO *                  predicate-argument combinations have a hierar-
OV) models the effects of both type (2) and typ-                    chical structure, with types as a high-level distinc-
icality at the verb, but does not (yet) account for                 tion and typicality as a low-level distinction, (2)
their interaction (3).                                              both levels are different, but interact early during


                                                               93
processing, influencing reading times, and (3) both           Michael Tomasello. 2009. Constructing a language:
type and typicality can emerge from the “same                   A usage-based theory of language acquisition. Har-
                                                                vard University Press, Cambridge, MA.
same” distributional model.
                                                              Alessandra Zarcone, Jason Utt, and Sebastian Padó.
Acknowledgments                                                 2012. Modeling covert event retrieval in logi-
                                                                cal metonymy: probabilistic and distributional ac-
This research was funded by the German Research                 counts. In Proceedings of the 3rd Workshop on
Foundation (DFG) as part of SFB 732 ”Incremen-                  Cognitive Modeling and Computational Linguistics,
                                                                pages 70–79, Montréal, Canada.
tal Specification in Context” and SFB 1102 ”Infor-
mation Density and Linguistic Encoding”.                      Alessandra Zarcone, Alessandro Lenci, Sebastian
                                                                Padó, and Jason Utt. 2013. Fitting, not clashing! a
                                                                distributional semantic model of logical metonymy.
References                                                      In Proceedings of the 10th International Conference
                                                                on Computational Semantics, Potsdam, Germany.
Marco Baroni and Alessandro Lenci. 2010. Dis-
 tributional Memory: a general framework for                  Alessandra Zarcone, Alessandro Lenci, Ken McRae,
 corpus-based semantics. Computational Linguis-                 and Sebastian Padó. in preparation. Type and the-
 tics, 36(4):673–721.                                           matic fit in logical metonymy resolution.

Klinton Bicknell, Jeffrey L Elman, Mary Hare, Ken
  McRae, and Marta Kutas. 2010. Effects of event
  knowledge in processing verbal arguments. Journal
  of Memory and Language, 63:489–505.

Stefan Evert. 2005. The statistics of word cooccur-
   rences. Ph.D. thesis, Universität Stuttgart.

Argyro Katsika, David Braze, Ashwini Deo, and
  Maria Mercedes Piñango. 2012. Complement
  coercion: Distinguishing between type-shifting
  and pragmatic inferencing. The Mental Lexicon,
  7(1):58–76.

Alessandro Lenci. 2011. Composing and updating
  verb argument expectations: A distributional seman-
  tic model. In Proceedings of the 2nd Workshop on
  Cognitive Modeling and Computational Linguistics,
  pages 58–66, Portland, OR.

Kazunaga Matsuki, Tracy Chow, Mary Hare, Jeffrey L
  Elman, Christoph Scheepers, and Ken McRae.
  2011. Event-based plausibility immediately influ-
  ences on-line language comprehension. Journal of
  Experimental Psychology: Learning, Memory, and
  Cognition, 37(4):913–934.

Liina Pylkkänen and Brian McElree. 2006. The
   syntax-semantics interface: On-line composition
   of sentence meaning. In M. Traxler and M. A.
   Gernsbacher, editors, Handbook of Psycholinguis-
   tics, pages 539–579. Elsevier, Amsterdam, The
   Netherlands, 2nd edition.

David E. Rumelhart and James L. McClelland. 1987.
  Learning the past tenses of English verbs. Implicit
  rules or parallel distributed processing. In Mech-
  anisms of language acquisition, pages 249–308.
  Lawrence Erlbaum Associates, Hillsdale, NJ.

Jenny R Saffran, Richard N Aslin, and Elissa L New-
   port. 1996. Statistical learning by 8-month-old in-
   fants. Science, 274(5294):1926–1928.


                                                         94