Grounding the Lexical Sets of Causative-Inchoative Verbs with Word
                            Embedding


 Edoardo Maria Ponti                    Elisabetta Jezek                  Bernardo Magnini
University of Cambridge           Università degli Studi di Pavia      Fondazione Bruno Kessler
 ep490@cam.ac.uk                      jezek@unipv.it                     magnini@fbk.eu


                 Abstract                          1     Introduction
English. Lexical sets contain the words            Lexicographic attempts to cope with verb sense
filling the argument positions of a verb           disambiguation often rely on “lexical sets”
in one of its senses. They can be ex-              (Hanks, 1996), which represent the lists of corpus-
tracted from corpora automatically. The            derived words that appear as arguments for each
purpose of this paper is demonstrating that        distinct verb sense. The arguments are the “slots”
their vector representation based on word          that have to be filled to satisfy the valency of a verb
embedding provides insights onto many              (subject, object, etc.). For example, {gun, bullet,
linguistic phenomena, such as causative-           shot, projectile, rifle...} is the lexical set of the ob-
inchoative verbs. A first experiment aims          ject for the sense ‘to shoot’ of to fire. In previ-
at investigating the internal structure of the     ous works, e.g. Montemagni et al. (1995), lexi-
sets, which are known to be radial and             cal sets were collected manually and were com-
continuous categories cognitively. A sec-          pared through set analysis. The measure of simi-
ond experiment shows that the distance             larity between two sets was proportional to the ex-
between the intransitive subject set and           tent of their intersection. We believe that possible
transitive object set is correlated with the       improvements may stem from deriving the lexical
spontaneity of the event expressed by the          sets automatically and from exploiting the seman-
verb, defined according to morphological           tic information of the fillers fully. In this work,
coding and frequency.                              we devise an extraction method from a huge cor-
Italiano. I set lessicali contengono le            pus and use a distributional semantics approach to
parole che occupano le posizioni argo-             perform our analyses. More specifically, we repre-
mentali di un verbo in una delle sue ac-           sent fillers as word vectors and compare them with
cezioni, e possono essere estratti in modo         spatial distance measures. In order to test the rel-
automatico dai corpora. L’obiettivo di             evance for linguistic theory of this approach, we
questo articolo è dimostrare che la loro           focus on a case study, namely the properties of
rappresentazione vettoriale illumina al-           verbs undergoing the causative-inchoative alterna-
cuni fenomeni linguistici, come i verbi            tion. Section 1.1. outlines a framework for word
ad alternanza causativo-incoativa. Un              embeddings and section 1.2 introduces the case
esperimento investiga la struttura in-             study. Section 2 presents the method and the data,
terna degli insiemi, che a livello cog-            whereas section 3 reports the results of a couple of
nitivo sono ritenuti categorie radiali e           experiments.
continue. Inoltre, un secondo esperi-
                                                   1.1    Word Embedding
mento mostra che la distanza fra l’insieme
dei soggetti intransitivi e l’insieme degli        The full exploitation of the semantic information
oggetti transitivi è correlata alla spon-          inherent to argument fillers for verbs can take ad-
taneità dell’evento espresso dal verbo,            vantage from some recent developments in distri-
definita secondo la marca morfologica e            butional semantics. Recently, efficient algorithms
la frequenza.                                      have been devised mapping each word of a vocab-
ulary into a corresponding vector of n real num-         and occur more frequently in the causative form.
bers, which can be thought as a sequence of co-
ordinates in a n-dimensional space (Mikolov et           2       Previous Work
al., 2013). This mapping is yielded by unsuper-          In the literature, many methods are available for
vised machine learning, based on the assumption          the automatic detection of verb classes, such as
that the meaning of a word can be inferred by its        causative-inchoative verbs. They exploit features
context, i.e. its neighbouring words in texts. This      based on argument alternations, such as subcate-
model has some relevant properties: the geomet-          gorization frames (Joanis et al., 2008). The identi-
ric closeness of two vectors corresponds to the          fication of verb classes displaying a diathesis alter-
similarity in meaning of the corresponding words.        nation was also performed through the analysis of
Moreover, its dimensions have possibly a semantic        selectional preferences. Most notably, the lexical
interpretation.                                          items were compared via distributional semantics
1.2   Causative-Inchoative Alternation                   (McCarthy, 2000).
                                                            These features were usually induced from au-
A possible testbed for the usefulness of represent-      tomatic parses of heterogeneous and wide corpora
ing the argument fillers as vectors are the verbs        (Schulte Im Walde, 2000). In particular, the ex-
showing the so called causative-inchoative alter-        traction of subcategorization frames was refined
nation. These verbs appear either as transitive or       including e.g. noise filters based on frequency
intransitive. In the first case, an agent brings about   (Korhonen et al., 2000). Our work is inspired by
a change of state; in the second, the change of a        these attempts to automatically induce lexical in-
patient is presented as spontaneous (e.g. to break,      formation regarding verbs, but its direction of re-
as in “Mary broke the key” vs. “the key broke”).         search is reversed. Indeed, rather than classify-
   The two alternative forms of these verbs can          ing verb classes given this information, it analyses
be morphologically asymmetrical: in this case,           this information given a verb class in order to shed
one has a derivative affix and the other does not.       light on its properties from the perspective of lin-
The first is labelled here as “marked”, the sec-         guistic theory.
ond as “basic”. Italian verbs with an asymmetrical
alternation derive from the phenomenon of anti-          3       Data and Method
causativization. The intransitive form is marked
                                                         The data are sourced from a sample of ItWac, a
since it is sometimes preceded by the clitic si
                                                         wide Italian corpus gathered through web crawling
(Cennamo and Jezek, 2011). Haspelmath (1993)
                                                         (Baroni et al., 2009). This sample was further en-
maintain that verbs that show a preference for
                                                         riched with morpho-syntactic information through
a marked causative form (and a basic inchoative
                                                         the MATE-tools parser (Bohnet, 2010)1 and fil-
form) cross-linguistically denote a more “sponta-
                                                         tered by sentence length (< 100). Eventually,
neous” situation. Spontaneity is intended by the
                                                         sentences in the sample amounted to 2,029,454
author as the likelihood of the occurrence of the
                                                         items. A target group of 20 causative-inchoative
event without the intervention of an agent. This
                                                         verbs was taken from Haspelmath et al. (2014):
work is non-committal with respect to whether
                                                         they are listed here based on the reported transi-
spontaneity be an actual semantic factor. Rather,
                                                         tive/intransitive frequency ratio, from the highest
it is considered a notion useful for labelling the
                                                         to the lowest.
observed variations in morphology and frequency.
   In this way, a correlation between the form
                                                             close > open > improve > break > fill > gather > connect
and the meaning of these verbs was demon-                    > split > stop > go out > rise > rock > burn > freeze >
strated. Moreover, Samardzic and Merlo (2012)                turn > dry > wake > melt > boil > sink
and Haspelmath et al. (2014) argue that verbs
that appear more frequently (intra- and cross-             The extraction step consisted in identifying
linguistically) in the inchoative form tend to mor-      their argument fillers inside the sentences in the
phologically derive the causative form, too. This        sample. In particular, the arguments considered
time, the correlation holds between form and fre-        were the subjects of intransitives (S) and objects
quency. Vice versa, situations entailing agentive            1
                                                             LAS scores for the relevant dependency relations: 0.751
participation prefer to mark the inchoative form         with dobj (direct object), 0.719 with nsubj (subject), 0.691
                                                         with nsubjpass (subject of a passive verb).
                                                                       Once the fillers have been mapped to their re-
                                                                    spective vectors, a lexical set appears as a group
                                                                    of points in a multi-dimensional model. The cen-
                                                                    tre of this group is the Euclidean mean among the
                                                                    vectors, which is a vector itself and is called cen-
                                                                    troid. In the first experiment, we calculated the co-
                                                                    ordinates of the centroid of the lexical sets S and O
                                                                    for any selected verb5 . Then we evaluated the co-
                                                                    sine similarity of every vector member of the sets
                                                                    from its centroid. The value of this metric goes
                                                                    from 0 (overlap) to 1 (maximum distance) and is
                                                                    useful to evaluate how far a filler is from its pro-
Figure 1: Distance of vectors from their centroid.                  totype. We obtained two sets of cosine similarity
                                                                    values for each verb: these can be plotted as boxes
                                                                    and whiskers, like in Figure 1. The example rep-
                                                                    resents those of dividere ‘to split’. The rectangles
                                                                    stand for the values in the second and third quar-
(O) (Dixon, 1994).2 These arguments are relevant                    tiles, whereas the horizontal line for the median6 .
because they are deemed to share the same fillers                   From all these distance values, we picked the me-
(Pustejovsky, 1995).                                                dian value for each lexical set. The plot of these
   These operations resulted in a database where                    medians for the S set and the O set of each verb or-
each verb lemma had a single entry and was as-                      dered according to Haspelmath’s ranking is shown
sociated with a list of fillers, divided by argument                in Figure 2.
type. With this procedure, lexical sets were ex-                       Two main results can be observed from these
tracted automatically, although they were not di-                   plots: the S lexical set lies in a more compact
vided by verb sense. Afterwards, each of the ar-                    range of distances, whereas O is more scattered.
gument fillers was mapped to a vector relying on a                  On the other hand, the vectors of S tend farther
space model pre-trained through Word2Vec (Dinu                      from the centroid. This is demonstrated by the
et al., 2015).3                                                     ranges where their distance values fall. Moreover,
                                                                    the averages of medians for the ten verbs on the
4    Experiments                                                    left part of the scale (frequently transitive) and for
In order to bring to light the linguistic informa-                  the ten verbs on the right (frequently intransitive)
tion concealed in the automatically extracted lexi-                 were compared. The average median in S was
cal sets, we devised two experiments. One investi-                  0.696567 for the former and 0.585263 for the lat-
gates the internal structure of lexical sets. In fact,              ter. The average median in O was 0.556878 for
previous works based on set theory treated them as                  the former and 0.522418 for the latter. This shows
categoric sets, of which a filler is either a member                that the variation in O appears to be random. On
or not. Research in psychology, however, has long                   the other hand, the median of the distances in S is
since demonstrated that the members of a linguis-                   normally lower for verbs that lie in the bottom half
tic set are found in a radial continuum where the                   of the Haspelmath’s scale.
most central one is the prototype for its category,                    The second experiment consisted in estimating
and those at the periphery are less representative                  the cosine distance between the centroid of S and
(Rosch, 1973; Lakoff, 1987).4 Word vectors allow                    the centroid of O for each verb. This operation was
to capture this spatial continuum.                                  aimed at finding to which extent the lexical sets of
    2
                                                                    S and O overlap. In fact, Montemagni et al. (1995)
      Subjects of forms with si were treated as intransitive sub-
jects. Subjects of passive verbs were treated as objects.           and McCarthy (2000) assessed in a corpus some
    3
      It was generated by a CBOW algorithm with negative            asymmetries between these lexical sets, which in
sampling, 300 dimensions, a context window of 10 tokens,            principle should share all their members.
pruning of infrequent words and sub-sampling.
    4                                                                   5
      For previous work on lexical sets considering prototyp-             Every filler was weighted proportionally to its absolute
icality in the context of the notion of shimmering, see Jezek       frequency.
and Hanks (2010).                                                       6
                                                                          The median is the value separating the higher half of the
                                                                    ordered values from the lower half.
Figure 2: Medians of S (left) and O (right) distances for verbs ranked by position in Haspelmath’s scale.


   Inspecting our results, the distance between S      ρ = 0.56391 with a quite strong confidence, i.e.
and O seems to behave as a measure of spon-            with p < 0.01.7
taneity, intended as cross-linguistic frequency and
morphological markedness of a verb: the more the       5   Discussion
centroids tend to be set apart, the more the verb      The representation of lexical sets of Italian
tends to have a morphologically unmarked and           causative-inchoative verbs as vectors was demon-
more frequent intransitive form. In fact, we com-      strated to provide insights into their internal struc-
pared the ranking of 20 alternating verbs accord-      ture and their relation with spontaneity defined ac-
ing to the ratio of their cross-linguistic frequency   cording to morphological coding and frequency.
of transitive and intransitive forms (Haspelmath et    The distances of the objects appeared to be dis-
al., 2014) and a ranking based on the centroid dis-    tributed more uniformly, whereas those of the
tances of the same verbs. Both these rankings are      intransitive subjects more densely and remotely
plotted in Figure 3: every verb is associated with     from the centroid. This difference cannot stem
its position in the two scales.                        from the frequency of anaphoric fillers (contrary
                                                       to transitive subjects), since both these argument
                                                       positions share the discursive function of introduc-
                                                       ing new referents, and are hence occupied by fully
                                                       referential fillers (Du Bois, 1985).
                                                          Moreover, the medians of the distances of the
                                                       subject fillers from their centroid were shown to
                                                       vary. An interpretation is that they are sensi-
                                                       ble to the frequency scale: this implies that fre-
                                                       quently transitive (hence, non-spontaneous) verbs
                                                       have semantically less homogeneous sets of ref-
                                                       erents, since they are farther from the prototype.
                                                       Possibly this discovery can be related with the
                                                       fact that non-spontaneous verbs impose less selec-
                                                       tional restrictions on subjects (McKoon and Mac-
                                                       farland, 2000).
Figure 3: Ranking based on cross-linguistic
                                                          The lack of a perfect correlation between these
form frequencies (green triangles) against ranking
                                                       vector distance and frequency measures is maybe
based on distance of the centroids of S and O in
                                                       due to errors in the automatic extraction and data
Italian (blue squares).
                                                       sparseness for the former, or an insufficient sample
                                                           7
   Both scales display a common tendency. In par-            An alternative measure was considered for the ranking:
                                                       the cardinality of the S-O intersection weighted by the set
ticular a Spearman’s ranking test was performed        union. In this case, Spearman correlation was ρ = 0.42255,
over them, yielding a mild positive correlation of     but it was not significant because p ≈ 0.06.
of languages in the typological survey of Haspel-             Future works should also choose different pre-
math et al. (2014) for the latter. A possible in-          trained vector models, in order to try and replicate
terpretation of the correlation is that the entities       these results. In particular, the new vector models
capable of bringing about a change of state and            could be optimized for similarity through semantic
those that undergo it are indiscernible only for           lexica (Faruqui et al., 2015) or based on syntactic
non-spontaneous verbs. Studies on causer entities          dependencies (Séaghdha, 2010). The experiments
related them not only with the feature of agentiv-         in this work may be extended to other languages,
ity, but also in general with the so-called ‘teleolog-     either individually or through a multi-lingual word
ical capability’ (Higginbotham, 1997).                     embedding (Faruqui and Dyer, 2014).

6   Conclusion
                                                           References
Our work provided evidence that lexical sets of
Italian causative-inchoative verbs are continuous          Marco Baroni, Silvia Bernardini, Adriano Ferraresi,
                                                            and Eros Zanchetta. 2009. The wacky wide
and radial categories, whose distribution around            web: a collection of very large linguistically pro-
the prototype vary to a great extent. It is sensi-          cessed web-crawled corpora. Language resources
tive to the grammatical role and sometimes to the           and evaluation, 43(3):209–226.
position of the verb in the so-called spontaneity          Bernd Bohnet. 2010. Very high accuracy and fast de-
scale. Moreover, a correlation was discovered be-            pendency parsing is not a contradiction. In Proceed-
tween the distance between transitive object and             ings of the 23rd International Conference on Com-
intransitive subject lexical sets of a given verb and        putational Linguistics, pages 89–97. Association for
                                                             Computational Linguistics.
its cross-linguistic tendency to appear more fre-
quently as intransitive or as transitive. Figure 4         Michela Cennamo and Elisabetta Jezek. 2011. The
is a synopsis of this result in the context of the           anticausative alternation in italian. I luoghi della
                                                             traduzione, pages 809–823.
correlations established in previous works.
                                                           Georgiana Dinu, Angeliki Lazaridou, and Marco Ba-
                      Spontaneous                            roni. 2015. Improving zero-shot learning by miti-
                                                             gating the hubness problem. workshop contribution
                                                             at ICLR 2015.

                             ?                             Robert MW Dixon. 1994. Ergativity. Cambridge Uni-
                                                             versity Press.
                                                           John W Du Bois. 1985. Competing motivations.
                Frequently Intransitive                      Iconicity in syntax, pages 343–365.
                                                           Manaal Faruqui and Chris Dyer. 2014. Improving
                                                            vector space word representations using multilingual
            τ =0.65                  ρ=0.56
                                                            correlation.
                                                           Manaal Faruqui, Jesse Dodge, Sujay K. Jauhar, Chris
Unmarked Intransitive          Distant S and O centres      Dyer, Eduard Hovy, and Noah A. Smith. 2015.
                                                            Retrofitting word vectors to semantic lexicons. In
Figure 4: Synopsis of correlations among fea-               Proceedings of NAACL.
tures of causative-inchoative verbs. The measures          Patrick Hanks. 1996. Contextual dependency and lex-
are based on Kendall Tau test (τ ) and Spearman’s            ical sets. International Journal of Corpus Linguis-
ranking test (ρ).                                            tics, 1(1):75–98.
                                                           Martin Haspelmath, Andreea Calude, Michael Spag-
   In Figure 4, solid lines stand for correla-              nol, Heiko Narrog, and Elif Bamyaci. 2014. Cod-
tions proven based on cross-linguistic evidence             ing causal–noncausal verb alternations: A form–
(frequency-form) and evidence from the Italian              frequency correspondence explanation. Journal of
language (frequency-lexical sets). The dotted line,         Linguistics, 50(03):587–625.
on the other hand, suggests the existence of and           Martin Haspelmath. 1993. More on the typology of
underlying motivation for the correlations, which           inchoative/causative verb alternations. Causatives
nonetheless remains unproven and undetermined               and transitivity, 23:87.
in its nature. Its possible validation is left to future   James Higginbotham. 1997. Location and causation.
research.                                                    Ms., University of Oxford.
Elisabetta Jezek and Patrick Hanks. 2010. What lex-
   ical sets tell us about conceptual categories. Lexis,
   4(7):22.
Eric Joanis, Suzanne Stevenson, and David James.
   2008.     A general feature space for automatic
   verb classification. Natural Language Engineering,
   14(03):337–367.
Anna Korhonen, Genevieve Gorrell, and Diana Mc-
  Carthy. 2000. Statistical filtering and subcatego-
  rization frame acquisition. In Proceedings of the
  2000 Joint SIGDAT conference on Empirical meth-
  ods in natural language processing and very large
  corpora, pages 199–206. Association for Computa-
  tional Linguistics.
George Lakoff. 1987. Women, fire, and danger-
  ous things: What categories reveal about the mind.
  Cambridge University Press.
Diana McCarthy. 2000. Using semantic preferences to
  identify verbal participation in role switching alter-
  nations. In Proceedings of the 1st North American
  chapter of the Association for Computational Lin-
  guistics conference, pages 256–263.
Gail McKoon and Talke Macfarland. 2000. Externally
  and internally caused change of state verbs. Lan-
  guage, pages 833–858.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
  Dean. 2013. Efficient estimation of word represen-
  tations in vector space. In Workshop at ICLR.
Simonetta Montemagni, Nilda Ruimy, and Vito Pir-
  relli. 1995. Ringing things which nobody can ring.
  a corpus-based study of the causative-inchoative al-
  ternation in italian. Textus online only. 8 (1995), N.
  2, 1995, 8(2):1000–1020.
James Pustejovsky. 1995. The generative lexicon. The
  MIT Press.
Eleanor H Rosch. 1973. Natural categories. Cognitive
  psychology, 4(3):328–350.
Tanja Samardzic and Paola Merlo. 2012. The mean-
  ing of lexical causatives in cross-linguistic variation.
  Linguistic Issues in Language Technology, 7(12):1–
  14.
Sabine Schulte Im Walde. 2000. Clustering verbs se-
  mantically according to their alternation behaviour.
  In Proceedings of the 18th conference on Computa-
  tional linguistics-Volume 2, pages 747–753.
Diarmuid O Séaghdha. 2010. Latent variable mod-
  els of selectional preference. In Proceedings of the
  48th Annual Meeting of the Association for Compu-
  tational Linguistics, pages 435–444.