Modelling Italian construction flexibility with distributional semantics: Are constructions enough? Lucia Busso Ludovica Pannitto Alessandro Lenci CoLing Lab, University of Pisa {lucia.busso90,ellepannitto}@gmail.com, alessandro.lenci@unipi.it Abstract meaning represents a radical departure from other theories of grammar is argument structure. These English. The present study combines Cxns, such as the English Ditransitive, are claimed psycholinguistic evidence on Italian va- to be associated with an abstract semantic content. lency coercion and a distributional analy- In this case, constructional meaning can be para- sis. The paper suggests that distributional phrased as X CAUSES Y TO RECEIVE Z. One of properties can provide useful insights on the main supporting arguments in favour of con- how general abstract constructions influ- structions as independent and primitive objects of ence the resolution of coercion effects. grammar is the flexibility with which argument However, complete understanding of the Cxns and verbs interact with each other, as in ex- processing and recognition of coercion re- ample (1) in which the original intransitive sense quires to take into consideration the com- of “to sneeze” is overridden by the Caused Motion plex intertwining of lexical verb and ab- Cxn, and thus takes a transitive sense of “making stract constructions. something move by sneezing”. Italiano. Il lavoro unisce uno studio (1) John sneezed the napkin off the table psicolinguistico sul fenomeno della coer- This flexibility in combining Cxns and verbs cion valenziale in Italiano con un’analisi is known as valency coercion (Michaelis, 2004; distribuzionale.L’articolo suggerisce che Boas, 2011; Lauwers and Willems, 2011; Perek le proprietà distribuzionali forniscano and Hilpert, 2014). un’utile passaggio per capire l’influenza This phenomenon, although vastly addressed delle costruzioni alla risoluzione di effetti for English, has not yet received a systematic in- di coercion. Tuttavia, una piena compren- vestigation in other languages. For notable excep- sione del fenomeno richiede di prendere in tions, see Boas and Gonzálvez-García (2014). In considerazione la complessa relazione tra particular – to the best of our knowledge – no pre- verbo e costruzione argomentale. vious attempt to carry out an empirical investiga- tion of valency coercion exists for Italian. How- 1 Introduction ever, even a simple corpus query reveals that the phenomenon is present in Italian, though it is not In Construction Grammar (Goldberg, 2006), the as pervasive as in English: basic units of linguistic analysis are called con- structions (Cxns), form-meaning pairings associ- (2) Tossì una risata leggera tra i suoi capelli ated with autonomous, non-compositional abstract (He coughed a light laugh in her hair) meanings, independently from the lexical items [ItWac] occurring in them. Examples of Cxns range from This paper presents an analysis of Italian construc- morphemes (e.g., pre-, -ing), to filled or partially- tional flexibility that combines psycholinguistic filled complex words (e.g., daredevil) to idioms and computational evidence: first, we present the (e.g., give the devil his dues) to more abstract results of a behavioral experiment on valency coer- patterns like the Ditransitive [Subj V Obj1 Obj2] cion. Then, we model Cxns with distributional se- (e.g., he gave her a book) (Goldberg, 2006). mantics to investigate whether the semantic shape Cxns appear at any level of linguistic analysis, of Italian argument Cxns can affect the interpreta- but the level at which the notion of constructional tion and processing of coerced sentences. 2 Studying valency coercion: an guage use changes with age (Eckert, 2017; Labov, acceptability rating task 2001; Wagner, 2012). Thus, it could be the case that grammaticality judgments on creative, non- MATERIALS AND SUBJECTS: The offline standard sentences are also affected by age. In- psycholinguistic experiment targets 9 Italian Cxns cluding different age groups in our analysis allows (see Table 1) that were selected using existing us to investigate a more representative sample of resources: LexIt (Lenci et al., 2012) and Val- the population. To control for the possible influ- Pal (Cennamo and Fabrizio, 2013). The resultant encing factor of education level, we only tested Cxns are of varying abstractness and schematicity adult speakers either in possess of (at least) a bach- levels (Barðdal, 2008). elor degree or enrolled in a University course. Ta- Cxn frames ble 2 summarizes number, age groups and distri- CAUSED MOTION (CM) NPj-V-NP -PPlocation bution of tested subjects. CAUSED MOTION + via (CMvia) NPs-V-NPobj DATIVE (DT) NPs-V-NPj-PPrecipient INTRANSITIVE MOTION (IM) NPs-V–PPlocation Age group Age range distribution Gender Tot PASSIVE (PASS) NPs-V-PP mean: 12.9 24 m (61,5%) Adolescents 12-14 39 PREDICATIVE (PRED) NPs-V–AdjPpredicate sd:0.63 15 f (38,4%) VERBA DICENDI explicit mean:27.3 15 m (37,5%) Young Adults 18-39 41 (sentential) (VDE) NPs-V-cheVP sd:2.94 25 f (62,5%) VERBA DICENDI implicit mean: 56.7 18 m (43,9%) (sentential) (VDI) NP-V-diVP Adults Over 40 40 sd:9.48 23 f (56,1%) Table 1: Constructions used in the test. Table 2: Data about tested subjects. For each Cxn, we built 21 sentences, which A within-subject design was used, in which were subdivided into 3 experimental conditions: each subject sees all stimuli. Participants were GRAMMATICAL (3a), COERCION (3b), IMPOSSI - asked to judge the acceptability of the (random- BLE (3c) (7 sentences per condition). The total ized) stimuli on a Likert scale from 1 - “com- number of stimuli amounts to 189 sentences. The pletely unnatural” - to 7 - “perfectly natural”. Pre- structure of the test was inspired by Perek and sentation of the data varied across age groups: Hilpert (2014). Between conditions, sentences dif- adolescents were given the test directly in their fer only for their main verb, to have as little varia- class. Young adults’ judgments were collected tion as possible. through the online platform Figure Eight. Older (3) a. Gianni ha detto che verrà domani (Gi- adults, instead, were presented with a simple Mi- anni said that he will come tomorrow) crosoft Word document, in order to include par- b. Gianni ha fischiettato che verrà do- ticipants who did not have familiarity with online mani (Gianni whistled that he will data gathering. come tomorrow) c. Gianni ha cucinato che verrà domani RESULTS: We assessed statistical significance (Gianni cooked that he will come to- via linear mixed effect modelling, with by-subject morrow) and by-item intercepts.1 Results show that coer- cion sentences (purple boxplot in Figure 1) are The coercion condition consists of verbs that dis- recognized as an intermediate condition between play a partial semantic incompatibility with the complete grammaticality and total ungrammati- constructional environment they are embedded in. cality.2 We consider this result to support the They were selected by means of both native intu- claim that coercion effects include a degree of ition and corpus query, selecting and refining cases semantic incompatibility that is nonetheless re- that were either hapax or rare occurrences in the solved in the interpretation process. Consistently Italian corpus ItWac (Baroni et al., 2009). 120 Italian native speakers were tested: 39 ado- 1 model selection performed automatically via LRT with lescents (12-14 years old), 40 young adults (18- the R package afex. Models were performed with the R pack- 35 years old), and 41 adults (over 40). We tested age lmerTest and R2 values were calculated with the MuMIn package (Singmann et al., 2016; Kuznetsova et al., 2017; Bar- subjects of different ages following extensive so- toń, 2013) 2 ciolinguistic literature that has shown that lan- p < 0.0001, R2c 0.61 Estimate Std. Error t value p value coer 3,64*** 0,1 37,45 <0.0001 gramm 2,66*** 0,02 110,87 <0.0001 imp -1,79*** 0,02 -74,84 <0.0001 CM -0,14 0,16 -0,91 0,36 CMvia -0,24 0,16 -1,53 0,13 CO -0,26. 0,13 -1,95 0,05 DT -1,34*** 0,17 -7,98 <0.0001 IM 1,02*** 0,16 6,40 <0.0001 PASS -0,73** 0,26 -2,75 0,009 Figure 1: distribution of judgments in the 3 condi- PRED -0,07 0,26 -0,27 0,79 VDE 1.06*** 0,16 6,67 <0.0001 tions VDI 0,70*** 0,15 4,57 <0.0001 with the main tenets of Construction Grammar, we Table 3: fixed effects estimates of the coercion argue that the resolution of such incompatibility condition is driven by a dynamic interaction between the main verb and the constructional context (Kem- dataset, the latter again estimated with distribu- mer, 2008; Kemmer and Yoon, 2013; Yoon, 2016). tional semantics. In a second analysis, we wanted to assess the effect Different degrees of flexibility could derive ei- of Cxn types on acceptability ratings. We used lin- ther from cognitive processes that reflect on lan- ear mixed effect modelling, adding an interaction guage use, or emerge from repeated exposure and between Cxn type and experimental condition.3 thus entrench in speakers’ grammar. Both possible Results indicate high variability in Cxn ‘coercibil- directions of this causal circle, however, ultimately ity’ (see Figure 2 and table 3). That is, some Cxns allow us to fruitfully investigate construction flex- in our dataset were consistently judged as more ibility using distributional semantics models. In natural by speakers in the coercion condition. other words, the higher ‘coercibility’ of novel in- stances of some Cxns could be due to speakers’ sensitivity to distributional semantic features of the constructions (Barddal, 2006; Bybee, 2013; Zeschel, 2012; Perek and Goldberg, 2017). 3 A Distributional Semantic Model for argument constructions PROCEDURE: Perek (2016) has shown that dis- tributional semantics (Lenci, 2018) can be fruit- fully used to model the semantic space covered by Figure 2: line plot of judgments a Cxn. It has been argued in the literature that con- structional meanings for argument Cxns arise from In particular, it appears that IM, VDE and VDI the meaning of high frequency verbs that co-occur Cxns result to be more natural, while DT, PASS with them (Goldberg, 1999; Casenhiser and Gold- and (marginally) CO are the least naturally per- berg, 2005; Barak and Goldberg, 2017). There- ceived ones in coercion sentences. Since coer- fore, we modelled the semantic content of Cxns cion effects are said to be resolved by the gen- with the semantics of their most typical verb, each eral Cxn semantics overriding the lexical mean- represented as a distributional vector. ing of the verb, we hypothesize that the different We used the UDLex Pipeline4 (Rambelli et al., flexibility degrees of the Cxns in the first experi- 2017) to obtain a mapping between the Cxns of ment could be at least partially explained by dis- our dataset and the most frequent verbs that occur tributional properties, such as type and token fre- in them (these were selected considering verbs that quency, and semantic density of the Cxns in our appear at least 5 times in the relevant subcatego- 3 4 p < 0.0001, R2c 0.76 The UDLex Italian dataset consist of 409,127 tokens. rization frames). Table 4 summarizes the number Following Perek (2016), the semantic density of of verbs considered for each of the eight Cxns.5 a Cxn is computed as the mean value of pairwise Then, we built a Distributional Semantic Model cosines between the verbs occurring in Cxn. Fig- (DSM) from the italian corpus itWac (Baroni et ure 4 plots the semantic densities of our Cxns. al., 2009) in order to represent verb meaning of the verbs obtained with UDLex. The 300-dimensional vectors (i.e., the embeddings) were created with the SGNS algorithm (Mikolov et al., 2013), using the most frequent 30,000 words as context, with a minimum frequency of 100. type freq token freq Cxn (different verbs) (number of items) CM 103 1538 CO 5 43 DT 90 1659 Figure 4: Construction semantic density. IM 51 1097 PASS 8 49 Finally, to assess the effect of distributional PRED 19 359 properties on Cxns flexibility, we used semantic VD_E 12 116 density, type frequency and token frequency (cf. VD_I 15 199 Table 4) as predictors in linear mixed effect mod- elling. As dependent variable, we used the differ- Table 4: Number of selected verbs per Cxn. ence gramm − coer and coer − imp. We per- formed two separate analyses for type and token Following Lebani and Lenci (2017), we repre- frequencies without interactions to avoid multi- sented each Cxn as the weighted centroid vector collinearity effects. Predictors values were cen- of its typical verbs, as follows: tered. −−−→ 1 X CXN = v ∈ V f rel(v, Cxn) · ~v (1) |V | RESULTS: The estimates are reported in Tables 5 and 6 below. In the first two models frequency where V the set of the top-associated verbs v with does not yield any effect. In the second models, Cxn and f rel(v, Cxn) is the co-occurrence fre- instead, frequency appears to have an effect on the quency of a verb in a Cxn. data. Hence, it appears that type and token fre- We measured the pairwise cosine similarity quency help discerning impossible from coercion among the weighted Cxn vectors: as shown in Fig- instances of a Cxn, whereas only semantic den- ure 3, the distributional behaviour of the Cxn vec- sity affects the higher naturalness of coercion phe- tors suggests that some Cxns in our dataset show nomena. The more a Cxn is observed with se- similar distributional behaviour. mantically similar verbs (i.e., verbs that belong to the same classes or subclasses, which there- fore increase the Cxn semantic density), the more the constructional meaning is easily coerced into novel instances. 4 Discussion These findings support our claim that coercion ef- fects are resolved by a dynamic interrelation be- tween verb and Cxn (Kemmer, 2008; Kemmer and Yoon, 2013). Even though frequency ef- Figure 3: Construction semantic similarity. fects are shown to affect Cxns extensibility to new 5 the Cxn CMvia was excluded due to the absence of cor- items (Bybee, 2006), our results suggest that type responding subcategorization frames and token frequency only facilitate the distinc- (Gramm - coer) ∼sem. dens + type freq. low values of CO Cxns (see Figure 5). estimate st. error t value p value (Intercept) 2.71*** 0.11 25.02 <0.0001 Sem. density -0.34. 0.16 -2.217 0.007 Type freq. -0.13 0.16 -0.848 0.44 (Gramm - coer) ∼sem. dens + tok freq. estimate st. error t value p value (Intercept) 2.71*** 0.11 25.02 <0.0001 Sem. density -0.35. 0.16 -2.23 <0.1 Token freq. -0.13 0.16 -0.89 0.42 Table 5: Fixed effects table for the first two mod- els. (Coer - imp) ∼sem. dens + type freq. Figure 5: relation semantic density- estimates estimate st. error t value p value (Intercept) 1.69*** 0.15 10.87 <0.0001 Sem. density 0.86* 0.22 3.38 <0.01 All things considered, semantic properties Type freq. 0.47. 0.22 2.1 <0.1 (modelled with distributional vectors) of Cxns (Coer – imp) ∼sem. dens. + tok. freq. (e.g., its density) are only one of the factors influ- estimate st. error t value p value encing speakers processing and recognition of co- (Intercept) 1.69*** 0.14 33.33 <0.0001 Sem. density 0.91* 0.2 4.59 <0.001 ercion effects. In fact, it has been argued that Ro- Token freq. 0.54* 0.2 2.71 <0.01 mance languages are more valency driven than En- glish (and Germanic languages in general) (Perek Table 6: Fixed effects table for the second two and Hilpert, 2014). The results of both exper- models. iments provide substantial evidence for an inte- grated account of Italian coercion effects, which should consider not only the properties of the gen- tion between semantically incompatible and par- eral abstract Cxn, but rather the interaction of the tially compatible formulations, whereas higher co- mismatching verb with Cxn meaning. ercibility is only affected by semantic density. These result also have interesting implications We interpret this finding in light of the upward to understand the cognitive mechanisms underly- strengthening hypothesis (Hilpert, 2015), accord- ing Cxn flexibility and productivity. In fact, these ing to which a novel occurrence of a linguistic unit findings support the idea that Cxn meaning is ab- strengthens a superior node (i.e., the abstract Cxn) stracted from the semantics of prototypically oc- only if the former is categorized ‘as an instance curring verbs. As we saw, several studies have of a more abstract Cxn. If this categorization is argued in favour of this hypothesis for English, not performed, or only superficially so, no up- but the fact that we were able to adapt it to Italian ward strengthening will take place’ (Hilpert, 2015, suggests that the factors driving the acquisition of p.38). Higher coercibility is hence not affected by Cxns are - at least partially - not language-specific frequency of the Cxn because of the ‘intermedi- but rather general cognitive processes. ate’ grammaticality level of coercion, which does not allow unambiguous categorization. Coercion sentences result more natural if the target Cxn is Acknowledgments: observed with verbs belonging to similar seman- tic classes or subclasses, which increases Cxn se- The authors thank Lucia Passaro and Florent mantic density. We could therefore assume that Perek for their help and valuable suggestions. coercion effects in Italian elicit a partial catego- rization. The effect of semantic density, however, only explains part of the data. In fact, visual in- spection of the relation between semantic density and the estimates of table 3 reveals that this effect does not explain the high coercibility of IM, or the References Suzanne Kemmer and Soyeon Yoon. 2013. Rethink- ing coercion as a cognitive phenomenon: Data from Libby Barak and Adele E. Goldberg. 2017. Modeling processing, frequency, and acceptability judgments. the Partial Productivity of Constructions. Suzanne Kemmer. 2008. September. new dimensions Jóhanna Barðdal. 2008. Productivity: Evidence from of dimensions: Frequency, productivity, domains Case and Argument Structure in Icelandic. 12. and coercion. In meeting of Cognitive Linguis- tics Between Universality and Variation, Dubrovnik, Jóhanna Barddal. 2006. Predicting the Productivity of Croatia. Argument Structure Constructions. Annual Meeting of the Berkeley Linguistics Society, 32(1):467, Octo- Alexandra Kuznetsova, Per B. Brockhoff, and Rune ber. H. B. Christensen. 2017. lmerTest package: Tests in linear mixed effects models. Journal of Statistical Marco Baroni, Silvia Bernardini, Adriano Ferraresi, Software, 82(13):1–26. and Eros Zanchetta. 2009. The WaCky wide web: a collection of very large linguistically pro- W. Labov. 2001. Principles of Linguistic Change, So- cessed web-crawled corpora. Language resources cial Factors. Principles of Linguistic Change. Wi- and evaluation, 43(3):209–226. ley. Kamil Bartoń, 2013. MuMIn: multi-model inference, Peter Lauwers and Dominique Willems. 2011. Coer- R package version 1.9.13. CRAN http://CRAN.R- cion: Definition and challenges, current approaches, project.org/package=MuMIn. and new trends. Linguistics, 49(6), January. Hans Christian Boas and Francisco Gonzálvez-García, Gianluca E. Lebani and Alessandro Lenci. 2017. Mod- editors. 2014. Romance perspectives on construc- elling the Meaning of Argument Constructions with tion grammar. Number volume 15 in Constructional Distributional Semantics. In Proceedings of the approaches to language. John Benjamins Publishing AAAI 2017 Spring Symposium on Computational Company, Amsterdam ; Philadelphia. Construction Grammar and Natural Language Un- derstanding, pages 197–204. Hans C. Boas. 2011. Coercion and leaking argument structures in Construction Grammar. Linguistics, Alessandro Lenci, Gabriella Lapesa, and Giulia Bo- 49(6), January. nansinga. 2012. Lexit: A computational resource on italian argument structure. In LREC. J. Bybee. 2006. Frequency of Use and the Organiza- tion of Language. Oxford University Press. Alessandro Lenci. 2018. Distributional Models of Word Meaning. Annual Review of Linguistics, Joan L Bybee. 2013. Usage-based theory and exem- 4:151–171. plar representations of constructions. In The Oxford handbook of construction grammar. Laura A. Michaelis. 2004. Type shifting in construc- tion grammar: An integrated approach to aspectual Devin Casenhiser and Adele E Goldberg. 2005. Fast coercion. Cognitive Linguistics, 15(1):1–67, Jan- mapping between a phrasal form and meaning. De- uary. velopmental Science, 8(6):500–508. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Cor- Michela Cennamo and Claudia Fabrizio, 2013. Italian rado, and Jeff Dean. 2013. Distributed representa- Valency Patterns. Max Planck Institute for Evolu- tions of words and phrases and their compositional- tionary Anthropology, Leipzig. ity. In Advances in neural information processing systems, pages 3111–3119. Penelope Eckert. 2017. Age as a Sociolinguistic Vari- able. In The Handbook of Sociolinguistics, pages Florent Perek and Adele E. Goldberg. 2017. Linguis- 151–167. Wiley-Blackwell. tic generalization on the basis of function and con- straints on the basis of statistical preemption. Cog- Adele E. Goldberg. 1999. The emergence of the se- nition, 168:276–293. mantics of argument structure constructions. In The emergence of language, pages 215–230. Psychology Florent Perek and Martin Hilpert. 2014. Construc- Press. tional tolerance: Cross-linguistic differences in the acceptability of non-conventional uses of construc- Adele E. Goldberg. 2006. Constructions at work: tions. Constructions and Frames, 6(2):266–304. the nature of generalization in language. Oxford linguistics. Oxford University Press, Oxford ; New Florent Perek. 2016. Using distributional semantics York. to study syntactic productivity in diachrony: A case study. Linguistics, 54(1):149–188. Martin Hilpert. 2015. From hand-carved to computer- based: Noun-participle compounding and the up- Giulia Rambelli, Alessandro Lenci, and Thierry ward strengthening hypothesis. Cognitive Linguis- Poibeau. 2017. UDLex: Towards Cross-language tics, 26(1), January. Subcategorization Lexicons. In Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy, pages 207–217. Linköping University Electronic Press. Henrik Singmann, Ben Bolker, Jake Westfall, and Fred- erik Aust, 2016. afex: Analysis of Factorial Experi- ments. R package version 0.16-1. Suzanne Evans Wagner. 2012. Age Grading in So- ciolinguistic Theory: Age Grading in Sociolinguis- tic Theory. Language and Linguistics Compass, 6(6):371–382, June. Soyeon Yoon. 2016. Gradable nature of semantic com- patibility and coercion: A usage-based approach. Linguistic Research, 33(1):95–134, March. Arne Zeschel. 2012. Incipient productivity: a construction-based approach to linguistic creativ- ity. Number 49 in Cognitive linguistics research. De Gruyter Mouton, Berlin ; Boston.