=Paper=
{{Paper
|id=Vol-2481/paper72
|storemode=property
|title=Reflexives, Impersonals and Their Kin: a Classification Problem
|pdfUrl=https://ceur-ws.org/Vol-2481/paper72.pdf
|volume=Vol-2481
|authors=Kledia Topciu,Cristiano Chesi
|dblpUrl=https://dblp.org/rec/conf/clic-it/TopciuC19
}}
==Reflexives, Impersonals and Their Kin: a Classification Problem==
Reflexives, Impersonals and Their Kin: a Classification Problem Kledia Topciu Cristiano Chesi Università degli Studi Di Siena NETS - IUSS Via Roma 56 P.zza Vittoria 15 I-53100 Siena (Italy) I-27100 Pavia (Italy) kledia.topciu@student.unisi.it cristiano.chesi@iusspavia.it d. proi Sii/*j tolse la giaccai. proi SIi/*j took3-SG-PAST off the jacket Abstract ‘S/He took off the jacket.’ e. Il compagnoj di Adai si*i/j presentò. Despite the fact that true reflexives always The friendj of A.i SI*i/j introduced3-SG-PAST require a local antecedent, attempting an ‘A.’s friend introduced her/him-self.’ automatic referential resolution is often far f. Riconosciuto il compagnoj di Adai, from trivial: in many languages, reflexives prok si*i/*j/k presentò. are morphologically indistinguishable from Recognized3-SG-P.PART the friendj of A.i, impersonals and both particles are sensitive prok SI*i/*j/k introduced3-SG-PAST. to the syntactic structure in a non-trivial ‘Once s/he recognized A.’s friend, sense. Focusing on Italian, we annotated s/he introduced her/him-self.’ part of the Repubblica Corpus to attempt an g. Sigeneric pensa sempre a salvarsi la pelle. automatic classification of the reflexive SIgeneric thinks always to saveINF-REFLthe skin and impersonal si constructions. In this ‘We always think about saving our own skin.’ preliminary study we show that the Expecting the co-referential DP to be always accuracy of the automatic classification “immediately to the left” of the reflexive form methods that do not use any relevant quickly leads to wrong predictions: if this structural information are rather modest. A generalization might seem sufficient in (1a) this is thoughtful discussion of the structural bluntly wrong in (1b), where we need to assume an analysis required to distinguish among empty referent (pro, Rizzi 1986) before the different contexts is provided, in the end reflexive (see §1.1). Moreover, we should accept suggesting that these structural that the coreferential DP can be placed sometimes configurations are not easily recoverable to the right of the predicate (structurally speaking, using a purely distributional approach. pro and post-verbal subject options are related, Belletti 2002); in this case, the 1. Introduction (focalized/dislocated) post-verbal subject is a good The non-triviality of reflexive/impersonal candidate, (1b). Being “the closest DP” is however constructions in Italian is exemplified in (1): not a sufficient condition as suggested by the examples (1c-d). Hence, the null subject hypothesis (1) a. Adai sii presentò. as well as a structural analysis unravelling the role A.i SIi introduced3-SG-PAST of each DP surrounding the predicate is requested, ‘A. introduced herself.’ for the identification of the correct local binding b. Sii/*j presentò Adai. domain (1e-f). Last but not least, a proper SIi/*j introduced3-SG-PAST A.i classification of the predicate admitting a reflexive ‘A. introduced herself.’ or an impersonal pronoun is needed (1g). Under c. Si*i/j presentò ad Adai. this perspective, we decided to run a little SI*i/j introduced3-SG-PAST to A.i experiment to verify the consistency of a “usage- ‘S/He introduced him/herself to A.’ based” approach (Tomasello 2003) in this specific context and consider whether the “structural Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). analysis” (Chomsky 1995; 2008) can be proved to b. *L'uomo lavatosi ieri è mio nonno. be an outdated approach for the classification of the the man washed-him/herself yesterday is distinct kinds of si. In the remaining part of this my grandfather introduction we will present the (possibly outdated) A robust evidence supports the idea that the subject structural analyses proposed for reflexive (§1.1) of reflexive verbs patterns with the subject of and impersonal (§1.2) clitic si. We will then present unergatives, hence confirming its external our experiment consisting of the annotation of a argument nature (but see Pescarini 2015:42ff). small fragment of the Repubblica Corpus (Baroni Kayne (1975) observes that reflexives occur in et al. 2004) that we used to train and test a set of environments where transitive verbs are Machine Learning classification algorithms (§2). disallowed, e.g. in French causative constructions: Results presentation (§3) and their discussion (§4) when the verb embedded under the causative verb will follow. faire ‘make’ is a transitive verb (4a), its subject 1.1 The reflexivization configuration must be introduced by the preposition a ‘to’; when the lower verb is intransitive or reflexive, its A popular structural analysis of reflexives is the subject cannot be introduced by a (4b/c). unaccusative one: under this perspective, the subject of reflexives is an underlying object (just (4) a. Je ferai laver Jean *(a) Luc. like the subject of unaccusatives) which has to raise Io makeFUT wash Jean to Luc. to the subject position for Case reasons (reflexive ‘I will make Jean wash Luc’. morphology absorbs its Case). Two main variants b. Je ferai courir (*a) Jean. of this approach are discussed in the literature: a I makeFUT Jean run. lexical and a syntactic one. The lexical version ‘I will make Jean run.’ predicts that the external argument is absorbed in c. Je ferai se laver (*a) Jean. the lexicon (Marantz 1984 and Grimshaw 1990), I makeFUT SE wash Jean. while the syntactic one proposes that the external ‘I will make Jean wash himself.’ argument is present in syntax via the reflexive clitic When the lower verb is reflexive, its subject se (Kayne 1988, Pesetsky 1995, Sportiche 1998). appears without the preposition, exactly like the A different analysis is proposed by Reinhart & subject of unergative verbs. Therefore, reflexive Siloni (1999, 2005): reflexives should be verbs are not transitive entries either. unergative entries since unaccusativity tests (e.g. Reinhart & Siloni (2005) suggest that these ne cliticization, (2b)) fail with reflexive reflexive constructions are unergative entries constructions: derived from their transitive alternate by a (2) a. Ne sono arrivati tre. reduction operation targeting the internal argument of+themcl are arrived three (identified with the external one). They take verbal ‘Three of them arrived.’ reflexivization even further and propose a lexicon- b. *Se ne sono vestiti tre. syntax parameter: arity operations (on θ-roles) can SI of+themcl are dressed three apply either to the syntax or to the lexicon. ‘Three of them got dressed.’ Reflexivization is essentially the same phenomenon cross-linguistically, that is, two Since the internal argument only can be cliticized available θ-roles are assigned to the same syntactic and the reflexive verb fails the ne test, we conclude argument, or, better said, the operation of that the subject of the reflexives is an external reflexivization takes two θ-roles and forms one argument, unlike the subject of unaccusatives. complex θ-role. Another test helping us to tease apart external from The distinctions follow from two different internal argument structures is reduced relatives modes of operation: a lexical mode and a syntactic modification: when the modification is one. Languages such as Hebrew, English, Russian implemented via past participle, this does not allow and Dutch have the parameter set to “lexicon”, for predicates with an external argument. The while in Romance languages, Greek and German reduced relative in (3a) contains a reflexive the “syntax” value of the parameter is set. In the predicate, while the one in (3b) is an impossible syntactic option (which is relevant here), what is to cliticization of a transitive reflexive past participle. become a reflexive verb leaves the lexicon with the (3) a. Il bicchiere rottosi ieri apparteneva a mio same number of θ-roles, which need to be assigned, nonno. as the basic verbal entry. Since the clitic itself the glass broken-him/herself yesterday cannot be viewed as an argument (the lack of Case belonged to my grandfather blocks its merge), the “extra” θ-role has to be explained by an arity reduction operation. In conclusion, an automatic classification Aux-to-Comp (6) and Raising structures (7) with algorithm, attempting at identifying the typology of transitive and unergative verbs. the si reflexive pronoun, should necessarily have (6) Non essendosi ancora scoperto il colpevole… access to the subcategorization verbal frame and not beingGERUND-SI yet discoveredP-PART-SG-MASC postulate an arity-reduction as suggested by the culpritSG-MASC (Reinhart & Siloni 2005). If this information is not ‘Not having yet discovered the culprit...’ available as lexical resource, we might try to rely on structural cues to infer the correct argument (7) Sembra non essersi ancora scoperto il structure (as in Merlo & Stevenson 2001, Basili et colpevole … al 1997 or Ienco et al. 2008). On the other hand, if seems3RD-SG not being-SI yet statistical cues would be available, annotating them discovered P-PART-SG-MASC the culpritSG-MASC overtly would be unnecessary. ‘It seems it hasn’t yet been discovered A further complication, however, is associated the culprit.’ to the existence of a class of “reflexive” predicates Cinque considers these instances of si as (e.g. alzarsi, ‘to stand up’) which are bona fide argumental ones (+arg), which can be present in unaccusatives (inherent/lexical si constructions general only with verbs that project an external θ- Pescarini 2015). In this case, the overlapping role. The other si is a non-argumental one (-arg), between the bare verbal root and a transitive form which can be present with any verb class of some inherent si predicates does not help in (therefore, also with verbs that do not assign an automatic classification task (e.g. in “si lava la external θ-role). mano”, he/she wash his/her hand, due to the Dobrovie-Sorin (1998, 1999) argues that it is transitive nature of lavare/to wash, the post-verbal not necessary to postulate this: according to her, DP “la mano” could be analyzed both as direct what Cinque calls a +arg si is actually a middle object or post-verbal subject). passive Accusative si. The only Nominative si is 1.2 Impersonal si constructions Cinque’s -arg si. She argues that si is not licensed in non-finite clause because it is a Nominative clitic The reflexive reading is not the only available and, in Italian, Nominative clitics are not allowed option when the si pronoun is present: an in non-finite clauses. Only transitive and impersonal reading is also possible. Impersonal si unergative Aux-to-Comp and Raising structures constructions are used to introduce a generic, allow si as Accusative. Dobrovie-Sorin tries to unspecified subject and to make general statements unify all the uses of SE in Romance languages and about groups of people (Cinque 1988, Dobrovie- assumes that si is not a special lexical item that Sorin, C. 1998, 1999 a.o.). In Italian, si absorbs a θ-role or Case. Her analysis accounts for constructions are exemplified in (5a). The subject special cases, such as Romanian, which has si is unspecified and the sentence has a generic constructions but doesn’t have Nominative clitics. reading because of si, otherwise its absence would Italian si constructions, on the other hand, rely result in a sentence with a specific subject (5b) either on Nominative (8) or Accusative (which also being Italian a pro-drop language (Rizzi 1986). includes reflexive configurations) (9). (5) a. In Italia si mangia troppo. (8) Non sii ei è mai contenti. In Italy si eats3rdSG too much not SI is3RD-SG ever satisfied ‘In Italy, people eat too much.’ 'One is never satisfied.' b. In Italia pro mangia troppo. In Italia pro reads3rd-SG a lot (9) Il grecoi sii traduce ei facilmente. ‘In Italia he/she reads a lot’ the Greek SI translates3RD-SG easily ‘Greek translates easily.’ Notice that the adverbial modal modification “troppo” is coherent with the generic reading, In (8), si is an anaphor and if we assume a restricted while a punctual temporal adverbial modification theory of binding, the anaphoric status of the clitic would result inconsistent (“#In Italia si mangia is transferred to its trace. The indexing domani” vs. “In Italia si mangia sempre”). configuration corresponds to a single argument, the As for the argumental status of si, there is a Theme. On the other hand, the si in (9) is not an large disagreement in the linguistic community: anaphor and therefore imposes no relation between Cinque (1988) proposes the existence of two the subject and object positions; it binds an empty different si items: the presence of si is usually category in the subject A-position. restricted to finite clauses, however, it is also A rephrase of Dobrovie-Sorin’s proposal is permitted in certain untensed clauses, namely in formulated by Salvi (2018), who argues that in modern Italian there are two reflexive si 2. Materials and methods constructions: a passive one and an impersonal one (the reader should refer to Pescarini 2015 for a From Repubblica Corpus (Baroni et al 2004), we more detailed discussion of a richer classification). extracted all contexts in which the “si” lemma was The first one, exemplified in (10b), is characterized present: 2.737.558 contexts are returned by the by the cancelation of the subject (10a) and the simple query including a left and right context of transformation of the direct object into the maximum 8 words around the si + predicate grammatical subject (triggering agreement); the cluster; each left and right context was cut at full derived grammatical subject can occur also in the stops, colons, semi colons, exclamative and canonical preverbal position (10c): question marks, whenever those were found within the 8 tokens context. The tagset used in the (10) a. Il preside ha consegnato i diplomi. Repubblica Corpus neither distinguishes among The dean has awarded the diplomas reflexive and various types of impersonal forms b. Si sono consegnati i diplomi. (“CLI/si” is the generic tag used) nor among SIgeneric are awarded the diplomas different verbal classes with respect to their ‘Diplomas got awarded’ argumental structure (only VB for “be”, VH for c. I diplomi si consegnano (agli studenti). “have”, and VV for other verbs are included). We the diplomas SIgeneric awarded then decided to annotate manually the first 2.000 (to the students) contexts returned by our query (0,07% of the total) ‘Diplomas are getting awarded using the following scheme much simplified with (to the students)’ respect to the structural asymmetries revealed by This construction is only possible with the discussion in §1: I (impersonal), L (local, DP (di)transitive predicates, since the promotion of the immediately preceding “si” is the correct one), PV object to the grammatical subject role is only (post-verbal: the first DP after the predicate available when a direct object is available. following “si” is the correct co-referent) and LM On the other hand, the impersonal version of si (the DP immediately preceding, in the hierarchical does not induce the promotion of the internal sense, the reflexive “si” is the correct one, but such argument to the grammatical subject role and in DP is “modified” by a PP or a relative clause) and fact this construction is available without any A (the referent is not present/retrievable in the verbal class restriction: extracted context; these are in the great majority pro-drop cases, in just two cases the referent was (11) a. Si guarda la partita lexically realized outside the context isolated). SIgeneric watches the game Both authors annotated independently the corpus ‘We watch the game’ and discussed about the disagreement cases (less b. Si dorme than 1% of the sample) in order to find an SIgeneric sleeps agreement in the annotation. Table 1 indicates the ‘We sleep’ distribution of the classes across the annotated c. Si cade corpus fragment, while Table 2 exemplifies the SIgeneric falls classification. Due to the simplicity of this ‘We fall’ classification (that essentially focus on the In sum, with the impersonal si construction, the identification of the reflexive antecedent, if subcategorization verbal frame (i.e. the verbal present/necessary), we would expect a better argumental structure) could help in isolating the performance compared to any richer classification, passive si construction, but not the impersonal one. which is apparently necessary according to the As for reflexive si, the full argument structure must structural analysis previously discussed. be identified and then either the passive strategy annotation # of contexts % (deletion and promotion) or the impersonal one (simple deletion) considered. As a consequence of I 332 16.6 the null subject option in Italian, the difference L 994 49.7 between impersonal and passive si is often blurred. LM 417 20.8 PV 183 9.15 A 74 3.7 Table 1. Distribution of the annotated categories across the sample. neural networks using Weka wrappers for annotation example Deeplearning4j 1.5.13 (srnn.net in table 4) for a I si è deciso di ridurre il deficit total of 5 classifiers. We run our experiments we decided to reduce the deficit within Weka 3.8.3 environment with CUDA 10.1 L [i fedeli]i sii sono tuttavia sciolti GPU nVIDIA support. Word embeddings are built the faithfulls, nevertheless, split up using a larger fragment of left and right contexts LM [il vertice di Dublino]i sii è dimostrato (+/-10 words at most, breaking the left/right the Dublin summit proved to be … context at full stops) extracted from Repubblica PV nel cortile sii stendono [le stuoie]i corpus including the “si” seed (first 1.000.000 in the courtyard the mats unfolded sentences returned using the publicly available A per 16 anni sii è occupato dei processi for 16 years [he] took care of the trials Sketch Engine search interface). Table 2. Sample annotation using 5 categories. 3. Results 2.1 Classifiers descriptions The results of the classification tests are reported in Under the “usage-based” approach the table 4. The accuracy indicates the rate of correct disambiguation (i.e. the interpretation of the correct classifications and the standard deviation running referent, if necessary) of the distinct si 10 experiments with cross-fold validation constructions should be possible on the basis of the (standard deviation is indicated) and the purely statistical distribution of the (implicit) significance is expressed with respect to the features across the corpus (Tomasello 2003 and baseline: indicates that the accuracy is related works). To test this hypothesis we created a significantly better than baseline, significantly set of classifiers using the Weka environment worse and no sign means no significant difference (Frank et al 2016). 4 different classifiers are used (pair-wise comparison using corrected resampled including the original extracted context of T-Test, Witten & Frank 2005). maximum 8 words before and after the clitic si + predicate cluster (Table 3): pure Bag-of-Words Class. ID Algorithm Accuracy (SD) Sign. (BoW) approach was used for the first two baseline 49.70% classifiers, one with only the left context included, n.bayes 56.95% (2.79) the other with both left and right context; then we manipulated the left context classifier substituting n.bayes.mul. 54.28% (2.03) the words with their POS (classifier C3-POS-L) and C1-BOW-L J48 58.34% (2.48) with a more coarse set of POS tags (C4-CPOS-L). conv.net 51.88% (1.44) POS and CPOS annotation are obtained using a srnn.net 39.63% (11.79) free online tool (ItaliaNLP REST API, Cimino & Dell’Orletta 2016). n.bayes 49.21% (3.40) n.bayes.mul. 51.61% (1.17) Class. ID Approach Context C2-BOW-LR J48 48.66% (2.53) C1-BOW-L Left context conv.net 49.77% (0.41) BoW srnn.net 39.05% (12.77) C2-BOW-LR Left & Right context C3-POS-L POS Left context n.bayes 54.49% (2.35) C4-CPOS-L CPOS Left context n.bayes.mul. 53.26% (1.99) Table 3. Classifier description C3-POS-L J48 60.76% (2.97) conv.net 57.58% (1.98) 2.2 Classification algorithms srnn.net 43.52% (7.17) Given the baseline classification of 49.7% of n.bayes 59.96% (2.85) accuracy, obtained by choosing always the reflexive local class (L classification), we n.bayes.mul. 50.89% (1.03) compared Naïve Bayesian algorithms (i.e. C4-CPOS-L J48 61.49% (3.08) NaïveBayes, n.bayes in table 4, and conv.net 49.70% (0.25) NaïveBayesMultimodal, n.bayes.mul. in table 4) with a decision tree-based algorithm (i.e. J48) and srnn.net 44.20% (6.17) then with both 3 layers convoluted (with LSTM Table 4. Classification accuracy results layer; conv.net in table 4) and simple recurrent In both left and left-right context classifiers, BoW Italian native speaker owns and that enable her/him approach (C1-BOW-L and C2-BOW-LR) is clearly to identify correctly the relevant referent both pre- not sufficient to solve the classification problem; and post-verbally, even in the case of complex the introduction of a right context (C2-BOW-LR) subjects (referent DPs modified by prepositional significantly reduces the performance of the phrases or relative clauses), as well as its classifier. Notice that in almost 10% of the cases unnecessity (in generic/impersonal readings) or its the availability of the referent is post-verbal (PV recovery in case of pro-drop. We might expect then classification). Decision trees (J48), overall, that a richer syntactic annotation could help to perform better (M=58.34% SD=2.48) but this boost the automatic classification results in performance represents a significant improvement accordance with the structural analysis only with C1-BOW-L and C4-CPOS-L classifiers. summarized in §1.1 and §1.2: first, a verbal None of the deep learning approaches (conv.net subcategorization specification properly describing and srnn.net) are significantly better than decision the predicate argument structure could be useful, trees (in some cases SRNs perform significantly then a correct analysis of the subject phrase worse). The best absolute performance in obtained structure, including agreement cues should be used, substituting words with coarse POS (C4-CPOS-L). as well as a richer classification of temporal/modal In this case J48 obtains the best accuracy adverbials/modifiers. (M=61.49% SD=3.08). As suggested by an anonymous reviewer, information structure, which is largely obliterated 4. Discussion in written texts, is expected to disambiguate In this paper, we discussed the nature of some si between reflexive and impersonal constructions: constructions in Italian, suggesting that, despite for instance, non-dislocated preverbal subjects their apparent simplicity, their structural intricacies (L(M) in our classification) should be ruled out in require a deep syntactic analysis for identifying impersonal constructions (see Raposo & correctly the typology of the clitic in various Uriagereka 1996); moreover, non-focalized (or contexts and retrieve, when necessary, a proper right-dislocated) postverbal subjects (PV in our referent. Also using a simplified set of five classes classification) should be ruled out in reflexive (I = impersonal; L = local immediately preceding constructions. Then, despite the fact that coreferential DP; PV = local, immediately post- prosody/information structure cannot be assessed verbal coreferential DP; LM = local preceding within a corpus-based study, we might expect an coreferential DP but with prepositional phrase or improvement of the classifiers performance relative clause modification; A = absent referent), considering some relevant features associated to we demonstrated that, using an annotated sample these configurations: e.g. post-verbal subject of the Repubblica corpus, no classifier has annotation in connection with the verbal class and exceeded the performance of 61.49% of accuracy. adverbials placement between the subject and verb This is well below any human reasonable indicating a dislocated subject. performance (as suggested by the 99% agreement A follow up of this study should test these in classification between annotators). These predictions and, possibly, extend the study to the results, even though still based on a small fragment whole Repubblica corpus, confirming (or of the Repubblica Corpus, extend Chesi & Moro disconfirming) our preliminary results that suggest (2018) original considerations using a wider we cannot avoid a deep structural analysis of these dataset and more advanced ML algorithms. constructions to classify (and interpret) them These results showed that neither the algorithms correctly. used nor the extension of the context (both left and right) helped in classifying correctly the instances References of “si” when the referent had to be retrieved non- locally or in impersonal “si” cases. Replacing the Baroni, Marco, Silvia Bernardini, Federica words with their POS mildly helped in improving Comastri, Lorenzo Piccioni, Alessandra Volpi, the performance of some classifiers (especially Guy Aston, and Marco Mazzoleni. 2004. using the coarse tagset), with decision tree Introducing the La Repubblica Corpus: A classifier (J48) obtaining the best performance (on Large, Annotated, TEI (XML)-compliant average) across the tests. Corpus of Newspaper Italian. In Proceedings of Given the poor performance of the classifiers the Fourth International Conference on tested, we concluded that the “usage-based” Language Resources and Evaluation (LREC intuition is not sufficient here to account for the 2004). acquisition of the discriminative capabilities any Basili, Roberto, Maria Teresa Pazienza, and Pescarini, Diego. 2015. Le costruzioni con si. Michele Vindigni. 1997. Corpus-driven Italiano, dialetti, lingue romanze. Roma: unsupervised learning of verb subcategorization Carocci. frames. Congress of the Italian Association for Pesetsky, David. 1995. Zero Syntax. MIT Press, Artificial Intelligence. Springer, Berlin, Cambridge, MA Heidelberg. Raposo, Eduardo & Juan Uriagereka. 1996. Belletti, Adriana. 2002. Aspects of the low IP area. Indefinite SE. Natural Language and Linguistic Forthcoming in The structure of IP and CP. The Theory 14: 749—810. Cartography of Syntactic Structures, vol. 2, L. Reinhart, Tania, & Siloni, Tal. 2005. The lexicon- Rizzi (ed.). New York: Oxford University syntax parameter: Reflexivization and other Press. arity operations. Linguistic inquiry, 36(3), 389- Burzio, Luigi 1992. On the morphology of 436. reflexives and impersonals. Theoretical Rizzi, Luigi. (1986). Null objects in Italian and the analyses in Romance linguistics. Amsterdam: theory of 'pro'. Linguistic inquiry, 17(3), 501- Benjamins, 399-414. 558. Chesi, Cristiano, & Moro, Andrea 2018. Il divario Salvi, Giampaolo 2018. La formazione della (apparente) tra gerarchia e tempo. Sistemi costruzione impersonale in italiano. intelligenti, 30(1), 11-32. Linguística: Revista de Estudos Linguísticos da Chomsky, Noam 1995. The minimalist program. Universidade do Porto, 3, 13-37. Cambridge, MA: MIT press. Sportiche, Dominique. 1998. Partitions and atoms Cimino, Andrea, Dell’Orletta, Felice. 2016. of clause structure: Subjects, agreement, Case “Building the state-of-the-art in POS tagging of and clitics. New York: Routledge. Italian Tweets”. In Proceedings of EVALITA Tomasello, Michael. 2003. Constructing a ’16, Evaluation of NLP and Speech Tools for language: A usage-based theory of language Italian, 7 December, Napoli, Italy. acquisition. Cambridge, MA: Harvard Cinque, Guglielmo 1988. On si constructions and University press. the theory of arb. Linguistic inquiry, 19(4), 521- Witten, Ian, H. and Eibe Frank 2005. Data Mining: 581. Practical machine learning tools and Dillon, Brian, Alan Mishler, Shayne Sloggett, and techniques. 2nd edition Morgan Kaufmann, San Colin Phillips. 2013. Contrasting intrusion Francisco. profiles for agreement and anaphora: experimental and modeling evidence. J. Mem. Lang. 69, 85–103. Dobrovie-Sorin, Carmen. 1998. Impersonal se constructions in Romance and the passivization of unergatives. Linguistic Inquiry, 29(3), 399- 437. Frank, Eibe, Mark A. Hall, and Ian H. Witten. 2016. The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition, 2016. Grimshaw, Jane. 1990. Argument Structure. MIT Press, Cambridge, MA. Ienco, Dino, Serena Villata, and Cristina Bosco. 2008. Automatic extraction of sub- categorization frames for Italian. In LREC08, pp. 2094-2100. European Language Resources Association (ELRA) Marantz, Alec. 1984. On the Nature of Grammatical Relations. MIT Press, Cambridge. Merlo, Paola and Stevenson, S Suzanne, 2001. Automatic verb classification based on statistical distributions of argument structure. Computational Linguistics, 27(3), pp.373-408.