=Paper= {{Paper |id=Vol-2481/paper72 |storemode=property |title=Reflexives, Impersonals and Their Kin: a Classification Problem |pdfUrl=https://ceur-ws.org/Vol-2481/paper72.pdf |volume=Vol-2481 |authors=Kledia Topciu,Cristiano Chesi |dblpUrl=https://dblp.org/rec/conf/clic-it/TopciuC19 }} ==Reflexives, Impersonals and Their Kin: a Classification Problem== https://ceur-ws.org/Vol-2481/paper72.pdf
          Reflexives, Impersonals and Their Kin: a Classification Problem

                 Kledia Topciu                                                    Cristiano Chesi
          Università degli Studi Di Siena                                           NETS - IUSS
                  Via Roma 56                                                     P.zza Vittoria 15
              I-53100 Siena (Italy)                                             I-27100 Pavia (Italy)
     kledia.topciu@student.unisi.it                                      cristiano.chesi@iusspavia.it


                                                                            d. proi Sii/*j tolse                   la giaccai.
                                                                                proi SIi/*j took3-SG-PAST off the jacket
                           Abstract                                            ‘S/He took off the jacket.’
                                                                            e. Il compagnoj di Adai si*i/j presentò.
     Despite the fact that true reflexives always                               The friendj of A.i SI*i/j introduced3-SG-PAST
     require a local antecedent, attempting an                                 ‘A.’s friend introduced her/him-self.’
     automatic referential resolution is often far                          f. Riconosciuto il compagnoj di Adai,
     from trivial: in many languages, reflexives                                                    prok si*i/*j/k presentò.
     are morphologically indistinguishable from                                 Recognized3-SG-P.PART the friendj of A.i,
     impersonals and both particles are sensitive                                               prok SI*i/*j/k introduced3-SG-PAST.
     to the syntactic structure in a non-trivial                               ‘Once s/he recognized A.’s friend,
     sense. Focusing on Italian, we annotated                                                    s/he introduced her/him-self.’
     part of the Repubblica Corpus to attempt an                            g. Sigeneric pensa sempre a salvarsi la pelle.
     automatic classification of the reflexive                                  SIgeneric thinks always to saveINF-REFLthe skin
     and impersonal si constructions. In this                                  ‘We always think about saving our own skin.’
     preliminary study we show that the                                  Expecting the co-referential DP to be always
     accuracy of the automatic classification                            “immediately to the left” of the reflexive form
     methods that do not use any relevant                                quickly leads to wrong predictions: if this
     structural information are rather modest. A                         generalization might seem sufficient in (1a) this is
     thoughtful discussion of the structural                             bluntly wrong in (1b), where we need to assume an
     analysis required to distinguish among                              empty referent (pro, Rizzi 1986) before the
     different contexts is provided, in the end                          reflexive (see §1.1). Moreover, we should accept
     suggesting       that    these     structural                       that the coreferential DP can be placed sometimes
     configurations are not easily recoverable                           to the right of the predicate (structurally speaking,
     using a purely distributional approach.                             pro and post-verbal subject options are related,
                                                                         Belletti     2002);      in     this    case,     the
1. Introduction
                                                                         (focalized/dislocated) post-verbal subject is a good
The non-triviality of reflexive/impersonal                               candidate, (1b). Being “the closest DP” is however
constructions in Italian is exemplified in (1):                          not a sufficient condition as suggested by the
                                                                         examples (1c-d). Hence, the null subject hypothesis
(1) a. Adai        sii    presentò.                                      as well as a structural analysis unravelling the role
       A.i        SIi    introduced3-SG-PAST                             of each DP surrounding the predicate is requested,
      ‘A. introduced herself.’                                           for the identification of the correct local binding
   b. Sii/*j    presentò            Adai.                                domain (1e-f). Last but not least, a proper
       SIi/*j introduced3-SG-PAST A.i                                    classification of the predicate admitting a reflexive
     ‘A. introduced herself.’                                            or an impersonal pronoun is needed (1g). Under
   c. Si*i/j presentò             ad Adai.                               this perspective, we decided to run a little
       SI*i/j introduced3-SG-PAST to A.i                                 experiment to verify the consistency of a “usage-
      ‘S/He introduced him/herself to A.’                                based” approach (Tomasello 2003) in this specific
                                                                         context and consider whether the “structural

   Copyright © 2019 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
analysis” (Chomsky 1995; 2008) can be proved to                b. *L'uomo lavatosi ieri è mio nonno.
be an outdated approach for the classification of the              the man washed-him/herself yesterday is
distinct kinds of si. In the remaining part of this                my grandfather
introduction we will present the (possibly outdated)
                                                           A robust evidence supports the idea that the subject
structural analyses proposed for reflexive (§1.1)
                                                           of reflexive verbs patterns with the subject of
and impersonal (§1.2) clitic si. We will then present
                                                           unergatives, hence confirming its external
our experiment consisting of the annotation of a
                                                           argument nature (but see Pescarini 2015:42ff).
small fragment of the Repubblica Corpus (Baroni
                                                               Kayne (1975) observes that reflexives occur in
et al. 2004) that we used to train and test a set of
                                                           environments where transitive verbs are
Machine Learning classification algorithms (§2).
                                                           disallowed, e.g. in French causative constructions:
Results presentation (§3) and their discussion (§4)
                                                           when the verb embedded under the causative verb
will follow.
                                                           faire ‘make’ is a transitive verb (4a), its subject
1.1 The reflexivization configuration                      must be introduced by the preposition a ‘to’; when
                                                           the lower verb is intransitive or reflexive, its
A popular structural analysis of reflexives is the         subject cannot be introduced by a (4b/c).
unaccusative one: under this perspective, the
subject of reflexives is an underlying object (just        (4) a. Je ferai laver Jean *(a) Luc.
like the subject of unaccusatives) which has to raise             Io makeFUT wash Jean to Luc.
to the subject position for Case reasons (reflexive              ‘I will make Jean wash Luc’.
morphology absorbs its Case). Two main variants                b. Je ferai courir (*a) Jean.
of this approach are discussed in the literature: a                I makeFUT Jean run.
lexical and a syntactic one. The lexical version                  ‘I will make Jean run.’
predicts that the external argument is absorbed in             c. Je ferai se laver (*a) Jean.
the lexicon (Marantz 1984 and Grimshaw 1990),                      I makeFUT SE wash Jean.
while the syntactic one proposes that the external                ‘I will make Jean wash himself.’
argument is present in syntax via the reflexive clitic     When the lower verb is reflexive, its subject
se (Kayne 1988, Pesetsky 1995, Sportiche 1998).            appears without the preposition, exactly like the
    A different analysis is proposed by Reinhart &         subject of unergative verbs. Therefore, reflexive
Siloni (1999, 2005): reflexives should be                  verbs are not transitive entries either.
unergative entries since unaccusativity tests (e.g.            Reinhart & Siloni (2005) suggest that these
ne cliticization, (2b)) fail with reflexive                reflexive constructions are unergative entries
constructions:                                             derived from their transitive alternate by a
(2) a. Ne sono arrivati tre.                               reduction operation targeting the internal argument
       of+themcl are arrived three                         (identified with the external one). They take verbal
      ‘Three of them arrived.’                             reflexivization even further and propose a lexicon-
    b. *Se ne sono vestiti tre.                            syntax parameter: arity operations (on θ-roles) can
        SI of+themcl are dressed three                     apply either to the syntax or to the lexicon.
       ‘Three of them got dressed.’                        Reflexivization is essentially           the same
                                                           phenomenon cross-linguistically, that is, two
Since the internal argument only can be cliticized         available θ-roles are assigned to the same syntactic
and the reflexive verb fails the ne test, we conclude      argument, or, better said, the operation of
that the subject of the reflexives is an external          reflexivization takes two θ-roles and forms one
argument, unlike the subject of unaccusatives.             complex θ-role.
Another test helping us to tease apart external from           The distinctions follow from two different
internal argument structures is reduced relatives          modes of operation: a lexical mode and a syntactic
modification: when the modification is                     one. Languages such as Hebrew, English, Russian
implemented via past participle, this does not allow       and Dutch have the parameter set to “lexicon”,
for predicates with an external argument. The              while in Romance languages, Greek and German
reduced relative in (3a) contains a reflexive              the “syntax” value of the parameter is set. In the
predicate, while the one in (3b) is an impossible          syntactic option (which is relevant here), what is to
cliticization of a transitive reflexive past participle.   become a reflexive verb leaves the lexicon with the
(3) a. Il bicchiere rottosi ieri apparteneva a mio         same number of θ-roles, which need to be assigned,
       nonno.                                              as the basic verbal entry. Since the clitic itself
       the glass broken-him/herself yesterday              cannot be viewed as an argument (the lack of Case
       belonged to my grandfather                          blocks its merge), the “extra” θ-role has to be
                                                           explained by an arity reduction operation.
    In conclusion, an automatic classification          Aux-to-Comp (6) and Raising structures (7) with
algorithm, attempting at identifying the typology of    transitive and unergative verbs.
the si reflexive pronoun, should necessarily have
                                                        (6) Non essendosi ancora scoperto il colpevole…
access to the subcategorization verbal frame and
                                                            not beingGERUND-SI yet discoveredP-PART-SG-MASC
postulate an arity-reduction as suggested by
                                                                                        the culpritSG-MASC
(Reinhart & Siloni 2005). If this information is not
                                                           ‘Not having yet discovered the culprit...’
available as lexical resource, we might try to rely
on structural cues to infer the correct argument        (7) Sembra non essersi ancora scoperto il
structure (as in Merlo & Stevenson 2001, Basili et          colpevole …
al 1997 or Ienco et al. 2008). On the other hand, if         seems3RD-SG not being-SI yet
statistical cues would be available, annotating them         discovered P-PART-SG-MASC the culpritSG-MASC
overtly would be unnecessary.                               ‘It seems it hasn’t yet been discovered
    A further complication, however, is associated                                        the culprit.’
to the existence of a class of “reflexive” predicates   Cinque considers these instances of si as
(e.g. alzarsi, ‘to stand up’) which are bona fide       argumental ones (+arg), which can be present in
unaccusatives (inherent/lexical si constructions        general only with verbs that project an external θ-
Pescarini 2015). In this case, the overlapping          role. The other si is a non-argumental one (-arg),
between the bare verbal root and a transitive form      which can be present with any verb class
of some inherent si predicates does not help in         (therefore, also with verbs that do not assign an
automatic classification task (e.g. in “si lava la      external θ-role).
mano”, he/she wash his/her hand, due to the                 Dobrovie-Sorin (1998, 1999) argues that it is
transitive nature of lavare/to wash, the post-verbal    not necessary to postulate this: according to her,
DP “la mano” could be analyzed both as direct           what Cinque calls a +arg si is actually a middle
object or post-verbal subject).                         passive Accusative si. The only Nominative si is
1.2 Impersonal si constructions                         Cinque’s -arg si. She argues that si is not licensed
                                                        in non-finite clause because it is a Nominative clitic
The reflexive reading is not the only available         and, in Italian, Nominative clitics are not allowed
option when the si pronoun is present: an               in non-finite clauses. Only transitive and
impersonal reading is also possible. Impersonal si      unergative Aux-to-Comp and Raising structures
constructions are used to introduce a generic,          allow si as Accusative. Dobrovie-Sorin tries to
unspecified subject and to make general statements      unify all the uses of SE in Romance languages and
about groups of people (Cinque 1988, Dobrovie-          assumes that si is not a special lexical item that
Sorin, C. 1998, 1999 a.o.). In Italian, si              absorbs a θ-role or Case. Her analysis accounts for
constructions are exemplified in (5a). The subject      special cases, such as Romanian, which has si
is unspecified and the sentence has a generic           constructions but doesn’t have Nominative clitics.
reading because of si, otherwise its absence would      Italian si constructions, on the other hand, rely
result in a sentence with a specific subject (5b)       either on Nominative (8) or Accusative (which also
being Italian a pro-drop language (Rizzi 1986).         includes reflexive configurations) (9).
(5) a. In Italia si mangia troppo.                      (8) Non sii ei è mai contenti.
        In Italy si eats3rdSG too much                       not SI is3RD-SG ever satisfied
       ‘In Italy, people eat too much.’                     'One is never satisfied.'
    b. In Italia pro mangia troppo.
        In Italia pro reads3rd-SG a lot                 (9) Il grecoi sii traduce ei facilmente.
        ‘In Italia he/she reads a lot’                       the Greek SI translates3RD-SG easily
                                                            ‘Greek translates easily.’
Notice that the adverbial modal modification
“troppo” is coherent with the generic reading,          In (8), si is an anaphor and if we assume a restricted
while a punctual temporal adverbial modification        theory of binding, the anaphoric status of the clitic
would result inconsistent (“#In Italia si mangia        is transferred to its trace. The indexing
domani” vs. “In Italia si mangia sempre”).              configuration corresponds to a single argument, the
    As for the argumental status of si, there is a      Theme. On the other hand, the si in (9) is not an
large disagreement in the linguistic community:         anaphor and therefore imposes no relation between
Cinque (1988) proposes the existence of two             the subject and object positions; it binds an empty
different si items: the presence of si is usually       category in the subject A-position.
restricted to finite clauses, however, it is also           A rephrase of Dobrovie-Sorin’s proposal is
permitted in certain untensed clauses, namely in        formulated by Salvi (2018), who argues that in
modern Italian there are two reflexive si               2. Materials and methods
constructions: a passive one and an impersonal one
(the reader should refer to Pescarini 2015 for a        From Repubblica Corpus (Baroni et al 2004), we
more detailed discussion of a richer classification).   extracted all contexts in which the “si” lemma was
The first one, exemplified in (10b), is characterized   present: 2.737.558 contexts are returned by the
by the cancelation of the subject (10a) and the         simple query including a left and right context of
transformation of the direct object into the            maximum 8 words around the si + predicate
grammatical subject (triggering agreement); the         cluster; each left and right context was cut at full
derived grammatical subject can occur also in the       stops, colons, semi colons, exclamative and
canonical preverbal position (10c):                     question marks, whenever those were found within
                                                        the 8 tokens context. The tagset used in the
(10) a. Il preside ha consegnato i diplomi.             Repubblica Corpus neither distinguishes among
        The dean has awarded the diplomas               reflexive and various types of impersonal forms
     b. Si sono consegnati i diplomi.                   (“CLI/si” is the generic tag used) nor among
        SIgeneric are awarded the diplomas              different verbal classes with respect to their
       ‘Diplomas got awarded’                           argumental structure (only VB for “be”, VH for
     c. I diplomi si consegnano (agli studenti).        “have”, and VV for other verbs are included). We
        the diplomas SIgeneric awarded                  then decided to annotate manually the first 2.000
                                 (to the students)      contexts returned by our query (0,07% of the total)
       ‘Diplomas are getting awarded                    using the following scheme much simplified with
                                 (to the students)’     respect to the structural asymmetries revealed by
This construction is only possible with                 the discussion in §1: I (impersonal), L (local, DP
(di)transitive predicates, since the promotion of the   immediately preceding “si” is the correct one), PV
object to the grammatical subject role is only          (post-verbal: the first DP after the predicate
available when a direct object is available.            following “si” is the correct co-referent) and LM
    On the other hand, the impersonal version of si     (the DP immediately preceding, in the hierarchical
does not induce the promotion of the internal           sense, the reflexive “si” is the correct one, but such
argument to the grammatical subject role and in         DP is “modified” by a PP or a relative clause) and
fact this construction is available without any         A (the referent is not present/retrievable in the
verbal class restriction:                               extracted context; these are in the great majority
                                                        pro-drop cases, in just two cases the referent was
    (11) a. Si guarda la partita                        lexically realized outside the context isolated).
            SIgeneric watches the game                  Both authors annotated independently the corpus
            ‘We watch the game’                         and discussed about the disagreement cases (less
         b. Si dorme                                    than 1% of the sample) in order to find an
            SIgeneric sleeps                            agreement in the annotation. Table 1 indicates the
            ‘We sleep’                                  distribution of the classes across the annotated
         c. Si cade                                     corpus fragment, while Table 2 exemplifies the
            SIgeneric falls                             classification. Due to the simplicity of this
            ‘We fall’                                   classification (that essentially focus on the
In sum, with the impersonal si construction, the        identification of the reflexive antecedent, if
subcategorization verbal frame (i.e. the verbal         present/necessary), we would expect a better
argumental structure) could help in isolating the       performance compared to any richer classification,
passive si construction, but not the impersonal one.    which is apparently necessary according to the
As for reflexive si, the full argument structure must   structural analysis previously discussed.
be identified and then either the passive strategy
                                                           annotation       # of contexts            %
(deletion and promotion) or the impersonal one
(simple deletion) considered. As a consequence of              I                  332               16.6
the null subject option in Italian, the difference             L                  994               49.7
between impersonal and passive si is often blurred.            LM                 417               20.8
                                                               PV                 183               9.15
                                                               A                   74                3.7
                                                        Table 1. Distribution of the annotated categories across
                                                        the sample.
                                                         neural networks using Weka wrappers for
annotation                   example
                                                         Deeplearning4j 1.5.13 (srnn.net in table 4) for a
      I            si è deciso di ridurre il deficit     total of 5 classifiers. We run our experiments
                 we decided to reduce the deficit
                                                         within Weka 3.8.3 environment with CUDA 10.1
     L           [i fedeli]i sii sono tuttavia sciolti   GPU nVIDIA support. Word embeddings are built
               the faithfulls, nevertheless, split up
                                                         using a larger fragment of left and right contexts
   LM        [il vertice di Dublino]i sii è dimostrato   (+/-10 words at most, breaking the left/right
               the Dublin summit proved to be …
                                                         context at full stops) extracted from Repubblica
    PV          nel cortile sii stendono [le stuoie]i    corpus including the “si” seed (first 1.000.000
               in the courtyard the mats unfolded
                                                         sentences returned using the publicly available
     A       per 16 anni sii è occupato dei processi
            for 16 years [he] took care of the trials    Sketch Engine search interface).
Table 2. Sample annotation using 5 categories.           3. Results
2.1 Classifiers descriptions                             The results of the classification tests are reported in
Under the “usage-based” approach the                     table 4. The accuracy indicates the rate of correct
disambiguation (i.e. the interpretation of the correct   classifications and the standard deviation running
referent, if necessary) of the distinct si               10 experiments with cross-fold validation
constructions should be possible on the basis of the     (standard deviation is indicated) and the
purely statistical distribution of the (implicit)        significance is expressed with respect to the
features across the corpus (Tomasello 2003 and           baseline:  indicates that the accuracy is
related works). To test this hypothesis we created a     significantly better than baseline,  significantly
set of classifiers using the Weka environment            worse and no sign means no significant difference
(Frank et al 2016). 4 different classifiers are used     (pair-wise comparison using corrected resampled
including the original extracted context of              T-Test, Witten & Frank 2005).
maximum 8 words before and after the clitic si +
predicate cluster (Table 3): pure Bag-of-Words              Class. ID       Algorithm       Accuracy (SD)    Sign.
(BoW) approach was used for the first two                    baseline                           49.70%
classifiers, one with only the left context included,
                                                                           n.bayes          56.95% (2.79)     
the other with both left and right context; then we
manipulated the left context classifier substituting                       n.bayes.mul.     54.28% (2.03)     
the words with their POS (classifier C3-POS-L) and         C1-BOW-L J48                     58.34% (2.48)     
with a more coarse set of POS tags (C4-CPOS-L).                            conv.net         51.88% (1.44)     
POS and CPOS annotation are obtained using a
                                                                           srnn.net         39.63% (11.79)    
free online tool (ItaliaNLP REST API, Cimino &
Dell’Orletta 2016).                                                        n.bayes          49.21% (3.40)
                                                                           n.bayes.mul.     51.61% (1.17)     
   Class. ID     Approach              Context            C2-BOW-LR J48                     48.66% (2.53)
  C1-BOW-L                    Left context                                 conv.net         49.77% (0.41)
                    BoW                                                    srnn.net         39.05% (12.77)    
 C2-BOW-LR                    Left & Right context
  C3-POS-L          POS       Left context                                 n.bayes          54.49% (2.35)     
 C4-CPOS-L         CPOS       Left context                                 n.bayes.mul.     53.26% (1.99)     
Table 3. Classifier description                            C3-POS-L        J48              60.76% (2.97)     
                                                                           conv.net         57.58% (1.98)     
2.2 Classification algorithms
                                                                           srnn.net         43.52% (7.17)     
Given the baseline classification of 49.7% of
                                                                           n.bayes          59.96% (2.85)     
accuracy, obtained by choosing always the
reflexive local class (L classification), we                               n.bayes.mul.     50.89% (1.03)     
compared Naïve Bayesian algorithms (i.e.                   C4-CPOS-L J48                    61.49% (3.08)     
NaïveBayes, n.bayes in table 4, and
                                                                           conv.net         49.70% (0.25)
NaïveBayesMultimodal, n.bayes.mul. in table 4)
with a decision tree-based algorithm (i.e. J48) and                        srnn.net         44.20% (6.17)     
then with both 3 layers convoluted (with LSTM            Table 4. Classification accuracy results
layer; conv.net in table 4) and simple recurrent
In both left and left-right context classifiers, BoW      Italian native speaker owns and that enable her/him
approach (C1-BOW-L and C2-BOW-LR) is clearly              to identify correctly the relevant referent both pre-
not sufficient to solve the classification problem;       and post-verbally, even in the case of complex
the introduction of a right context (C2-BOW-LR)           subjects (referent DPs modified by prepositional
significantly reduces the performance of the              phrases or relative clauses), as well as its
classifier. Notice that in almost 10% of the cases        unnecessity (in generic/impersonal readings) or its
the availability of the referent is post-verbal (PV       recovery in case of pro-drop. We might expect then
classification). Decision trees (J48), overall,           that a richer syntactic annotation could help to
perform better (M=58.34% SD=2.48) but this                boost the automatic classification results in
performance represents a significant improvement          accordance with the structural analysis
only with C1-BOW-L and C4-CPOS-L classifiers.             summarized in §1.1 and §1.2: first, a verbal
None of the deep learning approaches (conv.net            subcategorization specification properly describing
and srnn.net) are significantly better than decision      the predicate argument structure could be useful,
trees (in some cases SRNs perform significantly           then a correct analysis of the subject phrase
worse). The best absolute performance in obtained         structure, including agreement cues should be used,
substituting words with coarse POS (C4-CPOS-L).           as well as a richer classification of temporal/modal
In this case J48 obtains the best accuracy                adverbials/modifiers.
(M=61.49% SD=3.08).                                          As suggested by an anonymous reviewer,
                                                          information structure, which is largely obliterated
4. Discussion                                             in written texts, is expected to disambiguate
    In this paper, we discussed the nature of some si     between reflexive and impersonal constructions:
constructions in Italian, suggesting that, despite        for instance, non-dislocated preverbal subjects
their apparent simplicity, their structural intricacies   (L(M) in our classification) should be ruled out in
require a deep syntactic analysis for identifying         impersonal constructions (see Raposo &
correctly the typology of the clitic in various           Uriagereka 1996); moreover, non-focalized (or
contexts and retrieve, when necessary, a proper           right-dislocated) postverbal subjects (PV in our
referent. Also using a simplified set of five classes     classification) should be ruled out in reflexive
(I = impersonal; L = local immediately preceding          constructions. Then, despite the fact that
coreferential DP; PV = local, immediately post-           prosody/information structure cannot be assessed
verbal coreferential DP; LM = local preceding             within a corpus-based study, we might expect an
coreferential DP but with prepositional phrase or         improvement of the classifiers performance
relative clause modification; A = absent referent),       considering some relevant features associated to
we demonstrated that, using an annotated sample           these configurations: e.g. post-verbal subject
of the Repubblica corpus, no classifier has               annotation in connection with the verbal class and
exceeded the performance of 61.49% of accuracy.           adverbials placement between the subject and verb
This is well below any human reasonable                   indicating a dislocated subject.
performance (as suggested by the 99% agreement               A follow up of this study should test these
in classification between annotators). These              predictions and, possibly, extend the study to the
results, even though still based on a small fragment      whole Repubblica corpus, confirming (or
of the Repubblica Corpus, extend Chesi & Moro             disconfirming) our preliminary results that suggest
(2018) original considerations using a wider              we cannot avoid a deep structural analysis of these
dataset and more advanced ML algorithms.                  constructions to classify (and interpret) them
These results showed that neither the algorithms          correctly.
used nor the extension of the context (both left and
right) helped in classifying correctly the instances      References
of “si” when the referent had to be retrieved non-
locally or in impersonal “si” cases. Replacing the        Baroni, Marco, Silvia Bernardini, Federica
words with their POS mildly helped in improving              Comastri, Lorenzo Piccioni, Alessandra Volpi,
the performance of some classifiers (especially              Guy Aston, and Marco Mazzoleni. 2004.
using the coarse tagset), with decision tree                 Introducing the La Repubblica Corpus: A
classifier (J48) obtaining the best performance (on          Large, Annotated, TEI (XML)-compliant
average) across the tests.                                   Corpus of Newspaper Italian. In Proceedings of
    Given the poor performance of the classifiers            the Fourth International Conference on
tested, we concluded that the “usage-based”                  Language Resources and Evaluation (LREC
intuition is not sufficient here to account for the          2004).
acquisition of the discriminative capabilities any
Basili, Roberto, Maria Teresa Pazienza, and             Pescarini, Diego. 2015. Le costruzioni con si.
   Michele Vindigni. 1997. Corpus-driven                   Italiano, dialetti, lingue romanze. Roma:
   unsupervised learning of verb subcategorization         Carocci.
   frames. Congress of the Italian Association for      Pesetsky, David. 1995. Zero Syntax. MIT Press,
   Artificial Intelligence. Springer, Berlin,              Cambridge, MA
   Heidelberg.                                          Raposo, Eduardo & Juan Uriagereka. 1996.
Belletti, Adriana. 2002. Aspects of the low IP area.       Indefinite SE. Natural Language and Linguistic
   Forthcoming in The structure of IP and CP. The          Theory 14: 749—810.
   Cartography of Syntactic Structures, vol. 2, L.      Reinhart, Tania, & Siloni, Tal. 2005. The lexicon-
   Rizzi (ed.). New York: Oxford University                syntax parameter: Reflexivization and other
   Press.                                                  arity operations. Linguistic inquiry, 36(3), 389-
Burzio, Luigi 1992. On the morphology of                   436.
   reflexives and impersonals. Theoretical              Rizzi, Luigi. (1986). Null objects in Italian and the
   analyses in Romance linguistics. Amsterdam:             theory of 'pro'. Linguistic inquiry, 17(3), 501-
   Benjamins, 399-414.                                     558.
Chesi, Cristiano, & Moro, Andrea 2018. Il divario       Salvi, Giampaolo 2018. La formazione della
   (apparente) tra gerarchia e tempo. Sistemi              costruzione      impersonale      in      italiano.
   intelligenti, 30(1), 11-32.                             Linguística: Revista de Estudos Linguísticos da
Chomsky, Noam 1995. The minimalist program.                Universidade do Porto, 3, 13-37.
   Cambridge, MA: MIT press.                            Sportiche, Dominique. 1998. Partitions and atoms
Cimino, Andrea, Dell’Orletta, Felice. 2016.                of clause structure: Subjects, agreement, Case
   “Building the state-of-the-art in POS tagging of        and clitics. New York: Routledge.
   Italian Tweets”. In Proceedings of EVALITA           Tomasello, Michael. 2003. Constructing a
   ’16, Evaluation of NLP and Speech Tools for             language: A usage-based theory of language
   Italian, 7 December, Napoli, Italy.                     acquisition. Cambridge, MA: Harvard
Cinque, Guglielmo 1988. On si constructions and            University press.
   the theory of arb. Linguistic inquiry, 19(4), 521-   Witten, Ian, H. and Eibe Frank 2005. Data Mining:
   581.                                                    Practical machine learning tools and
Dillon, Brian, Alan Mishler, Shayne Sloggett, and          techniques. 2nd edition Morgan Kaufmann, San
   Colin Phillips. 2013. Contrasting intrusion             Francisco.
   profiles for agreement and anaphora:
   experimental and modeling evidence. J. Mem.
   Lang. 69, 85–103.
Dobrovie-Sorin, Carmen. 1998. Impersonal se
   constructions in Romance and the passivization
   of unergatives. Linguistic Inquiry, 29(3), 399-
   437.
Frank, Eibe, Mark A. Hall, and Ian H. Witten.
   2016. The WEKA Workbench. Online
   Appendix for "Data Mining: Practical Machine
   Learning Tools and Techniques", Morgan
   Kaufmann, Fourth Edition, 2016.
Grimshaw, Jane. 1990. Argument Structure. MIT
   Press, Cambridge, MA.
Ienco, Dino, Serena Villata, and Cristina Bosco.
   2008. Automatic extraction of sub-
   categorization frames for Italian. In LREC08,
   pp. 2094-2100. European Language Resources
   Association (ELRA)
Marantz, Alec. 1984. On the Nature of
   Grammatical Relations. MIT Press, Cambridge.
Merlo, Paola and Stevenson, S Suzanne, 2001.
   Automatic verb classification based on
   statistical distributions of argument structure.
   Computational Linguistics, 27(3), pp.373-408.