<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Role of Textual Connectives in Sentence Comprehension: A New Dataset for Italian</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgia Albertin</string-name>
          <email>giorgia.albertin.2@studenti.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessio Miaschi⋆ ⋄</string-name>
          <email>alessio.miaschi@phd.unipi.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dominique Brunato⋄</string-name>
          <email>dominique.brunato@ilc.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>lid, S ̧ aziye Betu ̈l O</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present a new evaluation resource for Italian aimed at assessing the role of textual connectives in the comprehension of the meaning of a sentence. The resource is arranged in two sections (acceptability assessment and cloze test), each one corresponding to a distinct challenge task conceived to test how subtle modifications involving connectives in real usage sentences influence the perceived acceptability of the sentence by native speakers and Neural Language Models (NLMs). Although the main focus is the presentation of the dataset, we also provide some preliminary data comparing human judgments and NLMs performance in the two tasks1.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The outstanding performance reached by recent
Neural Language Models (NLMs) across a
variety of NLP tasks that require extensive linguistic
skills has stimulated an increased interest in the
theoretical and computational linguistics community
towards a better understanding of their inner
mechanisms. In particular, the debate is focused on trying
to understand what kind of linguistic knowledge
these models are able to induce from the raw data
they are exposed to and to what extent this
knowledge resembles human-like generalization patterns
        <xref ref-type="bibr" rid="ref10 ref9">(Linzen and Baroni, 2021; Manning, 2015)</xref>
        . To
pursue this investigation, it has become of pivotal
importance the availability of challenging test sets,
also called ‘diagnostic’ or ‘stress’ tests, built to
probe the sensitivity of a model to specific
language phenomena.
      </p>
      <p>Copyright © 2021 for this paper by its authors. Use
permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0).</p>
      <p>1The resource is available at: http://www.italia
nlp.it/resources/.</p>
      <p>
        So far, most of the efforts have been focused on
assessing the syntactic abilities encoded by NLMs
by exploiting human curated benchmarks, which
are usually proposed in the form of minimal
sentence pairs, i.e. minimally different sentences
exemplifying a wide array of linguistic contrasts. A
well-known one is BLiMP (Benchmark of
Linguistic Minimal Pairs)
        <xref ref-type="bibr" rid="ref18">(Warstadt et al., 2020)</xref>
        which
contains pairs that contrast in syntactic
acceptability and isolating fine-grained phenomena in specific
domains of the English grammar, such as subject–
verb agreement, island effects, ellipsis and negative
polarity items.
      </p>
      <p>Differently from syntactic well-formedness, less
explored is the sensitivity of these models to deeper
linguistic dimensions involving semantics and
discourse, such as textual cohesion, which are critical
to language understanding. With this respect, one
of the explicit devices that natural languages use to
convey textual cohesion is represented by function
words. As observed by Kim et al. (2019), although
these words play a key role in compositional
meaning as they introduce discourse referents or make
explicit relations between them, they are still
underinvestigated in the literature on representation
learning. To this end, the authors released a suite of nine
challenge tasks for English aimed to test the NLMs’
understanding of specific types of function word,
e.g. coordinating conjunctions, quantifiers, definite
articles. Reasoning about conjuncts in conjunctive
sentences, Saha et al. (2020), instead, introduced
CONJNLI, a challenge stress-test for Natural
Language Inference (NLI) over conjunctive sentences,
where the premise differs from the hypothesis by
conjuncts removed, added, or replaced.</p>
      <p>Taking inspiration from this work, in this paper
we focus the attention on the role of textual
connectives in the comprehension of a sentence and
we introduce a new evaluation resource for Italian
which, to our knowledge, is the first one for this
language. The resource is articulated into two sections
(acceptability assessment and cloze test), each one
corresponding to a distinct task aimed at probing,
in a different format, to what extent current NLMs
are able to properly encode the role of connectives
in a sentence. A peculiarity of the dataset is that
it contains sentences that were extracted and
minimally modified from existing corpora so as to test
the comprehension of connectives in the real use
of language.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Corpus Collection</title>
      <p>This section is divided into two parts. In the first
one, we discuss the methodology implemented
for the selection of connectives and the
extraction of the sentences. Subsequently, we provide
an overview of the two tasks defined to test the
correct comprehension of connectives.
2.1</p>
      <sec id="sec-2-1">
        <title>Selecting Connectives and Extracting</title>
      </sec>
      <sec id="sec-2-2">
        <title>Sentences</title>
        <p>
          As a first step, we denfied the linguistic criteria for
the selection of connectives to include in the corpus.
By connective we mean specific words that have the
function of drawing a relation between two or more
clauses
          <xref ref-type="bibr" rid="ref14 ref6">(Sanders and Noordman, 2000; Graesser
and McNamara, 2011)</xref>
          . To this end, two resources
were employed: the INVALSI reading
comprehension and language reflection tests designed by the
National Institute for the Evaluation of the
Education System and the Nuovo Vocabolario di Base
of Italian (NVdB)
          <xref ref-type="bibr" rid="ref4">(De Mauro and Chiari, 2016)</xref>
          .
Starting from the collection of the INVALSI tests
proposed in the last six years for different grades,
we extracted all words which were expressly called
‘connective’ in the tests or were involved in defining
a logical relationship between two sentences. We
thus obtained a first list of 46 elements, belonging
to diverse morpho-syntactic categories (i.e.
prepositions, conjunctions, adverbs), which was then
integrated with other 19 connectives extracted from
the NVdB. We then checked the distribution of the
selected items in existing Italian treebanks and
extracted the sentences in which these words were
unambiguously used as sentence connectives. Three
different sections of the Italian Universal
Dependency Treebank (IUDT) (Zeman et al., 2020) were
used: ISDT
          <xref ref-type="bibr" rid="ref1">(Bosco et al., 2013)</xref>
          , PoSTWITA
          <xref ref-type="bibr" rid="ref15">(Sanguinetti et al., 2018)</xref>
          and TWITTIRo`
          <xref ref-type="bibr" rid="ref2">(Cignarella et
al., 2019)</xref>
          2, the first one representative of standard
2https://universaldependencies.org/tr
eebanks/it-comparison.html.
language and the latter collecting Italian tweets.
We employed PML TreeQuery3 to query the
treebanks and filter the sentences containing the
connectives we were interested in. In particular, to
exclude occurrences which do not have the role of
phrasal connectives (e.g. the conjunction e joining
two nouns), only sentences in which the
connective was headed by a verb or a copula were taken
into account. We observed that the absolute
frequency’s positions of the selected connectives in
the three corpora above-mentioned mostly
overlap, although their occurrences in PoSTWITA and
TWITTIRo` (jointly considered as sample of Italian
social media language) were lower than in ISDT,
also given the different corpora sizes (i.e. 289,343
words in ISDT vs 154,050 words in PoSTWITA
and TWITTIRo`). Given the partial overlapping of
the frequency data and the potential non-standard
use of connectives in treebanks representative of
social media texts, also due to genre-specific
features (e.g. hashtag, emoticons etc.), we decided
to consider only the first 21 most frequent
connectives occurring in ISDT. As the first Italian corpus
for the comprehension on textual connectives, we
prefer to focus in sentences as close as possible to
standard Italian language. Further considerations
on connectives’ distributions led us to the deletion
of per, cos`ı, ancora, because of their ambiguous
behavior as textual connectives (e.g. we noticed that
the majority of the occurrences of per involves the
presence of an infinite verb, a distribution which is
far from the other connectives). The following 18
connectives were finally considered: e, se, quando,
come, ma, dove, o, anche, perche´, poi, mentre,
infatti, prima, pero`, invece, inoltre, tuttavia, quindi.
The distribution of the finally selected connectives
from ISDT and from PoSTWITA and TWITTIRo`
is reported in Appendix A.
        </p>
        <p>Once established the final list, those sentences
which we consider more suitable to be involved in
our tasks were manually extracted from ISDT and
eventually modified following some patterns, to
guarantee sentence comprehension. For example,
in some cases two sentences occurring in the
treebank in a subsequent order, but that were clearly
extracted from the same text, were joined together
to form a unique sentence, through the insertion
of the appropriate punctuation. This happened e.g.
when the connective appeared at the beginning of
the second sentence joining this to the first one,
3https://ufal.mff.cuni.cz/pmltq.
which serves as the antecedent to comprehend the
logical relationship. We tried to include in the
dataset sentences with different degrees of
syntactic and lexical complexity, considering the number
of subordinate clauses and the variety of the
lexicon as related proxies. All the original sentences,
later arranged into the acceptability assessment and
the cloze test task, are drawn from ISDT.
2.2</p>
      </sec>
      <sec id="sec-2-3">
        <title>Definition of the Tasks</title>
        <p>The collected sentences were grouped in two
sections aimed at testing the correct comprehension
of connectives in a different format, i.e. through an
acceptability assessment task and a cloze test task.
Table 1 provides an example of sentences/sentences
pairs for each task.</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.2.1 Acceptability Assessment Section</title>
        <p>
          To design the acceptability assessment task, we
selected 15 sentences per connective from the whole
dataset. For each sentence, an unacceptable
counterpart was created by replacing the original
connective with another of the list. The replacement
strategy was meant to obtain unacceptable
sentences with contradictory or nonsensical meaning
but preserving their grammaticality. Indeed those
sentences should be the most challenging one for
NLMs, which have been shown to be capable of
detecting sentence grammaticality
          <xref ref-type="bibr" rid="ref7">(Jawahar et al.,
2019)</xref>
          , but still struggle to track down unacceptable
meanings and contradictions. Nevertheless, we
were not always able to guarantee this constraint
as for some specific contexts none of the available
connective could be substituted without affecting
the resulting grammaticality. This happened in 98
cases, which we decided to keep in the dataset but
we signaled with the label ‘no’ in the field
’grammaticality’, as in:
        </p>
        <p>Nei campi si sopravvive anche intorno tutto
muore.</p>
        <p>Although the assessment of grammaticality is
not the main focus of this work, given the fact that
it was unavoidably violated in the above-reported
cases, we feel compelled to provide distinguished
analysis for the group of ungrammatical sentences.
A few sentences were also deleted due to ambiguity.
The final section contains 518 sentence pairs, i.e.
259 acceptable and 259 unacceptable ones.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.2.2 Cloze Test Section</title>
        <p>The second section was designed as a cloze test
task and contains 270 sentences, 15 for
connective. For every sentence the original connective
was replaced by a blank space and 5 alternatives
were proposed for completion: the target, a
plausible alternative and three implausible options. For
‘plausible alternative’ we mean another connective
of the list that could occupy the same linguistic
contest of the target, yielding to an identical
meaning or to a different, yet totally plausible, reading.
As for the acceptability task, it turns out that for
some connectives (e.g. prima) it was very
challenging, if not impossible, to propose such a plausible
connective. In those cases, that in truth are only a
minority, it has been proposed an alternative that at
least should guarantee the grammaticality.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Corpus Annotation</title>
      <p>The two sections of the dataset were splitted into
9 surveys (5 for the acceptability assessment task
and 4 for the cloze task) and submitted to human
evaluation by recruiting Italian native speakers of
different ages through the Prolific platform 4.</p>
      <p>
        In the acceptability assessment task,
participants were asked to judge the acceptability of each
sentence on a 5-grade Likert scale (from 1=‘totally
unacceptable’ to 5=‘totally acceptable’). Although
this makes the dataset more challenging, we
assume that acceptability is a gradual rather than
binary notion as it is affected by many factors
        <xref ref-type="bibr" rid="ref16 ref17">(Sorace
and Keller, 2005; Sprouse, 2007)</xref>
        . To disambiguate
the interpretation of sentence acceptability and
orient annotators in giving their judgments, the survey
guidelines encouraged them to think if they found
the sentence natural in Italian and if they would
have used it in a real conversation or any other
communicative context.
      </p>
      <p>For the cloze test task, participants were
required to supply the missing element choosing
among the proposed options plus the one “none
of the previous options is suitable”.</p>
      <p>Each survey was completed by 20 annotators on
average. The number of annotations per sentence
in the acceptability task ranges from 16 to 21 and
for the cloze task from 18 to 21. To improve data
quality, we discarded annotators who took less than
10 minutes to complete the test, considering the
average threshold time for each survey. This led
us to reject 5 annotators only for the acceptability
task.</p>
      <p>Table 2 reports the average human score and
standard deviation obtained by the acceptable and
4https://prolific.co.
e 11A
e 11NA
ma 64A
ma 64NA
Che cosa possiamo fare in estate ... vogliamo partire per le vacanze e
abbiamo un cane o un gatto? [ se quando perche´ dove come]
Nelle botteghe artigianali della produzione di piastrelle la smaltatura e`
ancora tradizionale, ... i forni, come e` naturale, oggi funzionano a gas.
[mentre invece come dove perche´]
unacceptable sentences. For the latter, we
separately computed these scores for the subset of
sentences which were also labeled as ungrammatical
(see Section 2.2.1). As it can be seen, humans
perform very well on the task assigning quite higher
scores to the acceptable sentences with respect to
the unacceptable ones, also with little variability.
Within the unacceptable subset, the slightly smaller
score received on average by ungrammatical
sentences provides further evidence that humans are
sensitive to this distinction.</p>
      <p>Also for the cloze test task the human evaluation
confirms the validity of the resource. Indeed, as
shown in Table 3, the target connective was largely
chosen by the majority of annotators as the most
adequate one, although for ∼ 20% of sentences
humans preferred the plausible candidate or the two
options got half annotations each. The percentage
of sentences for which the majority label was given
to an implausible choice is largely negligible.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Testing the Sensitivity of Neural</title>
    </sec>
    <sec id="sec-5">
      <title>Language Models to Connectives</title>
      <p>We conclude by presenting some preliminary
findings aimed at testing the performance of NLMs in
the two tasks. Specifically, we performed two
distinct evaluations. For the acceptability assessment
Cloze task choice
Target
Plausible alt.</p>
      <p>Implausible alt.</p>
      <p>Target=Plausible alt.</p>
      <p>
        N. Items
task, we computed the perplexity (PPL) score
assigned by the GePpeTto model
        <xref ref-type="bibr" rid="ref3 ref4">(De Mattei et al.,
2020)</xref>
        to all sentences of the corresponding section.
We relied on perplexity as it is a standard
evaluation measure of the quality of a language model
yielding a good approximation of how well a model
recognises an unseen piece of text as a plausible
one. Accordingly, we assumed that higher PPL
scores should be assigned to sentences labeled as
unacceptable with respect to their original version.
GePpeTto was chosen as it is a traditional
unidirectional model built using the GPT-2 architecture
        <xref ref-type="bibr" rid="ref12">(Radford et al., 2019)</xref>
        and, differently from a
bidirectional model such as BERT
        <xref ref-type="bibr" rid="ref5">(Devlin et al., 2019)</xref>
        ,
allows computing a well-formed probability
distribution over sentences. The sentence-level PPL was
calculated using the formula reported in Miaschi et
al. (2020).
      </p>
      <p>By inspecting the results in Table 4, we observed
that the average PPL score assigned to the
acceptable sentences is quite lower than the one assigned
to the unacceptable ones (i.e. 42.512 vs 78.280).</p>
      <p>As expected, for the subset of unacceptable
sentences, perplexity was on average higher for the
ones marked as ungrammatical (98.992), reflecting
the model’s capability of encoding syntactic
phenomena. Interestingly, among unacceptable
sentences, those obtaining lower PPL scores were
perfectly well-formed but with an implausible
meaning, as in the case of:</p>
      <p>Il film ’Le chiavi di casa’ ha partecipato al
Festival del Cinema di Venezia di quest’anno,
perche´ non ha vinto nessun premio (P P L =
13.892).</p>
      <p>To compare humans and model performance,
we also computed the Spearman’s rank correlation
(ρ ) between the average acceptability score given
by annotators and the PPL score assigned by the
model to the same sentences. Although limited to
this analysis, the resulting very weak correlation
(i.e. ρ = − 0.120, p − value &lt; 0.01) suggests
that connectives differently impact on the ability of
humans and models to assess the plausibility of a
sentence.</p>
      <p>
        As for the cloze task test, we relied on the
pretrained Italian version of the BERT model
developed by the MDZ Digital Library Team and
available trough the Huggingface’s Transformers library
        <xref ref-type="bibr" rid="ref19">(Wolf et al., 2020)</xref>
        5. We extracted the first ten
completions provided by the model trough the Masked
Language Modeling task (MLM) for each sentence,
along with their probabilities. This allowed us to
inspect whether and in how many cases either the
target connective or the plausible alternative appear
in the top-ranked predictions.
      </p>
      <p>As shown in Table 5, for the large majority of
cases BERT is able to infer in its first 10
predictions that the sentence should be completed with
a correct connective. That happens in 86.29% of
the sentences for the target, resulting from the sum
of the cases where only the target occurs in the
completions (31.48%) with the cases in which both
the target and the plausible alternative were
predicted (54.81%), and in 59.25% for the plausible
5https://huggingface.co/dbmdz/bert-ba
se-italian-xxl-cased
Predict.</p>
      <p>10 match
1st match
Target (85)
Pl. alt. (12)
Target+Pl. alt. (148)
Other (25)
alternative (that is 4.44% plus 54.81%). Focusing
instead on the first completion for each sentence,
we observe that in almost half of the sentences
BERT assigns the highest probability to the
original connective (41.11%) or to the plausible one
(8.52%).</p>
      <p>We are currently performing a more qualitative
analysis to better investigate the cases in which the
correct connective hasn’t received a high
probability score, as well as those in which neither of the
two options appeared at all (i.e. Other cases in
Table 5), in order to understand whether the other
completions can still be considered as plausible
ones. Preliminary findings showed that, among
the Other cases, about 56 of the completions
provided by BERT are unacceptable and 34 of them
are dubious acceptable i.e. not clearly
recognizable as acceptable6, as in the case of the following
sentence7:</p>
      <p>Secondo gli esperti, in Italia i giovani leggono
meno i giornali rispetto ai giovani di altri
Paesi europei, ... rispetto agli anni passati i
giovani tra i 14 e i 19 anni leggono piu` spesso
i giornali. [perche´ anche per o`].</p>
      <p>Nevertheless, the majority of Other’s
completions can be considered as acceptable ones. In
fact, BERT predicted a word leading to the same
meaning (or, at least, very similar) to the original
sentence in more that 60 cases. Moreover, in most
cases (i.e. 92) the completions provided are
plausible ones, although in some of them the sentences
acquire different meanings.</p>
      <p>6Note that in order to assign the acceptability label of each
completion we refer to the usage of the Italian language as
standard as possible.</p>
      <p>7the unacceptable completion is marked in bold, the
dubious acceptable one is reported in block and the original
connective is indicated in italics.</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In the context of studies devoted to assess the
linguistic knowledge implicitly encoded by Neural
Language Models, we introduced a new evaluation
dataset for Italian designed to test the
understanding of textual connectives in real-usage sentences.
At first, we verified the significance of a set of
selected connectives through a frequency analysis on
already existing Italian gold corpora. Then, we
manually selected only those sentences in which
occur a genuine connective. Finally, we grouped
the sentences into two different tasks, differing for
the format used to elicit sentence comprehension
in humans and current state-of-the-art NLMs:
acceptability assessment and cloze test tasks. Human
evaluation was provided for both the section, to
verify the robustness of the dataset, which indeed
was confirmed from the judgements collected.</p>
      <p>Preliminary findings on NLMs behaviour on
textual connectives showed that in several cases the
models are capable of distinguishing between
acceptable and unacceptable sentences, thus
suggesting their ability to encode sentence meaning within
their internal mechanisms. However, it remains
unclear to what extent these models rely on semantic
acceptability features, since we observed cases in
which they fail to recognize implausible meaning
of perfectly grammatical sentences.</p>
      <p>We are currently increasing the dataset with the
introduction of a new section designed in the form
of the traditional Natural Language Inference task,
for which the understanding of a given connective
will be fundamental to infer the correct entailment
relation between a premise and a hypothesis. We
also believe that expanding the dataset to further
connectives and including sentences representative
of non standard italian language usage, i.e.
socialmedia language, would be desirable to improve the
robustness of the resource.
Arantza Diaz de Ilarraza, Carly Dickerson, Arawinda
Dinakaramani, Bamba Dione, Peter Dirix, Kaja
Dobrovoljc, Timothy Dozat, Kira Droganova, Puneet
Dwivedi, Hanne Eckhoff, Marhaba Eli, Ali Elkahky,
Binyam Ephrem, Olga Erina, Tomazˇ Erjavec, Aline
Etienne, Wograine Evelyn, Sidney Facundes, Richa´rd
Farkas, Mar´ılia Fernanda, Hector Fernandez
Alcalde, Jennifer Foster, Cla´udia Freitas, Kazunori
Fujita, Katar´ına Gajdosˇova´, Daniel Galbraith,
Marcos Garcia, Moa Ga¨rdenfors, Sebastian Garza,
Fabr´ıcio Ferraz Gerardi, Kim Gerdes, Filip Ginter,
Iakes Goenaga, Koldo Gojenola, Memduh Go¨kırmak,
Yoav Goldberg, Xavier Go´mez Guinovart, Berta
Gonza´lez Saavedra, Bernadeta Griciu¯te˙, Matias
Grioni, Lo¨ıc Grobol, Normunds Gru¯z¯ıtis, Bruno
Guillaume, Ce´line Guillot-Barbance, Tunga Gu¨ngo¨r,
Nizar Habash, Hinrik Hafsteinsson, Jan Hajicˇ, Jan
Hajicˇ jr., Mika Ha¨ma¨la¨inen, Linh Ha` My˜,
NaRae Han, Muhammad Yudistira Hanifmuti, Sam
Hardwick, Kim Harris, Dag Haug, Johannes
Heinecke, Oliver Hellwig, Felix Hennig, Barbora Hladka´,
Jaroslava Hlava´cˇova´, Florinel Hociung, Petter Hohle,
Eva Huber, Jena Hwang, Takumi Ikeda, Anton Karl
Ingason, Radu Ion, Elena Irimia, O.la´j´ıde´ Ishola,
Toma´sˇ Jel´ınek, Anders Johannsen, Hildur Jo´nsdo´ttir,
Fredrik Jørgensen, Markus Juutinen, Sarveswaran K,
Hu¨ner Kas¸ıkara, Andre Kaasen, Nadezhda Kabaeva,
Sylvain Kahane, Hiroshi Kanayama, Jenna
Kanerva, Boris Katz, Tolga Kayadelen, Jessica
Kenney, Va´clava Kettnerova´, Jesse Kirchner, Elena
Klementieva, Arne Ko¨hn, Abdullatif Ko¨ksal, Kamil
Kopacewicz, Timo Korkiakangas, Natalia Kotsyba,
Jolanta Kovalevskaite˙, Simon Krek, Parameswari
Krishnamurthy, Sookyoung Kwak, Veronika Laippala,
Lucia Lam, Lorenzo Lambertino, Tatiana Lando,
Septina Dian Larasati, Alexei Lavrentiev, John Lee,
Phng Leˆ H`oˆng, Alessandro Lenci, Saran
Lertpradit, Herman Leung, Maria Levina, Cheuk Ying
Li, Josie Li, Keying Li, Yuan Li, KyungTae Lim,
Krister Linde´n, Nikola Ljubesˇic´, Olga Loginova,
Andry Luthfi, Mikko Luukko, Olga Lyashevskaya,
Teresa Lynn, Vivien Macketanz, Aibek Makazhanov,
Michael Mandl, Christopher Manning, Ruli
Manurung, Ca˘ta˘lina Ma˘ra˘nduc, David Marecˇek,
Katrin Marheinecke, He´ctor Mart´ınez Alonso, Andre´
Martins, Jan Masˇek, Hiroshi Matsuda, Yuji
Matsumoto, Ryan McDonald, Sarah McGuinness,
Gustavo Mendonc¸a, Niko Miekka, Karina Mischenkova,
Margarita Misirpashayeva, Anna Missila¨, Ca˘ta˘lin
Mititelu, Maria Mitrofan, Yusuke Miyao, AmirHossein
Mojiri Foroushani, Amirsaeid Moloodi, Simonetta
Montemagni, Amir More, Laura Moreno Romero,
Keiko Sophie Mori, Shinsuke Mori, Tomohiko
Morioka, Shigeki Moro, Bjartur Mortensen, Bohdan
Moskalevskyi, Kadri Muischnek, Robert Munro,
Yugo Murawaki, Kaili Mu¨u¨risep, Pinkey Nainwani,
Mariam Nakhle´, Juan Ignacio Navarro Horn˜iacek,
Anna Nedoluzhko, Gunta Nesˇpore-Be¯rzkalne, Lng
Nguy˜eˆn Thi., Huy`eˆn Nguy˜eˆn Thi. Minh, Yoshihiro
Nikaido, Vitaly Nikolaev, Rattima Nitisaroj, Alireza
Nourian, Hanna Nurmi, Stina Ojala, Atul Kr. Ojha,
Ade´dayo. Olu´o`kun, Mai Omura, Emeka
Onwuegbuzia, Petya Osenova, Robert O¨stling, Lilja
Øvree
se
quando
come
ma
dove
o
anche
perche´
poi
mentre
infatti
prima
pero`
invece
inoltre
tuttavia
quindi
ISDT</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Cristina</given-names>
            <surname>Bosco</surname>
          </string-name>
          , Simonetta Montemagni, and
          <string-name>
            <given-names>Maria</given-names>
            <surname>Simi</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Converting italian treebanks: Towards an italian stanford dependency treebank</article-title>
          .
          <source>In Proceedings of the ACL Linguistic Annotation Workshop &amp; Interoperability with Discourse.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Alessandra</given-names>
            <surname>Teresa</surname>
          </string-name>
          <string-name>
            <surname>Cignarella</surname>
          </string-name>
          , Cristina Bosco, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2019</year>
          . Presenting TWITTIR O`
          <article-title>-UD: An italian twitter treebank in universal dependencies</article-title>
          .
          <source>In Proceedings of the Fifth International Conference on Dependency Linguistics (Depling</source>
          , SyntaxFest
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Lorenzo De Mattei</surname>
            , Michele Cafagna, Felice Dell'Orletta,
            <given-names>Malvina</given-names>
          </string-name>
          <string-name>
            <surname>Nissim</surname>
            , and
            <given-names>Marco</given-names>
          </string-name>
          <string-name>
            <surname>Guerini</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Geppetto carves italian into a language model</article-title>
          . In CLiC-it.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Tullio De Mauro</surname>
            and
            <given-names>I</given-names>
          </string-name>
          <string-name>
            <surname>Chiari</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Il nuovo vocabolario di base della lingua italiana</article-title>
          . Internazionale.[
          <volume>28</volume>
          /11/2020]. https://www. internazionale. it/opinione/tullio-de-mauro/
          <year>2016</year>
          /12/23/ilnuovo-vocabolario
          <article-title>-di-base-della-lingua-italiana.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          .
          <source>In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          , Volume
          <volume>1</volume>
          (Long and Short Papers), pages
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Arthur C Graesser and Danielle S McNamara</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Computational analyses of multilevel discourse comprehension</article-title>
          .
          <source>Topics in cognitive science</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <fpage>371</fpage>
          -
          <lpage>398</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Ganesh</given-names>
            <surname>Jawahar</surname>
          </string-name>
          , Benoˆıt Sagot, and Djame´ Seddah.
          <year>2019</year>
          .
          <article-title>What does BERT learn about the structure of language? In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</article-title>
          , pages
          <fpage>3651</fpage>
          -
          <lpage>3657</lpage>
          , Florence, Italy, July. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Najoung</given-names>
            <surname>Kim</surname>
          </string-name>
          , Roma Patel, Adam Poliak, Alex Wang,
          <string-name>
            <surname>Patrick Xia</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Thomas</surname>
            <given-names>McCoy</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ian</given-names>
            <surname>Tenney</surname>
          </string-name>
          , Alexis Ross, Tal Linzen, Benjamin Van Durme,
          <string-name>
            <surname>Samuel R. Bowman</surname>
            , and
            <given-names>Ellie</given-names>
          </string-name>
          <string-name>
            <surname>Pavlick</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Probing what different nlp tasks teach machines about function word comprehension</article-title>
          .
          <source>In *SEMEVAL.</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Tal</given-names>
            <surname>Linzen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>Syntactic structure from deep learning</article-title>
          .
          <source>Annual Review of Linguistics</source>
          ,
          <volume>7</volume>
          :
          <fpage>195</fpage>
          -
          <lpage>212</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Computational Linguistics and Deep Learning</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>41</volume>
          (
          <issue>4</issue>
          ):
          <fpage>701</fpage>
          -
          <lpage>707</lpage>
          ,
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Alessio</given-names>
            <surname>Miaschi</surname>
          </string-name>
          , Chiara Alzetta, Dominique Brunato, Felice Dell'Orletta,
          <string-name>
            <given-names>and Giulia</given-names>
            <surname>Venturi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Is neural language model perplexity related to readability? In CLiC-it</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Alec</given-names>
            <surname>Radford</surname>
          </string-name>
          ,
          <string-name>
            <surname>Jeff Wu</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Child</surname>
            , David Luan,
            <given-names>Dario</given-names>
          </string-name>
          <string-name>
            <surname>Amodei</surname>
            , and
            <given-names>Ilya</given-names>
          </string-name>
          <string-name>
            <surname>Sutskever</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Language models are unsupervised multitask learners</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Swarnadeep</given-names>
            <surname>Saha</surname>
          </string-name>
          , Yixin Nie, and
          <string-name>
            <given-names>Mohit</given-names>
            <surname>Bansal</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>ConjNLI: Natural language inference over conjunctive sentences</article-title>
          .
          <source>In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          , pages
          <fpage>8240</fpage>
          -
          <lpage>8252</lpage>
          , Online, November. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <source>Ted JM Sanders and Leo GM Noordman</source>
          .
          <year>2000</year>
          .
          <article-title>The role of coherence relations and their linguistic markers in text processing</article-title>
          .
          <source>Discourse processes</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ):
          <fpage>37</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Manuela</given-names>
            <surname>Sanguinetti</surname>
          </string-name>
          , Cristina Bosco, Alberto Lavelli, Alessandro Mazzei, and
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Tamburini</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>PoSTWITA-UD: an Italian Twitter Treebank in universal dependencies</article-title>
          .
          <source>In Proceedings of the Eleventh Language Resources and Evaluation Conference (LREC</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Antonella</given-names>
            <surname>Sorace</surname>
          </string-name>
          and Frank Keller.
          <year>2005</year>
          .
          <article-title>Gradience in linguistic data</article-title>
          .
          <source>Lingua</source>
          ,
          <volume>115</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1497</fpage>
          -
          <lpage>1524</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Jon</given-names>
            <surname>Sprouse</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Continuous acceptability, categorical grammaticality, and experimental synta</article-title>
          .
          <source>Biolinguistics</source>
          , pages
          <fpage>1123</fpage>
          -
          <lpage>134</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Alex</given-names>
            <surname>Warstadt</surname>
          </string-name>
          , Alicia Parrish, Haokun Liu, Anhad Mohananey, Wei Peng,
          <string-name>
            <surname>Sheng-Fu</surname>
            <given-names>Wang</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Samuel</surname>
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Bowman</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>BLiMP: The benchmark of linguistic minimal pairs for English</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          ,
          <volume>8</volume>
          :
          <fpage>377</fpage>
          -
          <lpage>392</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Wolf</surname>
          </string-name>
          , Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Rush</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Transformers: State-of-the-art natural language processing</article-title>
          .
          <source>In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>
          , pages
          <fpage>38</fpage>
          -
          <lpage>45</lpage>
          , Online, October. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>