<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Self-Contained Italian Negation Test (SCIN)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viola Gullace</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Kletz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thierry Poibeau</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Lenci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Amsili</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CoLing Lab, Dipartimento di Filologia, Letteratura e Linguistica, Università di Pisa</institution>
          ,
          <addr-line>Via Santa Maria, Pisa, 56126</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>LLF, CNRS &amp; Université Paris Cité</institution>
          ,
          <addr-line>8 Rue Albert Einstein 75013 Paris</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Lattice, CNRS &amp; ENS-PSl &amp; U. Sorbonne-Nouvelle</institution>
          ,
          <addr-line>1 rue Maurice Arnoux F-92120 Montrouge</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Scuola Normale Superiore</institution>
          ,
          <addr-line>Piazza dei Cavalieri 7, Pisa, 56126</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Recent research has focused extensively on state-of-the-art pretrained language models, particularly those based on Transformer architectures, and how well they account for negation and other linguistic phenomena in various tasks. This study aims to evaluate the understanding of negation in Italian bert- and robert-based models, contrasting the predominant Englishfocused prior research. We develop the SCIN Set, an Italian dataset designed to model the influence of polarity constraints on models in a masked predictions task. Applying the SCIN Set reveals that these models do not adjust their behaviour based on sentences polarity, even when the resulting sentence is contradictory. We conclude that the tested models lack a clear understanding of how negation alters sentence meaning.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;negation</kwd>
        <kwd>Italian PLMs</kwd>
        <kwd>testing</kwd>
        <kwd>self-contained</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        • bert-base for Italian, both in its basic and
its XXL versions (bert-base-italian-cased,
bert-base-italian-xxl-cased)1 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
• m-bert (multilingual bert)2 [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
• alb3rt03 [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and
• UmBERTo4[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>Section 5 will discuss the results, followed by a final section containing our general conclusions and ideas for further research.</title>
        <p>2.2. The Self-Contained Neg Test</p>
      </sec>
      <sec id="sec-1-2">
        <title>The Self-Contained Neg Test, developed by Kletz et al.</title>
        <p>
          [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], is a set of pairs of sentences consisting of a context
(C) and a target (T) sentence, either positive (p) or
negative (n). The target sentence contains a masked position,
syntactically constrained to be filled by a verb (2).
(2)
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>Jessica is an architect who likes to dance. She isn’t happy to [MASK].</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <sec id="sec-2-1">
        <title>2.1. Efect of negation on the model’s prediction</title>
        <sec id="sec-2-1-1">
          <title>The instances are designed in such a way that a model</title>
          <p>
            Although negation plays an essential role in human com- that predicts (in the masked position of T) the last verb
munication, it appears to present challenges for PLMs. of C will produce a semantically well-formed paragraph
In recent years, much research has focused on this topic. only if C and T have the same polarity. For instance, in
(2), the context is positive (Cp), the target is negative
(Tn), and as a consequence a model predicting dance in
the masked position produces an ill-formed paragraph:
(3)
Kassner and Schütze [
            <xref ref-type="bibr" rid="ref7">7</xref>
            ] and Ettinger [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ] analyzed to
what extent Transformer-based language models’
predictions are sensitive to the presence or absence of negation In contrast, a CnTn version of (3) would accept the verb
in sentences involving factual knowledge, such as (1-a-b): dance in the same position:
#Jessica is an architect who likes to dance. She isn’t
happy to dance.
(1)
          </p>
          <p>Birds can [MASK].</p>
          <p>Birds cannot [MASK].</p>
          <p>(4)</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Jessica is an architect who doesn’t like to dance. She isn’t happy to dance.</title>
          <p>They found that in such pairs the top-1 predictions are To produce the sentences of the set, the pattern (5) is
unchanged most of the time: models do not seem to take taken as a starting point, where NAME and PRON are
into account the polarity of the environment (presence substituted with a proper noun and a compatible third
or absence of a negation in the surrounding sentence) to person pronoun, PRO is substituted with a profession
adapt their predictions. They concluded that models do name, and ACT is substituted with an action verb.
not deal correctly with negation.</p>
          <p>
            Gubelmann and Handschuh [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] criticized such studies, (5) NAME is a PROF who likes/doesn’t like to ACT.
noting in particular that the pragmatic component was PRON is/isn’t happy to [MASK].
overlooked in Ettinger’s experiments. They noted that
a statement containing a negation stating a false fact A large number of triplets (NAME, PRO, ACT) are tested
(tcpfhaoonatrnenneoaxttiabfaomrlelrypamslfitetaf,letlheBydeitr)rd.nusIenegcafabatnucinvtt,eouatnsvflytuaastsu)etamcnlauesnnmtatbb,teeemmrmaoekfoniwrtne(gospradilytas,utcBrsouiiubredll,des wvfooenirrtebhisnisestuatsacecnhlhfctmweh,ahoatedstnsehul,CemamaninndogddtethThle’asatorten(o6epb)sooattnrhheeapatporamseridoetiidrcveetelit’oa(sCnipnpiresTedtpdha)ice.rteHAioteCnhrTsee,
many of them with little association with the rest of the the triplet (Jessica, architect, dance) would be retained
sentence. This makes it challenging for any single word while the triplet (Luke, janitor, swim) would not.
to become the top prediction in the negative case. (6) a. Jessica is an architect who likes to dance. She
          </p>
          <p>
            Gubelmann and Handschuh [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] developed a more prag- is happy to dance.
matically informed test set, in which each instance is (in b. Luke is a janitor who likes to swim. He is
[
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]’s terms) self-contained. This means that each item happy to ski.
in the set includes some context information, allowing
direct evaluation of the model’s completion. Building Once triplets have been selected (the set of all triplets
on this work, [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ] developed the Self-Contained Neg Test, such that the ACT verb is repeated in CpTp instances),
which aimed to address some issues in the test set from CpTn and CnTp instances can be formed, and the
ex[
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] and more accurately determine the model’s handling pectation is that a model that “understands” negation
of negation without interference of world knowledge. should not predict the ACT verb in those cases since it
would lead to contradictory instances. As a control, two
additional confirgurations are considered: CnTn where it
is expected that the repetition of ACT is possible (though
1https://huggingface.co/dbmdz/bert-base-italian-xxl-cased
2https://huggingface.co/bert-base-multilingual-cased
3https://github.com/marcopoli/AlBERTo-it
4https://github.com/musixmatchresearch/umberto
not required), and CpTv in which an adverb (very) is
inserted in the positive target, which should not change
the preferred prediction of ACT since both sentences
are positive. The diferent configurations are illustrated
below.
(7)
          </p>
          <p>CpTp</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>CpTn</title>
        </sec>
        <sec id="sec-2-1-4">
          <title>CnTp</title>
        </sec>
        <sec id="sec-2-1-5">
          <title>CnTn</title>
        </sec>
        <sec id="sec-2-1-6">
          <title>CpTv</title>
          <p>Jessica is an architect who likes to dance.
She is happy to [MASK].</p>
          <p>Jessica is an architect who likes to dance.
She isn’t happy to [MASK].</p>
          <p>Jessica is an architect who doesn’t like to
dance. She is happy to [MASK].</p>
          <p>Jessica is an architect who doesn’t like to
dance. She isn’t happy to [MASK].
Jessica is an architect who likes to dance.</p>
          <p>She is very happy to [MASK].</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. SCIN construction</title>
      <sec id="sec-3-1">
        <title>In Italian, negation is most commonly expressed by the</title>
        <p>
          negative invariable proclitic non (not) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>It is this expression of negation that we use for the
Italian adaptation of the Self-Contained Neg Test that we
present in this section: the SCIN set.</p>
      </sec>
      <sec id="sec-3-2">
        <title>We choose instead to rely on the pair (9), involving a semantic inference relation. (9) ha l’abitudine di / molto spesso</title>
        <p>is used to / very often</p>
      </sec>
      <sec id="sec-3-3">
        <title>The final form of the SCIN set is available in table 1. The</title>
        <p>shape of the contexts is given in row 1, that of the targets
in row 2, and the test target Tv is added in row 3.</p>
        <p>Our assumption is that, if the model repeats the ACT
token in the CpTp configuration, it is proof that the model
has resolved the ha l’abitudine di / molto spesso inference.
When confronted with the CpTn or CnTp configuration,
the model should have the addition of the negation as the
only element that can explain the modification of its
predictions. Finally, the CpTv control allows us to check the
extent to which the addition of a diferent, non-negative
adverb in the sequence modifies the model’s predictions;
we can assume that any modification of greater
magnitude than that associated to CpTv are due to the influence
of negation.</p>
        <p>The complete list of new patterns is available in Table 1.</p>
        <sec id="sec-3-3-1">
          <title>3.2. Pattern selection</title>
          <p>3.1. Italian patterns The triplets (name, profession, verb) used for testing are
selected by testing them on the CpTp configuration: only
Following the preparation of the Self-Contained Neg Test, triplets leading to a repetition of the ACT token are
rewe collect a list of Italian verbs, professions and names tained (see Table 2). This ensures that only patterns for
that will be used to create the triplets to be tested. The which the model is already biased towards repetition are
verbs are taken from the Dizionario Italiano Sabatini Co- tested, and the model has to understand the influence of
letti 2022 (online version); only the intransitive (3138 negation on sentence semantics to reverse this tendency.
verbs) are retained; among these, for each of the tested All available triplets are tested, i.e. all configurations
models we further exclude the verbs that are not tok- between verbs monotokenized by the model, first names
enized as a single token. The selected names are the 100 and occupations selected in subsection 3.1. As
tokenizamost popular in Italy in 20245. Lastly, the professions are tion is model-dependent, the number of verbs tested is
taken from a site specializing in job searches in Italy6; not the same for each model: details are available in the
of those present on the site, only those consisting of a first row of table 3.
single word have been selected. The results of this test are available in table 3.</p>
          <p>The patterns cannot simply be a direct translation of The results are highly model-dependent: while the
English patterns into Italian. In fact, for the test to be bert-base-italian-cased model predicts the ACT token in
adequate for evaluating models, we need the masked almost 25% of cases, this is the case in only 0.03% of cases
position to be syntactically constrained to be a verb. This for alb3rt0.
would not be the case if we used a direct translation of
the original sentences: for example, the sequence (8) can
be completed with the token “questo” ( = PRON is happy 4. Testing
to do this).
(8)</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>NAME è un PROF che ama ACT. È felice di MASK.</title>
        <p>NAME is a PROF who loves to ACT. (PRON) is happy
to MASK.</p>
      </sec>
      <sec id="sec-3-5">
        <title>5https://www.nostrofiglio.it/gravidanza/nomi-per-bambini/</title>
        <p>i-100-nomi-per-bambini-piu-amati-dai-genitori-di-nostrofiglio-it
6https://www.wecanjob.it/pagina9_elenco-professioni.html
4.1. Setup</p>
      </sec>
      <sec id="sec-3-6">
        <title>Tests are performed as in Kletz et al. [11]. Contexts (C)</title>
        <p>and targets (T) are combined to create two test patterns
CpTn, CnTp; in addition to these two, the test includes
two control patterns CnTn and CpTv where the repetition
of the ACT verb is not contradictory.</p>
        <p>All selected triplets are then used to saturate the
patterns, and the resulting patterns are provided as inputs to
1
2
3
pol.
p
n
v</p>
        <p>NAME è un(a) PROF che ha l’abitudine di ACT.</p>
        <p>NAME is a PROF who is used to ACT-ing.</p>
        <p>NAME è un(a) PROF che non ha l’abitudine di ACT.</p>
        <p>NAME is a PROF who is not used to ACT-ing.</p>
        <p>T(arget)
PRON [MASK] molto spesso.</p>
        <p>PRON [MASK] often.</p>
        <p>PRON non [MASK] molto spesso.</p>
        <p>PRON doesn’t [MASK] often.</p>
        <p>PRON [MASK] davvero molto spesso.</p>
        <p>PRON [MASK] really often.</p>
        <p>Instantiated NAME/PROF:</p>
        <p>Jessica / Ballerina (Dancer)</p>
        <p>Tested verb: Fumare (To smoke)
Tested example: Jessica è una ballerina che
ha l’abitudine di fumare. Lei [MASK] spesso.</p>
        <p>definitively attribute this to its logical function, the
negation marker does exert a distinct influence.</p>
        <p>Nevertheless, it is important to emphasize the very
clear limitations of these results. Firstly, the drops never
exceed 25%, meaning that 75% of the times the model
Model pTorepd. 1 Retained? predicts a semantically prohibited token. On the other
b-b-italian-c fuma ✓ hand, with the exception of m-bert, all the models have a
b-b-italian-xxl-c fuma ✓ highe drop for the CnTn control than for the CnTp
conm-bert balla no ifguration, thus indicating that even though the models
alb3rt0 parla no have acquired a certain understanding of negation, this
remains superficial and does not, for example, clearly
inTable 2 clude an understanding of the positive value of a double
An example of selecting a triplet for testing. A negation.</p>
        <p>NAME/PROF/VERB triplet is used to saturate the CpTp A broader examination of the results reveals that while
ipnaptutetrtnooaf PSCLMIN..ITfhtheesemqoudeenlcperceodnictatiionns aismthaeskAaCnTdtioskuesne,dthaes the drops in CpTn and CnTp configurations increase
triplet is retained (indicated by the ✓ symbol). In the name together, the CnTn controls also show a corresponding
of the models given as examples, “b-b” means bert-base, “it” increase.
stands for italian and “c” for cased. Finally, the training corpus of the models seems to
have an influence on their performance. For
example, note that the alb3rt0 model is the model
obtainthe models. Predictions at masked positions are collected. ing the results least in line with our expectations, while</p>
        <p>We use drop as a measure of the models’ performance: bert-base-italian-xxl-cased and bert-base-italian-cased
for each pattern, given the rate  of repetitions of the Act had better drop values, with the former performing
betToken in the predictions, the drop is defined as 100 − . ter than the latter. However, these three models have
The higher the drop for the CpTn and CnTp patterns and identical numbers of layers, attention heads and hidden
the lower for the CnTn and CpTv controls, the better the sizes, the diference between them only consisting in
model has understood the negation. their training data. The alb3rt0 model was trained
exclusively on tweets, which likely limits the diversity of
4.2. Results and Discussion its data, particularly with respect negation. In contrast,
bert-base-italian-cased and bert-base-italian-xxl-cased
Results are shown in table 4. models were trained on more varied corpora, with the</p>
        <p>
          In contrast with the observations made by [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] and latter featuring a larger dataset.
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], the models are not insensitive to the presence of In the future, this should lead us to study the
correnegation in a sentence: all the models show a drop in both lation between the performance of the models and the
configurations CpTn and CnTp, showing an adaptation ifne-grained distribution of negative and afirmative
conof their predictions to the presence of a negation cue. texts in their training corpus.
        </p>
        <p>This observation is confirmed by the fact that the drops
in the CpTv control are always lower than those observed 5. Comparison with English
in CpTn or CnTp.</p>
        <p>This shows that simply adding an adverb is not
suficient to change the model’s predictions. While we cannot</p>
      </sec>
      <sec id="sec-3-7">
        <title>In this section we compare the results obtained with the SCIN Set with those observed by [2] in English.</title>
        <p># tested contexts
Repetitions
%
# retained contexts
Pattern
CpTn
CnTp
CnTn
CpTv
b-b-it-c</p>
        <p>The scale of the drops in the two articles is notably that negation modifies their predictions, but that this
very diferent: the maximum drop observed in Italian is does not happen consistently or in a way that is always
23% (CpTn m-bert), while in English it’s 82.8%. Similarly, coherent with the semantic efect that we expect negation
the CpTv drops of Italian-speaking models hardly exceed to have on sentences. These results suggest a strong need
15%, while those of English-speaking models are never to adapt these models to make them more sensitive to
less than 25%. negation and its semantic consequences.</p>
        <p>
          On the other hand, model architecture and type of Nevertheless, we also noted a fairly marked diference
training do not seem to have a major influence: Umberto in performance from one model to another, correlated
has the same architecture as roberta-base, but while the with the diferent corpora used to train them. We thus
latter is the best performing model in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], the former’s suggest that a lexical and statistical study of these corpora
drops are the lowest for all configurations of the SCIN could shed further light on the behavior of the models.
Set. Conversely, the other Italian models are built with Lastly, it would be interesting to compare these results
the same architecture as bert-base-cased, i.e. the worst with the performance of generative models, in order to
performing model for English; however, even the worst study the relative importance of the number of model
performing Italian model, namely alb3rt0, features higher parameters in relation to their architecture.
drops than bert-base-cased. This confirms the
observation from the previous section, that while architecture is
indeed a limiting criterion, training data probably plays Acknowledgments
a significant role.
        </p>
        <p>In general, we note that none of these models, neither
for Italian nor for English, shows definitive drops
compatible with a full understanding of the semantic constraints
of negation.</p>
      </sec>
      <sec id="sec-3-8">
        <title>We would like to express our gratitude to Marie Candito</title>
        <p>for her valuable assistance and guidance throughout the
course of this study.</p>
        <p>
          This work was funded in part by the French
government under management of Agence Nationale de
la Recherche as part of the “Investissements d’avenir"
6. Conclusion program, reference ANR-19-P3IA0001 (PRAIRIE 3IA
Institute). This research was also partially funded by
In this paper, we investigated the ability of several Italian the Labex EFL (ANR-10-LABX-0083) and by PNRR–
PLMs to take negation into account in their predictions. M4C2–Investimento 1.3, Partenariato Esteso
PE00000013To do this, we adapted to Italian the Self-Contained Neg –“FAIR—Future Artificial Intelligence Research”–Spoke 1
Test proposed by Kletz et al. [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which is based on mini- “Human-centered AI,” funded by the European
Commismal pairs of aligned sentences. sion under the NextGeneration EU programme.
        </p>
        <p>Applying this test to six models enabled us to show</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>A. Verb statistics by PLM</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L. R.</given-names>
            <surname>Horn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wansing</surname>
          </string-name>
          , Negation, in: E. N. Zalta,
          <article-title>Details of the number of monotokenised intransitive U</article-title>
          . Nodelman (Eds.),
          <article-title>The Stanford Encyclopedia of verbs available for each PLM tested are</article-title>
          available in taPhilosophy, Winter 2022 ed.,
          <source>Metaphysics Research ble 5</source>
          . Lab, Stanford University,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kletz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Amsili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Candito</surname>
          </string-name>
          ,
          <article-title>The self-contained model monotokenized verbs negation test set</article-title>
          , in: Y.
          <string-name>
            <surname>Belinkov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Hao</surname>
          </string-name>
          , J. Jumelet, bert
          <article-title>-base-italian-cased 294</article-title>
          <string-name>
            <given-names>N.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>McCarthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Mohebbi</surname>
          </string-name>
          (Eds.),
          <article-title>Proceed- bert-base-italian-xxl-cased 294 ings of the 6th BlackboxNLP Workshop: Analyzing m-bert 39 and Interpreting Neural Networks for NLP, Asso- alb3rt0 940 ciation for Computational Linguistics</article-title>
          , Singapore,
          <source>UmBERTo 14</source>
          <year>2023</year>
          , pp.
          <fpage>212</fpage>
          -
          <lpage>221</lpage>
          . URL: https://aclanthology.org/ Table 5 2023.blackboxnlp-
          <volume>1</volume>
          .16. doi:
          <volume>10</volume>
          .18653/v1/2023.
          <article-title>Detail of the number of Italian intransitive verbs tokenised as blackboxnlp-1.16. a single token for each of the Italian models tested</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Schweter</surname>
          </string-name>
          ,
          <source>Italian bert and electra models</source>
          ,
          <year>2020</year>
          . URL: https://doi.org/10.5281/zenodo.4263142. doi:
          <volume>10</volume>
          .5281/zenodo.4263142.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Devlin</surname>
          </string-name>
          , M.-
          <string-name>
            <given-names>W.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          ,
          <article-title>BERT: pre-training of deep bidirectional transformers for language understanding</article-title>
          , CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). URL: http://arxiv.org/abs/
          <year>1810</year>
          .04805. arXiv:
          <year>1810</year>
          .04805.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Basile</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis, G. Semeraro,
          <article-title>AlBERTo: Modeling italian social media language with bert</article-title>
          ,
          <source>IJCoL</source>
          <volume>25</volume>
          (
          <year>1984</year>
          )
          <fpage>11</fpage>
          -
          <lpage>31</lpage>
          . URL: https://doi.org/10.4000/ijcol.472.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Parisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Francia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Magnani</surname>
          </string-name>
          ,
          <article-title>Umberto: an italian language model trained with whole word masking</article-title>
          , https://github.com/musixmatchresearch/ umberto,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Kassner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Schütze</surname>
          </string-name>
          ,
          <article-title>Negated and misprimed probes for pretrained language models: Birds can talk, but can+not fly (</article-title>
          <year>2020</year>
          ). URL: https:// aclanthology.org/
          <year>2020</year>
          .acl-main.
          <volume>698</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ettinger</surname>
          </string-name>
          ,
          <article-title>What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models</article-title>
          ,
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>8</volume>
          (
          <year>2019</year>
          )
          <fpage>34</fpage>
          -
          <lpage>48</lpage>
          . URL: https://doi.org/10.1162/tacl_a_
          <fpage>00298</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>R.</given-names>
            <surname>Gubelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Handschuh</surname>
          </string-name>
          ,
          <article-title>Context matters: A pragmatic study of PLMs' negation understanding</article-title>
          ,
          <source>in: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Dublin, Ireland,
          <year>2022</year>
          , p.
          <fpage>4602</fpage>
          -
          <lpage>4621</lpage>
          . URL: https://aclanthology.org/
          <year>2022</year>
          .
          <article-title>acl-long</article-title>
          .
          <volume>315</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>L.</given-names>
            <surname>Renzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. G.</given-names>
            <surname>Salvi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cardinaletti</surname>
          </string-name>
          ,
          <article-title>Grande grammatica italiana di consultazione</article-title>
          , volume
          <volume>2</volume>
          ,
          <string-name>
            <surname>Il</surname>
            <given-names>Mulino</given-names>
          </string-name>
          , Bologna,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D.</given-names>
            <surname>Kletz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Candito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Amsili</surname>
          </string-name>
          ,
          <article-title>Probing structural constraints of negation in pretrained language models</article-title>
          ,
          <source>in: The 24rd Nordic Conference on Computational Linguistics</source>
          ,
          <year>2023</year>
          . URL: https://openreview. net/forum?id=_7VPETQwnPX.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>