<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards an Italian Corpus for Implicit Object Completion</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Agnese Daffara</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elisabetta Jezek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Pavia</institution>
          ,
          <addr-line>Corso Strada Nuova, 65, Pavia</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>This study centers on the creation of an Italian corpus designed for the task of Implicit Object Completion. In this corpus, every sentence contains a token [MASK] denoting the position of the Object's head, along with the annotation of a Gold Standard filler word. The completion of the Object is conceived as a masked word task, theoretically executable by a BERT-based transformer model. In the next phase of the project, this task will be applied to a range of Italian language models, and their performance will be assessed. Overall, this project seeks to offer insights into the capabilities and constraints of such models in successfully completing Implicit Objects within various contexts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;BERT</kwd>
        <kwd>Implicit Object</kwd>
        <kwd>masked word</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>When coming across the verb-argument structure of a
sentence, individuals have the cognitive ability to
comprehend its meaning by forming a semantic
representation of the situation in their minds. Even in
cases where one argument is implicit, they are still
capable of understanding the overall sense, thanks to
the verb's inherent lexical meaning and the
neighbouring words. The Distributional Hypothesis,
as proposed by Harris (1954) and Firth (1951),
suggests that it is possible to infer the meaning of a
word purely on the basis of the context.</p>
      <p>In the field of Natural Language Understanding,
Artificial Intelligence must replicate this ability in
order to reconstruct the scenario of the event,
specifically identifying its semantic participants.
Given the requirement for a computational model to
fill in the missing information, we propose that this
task can be conceived and construed as a masked
word completion task, for which transformer-based
technologies such as BERT (Devlin et al., 2019) have
proven to be the most suitable.</p>
      <p>This paper focuses on building an Italian corpus
for this specific purpose while hinting at the same
time at the forthcoming work of evaluation.</p>
      <p>The corpus centers on verbs that exhibit an
Optional Object, i.e. an Object that can be Implicit or
Explicit. The ontological set of verbs on which the
corpus is constructed is presented in section 4.
Following these verbs’ ambivalent possibility of
expressing or implying the Argument, the corpus is
divided into two datasets: on one side, an IMPLICIT
dataset of sentences with Implicit Objects; on the
other side, with a contrastive role, an EXPLICIT
dataset of sentences containing Explicit Objects.</p>
      <p>Our decision to create two different datasets is
motivated by the idea of observing the differences in
the performance of the models: do they perform
better when the original Object is Implicit? This issue
is grounded in the findings of a prior guide
experiment conducted by Ye et al. (2020); according
to their results, the model's performance would be
notably improved when fine-tuned on an IMPLICIT
dataset, because of the greater richness of contextual
information available. We want to investigate if such
observations can be generalized to our experiment.</p>
      <p>Regarding the annotation of the masked word, the
two datasets are treated differently. In the IMPLICIT
dataset, we inserted two [MASK] tokens right after
the verb or the adverb, and allowed the model to
generate either a single Noun, as in Table 1, sentence
1., or, if not found, a Noun Phrase (NP) consisting of a
Determiner plus a Noun. Further explanation of this
possibility can be found in section 5. Furthermore, a
Gold Standard (GS) Noun representing the optimal
completion of the Object’s head position was
annotated aside each sentence, together with the type
of omission (see section 2 for theoretical references).</p>
      <p>In the EXPLICIT dataset, on the other hand, we
removed the Explicit Object’s nominal head,
consisting of one word, and we annotated it as a GS.
Two examples of annotation for sentences 1. and 2.,
belonging respectively to the IMPLICIT and the
EXPLICIT dataset, are provided in Table 1.</p>
      <p>1. Da quel 26 dicembre non vuole più bere
[MASK][MASK] né lavarsi. ‘Since that
December 26, they no longer want to drink
[MASK][MASK], nor wash themselves.’
2. Infilo la fetta velocemente nel sacchetto
delle fragole e tiro un sospiro, bevo un
[MASK] di caffè. ‘I slide the slice quickly into
the strawberry bag and let out a sigh, I drink
a [MASK] of coffee.’</p>
      <p>As referenced above, the corpus will allow us to
undertake a selection of BERT Italian models and
systematically evaluate their performance on the task
of Implicit Object Completion, which we define in
section 6.</p>
      <p>We firmly believe that an annotated Italian
dataset containing masked Optional Objects, their
categorisation and their corresponding Gold Standard
completions, as well as the subsequent experiment
and evaluation, will greatly contribute to research
endeavors in the field of NLP.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Previous computational works have primarily
focused on the task of Implicit Argument Detection
rather than on the mere completion of a masked
Object. SemEval 2010 task 10 (Ruppenhofer et al.,
2009) introduced a variety of approaches aimed at
detecting the semantic participants to the event,
specifically identifying the Null Instantiations of the
Arguments. The term Null Instantiation was
introdu
        <xref ref-type="bibr" rid="ref5">ced by Fillmore (1982</xref>
        ) within the theory of
Frame Semantics. In fact, most of the proposals relied
on this theoretical background, adopting as a starting
point the Framenet dataset and annotation.
      </p>
      <p>While the rise of transformer-based models has
brought significant improvement for this task, as
shown, for example, by Zhang et al. (2020), it still
remains an interesting and challenging issue.</p>
      <p>For what concerns the Italian language, we
identified a potential gap in the literature on the
computational detection and processing of Implicit
Arguments, probably due to the lack of annotated
corpora designed for this task. It is, therefore, of
utmost importance to investigate this topic and create
new computational suitable resources.</p>
      <p>
        Significant progress has been made in the training
of BERT-based Italian models, including AlBERTo
(Polignano et al., 2019), UmBERTo
        <xref ref-type="bibr" rid="ref12">(Parisi et al.,
2020)</xref>
        , GilBERTo (Ravasio and Di Perna, 2020), and
the distilled Italian version of DistilBERT called
BERTino
        <xref ref-type="bibr" rid="ref11">(Muffo and Bertino, 2020)</xref>
        . Thanks to the
availability of this generation of open-source BERT
models, the masked word task has been applied to a
variety of different linguistic and cognitive topics,
such as the study of Agentivity and Telicity (Lombardi
and Lenci, 2021) or connectives
        <xref ref-type="bibr" rid="ref1">(Albertin et al.,
2021)</xref>
        . In particular, our study consistently builds
upon the prior application of the masked word Task
to the semantic topic of Logical Metonymy by Ye et al.
(2020).
      </p>
      <p>
        The existing linguistic literature has extensively
explored the concept of Implicit Argument and the
phenomenon of Argument omission in Italian.
Notably, Cennamo (2017) proposed a meticulous
comparative analysis of the parameters involved in
this process. In our study, we adopt the notion of
“Defaulting”, first introduced by Pustejovsky (1995)
and furth
        <xref ref-type="bibr" rid="ref9">er refined by Jezek (2018</xref>
        ). Following
Fillmore’s distinction between Definite and Indefinite
Null Instantiation, we delineate Pragmatic Defaulting
(PD) as the omission of the Object based on
contextual cues and Lexical Defaulting (LD) as the
omission of the Object licenced by the core meaning
of the verb. Overall, it is undeniable that both the
contextual cues and the semantics encoded into the
verb contribute to the possibility of implying and
reconstructing an Argument, and we believe it is
necessary to consider this difference when studying
Implicit Objects.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Data Preparation</title>
      <p>
        As a first step towards the corpus preparation, we
established a set of 30 verbs that allow for their
Object to remain implied. We refer to this set with the
term ‘ontology’ since it contains the basic verbal
structures of reference for the building of the corpus.
Our selection of such verbal structures draws upon
the resource T-PAS
        <xref ref-type="bibr" rid="ref10">(Jezek et al., 2014)</xref>
        , a repository of
Typed Predicate-Argument Structures (T-PAS) which
was developed at the University of Pavia in
collaboration with the Bruno Kessler Foundation in
Trento (I) and the Masaryk University in Brno (CZ) by
adopting a corpus-driven methodology.
      </p>
      <p>Each pattern in T-PAS corresponds to a distinct
contextual meaning of the verb (Predicate), plus the
list of all the possible semantic participants to the
event associated with that specific meaning
(Arguments). Notably, T-PAS not only captures
information concerning the syntactic structure but
also provides insights into the semantic types of the
Arguments. This resource is a valuable foundation for
our data collection, as it also annotates (in round
brackets) the potential for exhibiting an Implicit
Argument for each structure. An example of three
patterns displayed on the online T-PAS website for
the verb ‘bere’ (‘drink’) (including a metonymic use)
is given in Figure 1.</p>
      <p>
        From the comprehensive dataset of patterns
available, we first identified the ones containing one
or more Optional Arguments, using a simple RegEx
match search to detect round brackets. Afterwards,
we isolated ‘fundamental’ verbs (verbi fondamentali)
according to the Nuovo Vocabolario di Base della
Lingua Italiana (NVdB)
        <xref ref-type="bibr" rid="ref4">(De Mauro, 2016)</xref>
        . These
particular verbs were chosen due to their presence in
90% of Italian texts, making them a suitable
representative set for constructing an ontology.
      </p>
      <p>We then conducted a cleansing process that
excluded causatives, passives, and idiomatic
expressions, as well as other multiword expressions
or subpatterns with relatively infrequent occurrences
and less common meanings. We constantly consulted
the online version of the NVdB to decide whether a
structure was fundamental or not. This cleansing
process finally yielded a comprehensive list of 324
patterns with an Optional Argument, spanning across
213 distinct verb types.</p>
      <p>We finally proceeded to further narrow down our
focus, isolating the structures with an Optional Object.
The final ontology comprehends 30 different verbs,
corresponding to 60 T-PAS patterns and over 50
possible Object’s Semantic Types, which represents a
consistent variety. The detailed list is provided in
Appendix A, whereas a summary of the quantities of
verbs and patterns contained in T-PAS can be found
in Table 2.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Data Collection</title>
      <p>
        After the assessment of the ontology, our attention
turned to collecting the sentences. The resource of
reference is the T-PAS dataset, comprising 252,943
Manually Annotated Corpus Instances. All the
sentences were selected from the It-Wac reduced
corpus
        <xref ref-type="bibr" rid="ref2">(Baroni and Kilgarriff, 2006)</xref>
        and annotated
with the corresponding T-PAS number, denoting the
specific semantic pattern being used in that sentence.
For example, as shown in Figure 1, when the verb
‘bere’ (‘drink’) was used with a metonymic Object,
like ‘sorso’ (‘sip’), it was tagged with T-PAS number
1m, while, when it was found without an Object and
implying an alcoholic drink, it was tagged with T-PAS
number 2.
      </p>
      <p>After isolating the T-PAS structures contained in
our ontology, we proceeded to manually select the
sentences for the corpus. We removed those with a
Noun as an Object and preferred those with a linear
order (the Object following the Noun). In our pursuit
of a more extensive and diverse dataset, especially
concerning the variety of Objects, we also conducted
searches in the whole It-Wac reduced corpus through
the Sketch Engine online platform (Kilgarriff et al.,
2014). Eventually, 40 sentences were selected for
each verb. The resulting 1200 sentences are divided
into the two datasets, each containing 600 sentences,
as illustrated in Table 3.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Annotation</title>
      <p>The annotation process was handled differently for
the IMPLICIT and the EXPLICIT dataset.</p>
      <p>In the IMPLICIT dataset, the token [MASK] was
manually inserted after the verb or the adverbial
modifier, in order to signal the position to be filled by
the model. However, we've observed that when the
model encounters only one masked word, it tends to
generate either mass Nouns or plural Nouns due to
the lack of a Determiner. This presents a significant
limitation in the evaluation process. The chosen
approach involves the following steps: 1. Initially,
annotate two [MASK] tokens to indicate the positions
of the Determiner and the Noun. 2. Instruct the model
to first look for a Noun for the second position. (3) If a
Noun is not generated, proceed to search for a
Determiner. (4) Generate a new sentence containing
that Determiner, which will later be filled with a
Noun.</p>
      <p>As a following step in the annotation of the
IMPLICIT dataset, the GS word constituting the
optimal Object’s head was manually inserted on the
basis of the pragmatic context and the strength of the
possible collocations. This value, achieved using the
LogDice metric, can be obtained by querying Sketch
Engine on the ItTenTen20 Italian corpus1, as shown in
Figure 2.</p>
      <p>
        The last step of the IMPLICIT dataset’s annotation
regards the type of omission. As already mentioned,
we adopted the classification propos
        <xref ref-type="bibr" rid="ref9">ed by Jezek
(2018</xref>
        ) following Pustejovsky (1995) between
Pragmatic and Lexical Defaulting. This categorization
serves as a valuable tool during the final evaluation of
the model, enabling an assessment of its performance
across different kinds of omission.
      </p>
      <p>For what concerns the EXPLICIT dataset, the Object’s
head was manually detected, removed and replaced
by the token [MASK]. Subsequently, it was annotated
aside the sentence. Note that by removing just a single
word, these sentences retain their rich syntactic
context, displaying the modifiers of the removed
word. Such cues may improve the models' ability to
detect the original filler. As the EXPLICIT dataset
primarily has a contrastive function, we anticipate
that comparing results from both datasets will help
determine whether the model's output is closer to the
original when it receives a significant amount of
1
https://www.sketchengine.eu/ittenten-italiancorpus/
syntactic information or, conversely, when the
context is semantically richer, as seen in Implicit
Object sentences.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Task Definition</title>
      <p>We define Implicit Object Completion as the task of
substituting the masked Object in a sentence,
previously marked with the token [MASK], with the
most appropriate word or filler. When tested on each
sentence, the transformer model is expected to
produce the word that best fits the context of the
sentence.</p>
      <p>However, alternative outputs are possible,
potentially encompassing other Parts of Speech. As an
example, we employed the online demo of
bert-baseitalian-cased, made accessible by the MDZ Digital
Library team (dbmdz) at the Bavarian State Library
on Hugging Face2. The model generated the most
probable candidates for sentence 1.. Predictably, the
first output was the punctuation sign "," and the
expected Nouns were found in lower positions, as
depicted in Figure 3. In order to mitigate this issue
and ensure more accurate results, during the model’s
interrogation, we implemented a two-step filter that
isolates Nouns. In particular, we exclusively
considered the Noun with the highest probability
score.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Evaluation</title>
      <p>An issue in the design of the task is the possibility of
getting a synonym or a word that is only partially
correct or doesn’t perfectly align with the Gold
Standard.</p>
      <p>
        The Theory of Prototypes, as proposed firstly by
Rosch (1973), posits that within a semantic category,
certain members are more representative of the
category's core meaning. In contrast, less central
members demonstrate greater variability and may
deviate further from the core concept. By taking into
consideration both the Theory of Prototypes and the
Distributional Hypothesis (cited in section 1), during
the evaluation phase, we will systematically calculate
the similarity score (sim) between the output word
and the Gold Standard completion, corresponding to
the cosine between the two word vectors. This value
will be obtained by running the Python library SpaCy
        <xref ref-type="bibr" rid="ref3 ref8">(Honnibal and Montani, 2017)</xref>
        on the Italian model
it_core_news_lg, a large language model with a size of
541 MB3.
      </p>
      <p>An example of output of the model
bert-baseitalian-xxl-cased (bert-base-xxl), the bigger version of
bert-base-italian-cased from dbmdz, and its relative
annotation for sentence 1. and 2. is shown in Table 4.
With these annotation parameters, we aim to extend
our linguistic analysis beyond the model’s ability to
complete the cloze test by providing the right word.
Instead, we will also investigate the model’s capability
to effectively cluster words within the same domain.
8. Discussion and Results
This paper discusses the ongoing construction of a
corpus specifically tailored for the task of Implicit
Object Completion. This resource contains sentences
exhibiting both Implicit and Explicit Objects, thus
enabling the assessment of two distinct datasets that
will be treated separately.</p>
      <p>In the IMPLICIT dataset, the position for the
Noun/NP is signaled by manually inserting two
tokens [MASK] right after the verb or the adverb. The
GS Object’s head is manually added, considering both
the context of the sentence and the general strength
of the verb-Object collocation, which can be
quantified through the typicality score on the
ItTenTen20 corpus (Jakubíček et al., 2013) using
Sketch Engine. Additionally, in the case of Implicit
Objects, we provide information about the type of
omission, which may either depend on the contextual
cues (Pragmatic Defaulting) or the lexical verbal root
(Lexical Defaulting). On the other hand, the
annotations within the EXPLICIT dataset include the
manual identification of the Object’s nominal head,
which is substituted with the token [MASK] and
annotated aside the sentence.</p>
      <p>The forthcoming second phase of our project
involves an in-depth analysis of the outputs generated
by a selection of the primary BERT Italian models. As
a metric for the evaluation, we will adopt cosine
similarity. This value measures the similarity between
the output word provided by the model and the GS
word, thus measuring the ability of the model to
generate a filler which is semantically close to the
original. As an example of a comparison between two
models, consider the results of bert-base-xxl and
umberto-commoncrawl-cased-v1 (UmBERTo), on
sentence 2., which are reported in Table 5.</p>
      <p>2
https://huggingface.co/dbmdz/bert-baseitalian-cased
3
https://github.com/explosion/spacymodels/releases/tag/it_core_news_lg-3.7.0</p>
      <p>GS_obj
_head
sorso
‘sip’
bertbasexxl
bicchie
re ‘cup’
sim</p>
      <p>UmBERTo
0.62
cucchiaino
‘teaspoon’
sim
0.4
Bert-base-xxl returns a slightly higher score, as the
vectors of ‘bicchiere’ (‘cup’) and ‘sorso’ (‘sip’) have an
higher cosine similarity than those of ‘cucchiaino’
(‘teaspoon’) and ‘sorso’ (‘sip’). Although both the
models fail to understand the exact word and
categorize the filler as a [CONTAINER] rather than a
[QUANTITY], both the results are satisfactory and
plausible. More results for the EXPLICIT dataset can
be found in Appendix B.</p>
      <p>In conclusion, we expect the results to raise a
number of theoretical questions and possible
investigations. By conducting this analyses, we will
compare the models' performance on a novel topic
and investigate their ability to identify the semantic
category of the Objects, while effectively clustering
words within the same domain. In addition, the
annotation of the type of omission will allow further
insights on the importance of the context in
reconstructing Implicit Objects.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The authors gratefully acknowledge the contributions
of the anonymous reviewer for CLiC-it 2023. Their
insights and feedback have significantly enhanced the
quality of this paper.</p>
      <p>Upon completion of the corpus, the complete
dataset will be hosted on a public GitHub repository
in accordance with the FAIR principles.</p>
      <p>G. Ravasio, L. Di Perna, GilBERTo: An Italian
pretrained language model based on RoBERTa. URL:
https://github.com/idb-ita/GilBERTo
E. Rosch, Cognitive representations of semantic
categories, Journal of Experimental Psychology:
General, 104,3 (1975) 192-233.</p>
      <p>Appendix A: Ontology of selected verbs and patterns from T-PAS
verb
ascoltare
attendere
bere
cantare
chiamare
combattere
condurre
consumare
correre
cucinare
dirigere
disegnare
fumare
giocare
guadagnare
guidare
leggere
3
4
1
1
[Human] | [Institution] attendere ([Event]) | (che [Event])
[Human] | [Institution] attendere ([Human2] | [Vehicle] | [Time Point {data}] | [Document {visto |
passaporto}])
[Animate] bere ([Beverage {birra | caffè | tè | bibita | bevanda | aperitivo | cocktail | liquore | vino |
acqua | latte | grappino | birretta | spritz | mojito | birrozza | tisana | cappuccino | cioccolata |
whisky | vodka | rum | rhum | cognac | pozione | elisir | sangue | liquido | acqua}])
[Human] bere ([Container {bicchiere | bottiglia}] | [Business Enterprise = Producer] | [Quantity {sorso
| goccio}])
[Human] bere ([Alcool])
[Human] cantare
[Human] cantare
[Human] cantare ([Musical Composition {canzone | canto | inno | brano | testo | salmo}])
6
6m
[Human1] chiamare ([Human2] | [Institution {polizia}])
[Human] chiamare ([Number] | [Device {telefono}] | [Location {call center}] | [Vehicle {ambulanza}])
[Human] combattere ([War {guerra | battaglia}])
[Human1] | [Human Group1] combatte ([War]) (con|contro [Human2] | con|contro [Human Group2])
[Human] condurre ([TV Program])
[Human] | [Human Group] | [Machine] | [Device] consumare ([Energy] | [Gas] | [Inanimate])
[Human = Runner | Pilot] correre ([Competition {maratona | palio | rally}])
[Human] cucinare ([Food] | [Meal {pranzo | cena}])
[Human] cucinare
[Human] dirigere ([Musical Performance {concerto}])
[Human = Director] dirigere ([Movie])
[Human] disegnare ([Image] | [Physical Entity])
[Human] disegnare ([Inanimate])
[Human] disegnare ([Document {fumetto | comics | copertina}])
[Human] disegnare
[Human] fumare ([Drug {sigaretta | pipa | sigaro | marijuana}])
[Human] fumare
[Human] giocare
[Human] | [Human Group = Team] giocare ([Competition {partita}] | {mano | set | stagione | tempo})
[Human] guadagnare ([Money])
[Human] guidare ([Road Vehicle])
[Human] leggere
[Human] leggere ([Document])
[Human1] leggere ([Document]) (a [Human2])
mangiare
ordinare
pagare
perdere
pregare
preoccupare
provare
respirare
scrivere
servire
suonare
tirare
vincere
2
3
[Human] mangiare ([Food {cibo | carne | pane | uovo | pizza | panino | gelato | biscotto | torta |
bistecca | hamburger | salsiccia | salame | polpetta | frutta | mela | verdura | banana | riso | patata
| carota | formaggio | minestra | insalata | polenta | zuppa | antipasto | spaghetto | pasta | patatina
| panettone | brioche | piadina | cornetto | focaccia | pasticcino | pappa | pasto | biada}])
[Human] mangiare ([Food] {cibo})
[Human] mangiare ([Meal])
[Human] ordinare ([Artifact])
[Human] ordinare ([Food] | [Beverage] | [Meal])
[Human] pagare ([Abstract Entity {conseguenza | debito | errore}])
[Human] | [Human Group] perdere ([Competition])
[Human] pregare ([Deity])
[Human1] | [Institution] pregare ([Deity]) (per [Human2])
[Human1] pregare ([Human2]) di [Activity)
[Anything] preoccupare ([Human])
[Human = Artist] | [Human Group = Artist] provare ([Artwork])
[Animate] respirare ([Vapor])
[Human] scrivere
[Human] scrivere ([Part of Language])
[Human] scrivere ([Document])
[Human] scrivere ([Document]) (a|per [Human2])
[Human = Writer] scrivere
[Human] servire ([Food] | [Meal]) (Manner)
[Human1 = Waiter] servire ([Food] | [Meal]) a [Human2 = Customer])
[Human] suonare ([Musical Instrument])
[Human] suonare
[Human = Artist] suonare ([Musical Composition {canzone | brano | pezzo | concerto}] | {musica})
[Human] suonare ({il campanello} | {il citofono}) | (alla {porta})
[Human = Football Player] tirare ([Ball])
[Human] | [Human Group] vincere ([Activity {gara | competizione | festival | elezioni}] | [War])
Tutto meno che restare a guardare la televisione a bere [MASK] e
divorare patatine.
[birra]
Giganti americani del valore di Theodore Dreiser, Ernest Hemingway e [litri]
Thornton Wilder, quando erano stanchi della routine andavano nei
locali a bere [MASK] di whisky e a sentire la grande musica per
ricercare la giusta ispirazione.</p>
      <p>Grazie anche per avermi fatto bere l' [MASK] di barbabietola e per
avermi fatto svegliare tutte le mattine alle 6 per prendere la pappa
reale...</p>
      <p>Bere due [MASK] di latte di soia o mangiare una tazza di tofu è
responsabile di livelli ematici di Isoflavoni che possono essere 500 o
1000 volte più elevati dei normali livelli di Estrogeni nelle donne.</p>
      <p>Da circa 3 settimane Federico ha cominciato a bere [MASK]
parzialmente scremato alta qualità e sin dai primi giorni ha mostrato di
gradire il nuovo alimento.</p>
      <p>La disposizione potrebbe essere utile anche nelle altre stagioni
dell'anno, specie se si vieta ai minori di bere [MASK].</p>
      <p>Bevi ogni giorno [MASK] in abbondanza è infatti la quinta regola della
sana alimentazione, che invita a bere, ma rigorosamente acqua e non
altre bevande, frequentemente e in piccole quantità.</p>
      <p>Per il pubblico in generale e per i giovani studenti c'è come al solito il
padiglione 9 aperto gratuitamente in cui si può fare shopping, bere del
[MASK] e rilassarsi oppure informarsi su ciò che accade nella fiera.</p>
      <p>Pruneddu, che forse aveva bevuto qualche [MASK] di troppo prima di [bicchiere]
affacciarsi sulla porta del bar, non si è accorto che il proprietario e le
altre persone presenti avevano organizzato una castagnata per stare
insieme a bere un po'di vino.</p>
      <p>Ma io non bevo [MASK] e gioco a freccette mentre dico parolacce.</p>
      <p>Davanti al Castello c'è il Ritz, dove Mordecai e Florence spesso
andavano a bere un [MASK].</p>
      <p>E poi pensa un po’ che GiPo ormai non può più dir niente perchè ha
bevuto la [MASK] ed è morto.</p>
      <p>Dopo aver recuperato un maglione e bevuto un buon [MASK] al
cardamomo, rientriamo in chiesa per l'ora del silenzio.</p>
      <p>Continuai a bere in silenzio il [MASK], mentre il sole che tramontava
tingeva di rosso il cielo.</p>
      <p>Gli uruguayani, come gli argentini, bevono moltissimo [MASK], un the [mate]
fatto con le foglie secche della pianta omonima, sorseggiato da una
piccola zucca attraverso una cannuccia di metallo, la bombilla.</p>
      <p>Appendix Β: Example of results for the EXPLICIT dataset
The following table reports an example of the outputs of two models, italian-ΒΕRT-xxl-cased and UmBERTo. The models
were tested on the 20 sentences from the EXPLICIT dataset with the verb ‘bere’ (‘drink’). The column ‘sent’ displays the
sentence with the masked word, corresponding to the Object’s NP’s head. The removed word is shown in the column
‘GS_obj_head’. The columns ‘bert-base-xxl’ and ‘UmBERTo’ report the outputs of the models. The similarity score between
the output word and the GS word is shown in the columns ‘sim’.
sent
La vita umana andrebbe rispettata, ma non sentirti mai in colpa di
[sangue]
[tazze]
[latte]
[alcool]
[acqua]
[vino]
[alcolici]
[drink]
[cicuta]
[caffè]
[the]
1
0,14
1
1
1
1
0,54
0,61
0,33
0,77
0,16
0,32
[estratto]
acqua
0,36 acqua</p>
      <p>0,36
bicchieri
0,64 bicchieri</p>
      <p>0,64
0,69 alcolici</p>
      <p>0,69
bicchiere</p>
      <p>bicchiere
birra
litri
latte
alcolici
acqua
vino
alcolici
caffè
birra
caffè
vino
tè
cibo
1
1
1
1
1
1
1
1
birra
fiumi
latte
acqua
caffè
alcolici
0,61 caffè
0,33 birra
tè
vino
0,16
0,34 caffè
bere [MASK] umano: è una cosa naturale.</p>
      <p>Infilo la fetta velocemente nel sacchetto delle fragole e tiro un sospiro, [sorso]
bevo un [MASK] di caffè.</p>
      <p>Questi giovani, ci scommetterei, han bevuto [MASK] ed ascoltato
musica rock come gli altri coetanei, cosa é scattato ad un certo punto
nel loro animo per un totale stravolgimento e per abbracciare
un'ideologia perversa?
Un turista che chiede un caffè in tazza, molto lungo e con latte - dice il [cappuccino] caffè
barman Umberto - dissimula la voglia di bere un [MASK] e pagarlo
come caffè ristretto.
0,69 caffè</p>
      <p>0,69
bicchiere</p>
      <p>0,62 cucchiaino 0,4
[coca cola]
birra
0,6
birra
0,6
Il cubetto grande è molto richiesto soprattutto sul mercato spagnolo;
nei locali il consumatore vuole bere un [MASK] in un bicchiere grande
(generalmente un tumbler alto) e gradisce che lo stesso gli venga
presentato colmo di distillato.
cocktail
0,54</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Albertin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Miaschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brunato</surname>
          </string-name>
          ,
          <article-title>On the Role of Textual Connectives in Sentence Comprehension: A New Dataset for Italian</article-title>
          ,
          <source>in: Proceedings of the Eighth Italian Conference on Computational Linguistics CLiCit</source>
          <year>2021</year>
          , Milano,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Baroni</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Kilgarriff</surname>
          </string-name>
          ,
          <article-title>Large LinguisticallyProcessed Web Corpora for Multiple Languages</article-title>
          ,
          <string-name>
            <surname>Demonstrations</surname>
          </string-name>
          (
          <year>2006</year>
          )
          <fpage>87</fpage>
          -
          <lpage>90</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Cennamo</surname>
          </string-name>
          ,
          <article-title>Object omission and the semantics of predicates in Italian in a comparative perspective</article-title>
          , In: L.
          <string-name>
            <surname>Hellan</surname>
            ,
            <given-names>A.L.</given-names>
          </string-name>
          <string-name>
            <surname>Malchukov</surname>
          </string-name>
          , M. Cennamo (eds.),
          <source>Contrastive Studies in Verbal Valency</source>
          , Benjamins, Amsterdam,
          <year>2017</year>
          , pp.
          <fpage>251</fpage>
          -
          <lpage>273</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>T. De Mauro</surname>
          </string-name>
          (ed.),
          <article-title>Il Nuovo vocabolario di base della lingua italiana</article-title>
          ,
          <source>Internazionale</source>
          ,
          <year>2016</year>
          . URL: https://dizionario.internazionale.it/ J. Devlin,
          <string-name>
            <given-names>M.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Toutanova</surname>
          </string-name>
          , BERT:
          <article-title>Pretraining of Deep Bidirectional Transformers for Language Understanding, in: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</article-title>
          , Volume
          <volume>1</volume>
          , Association for Computational Linguistics, Minneapolis, Minnesota,
          <year>2019</year>
          , pp.
          <fpage>4171</fpage>
          -
          <lpage>4186</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>C. J. Fillmore</surname>
          </string-name>
          ,
          <article-title>Frame semantics</article-title>
          .
          <source>In Linguistics in the Morning Calm</source>
          , Hanshin Publishing Co, Seoul,
          <year>1982</year>
          , pp.
          <fpage>111</fpage>
          -
          <lpage>137</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>J.R.</given-names>
            <surname>Firth</surname>
          </string-name>
          , Modes of Meaning, Papers in Linguistics 1934-
          <volume>1951</volume>
          (
          <year>1951</year>
          )
          <fpage>190</fpage>
          -
          <lpage>215</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Z.S.</given-names>
            <surname>Harris</surname>
          </string-name>
          , Distributional Structure, word
          <volume>10</volume>
          ,
          <fpage>2</fpage>
          -
          <lpage>3</lpage>
          (
          <year>1954</year>
          )
          <fpage>146</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Honnibal</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Montani</surname>
          </string-name>
          ,
          <article-title>SpaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing, 2017</article-title>
          . URL: https://spacy.io/ M. Jakubíček,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kilgarriff</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Kovář</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Rychlý</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Suchomel</surname>
          </string-name>
          , The TenTen Corpus Family,
          <year>2013</year>
          . URL: https://www.sketchengine.eu/ittenten-italiancorpus/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Jezek</surname>
          </string-name>
          ,
          <article-title>Partecipanti impliciti nella struttura argomentale dei verbi</article-title>
          , In: S. Dallabrida, P. Cordin (eds.),
          <source>La Grammatica delle Valenze</source>
          , Franco Cesati, Firenze,
          <year>2018</year>
          , pp.
          <fpage>55</fpage>
          -
          <lpage>71</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>E.</given-names>
            <surname>Jezek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Magnini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Feltracco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bianchini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Popescu</surname>
          </string-name>
          ,
          <string-name>
            <surname>T-PAS</surname>
          </string-name>
          ;
          <article-title>A resource of Typed Predicate Argument Structures for linguistic analysis and semantic processing</article-title>
          ,
          <source>in: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)</source>
          ,
          <source>European Language Resources Association (ELRA)</source>
          , Reykjavik, Iceland.
          <year>2014</year>
          , pp.
          <fpage>890</fpage>
          -
          <lpage>895</lpage>
          . URL: https://tpas.sketchengine.eu/ A.
          <string-name>
            <surname>Kilgarriff</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Baisa</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bušta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Jakubíček</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Kovář</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Michelfeit</surname>
            ,
            <given-names>P .</given-names>
          </string-name>
          <string-name>
            <surname>Rychlý</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Suchomel</surname>
          </string-name>
          ,
          <source>The Sketch Engine: ten years on, Lexicography</source>
          ,
          <volume>1</volume>
          (
          <year>2014</year>
          )
          <fpage>7</fpage>
          -
          <lpage>36</lpage>
          URL: https://www.sketchengine.eu/ A. Lombardi,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lenci</surname>
          </string-name>
          ,
          <article-title>Agentività e telicità in GilBERTo: implicazioni cognitive</article-title>
          ,
          <source>in: Proceedings of the Eighth Italian Conference on Computational Linguistics CLiC-it</source>
          <year>2021</year>
          , Milano,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>M.</given-names>
            <surname>Muffo</surname>
          </string-name>
          , E. Bertino,
          <article-title>BERTino: an Italian DistilBERT model</article-title>
          ,
          <source>in: Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it</source>
          <year>2020</year>
          , Bologna,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>L.</given-names>
            <surname>Parisi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Francia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Magnani</surname>
          </string-name>
          ,
          <article-title>Umberto: An Italian language model trained with whole word masking</article-title>
          ,
          <year>2020</year>
          . URL: https://github.com/musixmatchresearch/umberto 0,54 cocktail
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>