=Paper= {{Paper |id=Vol-2263/paper044 |storemode=property |title=Exploiting Multiword Expressions to Solve “La Ghigliottina” |pdfUrl=https://ceur-ws.org/Vol-2263/paper044.pdf |volume=Vol-2263 |authors=Federico Sangati,Antonio Pascucci,Johanna Monti |dblpUrl=https://dblp.org/rec/conf/evalita/SangatiPM18 }} ==Exploiting Multiword Expressions to Solve “La Ghigliottina”== https://ceur-ws.org/Vol-2263/paper044.pdf

Exploiting Multiword Expressions to solve “La Ghigliottina”

Federico Sangati Antonio Pascucci Johanna Monti
University L’Orientale University L’Orientale University L’Orientale
Naples, Italy Naples, Italy Naples, Italy
fsangati@unior.it apascucci@unior.it jmonti@unior.it

Abstract the different steps we took in order to prepare and
tune the U NIOR 4NLP system. In Section 4 we de-
English. The paper describes scribe our system and its functioning, while results
U NIOR 4NLP a system developed to are presented in Section 5, where we also focus on
solve “La Ghigliottina” game which took error analysis concerning both the data-set of the
part in the NLP4FUN task of the Evalita NLP4FUN task and our system. Finally, conclu-
2018 evaluation campaign. The system is sions and future work are presented in Section 6.
the best performing one in the competition
and achieves better results than human
players.
2 Related work

Italiano. Il contributo descrive il sistema From the very beginning of Artificial Intelligence
U NIOR 4NLP, sviluppato per risolvere il (AI) games represented an interesting playground
gioco “La Ghigliottina”, che ha parteci- to test the results of research in this field (Yan-
pato alla sfida NLP4FUN della campagna nakakis and Togelius, 2018). NLP plays an es-
di valutazione Evalita 2018. Il sistema sential role in solving language related games and
risulta il migliore della competizione e ha recent examples, such as the IBM Watson system
prestazioni più elevate rispetto agli umani. in Jeopardy!TM (Ferrucci et al., 2013), have proven
that its use can result in groundbreaking technol-
ogy. An interesting test-bed for this type of ap-
1 Introduction proach is represented by language games, such as
In this paper we describe U NIOR 4NLP, a sys- the Wheel of Fortune, Who Wants to be a Million-
tem which took part in the NLP4FUN task of the aire? and “La Ghigliottina”.
Evalita 2018 evaluation campaign (Basile et al., The game “La Ghigliottina” is particularly chal-
2018). The goal of this task is to design a solver lenging because its solution is based on modelling
for “La Ghigliottina”, the final game of the pop- how words are connected to each other. A first
ular Italian TV quiz show “L’Eredità”. The game artificial player of the game, OTTHO (Semeraro
involves a single player, who is given a set of five et al., 2009; Basile et al., 2016) exploits i) re-
words (clues), each one linked with an unknown sources from the web such as Wikipedia to build
sixth word that represents the solution to the game. a lexicon and a knowledge repository and ii) a
For example, given the set of clues [ fighting, gun, knowledge base modeling represented by an as-
roof, eater, set ] the solution is fire, because: the sociation matrix which stores the degree of corre-
roof is on fire is a title of a famous song, while fire lation between any two terms in the lexicon. Word
fighting, fire a gun, fire-eater, and set something correlations are detected by connecting i) lemmas
on fire are fixed word constructions. to the terms in its dictionary definition, pair of
U NIOR 4NLP relies on the assumption that words occurring in a proverb, movie or song title,
Multiword Expressions (MWEs) play an impor- and iii) pair of similar words by exploiting Vector
tant role in solving the game: given a set of clues, Space Models (Salton et al., 1975).
the system outputs the solution word which forms In our approach, we make use of similar re-
the strongest connections with all of the clues. sources but we only rely on a very limited set
The paper is organized as follows: in Section 2 of syntactic constructions (patterns) to correlate
we present related work. In Section 3 we describe words and build our association matrix.
3 Solving the Ghigliottina game tion resulted in the definition of six patterns that
identify valid MWEs connecting clue and solution
Building an automatic solver for the Ghigliottina pairs. We list them below with some examples
game requires a number of preliminary steps: i) from our data-set (solution words are underlined):
the analysis of real game instances, ii) the analysis
of patterns that could help the system in solving A B: diario segreto (‘diary secret’ → secret di-
the game, iii) the collection of the linguistic re- ary), brutta caduta (‘ugly fall’ → bad fall),
sources necessary to tune the system for the task. permesso premio (‘permit price’ → good be-
haviour license), dare gas (‘give gas’ → ac-
3.1 Analysis of real game instances celerate).
We have analyzed a sample of 100 game instances A det B: dare il permesso (‘give the permit’ →
that we personally collected from the last five edi- authorize).
tions of the TV show. We found out that in most
cases each clue word is connected to the solu- A prep B: colpo di coda (‘flick of tail’ → last
tion because they form a Multiword Expression ditch effort).
(MWE). We have used this key observation in de- A conj B: stima e affetto (esteem and affection).
signing our system. We started working on our
system before the announcement of the NLP4FUN A prepart B or A prep det B: virtù dei forti,
task. Since our system is not supervised, the extra part of the famous Italian proverb La calma
data-set is not adding any advantage to our sys- è la virtù dei forti (patience is the virtue of
tem. After the official data-set was released, we the strong).
found out that a good number of game instances A+B: compounds such as radio + attività = ra-
was confirming our initial finding. However we dioattività (radio + activity = radioactivity).
also observed a number of unusual cases which
will discuss in more depth in Section 5.2. 3.3 Linguistic Resources
A MWE can be defined as a sequence of words On the basis of the linguistic analysis described
that presents some characteristic behaviour (at the above, we collected the linguistic resources which
lexical, syntactic, semantic, pragmatic or statis- we deemed necessary for the task. To this end we
tical level) and whose interpretation crosses the used the following freely available corpora:
boundaries between words (Sag et al., 2002). Paisà: 225 M words corpus automatically anno-
MWEs have to be considered as lexical items tated (Lyding et al., 2014).
which convey a single meaning different from the
meanings of the constituents of the MWE, such as itWaC: 1.5 B words corpus automatically anno-
in the idiomatic expression kick the bucket where tated (Baroni et al., 2009)
the simple addition of the meanings of kick and Wiki-IT-Titles: Wikipedia-IT titles down-
bucket does not convey the meaning of to die. loaded via WikiExtractor (Attardi, 2016).
We have different classes of MWEs, such as id-
ioms (break a leg), verb particle constructions (to Proverbs: 1955 proverbs from Wikiquote
call off ), light verbs constructions (to provoke a (2016) and 371 from an online collection
reaction). For a detailed overview of MWEs in (Dige, 2016).
NLP applications we refer the reader to Constant In addition, we have constructed the following
et al. (2017). For the purpose of the current task lexical resources:
we considered only those MWEs characterized by
DeMauro-Ext: words extracted from “Il Nuovo
fixed syntactic patterns described in the following
vocabolario di base della lingua italiana”(De
section.
Mauro, 2016b), extended with morphological
3.2 Pattern Analysis variations obtained by changing last vowel of
the word and checking if the resulting word
A first analysis of the tuples from the sample men-
has frequency ≥ 1000 in Paisà.
tioned above revealed that words in the clues are
typically nouns, verbs, or adjectives, while the DeMauro-MWEs: MWEs extracted from the
ones in the solutions are typically nouns or ad- “De Mauro online dictionary” (De Mauro,
jectives (never verbs). A more detailed investiga- 2016a) composed of 30,633 entries.
4 System description
In order to build our system, we started process- X
w
cs = max Mpmi (ws , wc ) (5)
ing the selected corpora via standard tokeniza- ws ∈SLEX
wc ∈G
tion (only single word tokens) and removal of
punctuation marks and non-word patterns. We that is, we choose the word in SLEX which maxi-
next constructed two lexical sets: CLEX to cover mizes the score obtained by summing the pmi be-
the clue words, and SLEX to cover the solu- tween each clue word and the candidate word. If
tion words. SLEX (composed of 7,942 nouns two words are never seen co-occurring together in
and adjectives in DeMauro-Ext) is smaller than a pattern in the training corpora, we assign to them
CLEX (composed of 19,414 words from the full the lowest pmi value in Mpmi .
DeMauro-Ext and DeMauro-MWEs) because The system has been implemented in Python
solution words are almost always nouns or adjec- and the code is open source.1 After the matrix has
tives as described in Section 3.2. been loaded into memory the response time on an
Secondly, we built a co-occurrence matrix Mc average laptop is around 1-2 seconds.
which stores the counts ci,j for every pair of words
wi ∈ SLEX and wj ∈ CLEX such that wi co- 5 Results
occurs with wj in the resources according to pat-
terns described in Section 3.2. Co-occurrence According to Basile et al. (2018), U NIOR 4NLP
patterns were extracted from Paisà and itWaC is the best performing system in the Evalita
with weight w = 1, from DeMauro-MWE with NLP4FUN task. Table 1 provides the detailed
w = 200, from Proverbs with w = 100, and results, including split results on TV and Board
from Wiki-IT-Titles with w = 50. The Game (BG) subsets. The system achieved a very
weight were chosen manually taking into account high performance: in more than half of the games
the likelihood that a pattern in a given corpus rep- (64/105) it is able to guess the correct word.
resented a valid MWE. Compound patterns (A+B) In the attempt to compare the performance of
were extracted from CLEX : for every word w in our AI system with that of a top player we ana-
CLEX if w = ab, a and b are both in CLEX , lyzed the games played by Andrea Saccone, who
and a and b have at least 4 characters, the count has been the biggest champion of the Ghigliot-
for the pair (a, b) is incremented by 1 in the co- tina game so far: he was champion for 13 days
occurrence matrix. (3-15 March 2018), and he managed to find the
Thirdly, for every pair of words wi and wj correct solution three times.2 In comparison,
in Mc , we populate the association-score matrix U NIOR 4NLP was able to win the same game in-
Mpmi via the Pointwise Mutual Information mea- stances 9 times.
sure:
SET SIZE MRR R@100
p(wi , wj ) TEST ALL 105 0,64 0.82
Mpmi (wi , wj ) = log (1)
p(wi ) · p(wj )
TEST TV 66 0.67 0.88
where TEST BG 39 0.60 0.72
DEV ALL 315 0.56 0.80
X
p(wi ) = Mc (wi , wj ) (2)
wj ∈CLEX DEV TV 204 0.61 0.85
DEV BG 111 0.48 0.71
X
p(wj ) = Mc (wi , wj ) (3)
wi ∈SLEX
Table 1: Results on the TEST and DEV set. Eval-
Mc (wi , wj ) uations are the MRR (Mean Reciprocal Rank) and
p(wi , wj ) = P (4)
x∈SLEX Mc (x, y) R@100 (recall at 100).
y∈CLEX
1
Finally, for a given game instance with the 5 https://gitlab.com/kercos/ghigliottina
2
The players who reach the “Ghigliottina” game (the
clue words G = (wc1 , wc2 , wc3 , wc4 , wc5 ), we champion) continue to participate in the subsequent episodes
choose the solution word wcs ∈ SLEX such that: even if they do not guess the solution word.
The plot in Figure 1 shows the distributions of the TV set date back to the very first editions of the
the scores for the correct and missed solutions of TV game (when correlation criteria where proba-
our system on the full set of games in the devel- bly not yet well defined).
opment and test set (420 in total). This allows us
to set a number of confidence values for our sys- 5.2 System error analysis
tem: if the system returns a solution of a game In this section we analyze some types of errors that
with a score S ≥ 10 we can be reasonably cer- our system makes, and we provide some sugges-
tain (68/69 = 99%) that the system has guessed tion for possible improvement.
the correct solution, if 5 ≤ S < 10 we are above
chance level (50/70 = 71%), if 0 ≤ S < 5 we are Word similarity Although quite rare, few of the
at chance level (56/112 = 50%) and if S is below clue-solution links can be explained by the sim-
0, we are below chance level (48/169 = 28%). ilarity relation. An example is the clue-solution
pair sincero-franco (sincere-frank). Those are not
easily captured by patterns of the types described
in Section 3.2, but could be included by means
of automatic detection of word similarity via Vec-
tor Space Models (Salton et al., 1975) as done in
Basile et al. (2016).

Missing words As explained in Section 3.2, we
restricted the set of words in the solution set. This
choice, while helping the system to restrict the
Figure 1: Distribution of the correct and missed search space, leads to some coverage issues. For
solutions with respect to their score. instance, pennello (brush) is one of the solutions
of the games in the test data not present in our so-
lution set. In the future we would like to experi-
5.1 Data-set analysis
ment increasing the size of the solution set while
In the development data-set we found several avoiding performance and memory problems.
cases which fall outside the patterns we observed
in out data-set. For instance, we noticed the pres- Wrong PoS Our system analyzes words in their
ence of digits in some clues or solution words surface form, so it cannot distinguish cases where
(1973, 33), game instances with a clue being also the same word-form can have multiple Part of
the solution (‘sostanza’, ‘fuori’), and words being Speech (PoS) (with different meaning). To avoid
spelled in different ways (‘tenère’, ‘tenere’). this problem we could envision a system which
Moreover, we also observed a number of ‘clue takes PoS and word-sense disambiguation into
- solution’ pairs which are very difficult to relate. consideration.
We list below some examples with some possible
explanation: Multiword clues Although the great majority of
the clues are constituted by a single word, there are
• g - orecchio: (g - ear) the letter ‘g’ has the a few exceptions (typically names of saints). The
shape of a ear. current system considers only single-word tokens,
so if a game has a 2-word clue, it is regarded as
• classe 1973 - 33: (class 1973 - 33) this game
two separate clues (their contribution is then aver-
instance was from 2006, and that year people
age to obtain the final score). The system could be
born in 1973 were 33 years old.
optimized by using a tokenizer which keeps spe-
• ...—... - titanic: the clue being the S.O.S. bea- cific types of bigrams connected.
con in morse code.
Association metrics As described in Section 4,
One possible reason for these inconsistent cases we compute the association score between any
is that Board Game edition use slightly different pair of words in the matrix via the Pointwise Mu-
criteria to correlate words,3 and that those from tual Information measure (pmi). There is still a
3
This is supported by results in Table 1, where Board big number of alternative measures (Pecina, 2010)
Game results are lower than those from the TV set. that might lead to higher performance.
6 Conclusions and future work Acknowledgments
This research has been partly supported by the
In this paper we described U NIOR 4NLP, an ar-
PON Ricerca e Innovazione 2014/20 fund. Au-
tificial player of “La Ghigliottina”, a challenging
thorship contribution is as follows: Johanna Monti
game which requires linguistic knowledge to be
is author of Sections 1, 2, 3.3 and 6; Federico San-
solved. We described the preliminary steps that we
gati of Section 4 and 5, and Antonio Pascucci of
made before developing our system (identifying
Sections 3.1. and 3.2.
linguistic patterns that are relevant in the game)
as well as the algorithms and the methodology we
References
adopted. The system achieved a high performance
but we believe that with further tuning it can still Giuseppe Attardi. 2016. Wikiextractor. http :
be improved. //attardi.github.io/wikiextractor. Last accessed
Future work will focus on adopting the same on the 1st October 2018.
methodology to automatically create novel game Marco Baroni, Silvia Bernardini, Adriano Fer-
instances: using the same association-matrix we raresi, and Eros Zanchetta. 2009. The wacky
can choose a random word (the solution) and wide web: a collection of very large linguis-
present the list of 5 clues with high score. tically processed web-crawled corpora. Lan-
In order to make our system easily testable by guage Resources and Evaluation, 43(3):209–
the scientific community and general public, we 226.
have built an interactive version which can be ac- Pierpaolo Basile, Marco de Gemmis, Pasquale
cessed via a Telegram bot4 and on Twitter5 (see Lops, and Giovanni Semeraro. 2016. Solving
Figure 2). a complex language game by using knowledge-
based word associations discovery. IEEE Trans-
actions on Computational Intelligence and AI in
Games, 8(1):13–26.
Pierpaolo Basile, Marco de Gemmis, Lucia Sicil-
iani, and Giovanni Semeraro. 2018. Overview
of the evalita 2018 solving language games
(nlp4fun) task. In Tommaso Caselli, Nicole
Novielli, Viviana Patti, and Paolo Rosso,
editors, Proceedings of the 6th evaluation
campaign of Natural Language Processing
and Speech tools for Italian (EVALITA’18).
CEUR.org, Turin, Italy.
Mathieu Constant, Gülşen Eryiğit, Johanna Monti,
Lonneke Van Der Plas, Carlos Ramisch,
Michael Rosner, and Amalia Todirascu. 2017.
Multiword expression processing: A survey.
Computational Linguistics, 43(4):837–892.
Tullio De Mauro. 2016a. Il Nuovo De Mauro (On-
line). https://dizionario.internazionale.it. Last
accessed on the 1st October 2018.
Tullio De Mauro. 2016b. Il Nuovo vocabolario di
Figure 2: Screenshot of @U NIOR 4NLP twitter re- base della lingua italiana (pdf version). https :
sponse. / / www. internazionale . it / opinione / tullio - de -
mauro/2016/12/23/il- nuovo- vocabolario- di-
base - della - lingua - italiana. Last accessed on
4
the 1st October 2018.
https://t.me/Unior4NLPbot
5
https://twitter.com/UNIOR4NLP Antonio Dige. 2016. Raccolta di proverbi e detti
italiani. http : / / web . tiscali . it / proverbiitaliani.
Downloaded on the 24th April 2018.
David A. Ferrucci, Anthony Levas, Sugato
Bagchi, David Gondek, and Erik T. Mueller.
2013. Watson: Beyond jeopardy! Artif. Intell.,
199:93–105.
Verena Lyding, Egon Stemle, Claudia Borghetti,
Marco Brunello, Sara Castagnoli, Felice
Dell’Orletta, Henrik Dittmann, Alessandro
Lenci, and Vito Pirrelli. 2014. The PAISÀ cor-
pus of italian web texts. In Proceedings of the
9th Web as Corpus Workshop (WaC-9), pages
36–43. Association for Computational Linguis-
tics, Gothenburg, Sweden.
Pavel Pecina. 2010. Lexical association measures
and collocation extraction. Language Resources
and Evaluation, 44(1-2):137–158.
Ivan A Sag, Timothy Baldwin, Francis Bond, Ann
Copestake, and Dan Flickinger. 2002. Mul-
tiword expressions: A pain in the neck for
nlp. In International Conference on Intelligent
Text Processing and Computational Linguistics,
pages 1–15. Springer.
G. Salton, A. Wong, and C. S. Yang. 1975. A vec-
tor space model for automatic indexing. Com-
mun. ACM, 18(11):613–620.
Giovanni Semeraro, Pasquale Lops, Pierpaolo
Basile, and Marco De Gemmis. 2009. On the
tip of my thought: Playing the guillotine game.
In Proceedings of the 21st International Jont
Conference on Artifical Intelligence, IJCAI’09,
pages 1543–1548. Morgan Kaufmann Publish-
ers Inc., San Francisco, CA, USA.
Wikiquote. 2016. Proverbi italiani. https : / / it .
wikiquote . org / wiki / Proverbi _ italiani.
Downloaded on the 24th April 2018.
Georgios N Yannakakis and Julian Togelius. 2018.
Artificial Intelligence and Games. Springer.