=Paper= {{Paper |id=Vol-2263/paper044 |storemode=property |title=Exploiting Multiword Expressions to Solve “La Ghigliottina” |pdfUrl=https://ceur-ws.org/Vol-2263/paper044.pdf |volume=Vol-2263 |authors=Federico Sangati,Antonio Pascucci,Johanna Monti |dblpUrl=https://dblp.org/rec/conf/evalita/SangatiPM18 }} ==Exploiting Multiword Expressions to Solve “La Ghigliottina”== https://ceur-ws.org/Vol-2263/paper044.pdf
          Exploiting Multiword Expressions to solve “La Ghigliottina”

       Federico Sangati                    Antonio Pascucci                     Johanna Monti
     University L’Orientale              University L’Orientale               University L’Orientale
         Naples, Italy                       Naples, Italy                        Naples, Italy
    fsangati@unior.it                   apascucci@unior.it                    jmonti@unior.it



                     Abstract                             the different steps we took in order to prepare and
                                                          tune the U NIOR 4NLP system. In Section 4 we de-
    English.         The paper describes                  scribe our system and its functioning, while results
    U NIOR 4NLP a system developed to                     are presented in Section 5, where we also focus on
    solve “La Ghigliottina” game which took               error analysis concerning both the data-set of the
    part in the NLP4FUN task of the Evalita               NLP4FUN task and our system. Finally, conclu-
    2018 evaluation campaign. The system is               sions and future work are presented in Section 6.
    the best performing one in the competition
    and achieves better results than human
    players.
                                                          2   Related work

    Italiano. Il contributo descrive il sistema           From the very beginning of Artificial Intelligence
    U NIOR 4NLP, sviluppato per risolvere il              (AI) games represented an interesting playground
    gioco “La Ghigliottina”, che ha parteci-              to test the results of research in this field (Yan-
    pato alla sfida NLP4FUN della campagna                nakakis and Togelius, 2018). NLP plays an es-
    di valutazione Evalita 2018. Il sistema               sential role in solving language related games and
    risulta il migliore della competizione e ha           recent examples, such as the IBM Watson system
    prestazioni più elevate rispetto agli umani.          in Jeopardy!TM (Ferrucci et al., 2013), have proven
                                                          that its use can result in groundbreaking technol-
                                                          ogy. An interesting test-bed for this type of ap-
1   Introduction                                          proach is represented by language games, such as
In this paper we describe U NIOR 4NLP, a sys-             the Wheel of Fortune, Who Wants to be a Million-
tem which took part in the NLP4FUN task of the            aire? and “La Ghigliottina”.
Evalita 2018 evaluation campaign (Basile et al.,             The game “La Ghigliottina” is particularly chal-
2018). The goal of this task is to design a solver        lenging because its solution is based on modelling
for “La Ghigliottina”, the final game of the pop-         how words are connected to each other. A first
ular Italian TV quiz show “L’Eredità”. The game           artificial player of the game, OTTHO (Semeraro
involves a single player, who is given a set of five      et al., 2009; Basile et al., 2016) exploits i) re-
words (clues), each one linked with an unknown            sources from the web such as Wikipedia to build
sixth word that represents the solution to the game.      a lexicon and a knowledge repository and ii) a
For example, given the set of clues [ fighting, gun,      knowledge base modeling represented by an as-
roof, eater, set ] the solution is fire, because: the     sociation matrix which stores the degree of corre-
roof is on fire is a title of a famous song, while fire   lation between any two terms in the lexicon. Word
fighting, fire a gun, fire-eater, and set something       correlations are detected by connecting i) lemmas
on fire are fixed word constructions.                     to the terms in its dictionary definition, pair of
   U NIOR 4NLP relies on the assumption that              words occurring in a proverb, movie or song title,
Multiword Expressions (MWEs) play an impor-               and iii) pair of similar words by exploiting Vector
tant role in solving the game: given a set of clues,      Space Models (Salton et al., 1975).
the system outputs the solution word which forms             In our approach, we make use of similar re-
the strongest connections with all of the clues.          sources but we only rely on a very limited set
   The paper is organized as follows: in Section 2        of syntactic constructions (patterns) to correlate
we present related work. In Section 3 we describe         words and build our association matrix.
3     Solving the Ghigliottina game                     tion resulted in the definition of six patterns that
                                                        identify valid MWEs connecting clue and solution
Building an automatic solver for the Ghigliottina       pairs. We list them below with some examples
game requires a number of preliminary steps: i)         from our data-set (solution words are underlined):
the analysis of real game instances, ii) the analysis
of patterns that could help the system in solving        A B: diario segreto (‘diary secret’ → secret di-
the game, iii) the collection of the linguistic re-         ary), brutta caduta (‘ugly fall’ → bad fall),
sources necessary to tune the system for the task.          permesso premio (‘permit price’ → good be-
                                                            haviour license), dare gas (‘give gas’ → ac-
3.1    Analysis of real game instances                      celerate).
We have analyzed a sample of 100 game instances          A det B: dare il permesso (‘give the permit’ →
that we personally collected from the last five edi-        authorize).
tions of the TV show. We found out that in most
cases each clue word is connected to the solu-           A prep B: colpo di coda (‘flick of tail’ → last
tion because they form a Multiword Expression               ditch effort).
(MWE). We have used this key observation in de-          A conj B: stima e affetto (esteem and affection).
signing our system. We started working on our
system before the announcement of the NLP4FUN            A prepart B or A prep det B: virtù dei forti,
task. Since our system is not supervised, the extra         part of the famous Italian proverb La calma
data-set is not adding any advantage to our sys-            è la virtù dei forti (patience is the virtue of
tem. After the official data-set was released, we           the strong).
found out that a good number of game instances           A+B: compounds such as radio + attività = ra-
was confirming our initial finding. However we             dioattività (radio + activity = radioactivity).
also observed a number of unusual cases which
will discuss in more depth in Section 5.2.              3.3   Linguistic Resources
   A MWE can be defined as a sequence of words          On the basis of the linguistic analysis described
that presents some characteristic behaviour (at the     above, we collected the linguistic resources which
lexical, syntactic, semantic, pragmatic or statis-      we deemed necessary for the task. To this end we
tical level) and whose interpretation crosses the       used the following freely available corpora:
boundaries between words (Sag et al., 2002).            Paisà: 225 M words corpus automatically anno-
MWEs have to be considered as lexical items                tated (Lyding et al., 2014).
which convey a single meaning different from the
meanings of the constituents of the MWE, such as        itWaC: 1.5 B words corpus automatically anno-
in the idiomatic expression kick the bucket where          tated (Baroni et al., 2009)
the simple addition of the meanings of kick and         Wiki-IT-Titles: Wikipedia-IT titles down-
bucket does not convey the meaning of to die.              loaded via WikiExtractor (Attardi, 2016).
   We have different classes of MWEs, such as id-
ioms (break a leg), verb particle constructions (to     Proverbs: 1955 proverbs from Wikiquote
call off ), light verbs constructions (to provoke a        (2016) and 371 from an online collection
reaction). For a detailed overview of MWEs in              (Dige, 2016).
NLP applications we refer the reader to Constant           In addition, we have constructed the following
et al. (2017). For the purpose of the current task      lexical resources:
we considered only those MWEs characterized by
                                                        DeMauro-Ext: words extracted from “Il Nuovo
fixed syntactic patterns described in the following
                                                           vocabolario di base della lingua italiana”(De
section.
                                                           Mauro, 2016b), extended with morphological
3.2    Pattern Analysis                                    variations obtained by changing last vowel of
                                                           the word and checking if the resulting word
A first analysis of the tuples from the sample men-
                                                           has frequency ≥ 1000 in Paisà.
tioned above revealed that words in the clues are
typically nouns, verbs, or adjectives, while the        DeMauro-MWEs: MWEs extracted from the
ones in the solutions are typically nouns or ad-           “De Mauro online dictionary” (De Mauro,
jectives (never verbs). A more detailed investiga-         2016a) composed of 30,633 entries.
4   System description
In order to build our system, we started process-                                    X
                                                                  w
                                                                  cs =     max               Mpmi (ws , wc )    (5)
ing the selected corpora via standard tokeniza-                           ws ∈SLEX
                                                                                     wc ∈G
tion (only single word tokens) and removal of
punctuation marks and non-word patterns. We             that is, we choose the word in SLEX which maxi-
next constructed two lexical sets: CLEX to cover        mizes the score obtained by summing the pmi be-
the clue words, and SLEX to cover the solu-             tween each clue word and the candidate word. If
tion words. SLEX (composed of 7,942 nouns               two words are never seen co-occurring together in
and adjectives in DeMauro-Ext) is smaller than          a pattern in the training corpora, we assign to them
CLEX (composed of 19,414 words from the full            the lowest pmi value in Mpmi .
DeMauro-Ext and DeMauro-MWEs) because                      The system has been implemented in Python
solution words are almost always nouns or adjec-        and the code is open source.1 After the matrix has
tives as described in Section 3.2.                      been loaded into memory the response time on an
   Secondly, we built a co-occurrence matrix Mc         average laptop is around 1-2 seconds.
which stores the counts ci,j for every pair of words
wi ∈ SLEX and wj ∈ CLEX such that wi co-                5       Results
occurs with wj in the resources according to pat-
terns described in Section 3.2. Co-occurrence           According to Basile et al. (2018), U NIOR 4NLP
patterns were extracted from Paisà and itWaC            is the best performing system in the Evalita
with weight w = 1, from DeMauro-MWE with                NLP4FUN task. Table 1 provides the detailed
w = 200, from Proverbs with w = 100, and                results, including split results on TV and Board
from Wiki-IT-Titles with w = 50. The                    Game (BG) subsets. The system achieved a very
weight were chosen manually taking into account         high performance: in more than half of the games
the likelihood that a pattern in a given corpus rep-    (64/105) it is able to guess the correct word.
resented a valid MWE. Compound patterns (A+B)              In the attempt to compare the performance of
were extracted from CLEX : for every word w in          our AI system with that of a top player we ana-
CLEX if w = ab, a and b are both in CLEX ,              lyzed the games played by Andrea Saccone, who
and a and b have at least 4 characters, the count       has been the biggest champion of the Ghigliot-
for the pair (a, b) is incremented by 1 in the co-      tina game so far: he was champion for 13 days
occurrence matrix.                                      (3-15 March 2018), and he managed to find the
   Thirdly, for every pair of words wi and wj           correct solution three times.2 In comparison,
in Mc , we populate the association-score matrix        U NIOR 4NLP was able to win the same game in-
Mpmi via the Pointwise Mutual Information mea-          stances 9 times.
sure:
                                                                    SET         SIZE         MRR     R@100
                                  p(wi , wj )                    TEST ALL        105         0,64      0.82
        Mpmi (wi , wj ) = log                     (1)
                                p(wi ) · p(wj )
                                                                 TEST TV         66          0.67      0.88
where                                                            TEST BG         39          0.60      0.72
                                                                 DEV ALL         315         0.56      0.80
                       X
          p(wi ) =              Mc (wi , wj )     (2)
                     wj ∈CLEX                                    DEV TV          204         0.61      0.85
                                                                 DEV BG          111         0.48      0.71
                       X
          p(wj ) =              Mc (wi , wj )     (3)
                     wi ∈SLEX
                                                        Table 1: Results on the TEST and DEV set. Eval-
                         Mc (wi , wj )                  uations are the MRR (Mean Reciprocal Rank) and
        p(wi , wj ) = P                           (4)
                        x∈SLEX Mc (x, y)                R@100 (recall at 100).
                         y∈CLEX
                                                            1
   Finally, for a given game instance with the 5            https://gitlab.com/kercos/ghigliottina
                                                            2
                                                            The players who reach the “Ghigliottina” game (the
clue words G = (wc1 , wc2 , wc3 , wc4 , wc5 ), we       champion) continue to participate in the subsequent episodes
choose the solution word wcs ∈ SLEX such that:          even if they do not guess the solution word.
   The plot in Figure 1 shows the distributions of         the TV set date back to the very first editions of the
the scores for the correct and missed solutions of         TV game (when correlation criteria where proba-
our system on the full set of games in the devel-          bly not yet well defined).
opment and test set (420 in total). This allows us
to set a number of confidence values for our sys-          5.2   System error analysis
tem: if the system returns a solution of a game            In this section we analyze some types of errors that
with a score S ≥ 10 we can be reasonably cer-              our system makes, and we provide some sugges-
tain (68/69 = 99%) that the system has guessed             tion for possible improvement.
the correct solution, if 5 ≤ S < 10 we are above
chance level (50/70 = 71%), if 0 ≤ S < 5 we are            Word similarity Although quite rare, few of the
at chance level (56/112 = 50%) and if S is below           clue-solution links can be explained by the sim-
0, we are below chance level (48/169 = 28%).               ilarity relation. An example is the clue-solution
                                                           pair sincero-franco (sincere-frank). Those are not
                                                           easily captured by patterns of the types described
                                                           in Section 3.2, but could be included by means
                                                           of automatic detection of word similarity via Vec-
                                                           tor Space Models (Salton et al., 1975) as done in
                                                           Basile et al. (2016).

                                                           Missing words As explained in Section 3.2, we
                                                           restricted the set of words in the solution set. This
                                                           choice, while helping the system to restrict the
Figure 1: Distribution of the correct and missed           search space, leads to some coverage issues. For
solutions with respect to their score.                     instance, pennello (brush) is one of the solutions
                                                           of the games in the test data not present in our so-
                                                           lution set. In the future we would like to experi-
5.1   Data-set analysis
                                                           ment increasing the size of the solution set while
In the development data-set we found several               avoiding performance and memory problems.
cases which fall outside the patterns we observed
in out data-set. For instance, we noticed the pres-        Wrong PoS Our system analyzes words in their
ence of digits in some clues or solution words             surface form, so it cannot distinguish cases where
(1973, 33), game instances with a clue being also          the same word-form can have multiple Part of
the solution (‘sostanza’, ‘fuori’), and words being        Speech (PoS) (with different meaning). To avoid
spelled in different ways (‘tenère’, ‘tenere’).            this problem we could envision a system which
   Moreover, we also observed a number of ‘clue            takes PoS and word-sense disambiguation into
- solution’ pairs which are very difficult to relate.      consideration.
We list below some examples with some possible
explanation:                                               Multiword clues Although the great majority of
                                                           the clues are constituted by a single word, there are
  • g - orecchio: (g - ear) the letter ‘g’ has the         a few exceptions (typically names of saints). The
    shape of a ear.                                        current system considers only single-word tokens,
                                                           so if a game has a 2-word clue, it is regarded as
  • classe 1973 - 33: (class 1973 - 33) this game
                                                           two separate clues (their contribution is then aver-
    instance was from 2006, and that year people
                                                           age to obtain the final score). The system could be
    born in 1973 were 33 years old.
                                                           optimized by using a tokenizer which keeps spe-
  • ...—... - titanic: the clue being the S.O.S. bea-      cific types of bigrams connected.
    con in morse code.
                                                           Association metrics As described in Section 4,
   One possible reason for these inconsistent cases        we compute the association score between any
is that Board Game edition use slightly different          pair of words in the matrix via the Pointwise Mu-
criteria to correlate words,3 and that those from          tual Information measure (pmi). There is still a
  3
    This is supported by results in Table 1, where Board   big number of alternative measures (Pecina, 2010)
Game results are lower than those from the TV set.         that might lead to higher performance.
6       Conclusions and future work                    Acknowledgments
                                                       This research has been partly supported by the
In this paper we described U NIOR 4NLP, an ar-
                                                       PON Ricerca e Innovazione 2014/20 fund. Au-
tificial player of “La Ghigliottina”, a challenging
                                                       thorship contribution is as follows: Johanna Monti
game which requires linguistic knowledge to be
                                                       is author of Sections 1, 2, 3.3 and 6; Federico San-
solved. We described the preliminary steps that we
                                                       gati of Section 4 and 5, and Antonio Pascucci of
made before developing our system (identifying
                                                       Sections 3.1. and 3.2.
linguistic patterns that are relevant in the game)
as well as the algorithms and the methodology we
                                                       References
adopted. The system achieved a high performance
but we believe that with further tuning it can still   Giuseppe Attardi. 2016. Wikiextractor. http :
be improved.                                             //attardi.github.io/wikiextractor. Last accessed
   Future work will focus on adopting the same           on the 1st October 2018.
methodology to automatically create novel game         Marco Baroni, Silvia Bernardini, Adriano Fer-
instances: using the same association-matrix we         raresi, and Eros Zanchetta. 2009. The wacky
can choose a random word (the solution) and             wide web: a collection of very large linguis-
present the list of 5 clues with high score.            tically processed web-crawled corpora. Lan-
  In order to make our system easily testable by        guage Resources and Evaluation, 43(3):209–
the scientific community and general public, we         226.
have built an interactive version which can be ac-     Pierpaolo Basile, Marco de Gemmis, Pasquale
cessed via a Telegram bot4 and on Twitter5 (see          Lops, and Giovanni Semeraro. 2016. Solving
Figure 2).                                               a complex language game by using knowledge-
                                                         based word associations discovery. IEEE Trans-
                                                         actions on Computational Intelligence and AI in
                                                         Games, 8(1):13–26.
                                                       Pierpaolo Basile, Marco de Gemmis, Lucia Sicil-
                                                         iani, and Giovanni Semeraro. 2018. Overview
                                                         of the evalita 2018 solving language games
                                                         (nlp4fun) task. In Tommaso Caselli, Nicole
                                                         Novielli, Viviana Patti, and Paolo Rosso,
                                                         editors, Proceedings of the 6th evaluation
                                                         campaign of Natural Language Processing
                                                         and Speech tools for Italian (EVALITA’18).
                                                         CEUR.org, Turin, Italy.
                                                       Mathieu Constant, Gülşen Eryiğit, Johanna Monti,
                                                        Lonneke Van Der Plas, Carlos Ramisch,
                                                        Michael Rosner, and Amalia Todirascu. 2017.
                                                        Multiword expression processing: A survey.
                                                        Computational Linguistics, 43(4):837–892.
                                                       Tullio De Mauro. 2016a. Il Nuovo De Mauro (On-
                                                         line). https://dizionario.internazionale.it. Last
                                                         accessed on the 1st October 2018.
                                                       Tullio De Mauro. 2016b. Il Nuovo vocabolario di
Figure 2: Screenshot of @U NIOR 4NLP twitter re-         base della lingua italiana (pdf version). https :
sponse.                                                  / / www. internazionale . it / opinione / tullio - de -
                                                         mauro/2016/12/23/il- nuovo- vocabolario- di-
                                                         base - della - lingua - italiana. Last accessed on
    4
                                                         the 1st October 2018.
        https://t.me/Unior4NLPbot
    5
        https://twitter.com/UNIOR4NLP                  Antonio Dige. 2016. Raccolta di proverbi e detti
  italiani. http : / / web . tiscali . it / proverbiitaliani.
  Downloaded on the 24th April 2018.
David A. Ferrucci, Anthony Levas, Sugato
  Bagchi, David Gondek, and Erik T. Mueller.
  2013. Watson: Beyond jeopardy! Artif. Intell.,
  199:93–105.
Verena Lyding, Egon Stemle, Claudia Borghetti,
  Marco Brunello, Sara Castagnoli, Felice
  Dell’Orletta, Henrik Dittmann, Alessandro
  Lenci, and Vito Pirrelli. 2014. The PAISÀ cor-
  pus of italian web texts. In Proceedings of the
  9th Web as Corpus Workshop (WaC-9), pages
  36–43. Association for Computational Linguis-
  tics, Gothenburg, Sweden.
Pavel Pecina. 2010. Lexical association measures
  and collocation extraction. Language Resources
  and Evaluation, 44(1-2):137–158.
Ivan A Sag, Timothy Baldwin, Francis Bond, Ann
  Copestake, and Dan Flickinger. 2002. Mul-
  tiword expressions: A pain in the neck for
  nlp. In International Conference on Intelligent
  Text Processing and Computational Linguistics,
  pages 1–15. Springer.
G. Salton, A. Wong, and C. S. Yang. 1975. A vec-
  tor space model for automatic indexing. Com-
  mun. ACM, 18(11):613–620.
Giovanni Semeraro, Pasquale Lops, Pierpaolo
  Basile, and Marco De Gemmis. 2009. On the
  tip of my thought: Playing the guillotine game.
  In Proceedings of the 21st International Jont
  Conference on Artifical Intelligence, IJCAI’09,
  pages 1543–1548. Morgan Kaufmann Publish-
  ers Inc., San Francisco, CA, USA.
Wikiquote. 2016. Proverbi italiani. https : / / it .
 wikiquote . org / wiki / Proverbi _ italiani.
 Downloaded on the 24th April 2018.
Georgios N Yannakakis and Julian Togelius. 2018.
  Artificial Intelligence and Games. Springer.