<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Exploiting Multiword Expressions to solve “La Ghigliottina”</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Federico Sangati</string-name>
          <email>R@100</email>
          <email>fsangati@unior.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonio Pascucci</string-name>
          <email>apascucci@unior.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johanna Monti</string-name>
          <email>jmonti@unior.it</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University L'Orientale</institution>
          ,
          <addr-line>Naples</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University L'Orientale</institution>
          ,
          <addr-line>Naples</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University L'Orientale</institution>
          ,
          <addr-line>Naples</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>English. The paper describes UNIOR4NLP a system developed to solve “La Ghigliottina” game which took part in the NLP4FUN task of the Evalita 2018 evaluation campaign. The system is the best performing one in the competition and achieves better results than human players.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        In this paper we describe UNIOR4NLP, a
system which took part in the NLP4FUN task of the
Evalita 2018 evaluation campaign
        <xref ref-type="bibr" rid="ref4">(Basile et al.,
2018)</xref>
        . The goal of this task is to design a solver
for “La Ghigliottina”, the final game of the
popular Italian TV quiz show “L’Eredità”. The game
involves a single player, who is given a set of five
words (clues), each one linked with an unknown
sixth word that represents the solution to the game.
For example, given the set of clues [ fighting, gun,
roof, eater, set ] the solution is fire, because: the
roof is on fire is a title of a famous song, while fire
fighting, fire a gun, fire-eater, and set something
on fire are fixed word constructions.
      </p>
      <p>UNIOR4NLP relies on the assumption that
Multiword Expressions (MWEs) play an
important role in solving the game: given a set of clues,
the system outputs the solution word which forms
the strongest connections with all of the clues.</p>
      <p>The paper is organized as follows: in Section 2
we present related work. In Section 3 we describe
the different steps we took in order to prepare and
tune the UNIOR4NLP system. In Section 4 we
describe our system and its functioning, while results
are presented in Section 5, where we also focus on
error analysis concerning both the data-set of the
NLP4FUN task and our system. Finally,
conclusions and future work are presented in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        From the very beginning of Artificial Intelligence
(AI) games represented an interesting playground
to test the results of research in this field
        <xref ref-type="bibr" rid="ref17">(Yannakakis and Togelius, 2018)</xref>
        . NLP plays an
essential role in solving language related games and
recent examples, such as the IBM Watson system
in Jeopardy!TM
        <xref ref-type="bibr" rid="ref10">(Ferrucci et al., 2013)</xref>
        , have proven
that its use can result in groundbreaking
technology. An interesting test-bed for this type of
approach is represented by language games, such as
the Wheel of Fortune, Who Wants to be a
Millionaire? and “La Ghigliottina”.
      </p>
      <p>
        The game “La Ghigliottina” is particularly
challenging because its solution is based on modelling
how words are connected to each other. A first
artificial player of the game, OTTHO
        <xref ref-type="bibr" rid="ref15 ref3">(Semeraro
et al., 2009; Basile et al., 2016)</xref>
        exploits i)
resources from the web such as Wikipedia to build
a lexicon and a knowledge repository and ii) a
knowledge base modeling represented by an
association matrix which stores the degree of
correlation between any two terms in the lexicon. Word
correlations are detected by connecting i) lemmas
to the terms in its dictionary definition, pair of
words occurring in a proverb, movie or song title,
and iii) pair of similar words by exploiting Vector
Space Models
        <xref ref-type="bibr" rid="ref14">(Salton et al., 1975)</xref>
        .
      </p>
      <p>In our approach, we make use of similar
resources but we only rely on a very limited set
of syntactic constructions (patterns) to correlate
words and build our association matrix.</p>
    </sec>
    <sec id="sec-3">
      <title>Solving the Ghigliottina game</title>
      <p>Building an automatic solver for the Ghigliottina
game requires a number of preliminary steps: i)
the analysis of real game instances, ii) the analysis
of patterns that could help the system in solving
the game, iii) the collection of the linguistic
resources necessary to tune the system for the task.
3.1</p>
      <sec id="sec-3-1">
        <title>Analysis of real game instances</title>
        <p>We have analyzed a sample of 100 game instances
that we personally collected from the last five
editions of the TV show. We found out that in most
cases each clue word is connected to the
solution because they form a Multiword Expression
(MWE). We have used this key observation in
designing our system. We started working on our
system before the announcement of the NLP4FUN
task. Since our system is not supervised, the extra
data-set is not adding any advantage to our
system. After the official data-set was released, we
found out that a good number of game instances
was confirming our initial finding. However we
also observed a number of unusual cases which
will discuss in more depth in Section 5.2.</p>
        <p>
          A MWE can be defined as a sequence of words
that presents some characteristic behaviour (at the
lexical, syntactic, semantic, pragmatic or
statistical level) and whose interpretation crosses the
boundaries between words
          <xref ref-type="bibr" rid="ref13">(Sag et al., 2002)</xref>
          .
MWEs have to be considered as lexical items
which convey a single meaning different from the
meanings of the constituents of the MWE, such as
in the idiomatic expression kick the bucket where
the simple addition of the meanings of kick and
bucket does not convey the meaning of to die.
        </p>
        <p>We have different classes of MWEs, such as
idioms (break a leg), verb particle constructions (to
call off ), light verbs constructions (to provoke a
reaction). For a detailed overview of MWEs in
NLP applications we refer the reader to Constant
et al. (2017). For the purpose of the current task
we considered only those MWEs characterized by
fixed syntactic patterns described in the following
section.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Pattern Analysis</title>
        <p>A first analysis of the tuples from the sample
mentioned above revealed that words in the clues are
typically nouns, verbs, or adjectives, while the
ones in the solutions are typically nouns or
adjectives (never verbs). A more detailed
investigation resulted in the definition of six patterns that
identify valid MWEs connecting clue and solution
pairs. We list them below with some examples
from our data-set (solution words are underlined):
A B: diario segreto (‘diary secret’ ! secret
diary), brutta caduta (‘ugly fall’ ! bad fall),
permesso premio (‘permit price’ ! good
behaviour license), dare gas (‘give gas’ !
accelerate).</p>
        <p>A det B: dare il permesso (‘give the permit’ !
authorize).</p>
        <p>A prep B: colpo di coda (‘flick of tail’ ! last
ditch effort).</p>
        <p>A conj B: stima e affetto (esteem and affection).</p>
      </sec>
      <sec id="sec-3-3">
        <title>A prepart B or A prep det B: virtù dei forti,</title>
        <p>part of the famous Italian proverb La calma
è la virtù dei forti (patience is the virtue of
the strong).</p>
        <p>A+B: compounds such as radio + attività =
radioattività (radio + activity = radioactivity).
3.3</p>
      </sec>
      <sec id="sec-3-4">
        <title>Linguistic Resources</title>
        <p>
          On the basis of the linguistic analysis described
above, we collected the linguistic resources which
we deemed necessary for the task. To this end we
used the following freely available corpora:
Paisà: 225 M words corpus automatically
annotated
          <xref ref-type="bibr" rid="ref11">(Lyding et al., 2014)</xref>
          .
itWaC: 1.5 B words corpus automatically
annotated
          <xref ref-type="bibr" rid="ref2">(Baroni et al., 2009)</xref>
          Wiki-IT-Titles: Wikipedia-IT titles
downloaded via WikiExtractor
          <xref ref-type="bibr" rid="ref1">(Attardi, 2016)</xref>
          .
Proverbs: 1955 proverbs from
          <xref ref-type="bibr" rid="ref16">Wikiquote
(2016)</xref>
          and 371 from an online collection
          <xref ref-type="bibr" rid="ref8">(Dige, 2016)</xref>
          .
        </p>
        <p>
          In addition, we have constructed the following
lexical resources:
DeMauro-Ext: words extracted from “Il Nuovo
vocabolario di base della lingua italiana”
          <xref ref-type="bibr" rid="ref3 ref6 ref7">(De
Mauro, 2016b)</xref>
          , extended with morphological
variations obtained by changing last vowel of
the word and checking if the resulting word
has frequency 1000 in Paisà.
        </p>
        <p>
          DeMauro-MWEs: MWEs extracted from the
“De Mauro online dictionary”
          <xref ref-type="bibr" rid="ref3 ref6 ref7">(De Mauro,
2016a)</xref>
          composed of 30,633 entries.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>System description</title>
      <p>In order to build our system, we started
processing the selected corpora via standard
tokenization (only single word tokens) and removal of
punctuation marks and non-word patterns. We
next constructed two lexical sets: CLEX to cover
the clue words, and SLEX to cover the
solution words. SLEX (composed of 7,942 nouns
and adjectives in DeMauro-Ext) is smaller than
CLEX (composed of 19,414 words from the full
DeMauro-Ext and DeMauro-MWEs) because
solution words are almost always nouns or
adjectives as described in Section 3.2.</p>
      <p>Secondly, we built a co-occurrence matrix Mc
which stores the counts ci;j for every pair of words
wi 2 SLEX and wj 2 CLEX such that wi
cooccurs with wj in the resources according to
patterns described in Section 3.2. Co-occurrence
patterns were extracted from Paisà and itWaC
with weight w = 1, from DeMauro-MWE with
w = 200, from Proverbs with w = 100, and
from Wiki-IT-Titles with w = 50. The
weight were chosen manually taking into account
the likelihood that a pattern in a given corpus
represented a valid MWE. Compound patterns (A+B)
were extracted from CLEX : for every word w in
CLEX if w = ab, a and b are both in CLEX ,
and a and b have at least 4 characters, the count
for the pair (a; b) is incremented by 1 in the
cooccurrence matrix.</p>
      <p>Thirdly, for every pair of words wi and wj
in Mc, we populate the association-score matrix
Mpmi via the Pointwise Mutual Information
measure:
where</p>
      <p>Mpmi(wi; wj ) = log
p(wi) =
p(wj ) =</p>
      <p>X</p>
      <p>X
wj2CLEX
wi2SLEX</p>
      <p>p(wi; wj )
p(wi) p(wj )
Mc(wi; wj )</p>
      <p>Mc(wi; wj )</p>
      <p>Mc(wi; wj )
p(wi; wj ) =</p>
      <p>Px2SLEX Mc(x; y)</p>
      <p>y2CLEX</p>
      <p>Finally, for a given game instance with the 5
clue words G = (wc1; wc2; wc3; wc4; wc5), we
choose the solution word wcs 2 SLEX such that:
(1)
(2)
(3)
(4)
ws =
c</p>
      <p>max
ws2SLEX wc2G</p>
      <p>X</p>
      <p>Mpmi(ws; wc)
(5)
that is, we choose the word in SLEX which
maximizes the score obtained by summing the pmi
between each clue word and the candidate word. If
two words are never seen co-occurring together in
a pattern in the training corpora, we assign to them
the lowest pmi value in Mpmi.</p>
      <p>The system has been implemented in Python
and the code is open source.1 After the matrix has
been loaded into memory the response time on an
average laptop is around 1-2 seconds.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>According to Basile et al. (2018), UNIOR4NLP
is the best performing system in the Evalita
NLP4FUN task. Table 1 provides the detailed
results, including split results on TV and Board
Game (BG) subsets. The system achieved a very
high performance: in more than half of the games
(64/105) it is able to guess the correct word.</p>
      <p>In the attempt to compare the performance of
our AI system with that of a top player we
analyzed the games played by Andrea Saccone, who
has been the biggest champion of the
Ghigliottina game so far: he was champion for 13 days
(3-15 March 2018), and he managed to find the
correct solution three times.2 In comparison,
UNIOR4NLP was able to win the same game
instances 9 times.</p>
      <p>SET</p>
      <sec id="sec-5-1">
        <title>TEST ALL</title>
      </sec>
      <sec id="sec-5-2">
        <title>TEST TV</title>
      </sec>
      <sec id="sec-5-3">
        <title>TEST BG DEV ALL DEV TV DEV BG</title>
        <p>SIZE
105
66
39
315
204
111</p>
        <p>MRR
0,64
0.67
0.60
0.56
0.61
0.48
0.82
0.88
0.72
0.80
0.85
0.71</p>
        <p>The plot in Figure 1 shows the distributions of
the scores for the correct and missed solutions of
our system on the full set of games in the
development and test set (420 in total). This allows us
to set a number of confidence values for our
system: if the system returns a solution of a game
with a score S 10 we can be reasonably
certain (68=69 = 99%) that the system has guessed
the correct solution, if 5 S &lt; 10 we are above
chance level (50=70 = 71%), if 0 S &lt; 5 we are
at chance level (56=112 = 50%) and if S is below
0, we are below chance level (48=169 = 28%).
In the development data-set we found several
cases which fall outside the patterns we observed
in out data-set. For instance, we noticed the
presence of digits in some clues or solution words
(1973, 33), game instances with a clue being also
the solution (‘sostanza’, ‘fuori’), and words being
spelled in different ways (‘tenère’, ‘tenere’).</p>
        <p>Moreover, we also observed a number of ‘clue
- solution’ pairs which are very difficult to relate.
We list below some examples with some possible
explanation:
g - orecchio: (g - ear) the letter ‘g’ has the
shape of a ear.
classe 1973 - 33: (class 1973 - 33) this game
instance was from 2006, and that year people
born in 1973 were 33 years old.
...—... - titanic: the clue being the S.O.S.
beacon in morse code.</p>
        <p>One possible reason for these inconsistent cases
is that Board Game edition use slightly different
criteria to correlate words,3 and that those from
3This is supported by results in Table 1, where Board
Game results are lower than those from the TV set.
the TV set date back to the very first editions of the
TV game (when correlation criteria where
probably not yet well defined).
5.2</p>
        <sec id="sec-5-3-1">
          <title>System error analysis</title>
          <p>In this section we analyze some types of errors that
our system makes, and we provide some
suggestion for possible improvement.</p>
          <p>
            Word similarity Although quite rare, few of the
clue-solution links can be explained by the
similarity relation. An example is the clue-solution
pair sincero-franco (sincere-frank). Those are not
easily captured by patterns of the types described
in Section 3.2, but could be included by means
of automatic detection of word similarity via
Vector Space Models
            <xref ref-type="bibr" rid="ref14">(Salton et al., 1975)</xref>
            as done in
Basile et al. (2016).
          </p>
          <p>Missing words As explained in Section 3.2, we
restricted the set of words in the solution set. This
choice, while helping the system to restrict the
search space, leads to some coverage issues. For
instance, pennello (brush) is one of the solutions
of the games in the test data not present in our
solution set. In the future we would like to
experiment increasing the size of the solution set while
avoiding performance and memory problems.
Wrong PoS Our system analyzes words in their
surface form, so it cannot distinguish cases where
the same word-form can have multiple Part of
Speech (PoS) (with different meaning). To avoid
this problem we could envision a system which
takes PoS and word-sense disambiguation into
consideration.</p>
          <p>Multiword clues Although the great majority of
the clues are constituted by a single word, there are
a few exceptions (typically names of saints). The
current system considers only single-word tokens,
so if a game has a 2-word clue, it is regarded as
two separate clues (their contribution is then
average to obtain the final score). The system could be
optimized by using a tokenizer which keeps
specific types of bigrams connected.</p>
          <p>
            Association metrics As described in Section 4,
we compute the association score between any
pair of words in the matrix via the Pointwise
Mutual Information measure (pmi). There is still a
big number of alternative measures
            <xref ref-type="bibr" rid="ref12">(Pecina, 2010)</xref>
            that might lead to higher performance.
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and future work</title>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>In this paper we described UNIOR4NLP, an
artificial player of “La Ghigliottina”, a challenging
game which requires linguistic knowledge to be
solved. We described the preliminary steps that we
made before developing our system (identifying
linguistic patterns that are relevant in the game)
as well as the algorithms and the methodology we
adopted. The system achieved a high performance
but we believe that with further tuning it can still
be improved.</p>
      <p>Future work will focus on adopting the same
methodology to automatically create novel game
instances: using the same association-matrix we
can choose a random word (the solution) and
present the list of 5 clues with high score.</p>
      <p>In order to make our system easily testable by
the scientific community and general public, we
have built an interactive version which can be
accessed via a Telegram bot4 and on Twitter5 (see
Figure 2).</p>
      <sec id="sec-7-1">
        <title>4https://t.me/Unior4NLPbot 5https://twitter.com/UNIOR4NLP</title>
        <p>This research has been partly supported by the
PON Ricerca e Innovazione 2014/20 fund.
Authorship contribution is as follows: Johanna Monti
is author of Sections 1, 2, 3.3 and 6; Federico
Sangati of Section 4 and 5, and Antonio Pascucci of
Sections 3.1. and 3.2.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Giuseppe</given-names>
            <surname>Attardi</surname>
          </string-name>
          .
          <year>2016</year>
          . Wikiextractor. http : //attardi.github.io/wikiextractor.
          <source>Last accessed on the 1st October</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Baroni</surname>
          </string-name>
          , Silvia Bernardini, Adriano Ferraresi, and
          <string-name>
            <given-names>Eros</given-names>
            <surname>Zanchetta</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>The wacky wide web: a collection of very large linguistically processed web-crawled corpora</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>43</volume>
          (
          <issue>3</issue>
          ):
          <fpage>209</fpage>
          -
          <lpage>226</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Marco de Gemmis, Pasquale Lops, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Solving a complex language game by using knowledgebased word associations discovery</article-title>
          .
          <source>IEEE Transactions on Computational Intelligence and AI</source>
          in Games,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <fpage>13</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Marco de Gemmis, Lucia Siciliani, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Overview of the evalita 2018 solving language games (nlp4fun) task</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing</source>
          and
          <article-title>Speech tools for Italian (EVALITA'18)</article-title>
          . CEUR.org, Turin, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Mathieu</given-names>
            <surname>Constant</surname>
          </string-name>
          , Güls¸en Eryig˘it, Johanna Monti, Lonneke Van Der Plas, Carlos Ramisch,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Rosner</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Amalia</given-names>
            <surname>Todirascu</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Multiword expression processing: A survey</article-title>
          .
          <source>Computational Linguistics</source>
          ,
          <volume>43</volume>
          (
          <issue>4</issue>
          ):
          <fpage>837</fpage>
          -
          <lpage>892</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Tullio De Mauro. 2016a. Il Nuovo De Mauro</surname>
          </string-name>
          (Online). https://dizionario.internazionale.it.
          <source>Last accessed on the 1st October</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Tullio De Mauro</surname>
          </string-name>
          . 2016b.
          <article-title>Il Nuovo vocabolario di base della lingua italiana (pdf version)</article-title>
          . https : / / www. internazionale . it / opinione / tullio - de - mauro /
          <year>2016</year>
          / 12 / 23 / il - nuovo
          <string-name>
            <surname>-</surname>
          </string-name>
          vocabolario
          <string-name>
            <surname>-</surname>
          </string-name>
          di
          <string-name>
            <surname>-</surname>
          </string-name>
          base
          <string-name>
            <surname>-</surname>
          </string-name>
          della
          <string-name>
            <surname>-</surname>
          </string-name>
          lingua - italiana.
          <source>Last accessed on the 1st October</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Antonio</given-names>
            <surname>Dige</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Raccolta di proverbi e detti italiani</article-title>
          . http : / / web . tiscali . it / proverbiitaliani.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <source>Downloaded on the 24th April</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>David A.</given-names>
            <surname>Ferrucci</surname>
          </string-name>
          , Anthony Levas, Sugato Bagchi, David Gondek, and
          <string-name>
            <surname>Erik</surname>
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Mueller</surname>
          </string-name>
          .
          <year>2013</year>
          . Watson: Beyond jeopardy! Artif. Intell.,
          <volume>199</volume>
          :
          <fpage>93</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Verena</given-names>
            <surname>Lyding</surname>
          </string-name>
          , Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice Dell'Orletta, Henrik Dittmann, Alessandro Lenci, and
          <string-name>
            <given-names>Vito</given-names>
            <surname>Pirrelli</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The PAISÀ corpus of italian web texts</article-title>
          .
          <source>In Proceedings of the 9th Web as Corpus Workshop (WaC-9)</source>
          , pages
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          . Association for Computational Linguistics, Gothenburg, Sweden.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Pavel</given-names>
            <surname>Pecina</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Lexical association measures and collocation extraction</article-title>
          .
          <source>Language Resources and Evaluation</source>
          ,
          <volume>44</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>137</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <surname>Ivan A Sag</surname>
          </string-name>
          , Timothy Baldwin, Francis Bond, Ann Copestake, and
          <string-name>
            <given-names>Dan</given-names>
            <surname>Flickinger</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Multiword expressions: A pain in the neck for nlp</article-title>
          .
          <source>In International Conference on Intelligent Text Processing and Computational Linguistics</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          . Springer.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>G.</given-names>
            <surname>Salton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C. S.</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>1975</year>
          .
          <article-title>A vector space model for automatic indexing</article-title>
          .
          <source>Commun. ACM</source>
          ,
          <volume>18</volume>
          (
          <issue>11</issue>
          ):
          <fpage>613</fpage>
          -
          <lpage>620</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          , Pasquale Lops, Pierpaolo Basile, and Marco De Gemmis.
          <year>2009</year>
          .
          <article-title>On the tip of my thought: Playing the guillotine game</article-title>
          .
          <source>In Proceedings of the 21st International Jont Conference on Artifical Intelligence, IJCAI'09</source>
          , pages
          <fpage>1543</fpage>
          -
          <lpage>1548</lpage>
          . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <surname>Wikiquote</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Proverbi italiani</article-title>
          . https : / / it . wikiquote . org / wiki / Proverbi _ italiani.
          <source>Downloaded on the 24th April</source>
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <surname>Georgios N Yannakakis and Julian Togelius</surname>
          </string-name>
          .
          <source>2018. Artificial Intelligence and Games</source>
          . Springer.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>