<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Overview of the EVALITA 2018 Solving language games (NLP4FUN) Task</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierpaolo Basile</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco de Gemmis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lucia Siciliani</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science Via E. Orabona, 4 - 70125 Bari University of Bari Aldo Moro</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. This paper describes the first edition of the “Solving language games” (NLP4FUN) task at the EVALITA 2018 campaign. The task consists in designing an artificial player for “The Guillotine” (La Ghigliottina, in Italian), a challenging language game which demands knowledge covering a broad range of topics. The game consists in finding a word which is semantically correlated with a set of 5 words called clues. Artificial players for that game can take advantage from the availability of open repositories on the web, such as Wikipedia, that provide the system with the cultural and linguistic background needed to find the solution.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Italiano. Questo lavoro descrive la
prima edizione del task “Solving
language games” (NLP4FUN) task,
proposto durante la campagna di valutazione
EVALITA 2018. Il task consiste nella
realizzazione di un giocatore artificiale
per “La Gigliottina”, un gioco linguistico
molto sfidante, la cui soluzione richiede
conoscenze in svariati campi. Il gioco
consiste nel trovare una parola il cui
significato e` correlato a quello di un insieme
di 5 parole, chiamate indizi. Un
giocatore artificiale per questo task potrebbe
sfruttare diverse sorgenti di conoscenza
disponibili online, come Wikipedia, che
forniscano al sistema le conoscenze
linguistiche e culturali necessarie per
arrivare alla soluzione.
Language games draw their challenge and
excitement from the richness and ambiguity of natural
language, and therefore have attracted the
attention of researchers in the fields of Artificial
Intelligence and Natural Language Processing. For
instance, IBM Watson is a system which
successfully challenged human champions of Jeopardy!,
a game in which contestants are presented with
clues in the form of answers, and must phrase their
responses in the form of a question
        <xref ref-type="bibr" rid="ref4 ref6">(Ferrucci et
al., 2010; Molino et al., 2015)</xref>
        . Another popular
language game is solving crossword puzzles. The
first experience reported in the literature is Proverb
        <xref ref-type="bibr" rid="ref5">(Littman et al., 2002)</xref>
        , that exploits large libraries
of clues and solutions to past crossword puzzles.
WebCrow is the first solver for Italian crosswords
        <xref ref-type="bibr" rid="ref3">(Ernandes et al., 2008)</xref>
        .
      </p>
      <p>
        The proposed task consists in designing a solver
for “The Guillotine” (La Ghigliottina, in Italian)
game. It is inspired by the final game of an Italian
TV show called “L’eredita`”. The game, broadcast
by Italian National TV, involves a single player,
who is given a set of five words - the clues - each
linked in some way to a specific word that
represents the unique solution of the game. Words
are unrelated to each other, but each of them has
a hidden association with the solution. Once the
clues are given, the player has one minute to find
the solution. For example, given the five clues:
sin, Newton, doctor, New York, bad, the solution
is apple, because: the apple is the symbol of
original sin in Christian theology; Newton discovered
the gravity by means of an apple; “an apple a day
keeps the doctor away” is a famous proverb; New
York city is also called “the big apple”; and “one
bad apple can spoil the whole bunch” is a
popular phrase which figuratively means that the
person doing wrong can have a negative influence on
those around him. “La Ghigliottina” is a
challenging language game which demands
knowledge covering a broad range of topics. Artificial
players for that game can take advantage from the
availability of open repositories on the web, such
as Wikipedia, that provide the system with the
cultural and linguistic background needed to
understand clues
        <xref ref-type="bibr" rid="ref1 ref8 ref9">(Basile et al., 2016; Semeraro et al.,
2009; Semeraro et al., 2012)</xref>
        .
      </p>
      <p>
        The task is part of EVALITA 2018, the
periodic evaluation campaign of Natural Language
Processing (NLP) and speech tools for the Italian
language
        <xref ref-type="bibr" rid="ref2">(Caselli et al., 2018)</xref>
        .
      </p>
      <p>The paper is organized as follows: Section 2
reports details about the task, the dataset and the
evaluation protocol, while Section 3 describes the
systems participating in the task, and Section 4
shows results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Task Description: Dataset, Evaluation</title>
    </sec>
    <sec id="sec-3">
      <title>Protocol and Measures</title>
      <p>An instance of the game consists of a set of 5 clue
words and 1 word given as the official solution for
that instance. We provided:
• a training set for the system development,
containing 315 instances of the game;
• a test set for the evaluation, containing 105
instances of the game.</p>
      <p>In order to measure the performance of the
participants on games having different levels of
difficulty, we provided instances taken both from the
TV game and from the official board game. In the
training set, 204 instances (64.8%) came from the
TV game, 111 (35.2%) from the board game. In
the test set, 66 instances (62.9%) were collected
from the TV game, 39 (37.1%) from the board
game. In order to discourage participants from
cheating (e.g. finding the solution manually), in
the test set we included 300 fake games
automatically created by us. Obviously, fake games were
not taken into account in the evaluation.</p>
      <p>Any knowledge resource can be used to build
an artificial player, except further instances of the
game. For each instance of the game, a ranked
list of maximum 100 tentative solutions must be
provided.
2.1</p>
    </sec>
    <sec id="sec-4">
      <title>Data Format</title>
      <p>Both development and test set were provided in
XML format:
&lt;games&gt;
&lt;game&gt;
&lt;id&gt;3fc953bd...&lt;/id&gt;
&lt;clue&gt;uomo&lt;/clue&gt;</p>
      <p>The XML file consists of a root element games
which contains several game elements. Each game
has five clue elements and one solution. Moreover,
the element type specifies the type of the game: TV
or boardgame.</p>
      <p>The ranked list of solutions must be provided in
a single plain text file, according to the following
format:
id solution score rank time</p>
      <p>Values were separated by a whitespace
character; time taken by the system to compute the list
was also reported in milliseconds. An example of
a ranked list of solutions is reported below:
3fc953bd-... porta 0.978 1 3459
3fc953bd-... chiesa 0.932 2 3251
3fc953bd-... santo 0.897 3 4321
...
3fc953bd-... carta 0.321 100 2343
...
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>As evaluation measure, we adopt a weighted
version of Mean Reciprocal Rank (MRR). Since time
is a critical factor in this game, the Reciprocal
Rank is weighted by a function which lowers the
score based on the time taken by the computation.
In fact, in the TV game, the player has only one
minute to provide the solution. Taking into
account these factors, the evaluation measure was:
1 X 1 max( 1 , 1 )
|G| g∈G rg tg 10
(1)
where G is the set of games and rg is the rank
of the solution, while tg denotes the minutes taken
by the system to give the tentative solutions.
Systems that took more than 10 minutes are equally
penalized.</p>
      <p>The evaluation was performed only on the 105
test games, for which we knew the correct solution
(results provided for fake games were excluded).</p>
      <p>We provided a separate ranking for TV and
boardgame, but the final ranking was computed on
the the whole test set.
3</p>
    </sec>
    <sec id="sec-6">
      <title>Systems</title>
      <p>
        Twelve teams registered in the task, but only two
of them actually submitted the results for the
evaluation. A short description of each system
follows:
UNIOR4FUN - The system described in
        <xref ref-type="bibr" rid="ref7">(Sangati et al., 2018)</xref>
        is based on the idea that
clue words and corresponding solution are
often part of a multiword expression.
Therefore, the system exploits six linguistic
patterns1 that identify valid multiword
expressions connecting clue and solution pairs. The
core of the proposed solution is a set of freely
available corpora and lexical resources built
by the authors, which are used to find
potential solutions by computing mutual
information.
      </p>
      <p>System by Luca Squadrone - In (Squadrone,
2018), the author proposed an algorithm
based on two steps. In the first one, for each
clue of a game, a list of relevant keywords
is retrieved from linguistic corpora, so
that each clue is associated with keywords
representing the concepts having a relation
with that clue. Then, words at the
intersection of the retrieved sets are considered
as candidate solutions. In the second step,
another knowledge source made of proverbs,
book and movie titles, word definitions, is
exploited to count co-occurrences of clues
and candidate solutions.
4</p>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
    </sec>
    <sec id="sec-8">
      <title>System</title>
      <p>UNIOR4NLP
Squadrone</p>
      <p>Results of the evaluation in terms of M RR are
reported in Table 1. The best performance is
obtained by the UNIOR4NLP team. They reached a
1We must underline that patterns are extracted from a set
of 100 games collected by authors. This is in contrast with the
task guidelines; however, the games are not used for training
the system.
remarkable performance: MRR is very high, thus
showing that the system is able to place the
solution in the first positions of the ranking. We report,
also, the standard MRR (M RR(std)) computed
without taking into account the time. We notice
that for UNIOR4NLP the value is equal to M RR:
the system is able to provide the solution always in
the first minute, while the Squadrone system takes
more time for solving games.</p>
      <p>Table 2 reports the results by game type (66
instances from the TV game and 39 instances from
the boardgame). UNIOR4NLP shows similar
results for both the game types, while the system
proposed by Squadrone performs better on board
games.</p>
      <p>One possible explanation for this difference is
that board games are meant just for fun; they are
designed for the average player, whereas those
taken from the TV game are more difficult to solve
because they are intended to challenge the
contestants of the show who try to win a money prize.
Therefore, TV games generally have very specific
clues and require more extensive knowledge about
world facts and particular topics to find the
solution than the average player has. As a
consequence, the UNIOR4NLP solution based on
specific multiword expressions extracted from several
knowledge sources shows a more balanced
performance than the other system.</p>
      <p>However, despite the UNIOR4NLP system
obtained remarkable results, very difficult games,
requiring some kind of inference, are missed. For
example, for the following clues: uno, notte, la
trippa, auto, palazzo2, the solution is portiere
(porter). In order to solve that game, two difficult
inferences are needed:
• uno is the number generally assigned to the
role of the goolkeeper (portiere) in football
teams;
• “La Trippa” is the surname of “Antonio La
Trippa”, a character of the Italian movie “Gli
onorevoli”, whose job is the porter (portiere)
of a building.</p>
      <p>
        We hope that in a further edition of this task
participants will take into account these kind of games
in which the simple co-occurrence of words it is
not enough for solving the game. This is the most
2In English: one, night, “la trippa” (it was intended as a
surname in this case), car, building
challenging aspect of this game. In order to
compare system performance by taking into account
the different levels of difficulty of the games, we
plan to annotate guillottines with this information
provided by human players. A deeper analysis of
the results obtained by each system is provided in
the corresponding technical reports
        <xref ref-type="bibr" rid="ref7">(Sangati et al.,
2018; Squadrone, 2018)</xref>
        .
      </p>
      <p>Finally, by looking at the statistics about the
participation (12 registered teams, but only 2 of
them submitted the results), we conclude that the
task is attractive but perhaps it is too hard to solve.
For further task editions, we plan to support the
participants by providing pre-processed textual
resources useful for solving the task.
5</p>
    </sec>
    <sec id="sec-9">
      <title>Conclusions</title>
      <p>Language games draw their challenge and
excitement from the richness and ambiguity of natural
language. This type of games are inconsistent with
the closed world assumption: no fixed sets of rules
are sufficient to define the game play. The
proposed task consisted in building an artificial player
for a challenging language game which requires
from the player a strong linguistic and cultural
background. The systems participating in the task
were designed according to this idea: solving the
game strongly depends on the background
knowledge of the system. On the other hand, the results
demonstrated that filling in the system with a solid
background knowledge is not enough to find the
solution, but strong NLP algorithms are required
to discover hidden correlation among words. In
fact, only the system based on specific linguistic
patterns and multiword expressions was able to
achieve high performance. Moreover, some games
required a non-trivial inference step. For this kind
of games, systems must be equipped with deeper
reasoning capabilities. We hope that in further
editions of the task, participants will propose
solutions that deal with this issue.</p>
      <p>Luca Squadrone. 2018. Computer challenges
guillotine: how an artificial player can solve a complex
language TV game with web data analysis. In
Tommaso Caselli, Nicole Novielli, Viviana Patti, and
Paolo Rosso, editors, Proceedings of the 6th
evaluation campaign of Natural Language Processing and
Speech tools for Italian (EVALITA’18), Turin, Italy.
CEUR.org.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          , Marco de Gemmis, Pasquale Lops, and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Solving a complex language game by using knowledge-based word associations discovery</article-title>
          .
          <source>IEEE Transactions on Computational Intelligence and AI</source>
          in Games,
          <volume>8</volume>
          (
          <issue>1</issue>
          ):
          <fpage>13</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Tommaso</given-names>
            <surname>Caselli</surname>
          </string-name>
          , Nicole Novielli, Viviana Patti, and
          <string-name>
            <given-names>Paolo</given-names>
            <surname>Rosso</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Evalita 2018: Overview of the 6th evaluation campaign of natural language processing and speech tools for italian</article-title>
          .
          <source>In Tommaso Caselli</source>
          , Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of Sixth Evaluation Campaign of Natural Language Processing and Speech Tools for Italian. Final Workshop (EVALITA</source>
          <year>2018</year>
          ), Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Marco</given-names>
            <surname>Ernandes</surname>
          </string-name>
          , Giovanni Angelini, and
          <string-name>
            <given-names>Marco</given-names>
            <surname>Gori</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>A web-based agent challenges human experts on crosswords</article-title>
          .
          <source>AI Magazine</source>
          ,
          <volume>29</volume>
          (
          <issue>1</issue>
          ):
          <fpage>77</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Ferrucci</surname>
          </string-name>
          ,
          <string-name>
            <surname>Eric Brown</surname>
          </string-name>
          , Jennifer Chu-Carroll,
          <string-name>
            <given-names>James</given-names>
            <surname>Fan</surname>
          </string-name>
          , David Gondek, Aditya A Kalyanpur,
          <string-name>
            <given-names>Adam</given-names>
            <surname>Lally</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J William</given-names>
            <surname>Murdock</surname>
          </string-name>
          , Eric Nyberg, John Prager, et al.
          <year>2010</year>
          .
          <article-title>Building watson: An overview of the deepqa project</article-title>
          .
          <source>AI magazine</source>
          ,
          <volume>31</volume>
          (
          <issue>3</issue>
          ):
          <fpage>59</fpage>
          -
          <lpage>79</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Michael L Littman</surname>
          </string-name>
          ,
          <article-title>Greg A Keim,</article-title>
          and
          <string-name>
            <given-names>Noam</given-names>
            <surname>Shazeer</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>A probabilistic approach to solving crossword puzzles</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>134</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>23</fpage>
          -
          <lpage>55</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Piero</given-names>
            <surname>Molino</surname>
          </string-name>
          , Pasquale Lops, Giovanni Semeraro, Marco de Gemmis, and
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Playing with knowledge: A virtual player for who wants to be a millionaire? that leverages question answering techniques</article-title>
          .
          <source>Artificial Intelligence</source>
          ,
          <volume>222</volume>
          :
          <fpage>157</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Federico</given-names>
            <surname>Sangati</surname>
          </string-name>
          , Antonio Pascucci, and
          <string-name>
            <given-names>Johanna</given-names>
            <surname>Monti</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Exploiting Multiword Expressions to solve “La Ghigliottina”</article-title>
          . In Tommaso Caselli, Nicole Novielli, Viviana Patti, and Paolo Rosso, editors,
          <source>Proceedings of the 6th evaluation campaign of Natural Language Processing and Speech tools for Italian (EVALITA'18)</source>
          , Turin, Italy. CEUR.org.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          , Pasquale Lops, Pierpaolo Basile, and Marco de Gemmis.
          <year>2009</year>
          .
          <article-title>On the Tip of My Thought: Playing the Guillotine Game</article-title>
          . In Craig Boutilier, editor,
          <source>IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence</source>
          , Pasadena, California, USA, July
          <volume>11</volume>
          -
          <issue>17</issue>
          ,
          <year>2009</year>
          , pages
          <fpage>1543</fpage>
          -
          <lpage>1548</lpage>
          . Morgan Kaufmann.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Semeraro</surname>
          </string-name>
          , Marco de Gemmis, Pasquale Lops, and
          <string-name>
            <given-names>Pierpaolo</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>An artificial player for a language game</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>27</volume>
          (
          <issue>5</issue>
          ):
          <fpage>36</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>