<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Natat in Cerebro: Intelligent Information Retrieval for “The * Guillotine” Language Game</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pierpaolo Basile</string-name>
          <email>basilepp@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pasquale Lops</string-name>
          <email>lops@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco de Gemmis</string-name>
          <email>degemmis@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giovanni Semeraro</string-name>
          <email>semeraro@di.uniba.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science, University of Bari</institution>
          ,
          <addr-line>via E. Orabona, 4, Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>28</lpage>
      <abstract>
        <p>This paper describes OTTHO (On the Tip of my THOught), a system designed for solving a language game, called Guillotine. The rule of the game is simple: the player observes five words, generally unrelated to each other, and in one minute she has to provide a sixth word, semantically connected to the others. The system performs retrieval from several knowledge sources, such as a dictionary, a set of proverbs, and Wikipedia to realize a knowledge infusion process. The main motivation for designing an artificial player for Guillotine is the challenge of providing the machine with the cultural and linguistic background knowledge which makes it similar to a human being, with the ability of interpreting natural language documents and reasoning on their content. Our feeling is that the approach presented in this work has a great potential for other more practical applications besides solving a language game.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>BACKGROUND AND MOTIVATION</title>
      <p>
        Words are popular features of many games, and they play
a central role in many language games. A language game
is defined as a game involving natural language in which
word meanings play an important role. Language games
draw their challenge and excitement from the richness and
ambiguity of natural language. In this paper we present a
system that tries to play the Guillotine game. The
Guillotine is a language game played in a show on RAI, the Italian
National Broadcasting Service, in which a player is given a
set of five words (clues), each linked in some way to a
specific word that represents the unique solution of the game.
∗The full version appears in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
      </p>
      <p>She receives one word at a time, and must choose between
two different proposed words: one is correct, the other one
is wrong. Each time she chooses the wrong word, the prize
money is divided by half (the reason for the name
Guillotine). The five words are generally unrelated to each other,
but each of them is strongly related to the word representing
the solution. Once the five clues are given, the player has one
minute to provide the solution. An example of the game
follows: Given the five words Capital, Pope, City, Colosseum,
YellowAndRed, the solution is Rome, because Rome is
Capital of Italy, the Pope lives in Rome, Rome is a City, the
Colosseum is in Rome and YellowAndRed is an alternative
name for one of the Rome football teams. Often the solution
is not so intuitive and the player needs different knowledge
sources to reason and find the correct word.</p>
      <p>OTTHO (On the Tip of my THOught) tries to solve the
final stage of the Guillotine game. We assume that the five
words are provided at the same time, neglecting the initial
phase of choosing the words, that only concerns the
reduction of the initial prize.
2.</p>
    </sec>
    <sec id="sec-2">
      <title>OTTHO</title>
      <p>Guillotine is a cultural and linguistic game, and for this
reason we need to define an extended knowledge base for
representing the cultural and linguistic background
knowledge of the player. Next, we have to realize a reasoning
mechanism able to retrieve the most appropriate pieces of
knowledge necessary to solve the game.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>The Knowledge Sources</title>
      <p>After a deep analysis of the correlation between the clues
and the solution, we chose to include the following
knowledge sources, ranked according to the frequency with which
they were helpful in finding the solution of the game:
1) Dictionary: the word representing the solution is
contained in the description of a lemma or in some example
phrases using that lemma;
2) Encyclopedia: as for the dictionary, the description of
an article contains the solution, but in this case it is
necessary to process a more detailed description of information;
3) Proverbs and aphorisms: short pieces of text in which
the solution is found very close to the clues.</p>
      <p>These sources need to be organized and processed in
order to model relationships between words. The modeling
process must face the problem of the different
characteristics of the several knowledge sources, resulting in a set of
different heuristics for building the whole model on which
to apply the reasoning mechanism. Since we are interested
in finding relationships existing between words, we decided
to model each knowledge source using the set of correlations
existing between terms occurring in that specific source (a
proverb, a definition in a dictionary, etc). Indeed, we used
a term-term matrix containing terms occurring in the
modeled knowledge source in which each cell contains the weight
representing the degree of correlation between the term on
the row and the one on the column. The computation of
weights is different for each type of knowledge source.</p>
      <p>
        For the dictionary, we used the on-line De Mauro
Paravia Italian dictionary1, containing 160,000 lemmas. We
obtained a lemma-term matrix containing weights
representing the relationship between a lemma and terms used
to describe it. Because of the general lemma-definition
organization of entries in the dictionary, we can fairly claim
that the model is language-independent. Each Web page
describing a lemma has been preprocessed in order to extract
the most relevant information useful for computing weights
in the matrix. The text of each Web page is processed in
order to skip the HTML tags, even if the formatting
information is preserved in order to give higher weights to terms
formatted using bold or italic font. Stopwords are
eliminated and abbreviations used in the definition of the lemma
are expanded. Weights in the matrix are computed using a
classical strategy based on a tf-idf scheme, and normalized
with respect to the length of the definition in which the term
occurs and the length of the entire dictionary. A detailed
description of the heuristics for modeling the dictionary is
reported in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>As for the dictionary, a tf-idf strategy has been used for
defining the weights in the term-term matrix modeling the
knowledge source of proverbs, a collection of 1, 600 proverbs
gathered from the web2.</p>
      <p>
        The process of modeling Wikipedia is different from the
one adopted for proverbs and dictionary, due to the huge
amount of information to be processed. We adopted a more
scalable approach for processing Wikipedia entries, by
using models for representing concepts through vectors in a
high dimensional space, such as the Semantic Vectors or
WordSpace models [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The core idea behind semantic
vectors is that words and concepts are represented by points
in a mathematical space, and this representation is learned
from text in such a way that concepts with similar or related
meanings are near to one another in that space (geometric
metaphor of meaning). The basis of semantic vectors model
is the theory of meaning called distributional hypothesis,
according to which the meaning of a word is determined by
the rules of its use in the context of ordinary and concrete
language behavior. This means that words are semantically
similar to the extent that they share contexts (surrounding
words). If ‘beer’ and ‘wine’ frequently occur in the same
context, say after ‘drink’, the hypothesis states that they
are semantically related or similar.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>The Reasoning Mechanism</title>
      <p>
        We adopt a spreading activation model [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which has been
used in other areas of Computer Science such as
Information Retrieval [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] as reasoning mechanism for OTTHO. The
pure spreading activation model consists of a network data
structure of nodes interconnected by links, that may be
labeled and/or weighted and usually have directions. In the
network for “The Guillotine” game, nodes represent words,
while links denote associations between words obtained from
the knowledge sources. Spreading in the network is triggered
by clues. The activation of clues causes words with related
meanings (as modeled in the knowledge sources) to become
active. At the end of the weight propagation process, the
most “active” words represent good candidates to be the
solution of the game.
3.
      </p>
    </sec>
    <sec id="sec-5">
      <title>BEYOND THE GAME</title>
      <p>
        The system could be used for implementing an alternative
paradigm for associative retrieval on collections of text
documents [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], in which an initial indexing phase of documents
can spread further “hidden” terms for retrieving other
related documents. The identification of hidden terms might
rely on the integration of specific pieces of knowledge
relevant for the domain of interest. This might represent a
valuable strategy for several domains, such as search engine
advertising, in which customers’ search terms (and interests)
need to be matched with those of advertisers. Spreading
activation can be also effectively combined with document
retrieval for semantic desktop search.
4.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Collins</surname>
          </string-name>
          and
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Loftus</surname>
          </string-name>
          .
          <source>A Spreading Activation Theory of Semantic Processing. Psychological Review</source>
          ,
          <volume>82</volume>
          (
          <issue>6</issue>
          ):
          <fpage>407</fpage>
          -
          <lpage>428</lpage>
          ,
          <year>1975</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>F.</given-names>
            <surname>Crestani</surname>
          </string-name>
          .
          <source>Application of Spreading Activation Techniques in Information Retrieval. Artificial Intelligence</source>
          ,
          <volume>11</volume>
          (
          <issue>6</issue>
          ):
          <fpage>453</fpage>
          -
          <lpage>482</lpage>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lops</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          , M. de Gemmis, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Semeraro</surname>
          </string-name>
          .
          <article-title>Language Is the Skin of My Thought”: Integrating Wikipedia and AI to Support a Guillotine Player</article-title>
          .
          <source>In AI*IA</source>
          <year>2009</year>
          , LNCS
          <volume>5883</volume>
          , pages
          <fpage>324</fpage>
          -
          <lpage>333</lpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sahlgren</surname>
          </string-name>
          . The
          <string-name>
            <surname>Word-Space Model</surname>
          </string-name>
          :
          <article-title>Using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces</article-title>
          .
          <source>PhD thesis</source>
          , Stockholm University, Department of Linguistics,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Semeraro</surname>
          </string-name>
          and
          <string-name>
            <given-names>M. d. G. P.</given-names>
            <surname>Lops</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Basile</surname>
          </string-name>
          .
          <article-title>On the Tip of my Thought: Playing the Guillotine Game</article-title>
          .
          <source>In IJCAI 2009</source>
          , pages
          <fpage>1543</fpage>
          -
          <lpage>154</lpage>
          . Morgan Kaufmann,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>