=Paper=
{{Paper
|id=None
|storemode=property
|title=OTTHO: An Artificial Player for a Complex Language Game
|pdfUrl=https://ceur-ws.org/Vol-860/paper8.pdf
|volume=Vol-860
|dblpUrl=https://dblp.org/rec/conf/aiia/SemeraroLGB12
}}
==OTTHO: An Artificial Player for a Complex Language Game==
<pdf width="1500px">https://ceur-ws.org/Vol-860/paper8.pdf</pdf>
<pre>
    OTTHO: An Artificial Player for a Complex
              Language Game

 Giovanni Semeraro, Pasquale Lops, Marco de Gemmis, and Pierpaolo Basile

                  Dept. of Computer Science - University of Bari
                      Via E. Orabona, 4 - 70125 Bari - Italia
                {semeraro,lops,degemmis,basilepp}@di.uniba.it


      Abstract. This paper describes OTTHO (On the Tip of my THOught),
      an artificial player able to solve a very popular language game, called
      “The Guillotine”, broadcast by the Italian National TV company. The
      game demands knowledge covering a broad range of topics, such as pol-
      itics, literature, history, proverbs, and popular culture. The rule of the
      game is simple: the player observes five words, generally unrelated to each
      other, and in one minute she has to provide a sixth word, semantically
      connected to the others. In order to find the solution, a human being has
      to perform a complex memory retrieval task within the facts retained
      in her own knowledge, concerning the meanings of thousands of words
      and their contextual relations. In order to make this task executable by
      machines, machine reading techniques are exploited for knowledge ex-
      traction from the web, while Artificial Intelligence techniques are used
      to infer new knowledge, in the form of keywords, from the extracted
      information.


1   Background and Motivation
The literature classiﬁes games related to the language in two main categories:
word games and language games [5]. Word games do not involve true language,
because word meanings are not important. An example of word game is Scrabble,
in which players take turn placing letters in a grid to form words. Language
games, such as crosswords or “Who wants to be a millionaire?”, strongly involve
natural language, since word meanings play an important role. Language games
draw their challenge and excitement from the richness and ambiguity of natural
language, which is, on the other side, the main source of complexity for machines.
OTTHO is a system designed to provide solutions for The Guillotine game, in
which the player is given a set of ﬁve words (clues), each linked in some way
to a speciﬁc word that represents the unique solution of the game. Words are
unrelated to each other, but each of them is strongly related to the solution. Once
the ﬁve clues are given, the player has one minute to guess the right answer.
For example, given the clues sin, newton, doctor, pie, new york, the solution is
apple, because in the popular Christian tradition the apple is the forbidden fruit
in the Book of Genesis, that is the symbol of the original sin, Newton discovered
the gravity by means of an apple, “an apple a day takes the doctor away” is a
proverb, the apple pie is a fruit cake, and new york city is called “the big apple”.
Often the solution is not so intuitive and only players with a strong background
knowledge are able to ﬁnd the correct word. Indeed, in order to ﬁnd the solution,
a human being has to perform a complex memory retrieval task within the facts
retained in her own knowledge, concerning the meanings of thousands of words
and their contextual relations [9].
    No ﬁxed sets of rules are suﬃcient to deﬁne the game play, thus solving the
game depends exclusively on the background knowledge of the system, which
is created by machine reading techniques. They analyze unstructured informa-
tion stored in open knowledge sources on the Web, such as dictionaries and
Wikipedia, and build a memory of linguistic competencies and world facts that
can be eﬀectively exploited by the system for a deeper understanding of clues.
Relatedness between terms, providing the evidence of a strong relationship be-
tween words, is the key factor for ﬁnding a set of candidate words that likely
contains the solution. To this purpose, OTTHO exploits a reasoning mechanism
based on Spreading Activation techniques [4, 3] which allows matching clues
with the background knowledge. The main motivation for designing an artiﬁcial
player for this game is the challenge of providing the machine with both the cul-
tural and linguistic background knowledge which makes it similar to a human
being, with the ability of interpreting natural language documents and reasoning
on their content. Our feeling is that the approach presented in this work has a
great potential for other more practical applications besides solving a language
game, which are mentioned in the last section of the paper.


2   System Description

An extended knowledge base must be built for representing the cultural and
linguistic background knowledge of the artiﬁcial player. After a deep analysis
of the correlation between the clues and the solution, the following knowledge
sources have been processed to build the memory of the system:

 – Encyclopedia – the Italian version of Wikipedia;
 – Dictionary – the De Mauro Paravia Italian on-line dictionary (no longer
   available);
 – Compound forms – groups of words that often go together having a speciﬁc
   meaning, crawled from the IntraText Digital Library
   (http://www.intratext.com/bsi/listapolirematiche/indalfa.htm)
   and the on-line dictionary TLIO - Tesoro della Lingua Italiana delle Origini
   (http://ovipc44.ovi.cnr.it/Tliopoli/);
 – Proverbs and Aphorisms – the Italian version of Wikiquote;
 – Movies – descriptions of Italian movies crawled from the Internet Movie
   Database (http://www.imdb.com);
 – Songs – Italian songs crawled from OnlyLyrics
   (http://www.onlylyrics.com/);
 – Books – book titles crawled from several web sites.
The above mentioned types of sources have diﬀerent characteristics, therefore
an important problem is to deﬁne a uniform representation of the information
they store, which is discussed in the next section.


2.1    The Memory of the System: Cognitive Units

Formerly we modeled each source as a term-term matrix whose cells represent
the degree of correlation between the term on the row and the one on the col-
umn, according to speciﬁc heuristics [8]. In the new version of the system, we
adopt a novel strategy based on the ACT theory of fact memory [1], according
to which information in long term memory of human beings is encoded as Cog-
nitive Units (CUs) that form an interconnected network. A cognitive unit is a
piece of information (e.g. a proposition) we can hold consciously in our focus
of attention, together with all the links (many of which are unconscious) that
can be established with other parts of our cognitive structure. According to this
idea, we see knowledge sources as repositories of CUs.
    Because of the heterogeneity of the knowledge sources involved in the pro-
cess, two problems must be solved in the implementation of the step that turns
knowledge sources into a unique machine-readable knowledge base, with concepts
represented in a homogeneous format:

 – identiﬁcation of the basic unit representing a concept in each speciﬁc knowl-
   edge source (e.g. a Wikipedia article, a lemma in a dictionary);
 – deﬁnition of a unique representation model for cognitive units in the back-
   ground knowledge.

Since the information provided by the knowledge sources is represented in textual
form, we regard a CU as the structured textual description of a concept. For each
knowledge source included in the memory of OTTHO, the basic unit describing
a concept is chosen: a lemma in the Dictionary, an article in Wikipedia, etc.
Basic units are turned into CUs by machine reading techniques which analyze
the text and build the corresponding descriptions of recognized concepts. Each
CU is represented by two ﬁelds:

1. head – words identifying the concept represented by the CU;
2. body – words describing the concept.

In a more compact way:

 CU = [head|body]

For example, the Wikipedia article that provides the description of the con-
cept Artificial Intelligence 1 is turned into the corresponding CU and stored in a
repository of cognitive units:
1
    http://en.wikipedia.org/wiki/Artificial intelligence
CU_124 = [Artificial Intelligence (3.56) |AI (1.23),
          machine (1.14), computer science (2.58),
          Alan Turing (2.77)]


Keywords in CUs are assigned a weight representing how important they are for
that concept, similar to the bag of words approach in Information Retrieval (IR).
The main advantage of this representation strategy is that an IR model can be
adopted for retrieving relevant CUs associated with a keyword. 743,192 CUs have
been deﬁned: 584,527 for the Encyclopedia, 126,741 for the Dictionary, 10,744
for Compound Forms, 11,257 for Proverbs and Aphorisms, and 9,923 for Songs,
Movies and Books. The complete description of the machine reading process can
be found in [7]. This kind of “knowledge infusion” into the system creates a
memory of world facts and linguistic knowledge.
   As depicted in Figure 1, clues are used to query the memory of OTTHO, i.e.
the CU repository, in order to retrieve the most appropriate “pieces of knowl-
edge” related to them. Both clues and retrieved CUs are then passed to the
reasoning algorithm, which produces a new list of keywords associated with
them, which are possible solutions of the game. The reasoning mechanism is
described in the following section.


        Fig. 1. Process for finding candidate solutions for a run of the game
2.2   The Brain of the System: Spreading Activation Network
Spreading activation has proved to be a model with a high degree of explana-
tory power in cognitive psychology [1]. One of the merits of this model is that it
captures the way knowledge is represented as well as the way it is processed. In
this model, knowledge is represented in terms of nodes and associative pathways
between the nodes. Speciﬁcally, concepts are represented in memory as nodes,
and relations between concepts as associative pathways between the correspond-
ing nodes. When part of the memory network is activated, activation spreads
along the associative pathways to related areas in memory. The spread of acti-
vation mechanism selects the areas of the memory network that are more ready
to further cognitive processing.
    Since words and their meanings are stored in the mind in a network-like
structure [1], we adopted this model as the reasoning mechanism of OTTHO.
In the network for The Guillotine game, called SAN - Spreading Activation
Network, nodes represent either words or CUs, while links denote associations
between them, obtained from CU repositories. Links are weighted with a score
that measures the strength of the association.
    The SAN for a run of the game is built in 3 steps: (1) nodes corresponding to
clues are included in the SAN; (2) clues are connected to the most appropriate
CUs retrieved by a search mechanism which queries the CU repositories by using
clues; (3) retrieved CUs are connected to nodes representing the most informative
terms associated with them. An example of SAN is depicted in Figure 2. All in
all, the SAN is the part of the background knowledge of the system which is
related to the clues of the game to be solved. The spreading process over the
SAN starts from clue nodes and propagates ﬁrst to CUs and then to words
connected to CUs. In fact, it is a search process which selects, among all the
nodes in the SAN, those which are strongly connected to clues, and therefore
are good candidate solutions. Technical details about the spreading algorithm
are reported in [7].


3     Into and Beyond the Game
Figure 2 shows the OTTHO user interface. In this scenario, the system supports
the human player by showing, within the text area on the bottom-left, some
candidate solutions for the clues visualized on the top-left of the window. A
timer is displayed on the clues, near the OTTHO logo, which warns the player
on time to provide the answer. The SAN is depicted on the right. The solution
for this run is letto (bed), which actually appears in the list of suggestions.
    Figure 3 emphasizes the part of the SAN in which the solution is found.
Notice that the solution is connected with the clue “galline” since the idiom
“andare a letto con le galline” (“to go to bed with the chickens”, that means
“very early”) was found in the CU repository. By clicking on the CU node of the
idiom a795 and then on the “information” button on the top of the SAN, the
explanation for the solution is shown by OTTHO in the “polirematiche” box on
the left.
                             Fig. 2. A run of the game


    The system, besides supporting the player, could also assist authors of the
game for the veriﬁcation of the uniqueness of the “oﬃcial” solution. Indeed,
the “create” button (on the left of the GUI) allows users to propose their own
clues and to verify whether other words, in addition to the oﬃcial answer, are
consistent with the clues. In conclusion, the proposed system has a great po-
tential for other practical applications both in the area of Information Retrieval
and Information Filtering. For example, words suggested by OTTHO could be
used by search engines for intelligent query expansion [2] or by content-based
recommender systems for the computation of similarity between items [6].


References
1. Anderson, J.R.: A Spreading Activation Theory of Memory. Journal of Verbal Learn-
   ing and Verbal Behavior 22, 261–295 (1983)
2. Carpineto, C., Romano, G.: A survey of automatic query expansion in information
   retrieval. ACM Comput. Surv. 44(1), 1 (2012)
3. Collins, A.M., Loftus, E.F.: A Spreading Activation Theory of Semantic Processing.
   Psychological Review 82(6), 407–428 (1975)
           Fig. 3. Solution found in the SAN with explanation by OTTHO


4. Crestani, F.: Application of Spreading Activation Techniques in Information Re-
   trieval. Artificial Intelligence 11(6), 453–482 (1997)
5. Littman, M.L.: Review: Computer language games. In: Revised Papers from the
   Second International Conference on Computers and Games. pp. 396–404. CG ’00,
   Springer-Verlag, London, UK (2002)
6. Lops, P., de Gemmis, M., Semeraro, G.: Content-based Recommender Systems:
   State of the Art and Trends. In: Recommender Systems Handbook, pp. 73–105.
   Springer (2011)
7. Semeraro, G., de Gemmis, M., Lops, P., Basile, P.: Knowledge Infusion from Open
   Knowledge Sources: an Artificial Player for a Language Game. IEEE Intelligent
   Systems DOI: http://doi.ieeecomputersociety.org/10.1109/MIS.2011.37. In press
8. Semeraro, G., Lops, P., Basile, P., de Gemmis, M.: On the Tip of My Thought:
   Playing the Guillotine Game. In: Proc. of the 21st Int. Joint Conference on Artificial
   Intelligence. pp. 1543–1548. Morgan Kaufmann (2009)
9. Spitzer, M.: The Mind within the Net: Models of Learning, Thinking, and Acting.
   MIT Press (2000)

</pre>