V. Kůrková et al. (Eds.): ITAT 2014 with selected papers from Znalosti 2014, CEUR Workshop Proceedings Vol. 1214, pp. 79–84
http://ceur-ws.org/Vol-1214, Series ISSN 1613-0073, c 2014 R. Rosa


                                                Fairytale Child Chatbot

                                                              Rudolf Rosa

   Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Czech Republic,
                                                   rosa@ufal.mff.cuni.cz,
                                  WWW home page: http://ufal.mff.cuni.cz/rudolf-rosa

Abstract: Fairytale Child (Pohádkové dítě) is a simple                    In this work, we present a Czech language chatbot for
chatbot trying to simulate a curious child. It asks the user            a similar setting – the Fairytale Child. The chatbot tries to
to tell a fairy tale, often interrupting to ask for details and         simulate a child which wants to hear a fairy tale. Similarly
clarifications. However, it remembers what it was told and              to the therapist scenario, it is the user who creates the en-
tries to show it if possible. The chatbot can communi-                  tire world in which the story takes place. The child knows
cate in Czech and in English. It analyzes the morphol-                  nothing about this world, and thus asks a lot of questions
ogy of each sentence produced by the user with natural                  to learn something about the world, the people and objects
language processing tools, tries to identify potential ques-            in it, and their properties.
tions to ask, and then asks one. A morphological gener-                    The behaviour of the chatbot is similar to the “Why-
ator is employed to generate correctly inflected sentences              phase” of child development, in which the child repeat-
in Czech, so that the resulting sentences sound as natural              edly asks “Why?” as a reaction to virtually any sentence.
as possible.                                                            However, as the “Why-phase” is rather annoying for the
                                                                        parents, and we do not want our chatbot to be annoying,
                                                                        we try to simulate an older child, which has a wider range
1 Introduction                                                          of more specific questions, and which often does not ask
                                                                        any question but simply encourages the user to go on.
One of the first chatbots, and the first one that became                   Similarly to several other chatbots, the Fairytale Child
widely known and successful, was ELIZA [10]. ELIZA                      remembers what it was told, and does not ask for it again;
tried to simulate a Rogerian psychotherapist, i.e. a thera-             instead, it tries to show the user that it has paid attention
pist who tries to encourage clients to realize by themselves            by showing off the knowledge it has acquired if the story
in what ways their feelings and behaviour are affected                  touches the topic again.
negatively [2]. This was a particularly good choice, as                    The Fairytale Child chatbot is built using the Treex Nat-
the Rogerian dialogue techniques involve asking the client              ural Language Processing (NLP) framework [11] both for
general questions and giving sympathetic comments, try-                 analysis of the user input and for response generation. The
ing to motivate him to speak more and to realize things.                tools that are part of Treex, namely the tagger [9] and mor-
The chatbot analyzed the user-entered sentences in a shal-              phological generator [3], are vital for the implementaion
low way, reacted specifically to certain keywords or pat-               of the chatbot, since they enable us to alleviate the bur-
terns (“I feel tired.” – “What makes you feel tired?”), and             den of handling the morphologically rich Czech language,
provided a more generic answer if no keyword or pattern                 which is one of the main obstacles preventing usage of the
was spotted (such as “I understand.”, “Do you really think              same simple techniques that have been repeatedly shown
so?” or “Can you elaborate on that?”).                                  to perform well enough for English.
   Despite its simplicity, ELIZA proved to be able to suc-                 The main aim of our work is to show that when backed
cessfully communicate and react naturally enough so that                by a powerful and easy-to-use NLP framework, construc-
numerous users reported a belief that the chatbot was ac-               tion of a simple Czech chatbot able to produce naturally
tually intelligent and did understand them. One of the rea-             formed responses can be not only viable, but even rather
sons behind the success of ELIZA is the fact that when                  straightforward.
communicating with a psychotherapist, it is the client who                 The chatbot is being developed in two versions – Fairy-
“creates” the world about which he talks; it is thus natu-              tale Child for English, and Pohádkové dítě for Czech. As
ral that the psychotherapist does not know anything about               creating a chatbot for Czech is a more interesting and chal-
that world (which would appear strange in other situa-                  lenging endeavour, Pohádkové dítě is the primary focus of
tions, such as discussing the current political situation).             our work; we provide the Fairytale Child version mainly
Another important decision was the choice of English as                 for our work to be intelligible also for non-Czech speakers.
the communication language, which enabled employment
of rather simple pattern matching instead of full-fledged
linguistic analyses, and extremely simple language gener-               2     Example
ation techniques; this is thanks to English being an ana-
lytical (nearly non-inflecting) language with a fixed word              Let us start by providing a real full sample of one session
order.                                                                  of a user telling the chatbot the story of The Ugly Duck-
80                                                                                                                           R. Rosa


ling [1]. The session, listed in Section 2.1, was lead in        (38) USER : No a kačátko uvidělo, jak k jezeru přilétají z
Czech; in Section 2.2, we provide its English version.                teplých krajin krásní velcí bílí ptáci.
   Please observe the ways in which the chatbot tries to         (39) CHATBOT : Byli hodně bílí?
ask questions. Also please notice how it remembers what          (40) USER : Byli bílí jako sníh.
it was told about the objects, and when they are mentioned
                                                                 (41) CHATBOT : Pokračuj...
again, it can decide to show that it has been listening by re-
peating what it was told. Interestingly, this leads to a “mis-   (42) USER : Byly to labutě.
take in understanding” towards the end of the story, which,      (43) CHATBOT : Pokračuj...
by accident, seems very natural and appropriate at that          (44) USER : Labutě přistávaly na jezero, a káčátko k nim
point.                                                                opatrně šlo a obdivovalo je.
                                                                 (45) CHATBOT : Pokračuj...
2.1   Czech – Pohádkové dítě                                    (46) USER : Čekalo, kdy si ho všimnou a budou na něj zase
                                                                      pokřikovat.
This session transcript is real and unedited (except for a       (47) CHATBOT : Pokračuj...
few corrections of user’s typos). Therefore, it also shows       (48) USER : Ale labutě na něj ani nepokřikovaly, ani ho
several cases where the chatbot misanalyzed the sentence;             neodháněly, ani se mu nesmály.
these will be referred to in Section 4.
                                                                 (49) CHATBOT : Pokračuj...
(1) CHATBOT : Povídej mi pohádku!                                (50) USER : Kačátko se tomu divilo, a ptalo se labutí: Co-
(2) USER : Povím ti pohádku o ošklivém káčátku.                      pak vám nepřipadám ošklivé?
(3) CHATBOT : Bylo hodně ošklivé?                               (51) CHATBOT : Pokračuj...
(4) USER : No vlastně ne, ale myslelo si to o sobě.            (52) USER : A na to se labutě divily a ptaly se ho: Proč bys
(5) CHATBOT : Pokračuj...                                            nám připadalo ošklivé? Vždyt’ jsi stejně krásné jako
(6) USER : Byla jednou jedna kachna, a ta nakladla va-                my!
     jíčka.
                                                                 (53) CHATBOT : Jaké to byly labutě?
(7) CHATBOT : Jaká to byla vajíčka?
(8) USER : No kachní vajíčka.                                   (54) USER : Ty, co na zimu odlětly do teplých krajin, stejně
(9) CHATBOT : Aha, kachní vajíčka.                                   jako divoké kachny.
(10) USER : A když přišel čas a vylíhla se káčátka, jedno z   (55) CHATBOT : Byla hodně divoká?
     nich bylo ošklivější než ta ostatní.                       (56) USER : Ale to se jen tak říká, divoké kachny, to zna-
(11) CHATBOT : Jaké to bylo káčátko?                                 mená, že žijou volně v přírodě.
(12) USER : To káčátko, o kterém ti povídám.
(13) CHATBOT : Aha, to káčátko, o kterém mi povídáš.            (57) CHATBOT : Jaká to byla příroda?
(14) USER : Ostatní káčátka i máma kachna se mu pořád          (58) USER : No příroda kolem toho jezera.
     posmívali, a ono bylo z toho smutné.                        (59) CHATBOT : Aha, to, u kterého se kačátko narodilo.
(15) CHATBOT : Byla hodně smutná?
                                                                 (60) USER : Ano to. No a to kačátko se divilo: To přece
(16) USER : Hodně, protože ho nikdo neměl rád.
                                                                      není pravda, vy jste překrásné labutě a já jsem ošklivé
(17) CHATBOT : Pokračuj...
                                                                      kačátko.
(18) USER : Pak na zimu kachny odletěly do teplých kra-
     jin, a kačátko zůstalo samo.                              (61) CHATBOT : Pokračuj...
(19) CHATBOT : Jaké to bylo kačátko?                            (62) USER : A labutě mu řekly: tak pojd’ blíž a podívej se
(20) USER : To ošklivé.                                               na svůj odraz na hladině.
(21) CHATBOT : Pokračuj...                                      (63) CHATBOT : Jaká to byla hladina?
(22) USER : Byla mu zima, ale vzal ho k sobě domů hodný
                                                                 (64) USER : Hladina toho jezera.
     pán.
(23) CHATBOT : Jaký to byl pán?                                  (65) CHATBOT : Pokračuj...
(24) USER : Takový starý.                                        (66) USER : A tak se kačátko podívalo, ale místo ošklivého
(25) CHATBOT : Co je to starý?                                        kačátka uvidělo krásnou bílou labut’.
(26) USER : To je, že je mu hodně let.                          (67) CHATBOT : Aha, ty co na zimu odlětly do teplých kra-
(27) CHATBOT : Pokračuj...                                           jin stejně jako divoké kachny.
(28) USER : A u toho pána kačátko přečkalo zimu.
                                                                 (68) USER : Právě že ne – uvidělo totiž samo sebe.
(29) CHATBOT : Co je to zima?
(30) USER : To je, když je venku sníh.                           (69) CHATBOT : Pokračuj...
(31) CHATBOT : Jaký to byl sníh?                                 (70) USER : Nebylo to totiž ve skutečnosti kačátko, ale
(32) USER : Bílý a studený :-)                                        labut’átko.
(33) CHATBOT : Pokračuj...                                      (71) CHATBOT : Pokračuj...
(34) USER : No a na jaře vyrazilo ven a šlo k jezeru.
(35) CHATBOT : Jaké to bylo jezero?                              (72) USER : No a to je konec.
(36) USER : To, u kterého se kačátko narodilo.                  (73) CHATBOT : To už je konec? Tak děkuju za pohádku!
(37) CHATBOT : Pokračuj...                                           :-)
Fairytale Child Chatbot                                                                                                           81


2.2    English – Fairytale Child                                   (40) USER : They were white as the snow.
                                                                   (41) CHATBOT : Go on...
.
    This is the English version of the sample Ugly Duckling        (42) USER : They were swans.
session from Section 2.1. The user uttrances have been             (43) CHATBOT : Go on...
translated from Czech to English. From the responses
                                                                   (44) USER : The swans were landing on the lake, and the
that the English version of the chatbot is able to gener-               duckling approached them carefully to admire them.
ate, we chose the ones that correspond to the responses in
                                                                   (45) CHATBOT : Go on...
the Czech session. Thus, this session is not a real one, but
it is a realistic one, in the sense that the system is able (and   (46) USER : It expected them to notice it and to start heck-
reasonably likely) to generate exactly the responses shown              ling it.
here.                                                              (47) CHATBOT : Go on...
                                                                   (48) USER : But the swans did neither heckle it, nor draw
(1) CHATBOT : Tell me a fairy tale!
                                                                        it away, nor laugh at it.
(2) USER : I will tell you the tale of the Ugly Duckling.
(3) CHATBOT : Very ugly?                                           (49) CHATBOT : Go on...
(4) USER : Well, not really ugly, but it thought so about          (50) USER : The duckling was surprised, and asked the
     itself.                                                            swans: Don’t you find me ugly?
(5) CHATBOT : Go on...
(6) USER : Once upon a time, there was a duck, and the             (51) CHATBOT : Go on...
     duck laid some eggs.                                          (52) USER : And now the swans were surprised and asked
(7) CHATBOT : What eggs were they?                                      it: Why would we find you ugly? After all, you are as
(8) USER : Well, duck eggs.                                             beautiful as we are!
(9) CHATBOT : I see, duck eggs.                                    (53) CHATBOT : What swans were they?
(10) USER : And then when the time came and the duck-
     lings hatched, one of them was uglier than the others.        (54) USER : The swans that flew south, just as the wild
(11) CHATBOT : What duckling was it?                                    ducks did.
(12) USER : Oh, the one I am telling you about.                    (55) CHATBOT : Very wild?
(13) CHATBOT : I see, the one you are telling me about.
                                                                   (56) USER : Oh, this is just something you say – wild
(14) USER : The other ducklings and the mother duck kept
                                                                        ducks means that the ducks live freely in the nature.
     laughing at it, and so it was sad about it.
(15) CHATBOT : Very sad?                                           (57) CHATBOT : What nature was it?
(16) USER : Very sad, because nobody liked it.                     (58) USER : Well, it was the nature around the lake.
(17) CHATBOT : Go on...
(18) USER : Then the ducks flew south for the winter, and          (59) CHATBOT : I see, it was the lake where the duckling
     the duckling stayed alone.                                         was born.
(19) CHATBOT : What duckling was it?                               (60) USER : Yes, this one. Well, and the duckling was star-
(20) USER : Oh, it was the ugly one.                                    tled: This is simply nonsense, you are beautiful swans
(21) CHATBOT : Go on...                                                 and I am just an ugly duckling.
(22) USER : It was cold, but a kind man took it into his           (61) CHATBOT : Go on...
     home.
(23) CHATBOT : What man was it?                                    (62) USER : And the swans told it: well then come closer
(24) USER : An old one.                                                 and look at your reflection at the surface.
(25) CHATBOT : What is an old?                                     (63) CHATBOT : What surface was it?
(26) USER : This means that he has lived for many years.
                                                                   (64) USER : The surface of the lake.
(27) CHATBOT : Go on...
(28) USER : And the duckling stayed at this man through-           (65) CHATBOT : Go on...
     out winter.                                                   (66) USER : And so the duckling looked, but instead of an
(29) CHATBOT : What is a winter?                                        ugly duckling it saw a beautiful white swan.
(30) USER : This is when there is snow falling.
                                                                   (67) CHATBOT : I see, the swans that flew south, just as the
(31) CHATBOT : What snow was it?
                                                                        wild ducks did.
(32) USER : White and cold :-)
(33) CHATBOT : Go on...                                            (68) USER : Well, actually not – it actually did see itself.
(34) USER : Well and in spring it came out and went to the         (69) CHATBOT : Go on...
     lake.
(35) CHATBOT : What lake was it?                                   (70) USER : It wasn’t in fact a duckling, but a swanling.
(36) USER : It was the lake where the duckling was born.           (71) CHATBOT : Go on...
(37) CHATBOT : Go on...
                                                                   (72) USER : And this is the end.
(38) USER : Well, and the duckling saw beautiful huge
     white birds arriving to the lake from the south.              (73) CHATBOT : This is the end already? Thanks for the
(39) CHATBOT : Very white?                                              story then! :-)
82                                                                                                                                  R. Rosa


            Czech sentence analysis scenario                                 ral), the vice versa is not possible: for a given lemma and
            Treex::Block::W2A::CS::Tokenize                                  tag, there is always at most one corresponding word form.
            Treex::Block::W2A::CS::TagMorce                                     For English, the standard Penn Treebank tagset [6] is
            Treex::Block::W2A::CS::FixMorphoErrors
                                                                             used. Unfortunately, it does not mark the natural gender
            English sentence analysis scenario                               for English nouns, so e.g. both “boy” and “girl” are tagged
            Treex::Block::W2A::EN::Tokenize                                  identically as NN. However, it does mark noun number,
            Treex::Block::W2A::EN::NormalizeForms
                                                                             which is a feature that we make use of.
            Treex::Block::W2A::EN::FixTokenization
            Treex::Block::W2A::EN::TagMorce                                     If the chatbot asks a question, the user’s answer is stored
            Treex::Block::W2A::EN::FixTags                                   in a simple knowledge base. Before storing, the answer
            Treex::Block::W2A::EN::Lemmatize                                 is normalized in two steps. The first one is the removal
                                                                             of sentence-initial garbage words “Well”/“Oh” (“No” in
       Table 1: Treex scenarios for sentence analysis.                       Czech) – see (8)-(9). The second one is more intricate: it
                                                                             is a switch between 1st and 2nd person – see (12)-(13).
                                                                             This consists of switching “I”/“me” and “you” in English
3     Implementation                                                         (“já” and “ty” in Czech), and also switching the person for
                                                                             Czech pronouns and verbs.
The chatbot is implemented as a console application in
                                                                                The knowledge base is simply a key-value hashmap.
Perl, reading user input from standard input, and produc-
                                                                             The key is the lemma of the word (noun or adjective) that
ing responses to standard output. It extensively uses Treex
                                                                             the chatbot asked about, and the value is the normalized
framework blocks and tools, but is not implemented as
                                                                             answer of the user to that question.
a Treex application, as Treex scenarios do not support in-
teractive operation.
   The script starts by initializing all tools and asking the                3.2   Response Selection and Generation
user to tell a fairy tale (see utterance (1) in Section 2).
Then it iteratively analyses the sentence entered by the                     The chatbot randomly chooses from a set of the follow-
user with NLP tools (Section 3.1), selects a suitable pre-                   ing three possible response types. It then tries to gener-
defined response template and generates a response (Sec-                     ate the response of the given type. If this is impossible,
tion 3.2).1 This loop continues until the end of the story                   e.g. because the template requires a noun but there is no
is detected – this is done by detecting the presence of the                  noun in the sentence – see (5) – it responds “Go on...”
word “end” (“konec” in Czech), see (72)-(73). The chat-                      (“Pokračuj...”) instead. This randomness makes the re-
bot thanks for the story and terminates.                                     sponses less predictable and therefore more realistic.
                                                                                For generating Czech responses, a morphological gen-
3.1    User Utterance Analysis                                               erator [3] is used to generate appropriately inflected word
                                                                             forms based on (lemma, tag) pairs.
Each line entered by the user is analyzed with a standard
Treex scenario – see Table 1. The sentences are split into
individual tokens, and each token is assigned a lemma2                       What NOUN was it? The first response type asks the user
and a part-of-speech3 tag.                                                   to provide details about a noun – see e.g. (23). The noun
   For Czech, the standard Czech positional tagset is                        to be asked about (denoted as NOUN) is the last noun in
used [5]. The tag marks several morphological categories;                    the sentence, as this is presumably the newest and most
in our work, we make use of the following ones: part-of-                     interesting piece of information.
speech, gender, number, case, person. An important fea-                         In Czech, NOUN is first switched to nominative case us-
ture of the tagset is that while one word form may often                     ing the morphological generator (but morphological num-
be analyzed in several ways due to inflection ambiguity                      ber is preserved). The response generated has the form
(e.g. “staré”, “old”, may be e.g. neuter nominative singu-                   of “Jak_ to byl_ NOUN?”; the correct form of “Jak_”
lar, feminine genitive singular or feminine accusative plu-                  (“Jaký”, “Jaká”, “Jaké”... – “What”) and “byl_” (“byl”,
                                                                             “byli”... – “was”/“were”) is chosen based on the morpho-
     1 Our chatbot, as most other chatbots, generates a response to each     logical gender and number of the NOUN (and generated
user input. However, it would be very annoying if the chatbot gener-         by the morphological generator).
ated a question each time – unlike Eliza, our system expects a user who         In English, the response has a form of “What NOUN
knows what he wants to say and does not need to be asked something           was it/were they?”, based on the singular/plural marked in
all the time to elicit input from him; we want to simulate a child that
actually does want to hear the story and therefore sometimes just listens,   the NOUN tag (NN and NNP tags mark singular, NNS and
without asking anything. Instead of being completely silent, which could     NNPS tags mark plural).
be mistaken for long response time or for an error, we therefore some-          If a question regarding NOUN had already been asked
times either simply encourage the user to go on, or show that the chatbot
is listening by showing what it remembers from what the user had said.
                                                                             (i.e. the knowledge base already contains NOUN as a key),
     2 E.g. for the token “sons”, the lemma is “son”                         the chatbot does not ask the question again, but instead
     3 E.g. noun, pronoun, verb                                              shows off by reapeating the then-provided answer – “I see,
Fairytale Child Chatbot                                                                                                                            83


ANSWER.” (“Aha, ANSWER.”) – see (35)-(36) and then                               seconds for Czech (about 3 seconds for English), but each
(58)-(59).4                                                                      response then takes only about 70 ms for Czech (about
                                                                                 15 ms for English).6 We believe this to be perfectly ac-
                                                                                 ceptable for real-world usage.
What is a NOUN? The second response type asks the
user to explain what a noun means – see e.g. (29). Again,
the noun to be asked for (denoted as NOUN) is the last                           4     Discussion and Future Work
noun in the sentence.
   In Czech, NOUN is first switched to nominative, pre-                          The whole conversation between the user and the chatbot
serving the number. The response generated has the form                          has a reasonably natural feel. The chatbot usually reacts
of “Co je/jsou to NOUN?”, the correct choice of verb “je”                        appropriately, which is demonstrated by the ability of the
or “jsou” (“is” or “are”) being made based on the morpho-                        user to answer chatbot’s questions and continue with the
logical number of the noun.                                                      story, eventually reaching the end.
   In English, the response has a form of “What is/are                              Occasionally, we encounter errors that stem from incor-
a/an NOUN?”, based on the singular/plural marked in the                          rect morphological analysis disambiguation by the tagger.
NOUN tag, and on the first letter of NOUN.5                                      One such case can be seen in (15), where “Byla hodně
   Similarly to the first response type, if a question regard-                   smutná?” (“Was she very sad?”) was generated instead of
ing NOUN had already been asked, the chatbot reapeats                            the correct “Bylo hodně smutné?” (“Was it very sad?”), as
the answer instead.                                                              the neuter nominative “smutné” (“sad”) in (14) was mis-
                                                                                 analyzed as feminine genitive. A similar situation happens
Very ADJECTIVE? The third type asks whether a prop-                              also in (54)-(55), where feminine plural nominative is mis-
erty was intense. Similarly to previous types, the adjective                     analyzed as feminine singular genitive, and in (24)-(25),
to be asked about (denoted ADJECTIVE) is the last adjec-                         where “starý” (“old”) is analyzed as a noun instead of ad-
tive of the sentence.                                                            jective. However, such errors happen already in the tool
   In Czech, the ADJECTIVE is first converted to nomi-                           that we use, and are effectively beyond our influence.
native. Then, the response is generated in the form “Byl_                           An error that we have encountered (but does not show in
hodně ADJECTIVE?” (the correct form of “byl_” is gen-                           the sample conversation) is that of asking the “Very AD-
erated according to morphological gender and number of                           JECTIVE?” question with inappropriate addjectives, as in
ADJECTIVE).                                                                      “He sat there the whole day.” (“Seděl tam celý den.”) –
   In English, the form is only “Very ADJECTIVE?”, for                           “Very whole?” (“Velmi celý?”). However, it seems to oc-
reasons discussed in Section 4.                                                  cur especially with several frequent adjectives, so enumer-
   If a question regarding ADJECTIVE had already been                            ating them manually seems to be a viable remedy.
asked, the chatbot resorts to the default “Go on” answer,                           We also experimented with the “Why?” question, but
as for adjective questions, the answers are not reproduced.                      have found it hard to identify the appropriate part of the
The idea is that while it is reasonable to believe that a re-                    user’s utterance to use in the question – the whole sentence
peated noun is a reference to the same entity, this is less                      seems often too long and awkward, while only taking the
likely for adjectives. However, to avoid annoying ques-                          verb often leads to a non-sentence. This is an issue to be
tions, the chatbot will not ask about the same adjective                         addresed once we add deeper levels of linguistic analyses.
twice.                                                                              There are still issues with the English implementation.
                                                                                 One of them is that the Czech question with “Jaký”, such
                                                                                 as (23) “Jaký to byl pán?”, is conveniently ambiguous, as
3.3 Efficiency Evaluation                                                        it can mean both “What was the man like?” and “Which
                                                                                 man was it?”, and thus feels more natural than the English
Although both Treex and Perl are sometimes blamed for
                                                                                 one. In English, we are not aware of such an ambiguous
their slowness, we have not encountered such issues with
                                                                                 construction, and we thus use “What man was it?”, which
our application. This is helped by the fact that all the Treex
                                                                                 tries to keep the ambiguity but is not good English.
tools are loaded only once – thus, the startup takes about 7
                                                                                    There are also cases where the Czech rich morphology
                                                                                 and analysis helps – especially when dealing with gender,
    4 Note that (10)-(11) and (18)-(19) seem to contradict this claim.
                                                                                 which is not marked in English analysis. Thus, when ask-
However, on closer inspection of the original Czech transcript, it can
be seen that the user used different spellings for the word “duckling” –         ing the “Very ADJECTIVE?” question, we can choose the
“káčátko” in (10) but “kačátko” in (18). This difference has no influence      correct subject in Czech based on the morphological fea-
on meaning in Czech, but as the tagger preserves this distinction in the         tures of the adjective, while this is not possible in English.
lemmas it produces, our system treats the two as distinct, and therefore
                                                                                 This can be seen e.g. in (3), where in Czech we gener-
asks about the duckling again instead of showing that it already knows
what the user is talking about. (The distinction has not been preserved in       ate the whole question “Bylo hodně ošklivé?” (“Was it
the English version because of the lack of such a dublet in English.)
    5 We resort to a simplification here by treating all vowel letters as vow-       6 Measured on Intel R
                                                                                                         Core 2 Duo T5870 running 32b Ubuntu 10.10,
els – i.e. we would incorrectly generate e.g. “an uniform” instead of the        averaged over 5 runs. The response time is averaged over the 36 sen-
correct “a uniform”.                                                             tences from Section 2.
84                                                                                                                                 R. Rosa


very ugly?”), because the morphological tag tells us that            The chatbot is licenced under GNU GPL v2 and is avail-
“ošklivé” (“ugly”) was used in a neuter gender, while in           able at https://ufal.mff.cuni.cz/dite.7
English, we currently resort only to “Very ugly?”, since
we cannot easily distinguish whether we should use “he”,
                                                                   Acknowledgments
“she” or “it” (or another pronoun) as the subject. This
is often distractive and leads to unnatural questions, as in       This research was supported by the grants GAUK
(39). The gender is not only unmarked on the adective it-          1572314, and SVV 260 104. This work has been us-
self, but the English tagset does not even mark gender on          ing language resources developed, stored and distributed
nouns (so there is no simple and direct way of telling that        by the LINDAT/CLARIN project of the Ministry of Edu-
a “king” is a “he” while a “queen” is a “she”). However,           cation, Youth and Sports of the Czech Republic (project
there exist tools that are able to fill that shortcoming, and      LM2010013).
we will try to use them in future. Still, it will necessitate to
use not only tagging but also syntactic parsing to be able
to identify the gender associated with an adjective, since it      References
is implied by the parent noun in a dependency tree, which
                                                                    [1] Hans Christian Andersen. New Fairy Tales. C. A. Reitzel,
might not be straightforward to find without a proper syn-              Copenhagen, 1844.
tactic analysis of the sentence.
                                                                    [2] Lisa M. Cepeda and Donna S. Davenport. Person-centered
   Employing a syntactic parser and a function labeller,                therapy and solution-focused brief therapy: An integration
such as [7], would be beneficial in other ways as well,                 of present and future awareness. Psychotherapy: Theory,
e.g. by enabling us to use meaningful phrases in the re-                Research, Practice, Training, 43(1):1, 2006.
sponses, as opposed to the current approach where we can            [3] Jan Hajič. Disambiguation of Rich Inflection (Computa-
only use either a single word or the whole sentence, or                 tional Morphology of Czech). Nakladatelství Karolinum,
by being able to distinguish the syntactic subject from the             2004.
syntactic object. An analysis of the topic-focus articula-          [4] Eva Hajičová. Topic-focus articulation and coreference in
tion [4] could probably make it possible to generate even               models of discourse production. Journal of Pragmatics,
more natural questions, trying to concentrate on questions              16(2):157–166, 1991.
about the focus, i.e. the new information – not about the           [5] Jan Hajič. Building a Syntactically Annotated Corpus: The
topic, which is the information that the speaker assumes is             Prague Dependency Treebank. In Eva Hajičová, editor, Is-
already known to the listener.                                          sues of Valency and Meaning. Studies in Honor of Jarmila
   An obvious and important future goal is to implement                 Panevová, pages 12–19. Prague Karolinum, Charles Uni-
a web interface. Currently, the application is terminal-                versity Press, 1998.
based, reading user utterances from the standard input and          [6] Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beat-
producing chatbot responses to standard output. This does               rice Santorini. Building a large annotated corpus of En-
not look very appealing; moreover, it requires the users to             glish: the Penn treebank. Comp. Ling., 19:313–330, June
                                                                        1993.
have Treex installed. Even a simple web interface would
make the application available to a much wider range of             [7] Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan
                                                                        Hajič. Non-projective dependency parsing using spanning
users, and would also make it more attractive. While Treex
                                                                        tree algorithms. In HLT ’05: Proceedings of the conference
does have a web interface [8], the Treex scenarios are not
                                                                        on Human Language Technology and Empirical Methods in
designed for interactive use, and therefore a different solu-           Natural Language Processing, pages 523–530, Vancouver,
tion will have to be found.                                             British Columbia, Canada, 2005.
                                                                    [8] Michal Sedlák. Treex::Web. Bachelor’s thesis, Charles
5    Conclusion                                                         University in Prague, Faculty of Mathematics and Physics,
                                                                        Prague, Czechia, 2014.
We have implemented a Fairytale Child chatbot, which                [9] Drahomíra Spoustová, Jan Hajič, Jan Votrubec, Pavel Kr-
acts as a child that wants to hear a fairy tale. It analyzes            bec, and Pavel Květoň. The best of two worlds: Coopera-
the user utterances and tries to react with a question or re-           tion of statistical and rule-based taggers for Czech. In Pro-
mark if possible, simulating a curious child. The chatbot               ceedings of the Workshop on Balto-Slavonic Natural Lan-
is able to communicate in Czech or in English.                          guage Processing, ACL 2007, pages 67–74, Praha, 2007.
   We used the Treex natural language processing frame-            [10] Joseph Weizenbaum. Eliza – a computer program for the
work for both analysis of the user utterances and genera-               study of natural language communication between man and
tion of the chatbot responses. This proved to be effective,             machine. Communications of the ACM, 9(1):36–45, 1966.
especially for the morphologically rich Czech language.            [11] Zdeněk Žabokrtský. Treex – an open-source framework
   Even though the implementation of the chatbot is quite               for natural language processing. In Markéta Lopatková,
simple, it responds quite naturally in most cases. This                 editor, ITAT, volume 788, pages 7–14, Košice, Slovakia,
shows that even for morphologically rich languages, a sim-              2011. Univerzita Pavla Jozefa Šafárika v Košiciach.
ple approach can lead to a well-performing chatbot, pro-              7 The 1.0 version of the chatbot, which is described in this paper, is

vided that there is a powerful NLP framework at hand.              published at: http://hdl.handle.net/11234/1-161