V. Kůrková et al. (Eds.): ITAT 2014 with selected papers from Znalosti 2014, CEUR Workshop Proceedings Vol. 1214, pp. 79–84 http://ceur-ws.org/Vol-1214, Series ISSN 1613-0073, c 2014 R. Rosa Fairytale Child Chatbot Rudolf Rosa Charles University in Prague, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Czech Republic, rosa@ufal.mff.cuni.cz, WWW home page: http://ufal.mff.cuni.cz/rudolf-rosa Abstract: Fairytale Child (Pohádkové dítě) is a simple In this work, we present a Czech language chatbot for chatbot trying to simulate a curious child. It asks the user a similar setting – the Fairytale Child. The chatbot tries to to tell a fairy tale, often interrupting to ask for details and simulate a child which wants to hear a fairy tale. Similarly clarifications. However, it remembers what it was told and to the therapist scenario, it is the user who creates the en- tries to show it if possible. The chatbot can communi- tire world in which the story takes place. The child knows cate in Czech and in English. It analyzes the morphol- nothing about this world, and thus asks a lot of questions ogy of each sentence produced by the user with natural to learn something about the world, the people and objects language processing tools, tries to identify potential ques- in it, and their properties. tions to ask, and then asks one. A morphological gener- The behaviour of the chatbot is similar to the “Why- ator is employed to generate correctly inflected sentences phase” of child development, in which the child repeat- in Czech, so that the resulting sentences sound as natural edly asks “Why?” as a reaction to virtually any sentence. as possible. However, as the “Why-phase” is rather annoying for the parents, and we do not want our chatbot to be annoying, we try to simulate an older child, which has a wider range 1 Introduction of more specific questions, and which often does not ask any question but simply encourages the user to go on. One of the first chatbots, and the first one that became Similarly to several other chatbots, the Fairytale Child widely known and successful, was ELIZA [10]. ELIZA remembers what it was told, and does not ask for it again; tried to simulate a Rogerian psychotherapist, i.e. a thera- instead, it tries to show the user that it has paid attention pist who tries to encourage clients to realize by themselves by showing off the knowledge it has acquired if the story in what ways their feelings and behaviour are affected touches the topic again. negatively [2]. This was a particularly good choice, as The Fairytale Child chatbot is built using the Treex Nat- the Rogerian dialogue techniques involve asking the client ural Language Processing (NLP) framework [11] both for general questions and giving sympathetic comments, try- analysis of the user input and for response generation. The ing to motivate him to speak more and to realize things. tools that are part of Treex, namely the tagger [9] and mor- The chatbot analyzed the user-entered sentences in a shal- phological generator [3], are vital for the implementaion low way, reacted specifically to certain keywords or pat- of the chatbot, since they enable us to alleviate the bur- terns (“I feel tired.” – “What makes you feel tired?”), and den of handling the morphologically rich Czech language, provided a more generic answer if no keyword or pattern which is one of the main obstacles preventing usage of the was spotted (such as “I understand.”, “Do you really think same simple techniques that have been repeatedly shown so?” or “Can you elaborate on that?”). to perform well enough for English. Despite its simplicity, ELIZA proved to be able to suc- The main aim of our work is to show that when backed cessfully communicate and react naturally enough so that by a powerful and easy-to-use NLP framework, construc- numerous users reported a belief that the chatbot was ac- tion of a simple Czech chatbot able to produce naturally tually intelligent and did understand them. One of the rea- formed responses can be not only viable, but even rather sons behind the success of ELIZA is the fact that when straightforward. communicating with a psychotherapist, it is the client who The chatbot is being developed in two versions – Fairy- “creates” the world about which he talks; it is thus natu- tale Child for English, and Pohádkové dítě for Czech. As ral that the psychotherapist does not know anything about creating a chatbot for Czech is a more interesting and chal- that world (which would appear strange in other situa- lenging endeavour, Pohádkové dítě is the primary focus of tions, such as discussing the current political situation). our work; we provide the Fairytale Child version mainly Another important decision was the choice of English as for our work to be intelligible also for non-Czech speakers. the communication language, which enabled employment of rather simple pattern matching instead of full-fledged linguistic analyses, and extremely simple language gener- 2 Example ation techniques; this is thanks to English being an ana- lytical (nearly non-inflecting) language with a fixed word Let us start by providing a real full sample of one session order. of a user telling the chatbot the story of The Ugly Duck- 80 R. Rosa ling [1]. The session, listed in Section 2.1, was lead in (38) USER : No a kačátko uvidělo, jak k jezeru přilétají z Czech; in Section 2.2, we provide its English version. teplých krajin krásní velcí bílí ptáci. Please observe the ways in which the chatbot tries to (39) CHATBOT : Byli hodně bílí? ask questions. Also please notice how it remembers what (40) USER : Byli bílí jako sníh. it was told about the objects, and when they are mentioned (41) CHATBOT : Pokračuj... again, it can decide to show that it has been listening by re- peating what it was told. Interestingly, this leads to a “mis- (42) USER : Byly to labutě. take in understanding” towards the end of the story, which, (43) CHATBOT : Pokračuj... by accident, seems very natural and appropriate at that (44) USER : Labutě přistávaly na jezero, a káčátko k nim point. opatrně šlo a obdivovalo je. (45) CHATBOT : Pokračuj... 2.1 Czech – Pohádkové dítě (46) USER : Čekalo, kdy si ho všimnou a budou na něj zase pokřikovat. This session transcript is real and unedited (except for a (47) CHATBOT : Pokračuj... few corrections of user’s typos). Therefore, it also shows (48) USER : Ale labutě na něj ani nepokřikovaly, ani ho several cases where the chatbot misanalyzed the sentence; neodháněly, ani se mu nesmály. these will be referred to in Section 4. (49) CHATBOT : Pokračuj... (1) CHATBOT : Povídej mi pohádku! (50) USER : Kačátko se tomu divilo, a ptalo se labutí: Co- (2) USER : Povím ti pohádku o ošklivém káčátku. pak vám nepřipadám ošklivé? (3) CHATBOT : Bylo hodně ošklivé? (51) CHATBOT : Pokračuj... (4) USER : No vlastně ne, ale myslelo si to o sobě. (52) USER : A na to se labutě divily a ptaly se ho: Proč bys (5) CHATBOT : Pokračuj... nám připadalo ošklivé? Vždyt’ jsi stejně krásné jako (6) USER : Byla jednou jedna kachna, a ta nakladla va- my! jíčka. (53) CHATBOT : Jaké to byly labutě? (7) CHATBOT : Jaká to byla vajíčka? (8) USER : No kachní vajíčka. (54) USER : Ty, co na zimu odlětly do teplých krajin, stejně (9) CHATBOT : Aha, kachní vajíčka. jako divoké kachny. (10) USER : A když přišel čas a vylíhla se káčátka, jedno z (55) CHATBOT : Byla hodně divoká? nich bylo ošklivější než ta ostatní. (56) USER : Ale to se jen tak říká, divoké kachny, to zna- (11) CHATBOT : Jaké to bylo káčátko? mená, že žijou volně v přírodě. (12) USER : To káčátko, o kterém ti povídám. (13) CHATBOT : Aha, to káčátko, o kterém mi povídáš. (57) CHATBOT : Jaká to byla příroda? (14) USER : Ostatní káčátka i máma kachna se mu pořád (58) USER : No příroda kolem toho jezera. posmívali, a ono bylo z toho smutné. (59) CHATBOT : Aha, to, u kterého se kačátko narodilo. (15) CHATBOT : Byla hodně smutná? (60) USER : Ano to. No a to kačátko se divilo: To přece (16) USER : Hodně, protože ho nikdo neměl rád. není pravda, vy jste překrásné labutě a já jsem ošklivé (17) CHATBOT : Pokračuj... kačátko. (18) USER : Pak na zimu kachny odletěly do teplých kra- jin, a kačátko zůstalo samo. (61) CHATBOT : Pokračuj... (19) CHATBOT : Jaké to bylo kačátko? (62) USER : A labutě mu řekly: tak pojd’ blíž a podívej se (20) USER : To ošklivé. na svůj odraz na hladině. (21) CHATBOT : Pokračuj... (63) CHATBOT : Jaká to byla hladina? (22) USER : Byla mu zima, ale vzal ho k sobě domů hodný (64) USER : Hladina toho jezera. pán. (23) CHATBOT : Jaký to byl pán? (65) CHATBOT : Pokračuj... (24) USER : Takový starý. (66) USER : A tak se kačátko podívalo, ale místo ošklivého (25) CHATBOT : Co je to starý? kačátka uvidělo krásnou bílou labut’. (26) USER : To je, že je mu hodně let. (67) CHATBOT : Aha, ty co na zimu odlětly do teplých kra- (27) CHATBOT : Pokračuj... jin stejně jako divoké kachny. (28) USER : A u toho pána kačátko přečkalo zimu. (68) USER : Právě že ne – uvidělo totiž samo sebe. (29) CHATBOT : Co je to zima? (30) USER : To je, když je venku sníh. (69) CHATBOT : Pokračuj... (31) CHATBOT : Jaký to byl sníh? (70) USER : Nebylo to totiž ve skutečnosti kačátko, ale (32) USER : Bílý a studený :-) labut’átko. (33) CHATBOT : Pokračuj... (71) CHATBOT : Pokračuj... (34) USER : No a na jaře vyrazilo ven a šlo k jezeru. (35) CHATBOT : Jaké to bylo jezero? (72) USER : No a to je konec. (36) USER : To, u kterého se kačátko narodilo. (73) CHATBOT : To už je konec? Tak děkuju za pohádku! (37) CHATBOT : Pokračuj... :-) Fairytale Child Chatbot 81 2.2 English – Fairytale Child (40) USER : They were white as the snow. (41) CHATBOT : Go on... . This is the English version of the sample Ugly Duckling (42) USER : They were swans. session from Section 2.1. The user uttrances have been (43) CHATBOT : Go on... translated from Czech to English. From the responses (44) USER : The swans were landing on the lake, and the that the English version of the chatbot is able to gener- duckling approached them carefully to admire them. ate, we chose the ones that correspond to the responses in (45) CHATBOT : Go on... the Czech session. Thus, this session is not a real one, but it is a realistic one, in the sense that the system is able (and (46) USER : It expected them to notice it and to start heck- reasonably likely) to generate exactly the responses shown ling it. here. (47) CHATBOT : Go on... (48) USER : But the swans did neither heckle it, nor draw (1) CHATBOT : Tell me a fairy tale! it away, nor laugh at it. (2) USER : I will tell you the tale of the Ugly Duckling. (3) CHATBOT : Very ugly? (49) CHATBOT : Go on... (4) USER : Well, not really ugly, but it thought so about (50) USER : The duckling was surprised, and asked the itself. swans: Don’t you find me ugly? (5) CHATBOT : Go on... (6) USER : Once upon a time, there was a duck, and the (51) CHATBOT : Go on... duck laid some eggs. (52) USER : And now the swans were surprised and asked (7) CHATBOT : What eggs were they? it: Why would we find you ugly? After all, you are as (8) USER : Well, duck eggs. beautiful as we are! (9) CHATBOT : I see, duck eggs. (53) CHATBOT : What swans were they? (10) USER : And then when the time came and the duck- lings hatched, one of them was uglier than the others. (54) USER : The swans that flew south, just as the wild (11) CHATBOT : What duckling was it? ducks did. (12) USER : Oh, the one I am telling you about. (55) CHATBOT : Very wild? (13) CHATBOT : I see, the one you are telling me about. (56) USER : Oh, this is just something you say – wild (14) USER : The other ducklings and the mother duck kept ducks means that the ducks live freely in the nature. laughing at it, and so it was sad about it. (15) CHATBOT : Very sad? (57) CHATBOT : What nature was it? (16) USER : Very sad, because nobody liked it. (58) USER : Well, it was the nature around the lake. (17) CHATBOT : Go on... (18) USER : Then the ducks flew south for the winter, and (59) CHATBOT : I see, it was the lake where the duckling the duckling stayed alone. was born. (19) CHATBOT : What duckling was it? (60) USER : Yes, this one. Well, and the duckling was star- (20) USER : Oh, it was the ugly one. tled: This is simply nonsense, you are beautiful swans (21) CHATBOT : Go on... and I am just an ugly duckling. (22) USER : It was cold, but a kind man took it into his (61) CHATBOT : Go on... home. (23) CHATBOT : What man was it? (62) USER : And the swans told it: well then come closer (24) USER : An old one. and look at your reflection at the surface. (25) CHATBOT : What is an old? (63) CHATBOT : What surface was it? (26) USER : This means that he has lived for many years. (64) USER : The surface of the lake. (27) CHATBOT : Go on... (28) USER : And the duckling stayed at this man through- (65) CHATBOT : Go on... out winter. (66) USER : And so the duckling looked, but instead of an (29) CHATBOT : What is a winter? ugly duckling it saw a beautiful white swan. (30) USER : This is when there is snow falling. (67) CHATBOT : I see, the swans that flew south, just as the (31) CHATBOT : What snow was it? wild ducks did. (32) USER : White and cold :-) (33) CHATBOT : Go on... (68) USER : Well, actually not – it actually did see itself. (34) USER : Well and in spring it came out and went to the (69) CHATBOT : Go on... lake. (35) CHATBOT : What lake was it? (70) USER : It wasn’t in fact a duckling, but a swanling. (36) USER : It was the lake where the duckling was born. (71) CHATBOT : Go on... (37) CHATBOT : Go on... (72) USER : And this is the end. (38) USER : Well, and the duckling saw beautiful huge white birds arriving to the lake from the south. (73) CHATBOT : This is the end already? Thanks for the (39) CHATBOT : Very white? story then! :-) 82 R. Rosa Czech sentence analysis scenario ral), the vice versa is not possible: for a given lemma and Treex::Block::W2A::CS::Tokenize tag, there is always at most one corresponding word form. Treex::Block::W2A::CS::TagMorce For English, the standard Penn Treebank tagset [6] is Treex::Block::W2A::CS::FixMorphoErrors used. Unfortunately, it does not mark the natural gender English sentence analysis scenario for English nouns, so e.g. both “boy” and “girl” are tagged Treex::Block::W2A::EN::Tokenize identically as NN. However, it does mark noun number, Treex::Block::W2A::EN::NormalizeForms which is a feature that we make use of. Treex::Block::W2A::EN::FixTokenization Treex::Block::W2A::EN::TagMorce If the chatbot asks a question, the user’s answer is stored Treex::Block::W2A::EN::FixTags in a simple knowledge base. Before storing, the answer Treex::Block::W2A::EN::Lemmatize is normalized in two steps. The first one is the removal of sentence-initial garbage words “Well”/“Oh” (“No” in Table 1: Treex scenarios for sentence analysis. Czech) – see (8)-(9). The second one is more intricate: it is a switch between 1st and 2nd person – see (12)-(13). This consists of switching “I”/“me” and “you” in English 3 Implementation (“já” and “ty” in Czech), and also switching the person for Czech pronouns and verbs. The chatbot is implemented as a console application in The knowledge base is simply a key-value hashmap. Perl, reading user input from standard input, and produc- The key is the lemma of the word (noun or adjective) that ing responses to standard output. It extensively uses Treex the chatbot asked about, and the value is the normalized framework blocks and tools, but is not implemented as answer of the user to that question. a Treex application, as Treex scenarios do not support in- teractive operation. The script starts by initializing all tools and asking the 3.2 Response Selection and Generation user to tell a fairy tale (see utterance (1) in Section 2). Then it iteratively analyses the sentence entered by the The chatbot randomly chooses from a set of the follow- user with NLP tools (Section 3.1), selects a suitable pre- ing three possible response types. It then tries to gener- defined response template and generates a response (Sec- ate the response of the given type. If this is impossible, tion 3.2).1 This loop continues until the end of the story e.g. because the template requires a noun but there is no is detected – this is done by detecting the presence of the noun in the sentence – see (5) – it responds “Go on...” word “end” (“konec” in Czech), see (72)-(73). The chat- (“Pokračuj...”) instead. This randomness makes the re- bot thanks for the story and terminates. sponses less predictable and therefore more realistic. For generating Czech responses, a morphological gen- 3.1 User Utterance Analysis erator [3] is used to generate appropriately inflected word forms based on (lemma, tag) pairs. Each line entered by the user is analyzed with a standard Treex scenario – see Table 1. The sentences are split into individual tokens, and each token is assigned a lemma2 What NOUN was it? The first response type asks the user and a part-of-speech3 tag. to provide details about a noun – see e.g. (23). The noun For Czech, the standard Czech positional tagset is to be asked about (denoted as NOUN) is the last noun in used [5]. The tag marks several morphological categories; the sentence, as this is presumably the newest and most in our work, we make use of the following ones: part-of- interesting piece of information. speech, gender, number, case, person. An important fea- In Czech, NOUN is first switched to nominative case us- ture of the tagset is that while one word form may often ing the morphological generator (but morphological num- be analyzed in several ways due to inflection ambiguity ber is preserved). The response generated has the form (e.g. “staré”, “old”, may be e.g. neuter nominative singu- of “Jak_ to byl_ NOUN?”; the correct form of “Jak_” lar, feminine genitive singular or feminine accusative plu- (“Jaký”, “Jaká”, “Jaké”... – “What”) and “byl_” (“byl”, “byli”... – “was”/“were”) is chosen based on the morpho- 1 Our chatbot, as most other chatbots, generates a response to each logical gender and number of the NOUN (and generated user input. However, it would be very annoying if the chatbot gener- by the morphological generator). ated a question each time – unlike Eliza, our system expects a user who In English, the response has a form of “What NOUN knows what he wants to say and does not need to be asked something was it/were they?”, based on the singular/plural marked in all the time to elicit input from him; we want to simulate a child that actually does want to hear the story and therefore sometimes just listens, the NOUN tag (NN and NNP tags mark singular, NNS and without asking anything. Instead of being completely silent, which could NNPS tags mark plural). be mistaken for long response time or for an error, we therefore some- If a question regarding NOUN had already been asked times either simply encourage the user to go on, or show that the chatbot is listening by showing what it remembers from what the user had said. (i.e. the knowledge base already contains NOUN as a key), 2 E.g. for the token “sons”, the lemma is “son” the chatbot does not ask the question again, but instead 3 E.g. noun, pronoun, verb shows off by reapeating the then-provided answer – “I see, Fairytale Child Chatbot 83 ANSWER.” (“Aha, ANSWER.”) – see (35)-(36) and then seconds for Czech (about 3 seconds for English), but each (58)-(59).4 response then takes only about 70 ms for Czech (about 15 ms for English).6 We believe this to be perfectly ac- ceptable for real-world usage. What is a NOUN? The second response type asks the user to explain what a noun means – see e.g. (29). Again, the noun to be asked for (denoted as NOUN) is the last 4 Discussion and Future Work noun in the sentence. In Czech, NOUN is first switched to nominative, pre- The whole conversation between the user and the chatbot serving the number. The response generated has the form has a reasonably natural feel. The chatbot usually reacts of “Co je/jsou to NOUN?”, the correct choice of verb “je” appropriately, which is demonstrated by the ability of the or “jsou” (“is” or “are”) being made based on the morpho- user to answer chatbot’s questions and continue with the logical number of the noun. story, eventually reaching the end. In English, the response has a form of “What is/are Occasionally, we encounter errors that stem from incor- a/an NOUN?”, based on the singular/plural marked in the rect morphological analysis disambiguation by the tagger. NOUN tag, and on the first letter of NOUN.5 One such case can be seen in (15), where “Byla hodně Similarly to the first response type, if a question regard- smutná?” (“Was she very sad?”) was generated instead of ing NOUN had already been asked, the chatbot reapeats the correct “Bylo hodně smutné?” (“Was it very sad?”), as the answer instead. the neuter nominative “smutné” (“sad”) in (14) was mis- analyzed as feminine genitive. A similar situation happens Very ADJECTIVE? The third type asks whether a prop- also in (54)-(55), where feminine plural nominative is mis- erty was intense. Similarly to previous types, the adjective analyzed as feminine singular genitive, and in (24)-(25), to be asked about (denoted ADJECTIVE) is the last adjec- where “starý” (“old”) is analyzed as a noun instead of ad- tive of the sentence. jective. However, such errors happen already in the tool In Czech, the ADJECTIVE is first converted to nomi- that we use, and are effectively beyond our influence. native. Then, the response is generated in the form “Byl_ An error that we have encountered (but does not show in hodně ADJECTIVE?” (the correct form of “byl_” is gen- the sample conversation) is that of asking the “Very AD- erated according to morphological gender and number of JECTIVE?” question with inappropriate addjectives, as in ADJECTIVE). “He sat there the whole day.” (“Seděl tam celý den.”) – In English, the form is only “Very ADJECTIVE?”, for “Very whole?” (“Velmi celý?”). However, it seems to oc- reasons discussed in Section 4. cur especially with several frequent adjectives, so enumer- If a question regarding ADJECTIVE had already been ating them manually seems to be a viable remedy. asked, the chatbot resorts to the default “Go on” answer, We also experimented with the “Why?” question, but as for adjective questions, the answers are not reproduced. have found it hard to identify the appropriate part of the The idea is that while it is reasonable to believe that a re- user’s utterance to use in the question – the whole sentence peated noun is a reference to the same entity, this is less seems often too long and awkward, while only taking the likely for adjectives. However, to avoid annoying ques- verb often leads to a non-sentence. This is an issue to be tions, the chatbot will not ask about the same adjective addresed once we add deeper levels of linguistic analyses. twice. There are still issues with the English implementation. One of them is that the Czech question with “Jaký”, such as (23) “Jaký to byl pán?”, is conveniently ambiguous, as 3.3 Efficiency Evaluation it can mean both “What was the man like?” and “Which man was it?”, and thus feels more natural than the English Although both Treex and Perl are sometimes blamed for one. In English, we are not aware of such an ambiguous their slowness, we have not encountered such issues with construction, and we thus use “What man was it?”, which our application. This is helped by the fact that all the Treex tries to keep the ambiguity but is not good English. tools are loaded only once – thus, the startup takes about 7 There are also cases where the Czech rich morphology and analysis helps – especially when dealing with gender, 4 Note that (10)-(11) and (18)-(19) seem to contradict this claim. which is not marked in English analysis. Thus, when ask- However, on closer inspection of the original Czech transcript, it can be seen that the user used different spellings for the word “duckling” – ing the “Very ADJECTIVE?” question, we can choose the “káčátko” in (10) but “kačátko” in (18). This difference has no influence correct subject in Czech based on the morphological fea- on meaning in Czech, but as the tagger preserves this distinction in the tures of the adjective, while this is not possible in English. lemmas it produces, our system treats the two as distinct, and therefore This can be seen e.g. in (3), where in Czech we gener- asks about the duckling again instead of showing that it already knows what the user is talking about. (The distinction has not been preserved in ate the whole question “Bylo hodně ošklivé?” (“Was it the English version because of the lack of such a dublet in English.) 5 We resort to a simplification here by treating all vowel letters as vow- 6 Measured on Intel R Core 2 Duo T5870 running 32b Ubuntu 10.10, els – i.e. we would incorrectly generate e.g. “an uniform” instead of the averaged over 5 runs. The response time is averaged over the 36 sen- correct “a uniform”. tences from Section 2. 84 R. Rosa very ugly?”), because the morphological tag tells us that The chatbot is licenced under GNU GPL v2 and is avail- “ošklivé” (“ugly”) was used in a neuter gender, while in able at https://ufal.mff.cuni.cz/dite.7 English, we currently resort only to “Very ugly?”, since we cannot easily distinguish whether we should use “he”, Acknowledgments “she” or “it” (or another pronoun) as the subject. This is often distractive and leads to unnatural questions, as in This research was supported by the grants GAUK (39). The gender is not only unmarked on the adective it- 1572314, and SVV 260 104. This work has been us- self, but the English tagset does not even mark gender on ing language resources developed, stored and distributed nouns (so there is no simple and direct way of telling that by the LINDAT/CLARIN project of the Ministry of Edu- a “king” is a “he” while a “queen” is a “she”). However, cation, Youth and Sports of the Czech Republic (project there exist tools that are able to fill that shortcoming, and LM2010013). we will try to use them in future. Still, it will necessitate to use not only tagging but also syntactic parsing to be able to identify the gender associated with an adjective, since it References is implied by the parent noun in a dependency tree, which [1] Hans Christian Andersen. New Fairy Tales. C. A. Reitzel, might not be straightforward to find without a proper syn- Copenhagen, 1844. tactic analysis of the sentence. [2] Lisa M. Cepeda and Donna S. Davenport. Person-centered Employing a syntactic parser and a function labeller, therapy and solution-focused brief therapy: An integration such as [7], would be beneficial in other ways as well, of present and future awareness. Psychotherapy: Theory, e.g. by enabling us to use meaningful phrases in the re- Research, Practice, Training, 43(1):1, 2006. sponses, as opposed to the current approach where we can [3] Jan Hajič. Disambiguation of Rich Inflection (Computa- only use either a single word or the whole sentence, or tional Morphology of Czech). Nakladatelství Karolinum, by being able to distinguish the syntactic subject from the 2004. syntactic object. An analysis of the topic-focus articula- [4] Eva Hajičová. Topic-focus articulation and coreference in tion [4] could probably make it possible to generate even models of discourse production. Journal of Pragmatics, more natural questions, trying to concentrate on questions 16(2):157–166, 1991. about the focus, i.e. the new information – not about the [5] Jan Hajič. Building a Syntactically Annotated Corpus: The topic, which is the information that the speaker assumes is Prague Dependency Treebank. In Eva Hajičová, editor, Is- already known to the listener. sues of Valency and Meaning. Studies in Honor of Jarmila An obvious and important future goal is to implement Panevová, pages 12–19. Prague Karolinum, Charles Uni- a web interface. Currently, the application is terminal- versity Press, 1998. based, reading user utterances from the standard input and [6] Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beat- producing chatbot responses to standard output. This does rice Santorini. Building a large annotated corpus of En- not look very appealing; moreover, it requires the users to glish: the Penn treebank. Comp. Ling., 19:313–330, June 1993. have Treex installed. Even a simple web interface would make the application available to a much wider range of [7] Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajič. Non-projective dependency parsing using spanning users, and would also make it more attractive. While Treex tree algorithms. In HLT ’05: Proceedings of the conference does have a web interface [8], the Treex scenarios are not on Human Language Technology and Empirical Methods in designed for interactive use, and therefore a different solu- Natural Language Processing, pages 523–530, Vancouver, tion will have to be found. British Columbia, Canada, 2005. [8] Michal Sedlák. Treex::Web. Bachelor’s thesis, Charles 5 Conclusion University in Prague, Faculty of Mathematics and Physics, Prague, Czechia, 2014. We have implemented a Fairytale Child chatbot, which [9] Drahomíra Spoustová, Jan Hajič, Jan Votrubec, Pavel Kr- acts as a child that wants to hear a fairy tale. It analyzes bec, and Pavel Květoň. The best of two worlds: Coopera- the user utterances and tries to react with a question or re- tion of statistical and rule-based taggers for Czech. In Pro- mark if possible, simulating a curious child. The chatbot ceedings of the Workshop on Balto-Slavonic Natural Lan- is able to communicate in Czech or in English. guage Processing, ACL 2007, pages 67–74, Praha, 2007. We used the Treex natural language processing frame- [10] Joseph Weizenbaum. Eliza – a computer program for the work for both analysis of the user utterances and genera- study of natural language communication between man and tion of the chatbot responses. This proved to be effective, machine. Communications of the ACM, 9(1):36–45, 1966. especially for the morphologically rich Czech language. [11] Zdeněk Žabokrtský. Treex – an open-source framework Even though the implementation of the chatbot is quite for natural language processing. In Markéta Lopatková, simple, it responds quite naturally in most cases. This editor, ITAT, volume 788, pages 7–14, Košice, Slovakia, shows that even for morphologically rich languages, a sim- 2011. Univerzita Pavla Jozefa Šafárika v Košiciach. ple approach can lead to a well-performing chatbot, pro- 7 The 1.0 version of the chatbot, which is described in this paper, is vided that there is a powerful NLP framework at hand. published at: http://hdl.handle.net/11234/1-161