=Paper=
{{Paper
|id=Vol-1664/w3
|storemode=property
|title=None
|pdfUrl=https://ceur-ws.org/Vol-1664/w3.pdf
|volume=Vol-1664
}}
==None==
1 PyGenbot for IoT: a demonstration of how to generate any restricted stateless AIML FAQ-chatter bot from text files Giovanni De Gasperis, Dipartimento di Ingegneria e Scienze dell’Informazione e Matematica, Università degli Studi dell’Aquila, Italy, email: giovanni.degasperis@univaq.it Abstract—Internet of things applications (IoT) are required with a human, mostly in a restricted knowledge domain. Since to interact with the user in the best natural possible way; ELIZA [1], text pattern recognition based chatter bots have the voice based conversation is the ultimate human-machine come a long way [2], [3] . A.L.I.C.E. is an handy crafted interaction in terms of easy to use and requirements from the chatter bot composed of about 50’000 lexical categories edited user part, which also has the advantage for the user to interact by a community of about 500 authors [3], aiming to be hands free, non necessary watching a computer screen. Chatter bots are conversational agents that simulate, and capable to unrestricted in its knowledge as a tentative to pass a limited sustain, a conversation with a human. Technology do exists implementation of the Turing test known as the Loebner that allows to create a lexical knowledge base to be used by Prize [4]. A.L.I.C.E.’s lexical knowledge base is described a restricted chatter bot, i.e. expert on a specific dominion. This using the Artificial Intelligence Markup Language, AIML [3]. work shows a methodology of restricted chatbot generation using The lexical categories in AIML are defined by means of Python program, called PyGenbot, that is capable to derive an (pattern, template) tuples in a XML derived syntax: AIML (Artificial Intelligence Markup Language) knowledge base starting from a simple textual data set, including: a FAQ, akeywords, a stopwords, a multiwords and a glossary file set. can then be integrated into an IoT application. Also, different categories with a common semantic back- ground can be linked together by means of a SRAI connection: I. INTRODUCTIONWHAT IS LINUX Any WOA attendee is welcome to supply arbitrary and simple formatted text files; then using PyGenbot, I will first edit the text Linux is an open-source input files needed to generate automatically the corresponding computer operating system AIML knowledge base set that can be used with any standard AIML interpreter to implement the desired chatter bot, whichInternet of things applications (IoT) are required to interact from major smartphone corporations, mostly specialized on the smartphone usage scenario: sending messages, handling the In this way a tree of SRAI connection can link all of the calendar, fix an appointment, searching for a restaurant close different lexical forms to their lemma. Using wildcards it is by. Also, home appliance with similar capabilities have shown also possible to filter out common words, isolating keywords. up in the market. Most of the time, these are proprietary solu- The reasoner, i.e. AIML interpreter, is a LISP program tion, not readily available to developers, but strictly integrated proposed by Richard Wallace [3] designed to search for the into commercial products, or proposing a licensed cloud API. best text pattern matching given the user input so to give the The voice recognition phase is not in the focus of this work. most appropriate answer during the written conversation. The So I give it for granted that do exists a hardware device, or IBM question answering system (QAS), known as Watson, cloud API that provides the voice-to–text recognition task. The won a challenge of the kind human-versus-machine [5], using focus in this work is to generate a proper textual reply to a brute force search algorithm on an unrestricted knowledge text generated by the user by any means, typing or talking. domain. However, in this work I concentrate to automate the I offer a general purpose tool that can be applied to generation of only stateless restricted chatter bots, given any IoT application, with limited computational capability by that their lexical knowledge can be expressed by means of using AIML automatically generated chatter bots. They are a combination of a frequently asked question/glossary set, conversational agents that simulate and sustain a conversation keywords, multiwords and stopwords lists. The input data 14 2 User INPUT Stop words filter Keywords match Question Answer Stop-words FAQ SET PyGenBot Fig. 2. Ideal textual data processing from the user input to the right answer. keywords Answering INPUT DATA Chatterbot SET Generator Algorithm The glossary item definition can be enriched using the free online dictionary 1 or by using the Python NLTK 2 . Multi-words Glossary items should cover the most significant terms of GLOSSARY SET the restricted knowledge domain about which the final chatter AIML bot is designed to be expert. Keywords should be selected Knowledge Base from the text of the questions in order to optimize the pattern Wictionary matching the user input. The stopwords list are just the most lookup common words of the language, i.e. elements of structural and connective lexicon. The input set does indeed determine: • the language (English , Italian, German, etc..) Fig. 1. Input data set and information workflow of the AIML generation • the restricted knowledge domain process. • the vocabulary The generated chatter bot is considered to be stateless since it can only demonstrate a purely reactive behavior, given set can be arbitrary, so the methodology is unrestricted, a textual stimulation. A more sophisticated prototype could but not so for the final products, i.e. the FAQ chatter bots. be built upon adopting proactive multi-agent system logic Also, the overall method does not depend on the language, frameworks like DALI [8] or AgenSpeak/Jason [9] middle so multilingual IoT applications can be readily designed with layer. By the way, a stateless chatter bot is what is needed parallel free text corpora for each language. Here is shown how in the majority of IoT applications were an appliance need to to apply the PyGenbot Python program [6], [7] designed to au- give a correct answer to a user or to set up a working parameter tomatically generate Artificial Intelligence Markup Language to accomplish a user given task. (AIML) knowledge bases. III. THE PYGENBOT PROGRAM II. INPUT DATA SET PyGenbot is a Python program with about 750 lines of code that takes as input the text files set and produces an The set of textual data in input is defined as the following: AIML file set, ready to be uploaded to any AIML interpreter, 1) a frequently asked questions (FAQ) file F which finally implements the actual stateless chatter bot able 2) a glossary file G to interact with the user. The usage scenario is analog to use a 3) a keywords list file K compiler to produce machine language (AIML) from high level 4) a stop words list file S source files (FAQ, keywords, multiwords, stop words), even if 5) a multiwords list file M in this case the underlying natural language is not context free All files are simple free text documents, with some basic as programming languages. structure in order to distinguish text of the questions from text The algorithm as been published in [6], [7]. The reference of the answers, or glossary items and their respective definition. idea is shown in Fig. 1, by which the construction of this kind Keywords, multiwords and stop words are just files containing of restricted chatter bots is inspired. a word on each line, or separated by comma. The FAQ file F PyGenbot generates three set of AIML files: is completely defined by the chatter bot designer. It contains • the FAQ/keywords/multiwords categories questions and answers in the simple form: • the glossary categories and “WHAT IS *” question patterns QWHAT IS GNU LINUX with the user in the best natural possible way; the voice based conversation is the ultimate human-machine interactionin terms of easy to use and requirements from the user part, WHAT IS LINUX which also has the advantage for the user to interact hands free, not necessary watching a computer screen such as the scenario of a car driver. Many commercial solutions have come recently| {Q } • the stopwords filtering categories A | {A } The FAQ/keywords/multiwords AIML set can grow to sev- eral thousands of categories, so the generation algorithm Alternative versions of the question are useful to enlarge the needs to be tuned by the maximum number of categories possibility to intercept the user input; alternative answers are each AIML file can contain, given the complexity of the great to increase the variability of the answers given by the FAQ/keywords/multiwords text file set and the final AIML chatter bot in response to the user input. The multiwords list interpreter tool adopted. The output AIML 1.0 file set is is very important to isolate conceptual entities that uses more then ready to be used in a AIML hosting web services, like than a words, as for example “operating system” or “credit http://pandorabots.com . card”. The input data set and the information workflow can be 1 http://en.wiktionary.org last accessed June 2016 summarized by the diagram in Fig.1. 2 http://nltk.org last accessed June 2016 15 3 IV. QUALITY ASSESMENT It is necessary to introduce a measurable metric of the correctness of the final chatter bot. As already proposed in [7], a three level metric can be adopted: • Level 0: the resulting chatter bot does give a correct answer for all the questions included in the FAQ, with exact text matching • Level 1: the resulting chatter bot does give at least 50% of correct answers, not using the exact wording of the original FAQ questions text, but with the same semantic • Level 2: the resulting chatter bot does give at least 50% of correct answers using questions with completely different wording, but same semantic of the original FAQ questions It has been experimentally proven that all chatter bots gen- erated with PyGenbot are at least of Level 0 quality, and very often can reach Level 1 quality if the FAQ/keywords/glossary set is accurately designed and well written. The demonstrator at the WOA 2016 Workshop is aimed to confirm experimen- tally this statement. V. CONCLUSION The proposed demonstrator, the PyGenbot program, is ca- pable of generating lexical knowledge bases for AIML based stateless chatter bots. This work illustrated the underline engi- neered knowledge-unrestricted methodology, also proposing a quality assessment procedure that should objectively demon- strate that restricted chatter bot can be generated starting from arbitrary text files, independent from the language. R EFERENCES [1] J. Weizenbaum, “Eliza a computer program for the study of natural language communication between man and machine,” Communications of the ACM, vol. 9, no. 1, pp. 36–45, 1966. [2] R. Epstein, G. Roberts, and G. Beber, Parsing the turing test : philosoph- ical and methodological issues in the quest for the thinking computer. New York: Springer, 2008. [3] R. S. Wallace, The Anatomy of A.L.I.C.E, ser. Parsing the Turing Test. New York: Springer, 2008, pp. 181–210. [4] M. Mauldin, Chatterbots, tinymuds, and the turing test: Entering the loebner prize competition, ser. AAAI ’94 Proceedings of the twelfth national conference on Artificial intelligence. AAAI Press, 1994, vol. 1, pp. 16–21. [5] S. Baker, Final Jeopardy: Man vs. Machine and the Quest to Know Everything. New York: Houghton Mifflin Harcourt Publishing Company, 2011. [6] G. De Gasperis, “Building an aiml chatter bot knowledge-base starting from a faq and a glossary,” Journal of e-Learning and Knowledge Society-English Version, vol. 6, no. 2, 2010. [7] G. De Gasperis, I. Chiari, and N. Florio, “AIML knowledge base construction from text corpora,” in Artificial intelligence, evolutionary computing and metaheuristics. Springer, 2013, pp. 287–318. [8] G. De Gasperis, S. Costantini, and G. Nazzicone, “Dali multi agent systems framework, doi 10.5281/zenodo.11042,” DALI GitHub Software Repository, July 2014, DALI: http: //github.com/AAAI-DISIM-UnivAQ/DALI. [9] R. H. Bordini and J. F. Hübner, “BDI agent programming in agentspeak using Jason (tutorial paper),” in Computational Logic in Multi-Agent Systems, 6th International Workshop, CLIMA VI, Revised Selected and Invited Papers, ser. Lecture Notes in Computer Science, F. Toni and P. Torroni, Eds., vol. 3900. Springer, 2006, pp. 143–164. 16