Rich Lexical Knowledge based Q&A System for Ubiquitous Knowledge Service Asanee Kawtrakul Mukda Suktarachan Navapat Khantonthong U-Know Center and, U-Know Center and, Department of Computer Engineering Department of Computer Engineering Department of Computer Engineering Kasetsart University Bangkok, Kasetsart University Bangkok, Kasetsart University Bangkok, Thailand Thailand Thailand navapatk@gmail.com asanee_naist@yahoo.com naist_da_da@yahoo.com Aree Thunkijjanukij Patrick Saint-Dizier ThaiAGRIS Center Kasetsart IRIT-CNRS 118 route de Narbonne University Bangkok, Thailand Toulouse, France thunkijja@yahoo.com stdizier@irit.fr ABSTRACT The project we present here emerged from a need of the real end- We present the concept of a Question-Answering System for users, the Agricultural Land Reform Office, Ministry of providing knowledge services. The system is based on a rice Agriculture and Cooperative, Thailand, in the project of ALRO production and rice disease textual database which has been Cyber Brain [1,2], which is a social network framework that structured according to a number of ontological conceptual combines approaches based on knowledge engineering with functions, and associated annotations. In this paper, the rich lexical language engineering. Conceptual knowledge is represented in knowledge is utilized for identifying semantic roles in a question, ontology through ontology workbench [8] for responding the connecting with the domain knowledge base in ontology and text factoid questions. New knowledge in textual format is extracted for formats to response the questions. maintaining ontological knowledge and responding non-factoid questions. We present below a brief outline of the main problems we have encountered. Keywords Question-Answering System, Knowledge Services, Lexical Knowledge, Ontology. 2. TOWARDS A ‘REAL’ QA SYSTEM: CHALLENGES First, at the level of QA analysis, several problems arise to identify 1. INTRODUCTION the facets of the question: the type of the question, its focus and the In this short text, we summarize some aspects of the development constraints that hold on the focus. of Question-Answering system for farmers through SMS by focusing on the language aspects. The communicating via SMS For complex questions, another challenge is to identify its contents. facilitates a ubiquitous and effective knowledge service in problem Our approach is, via a dependency parsing approach, to tag NPs solving, decision making and early warning. Our application area (noun phrase) and PPs (preposition phrase) by means of semantic is rice production and rice diseases. tags, which correspond to the categories of the AGRIS database. In natural language, a question can indeed be asked with different To get a better grasp at the problem and to be able to characterize it words and syntactic forms. This is particularly the case in Thai, in depth, we got a collection of 1000 questions raised in real life which allows for a lot of optional terms with a large constituent from farmers. We have annotated those 1000 questions and the order freedom. text(s) identified as responses for each query. This allowed us to understand how questions can be answered. Since the agricultural Next, in most cases, questions and answers do not match directly knowledge base we are using (derived from Thai AGRIS: because the clue words or focus words in the question never appear Agricultural Research Information System, specifications) has a in the answers. This obviously causes difficulty in finding the rich conceptual structure [7], about 60% of the questions can be expected answer. For this kind of Q&A matching problem, some directly answered by transforming queries into a conjunction of lexical semantics devices or more elaborated reasoning schema, conceptual functions of this schema via lexical descriptions and based on domain knowledge are needed to allow appropriate interpretation functions. However, for about 40% of the questions, question-response matching [5,6]. This is realized in our project via this is not possible, in particular for evaluative questions (such as text annotation and learning. “what is the largest …”) and How-to questions that are related to In some cases, some information is missing to elaborate a real procedures. To deal with this latter set of questions, we developed diagnosis, in that case the user is asked to provide more details. We a model based on response annotation in order to induce inference prefer to avoid settling a dialogue, since this may lead to rules to match a question with its answer. This is particularly unexpected data or directions. Users want a relatively fast crucial when there is no straightforward response, for examples, response, therefore just asking for more free input is the best when some forms of lexical inference are required, when the compromise. The second aspect of this problem is to be able to response is not a simple item, but a well-formed fragment of text, extract the complete text portion in a text that responds to the and chain of events leading to a consequence event, or a procedure question. For that purpose we have developed an annotation [4] etc. methodology whose goal is to identify the different processes at Ontology with 2322 concepts, 5603 terms, 57 associative relations, stake and the needed resources. This method allows us to identify 60% of questions can be directly answered by transforming queries relevant text portions and then to delimit them appropriately. into one or more conceptual functions via lexical inference rules and interpretation functions. Function Matching (Question Q, Answer A){ Match = false; // Relevant document If (Q.focus = A.index) then // Relevant answer If (Q.type = A.task type) then //Detect Answer for the Question If (Q.focus = A.title) then Match = true; Else if (Q.action = A.action and Q.theme = A.theme or Q.agent = A.agent) then Figure 1. Question Representation and Corresponding text Indexing Match = true; End If End If Our question answering system is based on three sources of End If knowledge which interact: Return Match;} − lexical data and in particular lexical semantics, and lexical Figure 3. Full-Text Q&A matching Algorithm inference, − the domain data as represented by the rich conceptual functions ,i.e. Rice Ontology, 4. ACKNOWLEDGMENTS The work described in this paper has been supported by the − some general purpose knowledge, useful for answering NECTEC No. NT-B-22-KE-12-50- 19, within the project, I-Know questions. II: CAT, EAT, RATs, and Agricultural Question & Answering Lexical representations of verbs are based on conceptual functions Service System, granted by the KURDI, Kasetsart University. We from Framenet. The general form is: Verb + argument selectional would also like to thank the French CNRS PICs programme. restrictions: conjunction of conceptual functions (with variables corresponding to argument positions). For example: resist: verb, [X:NP, Y:NP],[X:plant, Y:insect ‫ ש‬disease], X ’isResistantTo’ Y. 5. REFERENCES Nouns are associated with their types as defined in the domain [1] Kawtrakul, A. et. al. 2009. Problems-Solving Map Extraction ontology. with Collective Intelligence Analysis and Language Engineering. Book Chapter 18, Medical Information Science While the semantics of verbs can be represented on the basis of Reference in Information Retrieval in Biomedicine. ISBN: conceptual functions, more complex situations, e.g. the adjunction 978-1-60566-274-9; pp 460 of constraints, often expressed by syntactic adjuncts, need further developments. The first difficulty is to develop a compositional [2] Kawtrakul, A. et. al. 2009, From CyberBrain to Q&A framework that can integrate various modifiers. For that purpose, Services: A Development of Question - Answering Services we reuse the semantic representations we developed based on the System for the Farmer through the SMS. In Proceedings of Lexical Conceptual Structure principles that we have integrated WCCA2009. Grand Sierra Resort, Reno, Nevada, USA. into the PrepNet lexical base. PrepNet proposes semantic [3] Moldovan, D. et.al. 2000. The Structure and Performance of representations for a large number of forms of adjuncts based on a an Open-Domain Question Answering System, Proceedings of notion a prepositional modification, which is what is encountered the 38th Meeting of the Association for Computational in complex questions. For ontology based Question and Answering Linguistics (ACL), Hong Kong. system, we apply Thai Rice Ontology [7] as a source of rich [4] Estelle Delpech, Patrick Saint-Dizier. 2008. Investigating the knowledge. The corresponding answer could be extracted with Structure of Procedural Texts for Answering How-to simple algorithm by matching query with ontological relation and Questions, LREC2008, Marrakech. then grasping it’s associations with inference rules as an answer as in Figure 2. [5] Leonard Talmy. 1985. Lexicalization Patterns: Semantic Structure in Lexical Forms, in Language Typology and Q: What are the disease of rice caused by fungi ? With inference Rule: X isDiseaseOf Y ‫ ר‬X isa Fungi we can Syntactic Description 3: Grammatical Categories and the get the answer as the following. Lexicon, T. Shopen(ed.), 57-149, Cambridge University Press. A: Magnaporthe grisea [6] Takechi, M. et.al. 2003. Feature Selection in Categorizing Figure 2. Example of an ontological inference rule. Procedural Expressions, The 6th International Workshop on Then the answers will be collected more from full-text by matching Information Retrieval with Asian Languages (IRAL2003):49-56. the question to extracted full-text (See Figure 3). [7] Thunkijjanukij, Aree, Ontology development for Agricultural research Knowledge Management: a case study for Thai rice, 3. CONCLUSION PhD Dissertation, Kasetsart University, 2009 This short paper presents some ideas on how to utilize the rich [8] Kawtrakul A. et.al.2008. “Ontology based Knowledge Map lexical knowledge for annotating question and answer in both Construction for a Smart Knowledge Service” IAALD AFITA indexing and text level. The application of presented methodology WCCA 2008, Tokyo, Japan, 24 - 27 August. has been implemented on knowledge services for the Thai farmers in Rice domain and is under testing. Moreover, with the Rice