SUBJECTIVE MODEL ANSWER GENERATION TOOL FOR DIGITAL EVALUATION SYSTEMS Shubham Dr. Arpana Rawal Dr. Ani Thomas Research Scholar Professor Professor Bhilai Institute of Technology Bhilai Institute of Technology Bhilai Institute of Technology Durg Durg Durg +91-9644026902 +91-9907180993 +91-9893165872 Shubhamlive1010@gmail.com arpana.rawal@gmail.com ani.thomas@bitdurg.ac.in ABSTRACT recent statistics it takes one month on an average to evaluate 700 Automated subjective answer assessments in modern digital candidate answer script for six subjects in total. Thus, taking evaluation environments are promising structural consistency, but almost two to three months to declare answer for the same lot of they distort the very nature of expressing complex and context- students. Apart from this, even with expert evaluators it is not rich information put up for evaluation. In modern teaching- possible for anyone to justify which answer is better and why? learning environments, with wide variety of biasing observed Envisioning such a series of hurdles, an attempt is being made to while fabricating humane scripted Memoranda-Of-Instructions, it ease the task of manual answer generation by obtaining machine becomes difficult to evaluate subjective answers with appropriate generated answers to some question categories. justification. Answer evaluation systems have seen an extensive The rest of this paper is organized as follows: Section 2 addresses research by academicians since few decades. On the other hand, preprocessing issues that have been addressed by other systems research on subjective model answer generation is still in its for model answer generation. Section 3 outlines the subjective infancy stage. Lately, an algorithm for subjective model answer answer generation algorithm. Section 4 suggests further generation has become necessary for developing a generic applications and developments possible in near future. framework sufficing all types of subjective questions. In this paper, we described one such algorithm for generating model 2. PRE-PROCESSING ISSUES answers for all types of descriptive (subjective) questions from a The considerable issues that are needed to be investigated in given text corpus. depth before building the prototype tool of generating model answers are enumerated below: CCS Concepts • CCS → Information systems → Information retrieval → Language Support: One of the design issues in the algorithm Retrieval tasks and goals → Question answering. demands the concurrent modification of passive data objects in already existing dictionary, while checking for the terms in that Keywords domain-specific vocabulary in an attempt to expand the Answer Generation Systems; Algorithm; Content filtering; vocabulary at runtime. Not all languages support the above Restricted Domain; Vocabulary; Seed; Co-occurring; Domain mentioned feature. Hence, there arises a need to choose the specific; Answer retrieval appropriate language for tool development. This issue can also be resolved by using the method as described by D. Clarke et al. in 1. INTRODUCTION their article [3]. As the modern education system is augmented with digital environments round the globe, the evaluation systems are also Natural Language Processing (NLP) Tool selection: The getting digitized progressively. selected NLP tool must support the following Annotator properties viz. Tokenization, Sentence Splitting, Lemmatization, With the advent of automated subjective answering evaluation Parts of Speech Tagging, Constituency Parsing, Dependency tools like Electrnic Essay Rater (E-rater) by Burstein, Kukich, Parsing and Co-reference Resolution Wolff, Chi and Chodorow (1998), Conceptual Rater (C-rater) (Valenti et al., 2003), Intelligent Essay Assessor (IEA) (Valenti et Supporting Domain-specific Vocabularies: Using ‘WordNet’ al., 2003), Educational Testing Service (ETS-I) by Whittington as the source of open-domain vocabulary may seem optimal at and Hunt (1999), BETSY (Valenti et al., 2003), Schema Extract first, but it usually hinders the generation of most accurate Analyze and Report (SEAR) (Christie, 1999), a drastic drift is answers, demanding information retrieval for a narrowly specified seen in preparation of question paper manuscripts from MCQ subject domain. The Information Retrieval model built for questionnaire to a blend of both objective and subjective questions Question-Answering (QA) systems by IBM’s statistic system [1,2]. The Academicians are observed spending more of their time finds greatest hindrance observed in the last step of trimming set in setting question papers and evaluating answer, rather than of optimal sentences from the ranked set of passages obtained in analyzing the scores and counselling the students. According to the previous step. The best alternative for reducing such system errors is to use restricted domains as back-ground knowledge Copyright © 2017 for the individual papers by the papers’ authors. rather than open-domains [4]. In another exhaustive survey put up Copying permitted for private and academic purposes. This volume is by L. Hirschman and R. Gaizauskas, they emphasized on the published and copyrighted by its editors. crucial role of passages in extraction of answers to the subjective Seed_Index ← Get-Seed- questions through IR techniques [5]. Index(Section_Sentences,Seed_Sentences[j]) CREATE List Seed_Vocab of type String 3. SUBJECTIVE ANSWER GENERATION CREATE List Co_Occurring_NP of type String ALGORITHM Seed_Vocab ← Get-Vocab(Get- The syntax used in pseudo code borrows some of the elements Keywords(Seed_Sentences[Seed_Index]),V) from java syntax. All the input and output objects are specified in Left_Marker ← Seed_Index bold. All the language constructs like conditionals and loops use Right_Marker ← Seed_Index italics style. Square brackets denote index for storing and WHILE there exists a String from Seed_Vocab or accessing array elements. Assignment operation is denoted by Co_Occurring_NP Symbolic notation ←. The description of various variables and in Seed_Sentences[Left_Marker] procedures used in this pseudo code are as follows: Cur_Co_Occurring_NP ← Get-Co-Occurring- NP(Seed_Sentences[Left_Marker]) a. Q is the raw question string to be used for finding add Cur_Co_Occurring_NP to Co_Occurring_NP answers Left_Marker ← Left_Marker - 1 b. K is the List of keyword strings present in question Q IF Left_Marker = 0 which can be generated using open source NLP tools. break from while loop END IF c. C is the List of Corpus sections as string which can be END WHILE sections or chapters defined in a standard text book WHILE there exists a String from Seed_Vocab or Co_Occurring_NP d. V is the source vocabulary which can be generated for in Seed_Sentences[Right_Marker] all the keywords of Corpus C using either wordnet for Cur_Co_Occurring_NP ← Get-Co-Occurring- enhanced domain vocab or manual human intervention NP(Seed_Sentences[Right_Marker]) for restricting domain. add Cur_Co_Occurring_NP to Co_Occurring_NP e. Get-Entry-Points : Function for initial filtering based Right_Marker ← Right_Marker + 1 on count of Question keywords K found in different IF Right_Marker = Seed_Sentences.size sections of Text Corpus. break from while loop END IF f. Get-Seed-Sentences : Function for getting seed END WHILE sentences present in a particular section based on INITIALIZE Cur_Frag to empty String keyword K and keyword vocab for given question Q. FOR k = Left_Marker to Right_Marker g. Get-Section-Sentences : Function to return list of all Concatenate Seed_Sentences[k] to Cur_Frag sentences present in a text paragraph from a section. END FOR Add Cur_Frag to SectionWise_Fragments h. Get-Vocab : This procedure returns a list of string for END FOR all terms supplied separately. Each list in returned Remove Duplicate sentences from SectionWise_Fragments composite list is synonym for respective words provided INITIALIZE Cur_Section_Answer to empty String in input list. FOR j = 0 to SectionWise_Fragments.size do Concatenate SectionWise_Fragments[j] to i. Get-Keywords : This procedure returns a list of related Cur_Section_Answer keywords based on NLP dependencies provided by NLP END FOR parsers. Add Cur_Section_Answer to Answer_Fragments j. Get-Co-Occurring-NP : This procedure returns co- END FOR occurring NP after performing anaphora resolution of INITIALIZE A to empty String supplied text. FOR i = 0 to Answer_Fragments.size Concatenate Answer_Fragments[i] to A The algorithm for generating answers is as follows: END FOR RETURN A ALGORITHM Generate-Answer is INPUT: Question Q with Keywords K, 4. FURTHER APPLICATIONS AND Text Corpus C as List of section fragments, DEVELOPMENT Vocab Source V OUTPUT: Answer A comprising concatenated fragments This tool is observed to provide answer fragments that go fairly congenial, when compared with model answers fabricated by E ← Get-Entry-Points(Q,C) human assessors. The algorithm presented here is capable of CREATE an empty list Answer_Fragments of type String generating answers with highest precision depending on the FOR i = 0 to E.size do vocabulary source but some other parameters like context Cur_Segment ← E[i] continuity and context span must be included in order to limit the Seed_Sentences ← Get-Seed-Sentences(E[i],K,C) locality of context for more accurate results with high recall. The Section_Sentences ← Get-Section-Sentences(E[i],C) software testing of the tool seems to provide promising results in CREATE an empty list SectionWise_Fragments of type performing fair and unbiased evaluation of students’ answer string scripts. Combining this algorithm with a good answer evaluation FOR j = 0 to Seed_Sentences.size do approach can provide robust answer evaluation feature for automating the digital evaluation systems. Another field of this tool application is evaluation of online assignments at the institute level for analyzing students’ appraisals on continuous scale. The up gradation scopes of such a tool development follow with real-time answer generation for different types of subjective questions presented in wide variety of grammatical styles and for versatile subject domains. 5. ACKNOWLEDGMENTS This work was supported by Research and Development Laboratory, Department of Computer Science and Engineering at Bhilai Institute of Technology, Durg, Chhattisgarh, India, awaiting sponsorship from suitable funding agencies. 6. REFERENCES [1] Valenti, S., Neri, F. and Cucchiarelli, A. 2003. An Overview of Current Research on Automated Essay Grading. J. of Information Technology Education (JITE), pp. 319-330. [2] Christie, J. 1999. Assessment of Essay Marking - focus on Style and Content. In 3rd International Computer Assisted Assessment Conference (CAA) , pp. 39-45. [3] R.Diekema, Ozgur Yilmazel, and E.D.Liddy, 2004. Minimal Ownership of Active Objects. In Proceedings of the 6th Asian Symposium on Programming Languages and Systems, APLAS 2008, Bangalore, pp. 139-154A. [4] Parag A. Guruji, Mrunal M. Pagnis, Sayali M. Pawar and Prakash J. Kulkarni, ‘Evaluation Of Subjective Answers Using Glsa Enhanced With Contextual Synonymy’, International Journal on Natural Language Computing (IJNLC) Vol. 4, No.1, February 2015, pp. 51-60. [5] Jorg Tiedemann, “Integrating linguistic knowledge in passage retrieval for question answering.” Proceedings of Conference on Human Language Technology and Empirical Methods in Natural LanguageProcessing, Vancouver, British Columbia, Canada, pp. 939 - 946, 2005.